In Scrum, there is some confusion around the various types of estimation that are done. Product Backlog Items are estimated and Sprint Backlog Tasks are also estimated. These two estimates, despite some similarities, are used for very different purposes.
Scrum advocates estimating the Product Backlog Items (Work Queue) using relative points. Each item is given a number of points that represents its size relative to the other items in the backlog. The team does this by identifying a small, well-understood backlog item and giving it, quite arbitrarily, 2 points of effort. All other items are estimated relative to this item.
For the Sprint Backlog Tasks, the estimation is usually done in “ideal hours”. This is because tasks tend to be relatively fine-grained and in the context of developing software, the team typically breaks backlog items into tasks that take less than a couple of days of real effort. Again, there is a strong “relative” estimation component here where the team might look at tasks that are simpler and easy to implement and estimate those first, while the rest are estimated relative to those simple ones.
These two types of estimation are used for different purposes. Product backlog estimates in points are used for planning purposes. Task estimates in ideal hours are used for team commitment purposes.
Points -> Planning
When the project starts, an initial Product Backlog is created to be at least a skeleton of all the functionality that needs to be built. As described in a number of places, including both of Ken Schwaber’s books, the items at the top of the Product Backlog are at a more detailed level (and therefore smaller in effort) than those items at the bottom of the backlog.
To plan out the project, the team must estimate all the backlog items using points. The team guesses how many items on the top of the backlog it can do in its first sprint. Then, taking drag factors into account, the Product Owner can take the number of points estimated for the first sprint and divide that into the total number of points in the backlog to find out how many sprints the work might take. As the team goes along, this can be redone based on the number of points completed in the previous sprint.
At no time are the points used by the team to determine how much work they will commit to in a sprint. The team’s capacity for planning purposes is measured in points/sprint, but this is not used to constrain the team’s commitment. Instead, the team uses its estimates at the Task level.
Digression: Bananas -> Commitment
“Bananas”? Yup, bananas. A different word than points, and a word that has no connection to actual effort. The problem with estimating in Ideal Hours is that people think that there should be some predictable relationship to real hours. This is both false and unnecessary. Tasks in the Sprint Backlog can also be estimated in relative units and since the work “points” is already used for Product Backlog Items, I would like to introduce “bananas”.
The team estimates the tasks in relative bananas. Now, when the team completes its first sprint, it can measure how many bananas it estimated at the start of the sprint, and how many remained at the end of the sprint. The difference is the team’s velocity which is to be used for commitment purposes. You can’t use velocity for planning purposes because the team hasn’t broken the whole Product Backlog into tasks, only the backlog items that it is doing in the current sprint.
Based on velocity, the team can make an informed commitment at the start of the next iteration: never take on more bananas than your velocity allows.
I have deliberately used the nonsense term “bananas” to talk about the units of effort for tasks. I could have used “ping pong balls”, “elephants”, or “nebulous units of time” (NUTs). The point is, that these units can never be converted directly into man-hours or any other unit that might accidentally be used for comparison or performance evaluation purposes. One team’s velocity cannot be compared to another team’s velocity. Even if both teams use “bananas”, there are quite a number of factors that would prevent comparison: what tasks are used to baseline, the skills of the teams, the organizational and environmental obstacles, the definition of done, the mood and amount of sleep of the person working on the task, etc. etc.