In my past articles related to project and sprint planning, we touched on the concept of relative estimates. Those articles were
focused more on the planning aspect and the usage of the estimates and less on the actual process of estimation. So let's talk about estimation techniques
my colleagues and I found useful.
I already touched on this before, there is a huge misunderstanding in what makes a feature
development estimate exact. People intuitively think that an exact estimate is a precise number with no tolerance. Something like 23.5 man-days of work. Not
a tad more or less.
How much can we trust that number? I think we all feel that not much unless we know more about how the estimate was created. What precise information did
estimator base his estimate on? What assumptions did he make about future progress? What risks did he consider? What experience does he have with similar
We use this knowledge to make our own assessment on how likely it is that the job's duration will vary from the estimate. What we do is make our own
estimation of a probable range, where we feel the real task's duration is going to be.
It is quite a paradoxical situation, isn't it? We force someone to come up with precise numbers so that we can do our own probability model around it.
Wouldn't it be much more useful for the estimate to consider this probability in the first place?
That also means that (in my world) a task estimate is never an exact number, but rather a qualified prediction of the range of probability in which a
job’s duration is going to land. The more experience with similar tasks the estimator has, the narrower the range is going to be. A routine task that one
has already done hundreds of times can be estimated with a very narrow range.
But even with a narrow range, there are always variables. You might be distracted by someone calling you. You mistype something and have to spend time
figuring it out. Even though those variables are quite small and will not likely alter the job's duration by an order of magnitude, it still makes an
absolutely precise estimate impossible.
Linear and non-linear estimates
On top of all that, people are generally very bad at estimating linear numbers due to a variety of cognitive biases. I mentioned some of them here [link:
Wishful plans - Planning fallacies]. So (not just) from our experience, we proved that it is generally better to do relative estimates.
What is it? Basically, you are comparing future tasks against the ones that you already have experience with. You are trying to figure out if a given task
(or user story or job or anything else for that matter) is going to be more, less, or similarly challenging compared to a set benchmark. The more the
complexity increases, the more unknowns, and risks there generally are. That is the reason why relative estimates use non-linear scales.
One of the well-known scales is the pseudo-Fibonacci numerical series, which usually goes like 0, 1, 2, 3, 5, 8, 13, 20, 40, 100. An alternative would be
T-Shirt sizes (e.g. XS, S, M, L, XL, XXL). The point is that the more you move up the scale, the bigger is the increase in difference from the size below.
That takes out a lot of the painful (and mostly wildly inaccurate) decision-making from the process. You're not arguing about if an item should be sized 21
or 22. You just choose a value from the list.
We had a good experience with playing planning poker. Planning poker is a process in which the development team discusses aspects of a backlog item and then
each developer makes up his mind as to how “big” that item is on the given scale (e.g. the pseudo-Fibonacci numbers). When everyone is finished, all
developers present their estimates simultaneously to minimize any mutual influence.
A common practice is that everyone has a deck of cards with size values. When ready, a developer will put his card of choice on the table, card facing down.
Once everyone has chosen his card, all of the cards are presented.
Now each developer comments on his choice. Why did he or she choose that value? We found it helpful that everyone answers at least the following
- What are similarly complex backlog items that the team has already done in the past?
- What makes the complexity similar to such items?
- What makes the estimated item more complex than already done items, which were labeled with a complexity smaller by one size degree?
- What makes the estimated item less complex than already done items, which were labeled with a complexity higher by one size degree?
A few typical situations can arise.
1) Similar estimates
For a matured team and well-prepared backlog items, this is a swift process, where all the individual estimates are fairly similar, not varying much. The
team can then discuss and decide together as to what value it will agree on.
2) An outlying individual estimate
Another situation is that all individual estimates are similar, but there is one or two, which is completely different. This might have several causes.
Either that outlying individual has a good idea, that no-one has figured out or he misunderstands the backlog item itself. Or he has not realized all the
technical implications of the development of that particular item. Or he sees a potential problem that the others overlook.
In such situations we usually took the following approach. People with lower estimates explain the work they expect to be done. Then the developers with
higher estimates state the additional work they think needs to be done in comparison to the colleagues with lower estimates. By doing this, the difference
in their assumptions can be identified and now it is up to the team to decide if that difference is actually necessary work.
After the discussion is finished, the round of planning poker is repeated. Usually, the results are now closer to the first case.
3) All estimates vary greatly
It can also happen, that there is no obviously prevailing complexity value. All the estimates are scattered across the scale. This usually happens, when
there is a misunderstanding in what is actually a backlog item's purpose and its business approach. In essence, one developer imagines a simple user
function and another sees a sophisticated mechanism that is required.
This is often a symptom of a poorly groomed backlog that lacks mutual understanding among the devs. In this case, it is usually necessary to review the
actual backlog item's description and goal and discuss it with the product owner from scratch. The estimation process also needs to be repeated.
Alternatively, this can also happen to new teams with little technical or business experience of their product in the early stages of development.
It's a learning process
Each product is unique, each project is unique, each development environment is different. That means the development team creates their perception of
complexity references anew when they start a project. It is also a constant process of re-calibration. A few backlog items that used to serve as a benchmark
reference size at the beginning of a project usually need to be exchanged for something else later on. The perception of scale shifts over time.
The team evolves and gains experience. That means the team members need to revisit past backlog items and ask themselves if they would have estimated such
item differently with the experience they have now. It is also useful, at the end of a sprint, to review items that in the end were obviously far easier or
far more difficult than the team initially expected.
What caused the difference? Is there any pattern we can observe and be cautious in the future? For instance, our experience from many projects shows that
stuff that involves integrations to outer systems usually turns out to be far more difficult in comparison to what the team anticipates. So whenever the
devs see such a backlog item, the team knows it needs to think really carefully about what could go wrong.
Don't forget the purpose
In individual cases, the team will sometimes slightly overestimate and sometimes slightly underestimate. And sometimes estimates are going to be completely
off. But by self-calibrating using retrospective practices and the averaging effect over many backlog items, the numbers can usually be relied on in the
Always bear in mind that the objective of estimating backlog items is to produce a reasonably accurate prediction of the future with a reasonable amount of
effort invested. This needs to be done as honestly as possible given the current circumstances. We won't know the future better unless we actually do the
work we're estimating.