Infoq published the video recording of my talk at Agile 2008 titled “Extremely Short Iterations as a Catalyst for Effective Prioritization of Work“.
One of the challenges with agile methods is to get a clear perspective on how to measure process improvements. I recently had a brief discussion with a C-level executive at a small organization about this. His concern was that cycle time was meaningless because it depended so much upon the size of the work package. So how do we use cycle time as a meaningful measurement? What else can we use to measure process improvement?
Let’s look at the difference in measuring cycle time in an agile vs. non-agile environment. Then we’ll get to other measurements.
Cycle Time , Waterfall and Agile
First, let’s define cycle time. From iSixSigma we have:
Cycle time is the total time from the beginning to the end of your process, as defined by you and your customer. Cycle time includes process time, during which a unit is acted upon to bring it closer to an output, and delay time, during which a unit of work is spent waiting to take the next action.
This definition is important because it gives us a clue about the potential difference between a waterfall vs. agile method of delivering value. Let’s imagine the typical process used in a waterfall environment. The following are the high-level steps:
- Customer / User / Stakeholder sees a need, validates it and submits a request to have that need fulfilled. This is when we start the clock on cycle time.
- The fulfillment organization (IT, Product Development, R&D) puts the request in a queue, backlog or requirements management system.
- Along with other requests, the fulfillment organization schedules the work on the request, usually by creating a project to fulfill it and other related requests. The project is estimated at a high level, the current status of in-flight projects is noted, and the new project is prioritized relative to other projects.
- At some point, based on the schedule and the reality of the work on other projects, the project containing our customer’s request is started. Here, “started” means that detailed requirements are gathered.
- After sufficient requirements are gathered, a detailed technical analysis is done including architecture, high-level design, risk analysis, etc.
- Development begins. (Note: many people mistakenly start measuring cycle time here.)
- Developers and testers work to validate the results of development and fix any problems discovered.
- Final acceptance testing is done.
- The results of the project are deployed to users, sold to the client, or in some other way passed back to the original requestor. This is when we stop the clock on cycle time.
So from the start of the customer request formally submitted to the time that the fulfillment of that request is made is our true cycle time. There are a few important things to note here. First, there is a queue of work based on requests made but not yet scheduled. There is another queue for work scheduled but not yet started. We know that if we can reduce the size of these queues, we can improve cycle time in a general sense. Second, we know that most organizations of any significant size will have different queues based on the urgency of the request. For example, a high severity bug discovered in the production system of a company’s largest client will be treated differently than a wish list item for a small not-yet-client. These two requests won’t even go in the same queue: the high priority problem will be quickly escalated to a support or development team that can work on it immediately. Third, it is tempting for the development group to measure their local cycle time. This is a Really Bad Idea since it leads to sub-optimizing behaviors. For example, it is easy for the development team to improve their cycle time by sacrificing quality… but this just causes the QA cycle time to increase, and probably the overall cycle time (true cycle time) is affected more than the local improvement in the development group’s cycle time.
Now let’s look at the steps that occur in an ideal agile environment:
- As before, the Customer / User / Stakeholder sees a need, validates it and submits a request to have that need fulfilled.
- That request is immediately placed in a ready state for the next iteration (cycle, sprint) of a delivery team. Elapsed time: maximum one month.
- Team completes the request including all work to actually deliver/deploy and work is delivered to the stakeholder at the end of the iteration. Elapsed time: maximum two months.
So the ideal method of doing agile has a maximum cycle time of two months to deliver from the time a request is made… how many teams are doing this? Not many.
The ideal is extremely difficult to accomplish. Getting to that state requires that the development organization catches up to the business side so that there are zero pending requests at the start of each iteration. It also requires that the business side users and stakeholders are able to articulate their requests so that they are small, and appropriately detailed for the team doing the work.
A realistic agile implementation actually is a lot more messy. Depending on the type of request, the cycle time for a piece of work can vary widely. Some low priority items may take years even in an agile environment. A low priority request is made and approved but then never quite makes it into a project… and then once in a project never quite makes it to the top of the team’s product backlog. This is interesting to look at sometimes, but it points out another important aspect of measuring cycle time: mostly we care about average cycle time (or some other statistically interesting aggregate measure).
The predominant factor in most organizations’ cycle time is the number and size of the queues they use as work is processed. In most organizations there are several queues and most of them contain large numbers of requests or bits of work in process. Queues represent huge amounts of waste. It is easy to see that queue size and cycle time are closely related: the more items in a queue, the longer the cycle time.
This leads to a simple conclusion: regardless of lifecycle approach, reducing the size of an organization’s queues is one of the easiest ways to reduce cycle time. What are some common queues? There are often queues of projects, queues of enhancement requests, queues of defects to be fixed, queues of features, queues of tasks, queues of email (large inboxes), queues of approval requests, queues of production database changes. The number of queues increases the more an organization is oriented around functional groups, and the number of queues decreases the more an organization arranges work to be handled by cross-functional teams.
Cycle Time and Work Package Size
This is where queueing theory and agile methods intersect really well. Cycle time is related to the load on your system, in particular your units of work processing. In most organizations, teams are created to handle work. The more work given to a team simultaneously, the higher their utilization level. Many organizations like high utilization levels because it gives them a guarantee that people are doing valuable work all the time that they are paid to work. This is a completely false benefit and in fact is extremely destructive to overall productivity. From queueing theory we know that the cycle time for a piece of work increases exponentially to the utilization level. We see this whenever we over-load a server… but for some reason we fail to see this when we overload a person or a team or an organization even though it still happens.
Cycle time is also related to the variability in the size of the work packages. Low variability means that the exponential factor related to load is low, and high variability means that the exponential factor is high. In other words, if you have a highway that only allows motorbikes, you can have a very high load without getting bad traffic jams. On the other hand, if you have a highway that allows anything on it, you get traffic jams even with low levels of load. This is why HOV or commuter lanes and the left lane in multi-lane highways don’t typically allow transport trucks and buses. This result from queueing theory is not intuitively obvious so it is even harder for us to apply to software development.
But apply these two ideas, load and work size variability, we must if we wish to create a high performance development organization. The simplest way to do this is to have a single team work on a single project at a time and use iterations to ensure that the work being done is always exactly the same size – the size of the iteration.
Improving Cycle Time
It is possible to have very short iterations and still have a long cycle time. Many organizations make a few common mistakes with agile that cause this. If the work done inside each iteration is restricted to pure development work and everything else is done outside the iterations, then cycle time likely stays long. A common example of this is having the QA folks remain separate from the development team and do their work after a development team releases their work.
There is really only one way to avoid this: have a comprehensive definition of “done” that is met by the team every single iteration. This ensures that all work from idea to release for a given customer request is done inside a single iteration. A side effect of this is that all the pieces of work need to be small. It also gets rid of all the queues except one: the queue of ideas approved for delivery. With a single queue to manage, it becomes easy to measure cycle time, and therefore easy to improve it.
Improving cycle time can now be done in a few ways:
- Put a cap on the number of items in the work queue. Since cycle time is directly related to the size of the queues in a system, this is a sure way of putting a maximum on cycle time.
- Go through all existing requests and throw as many away as possible. This can be tough to do, but if you are able to do a cost benefit analysis, you will typically find that older items in the queue are no longer worth while.
- Provide more stringent gating functions for allowing requests onto the queue. The few items added, the faster the size of the queue is reduced.
- And of course, increase the performance of your team(s) so that they go through items on the queue more quickly.
Productivity and Cycle Time
Once you have control of cycle time, it is possible to make reasonable measurements of productivity and two more metrics become extremely important (not that they weren’t important before, but they are easier to work with now). The first is Return on Investment (ROI) and the second is customer satisfaction.
ROI is in its simplest form a measure of how much benefit there is to doing something as compared to the cost of doing it. It takes into account the importance of time and timing, the importance of other options you may have, and of course, hopefully takes into account the business reality of your work. It also takes into account costs.
In software development, the primary cost is the cost of the staff doing the work, and the time factor is your cycle time (Ah! that’s where we use it). If you have a consistent team working on iterations that are always the same size and if you have little or no work being done outside of the iterations, it is very easy to calculate ROI in a useful way. Simply measure how much value a given iteration worth of work will generate and divide by the cost of the team for an iterations (and if the team is not yet doing work as it comes in, take into account the time value of money since the work might not be done for several iterations). Now, productivity is simply a measure of the Return for each Team-Iteration. Dollars/iteration. Simple. If the team’s productivity goes down, you can ask some really simple questions:
- Did the expected return of the work go down? If so, is there more valuable work the team should be doing? This becomes an opportunity for product improvement.
- If not, what caused the team to get less done? Was the work harder than expected? Was there a skill gap? Was there an organizational obstacle that was revealed? Was someone sick? This becomes an opportunity for process and team improvement.
Customer satisfaction can be measured in many ways. If you have already started using agile practices, there is a good chance that your customers will already be more satisfied than they were before. This will show up informally through word-of-mouth. However, it is good to have a more systematic way of measuring customer satisfaction. One of the simplest and most commonly used methods of measuring customer satisfaction is the Net Promoter Score. From WikiPedia:
Companies obtain their Net Promoter Score by asking customers a single question (usually, “How likely is it that you would recommend us to a friend or colleague?”). Based on their responses, customers can be categorized into one of three groups: Promoters, Passives, and Detractors. In the net promoter framework, Promoters are viewed as valuable assets that drive profitable growth because of their repeat/increased purchases, longevity and referrals, while Detractors are seen as liabilities that destroy profitable growth because of their complaints, reduced purchases/defection and negative word-of-mouth. Companies calculate their Net Promoter Score by subtracting their % Detractors from their % Promoters.
The Net Promoter Score is closely linked to quality including the hard-to-measure parts of quality like responsiveness, ease of use, and fitness for purpose.
Cycle time also affects customer satisfaction. The faster you can respond to requests by customer, users or other stakeholders, the more likely they are to be satisfied. This happens for two reasons: fast response time means that solutions are more likely to still be useful and correct when actually delivered, and it also gives more opportunities for feedback.
In fact, if we look at these three measures, cycle time, ROI and customer satisfaction, we see that they form a mutually supporting and cross-checking system of ensuring productivity and effectiveness. Measuring anything else muddies the waters and can cause sub-optimal behaviors. The real challenge for most teams is realizing that all their local measures of performance and effectiveness may actually be causing harm (unintentionally) because they draw the team’s attention away from the three organizationally important measures.
Cycle time is the measure that is most closely related to process improvements, but ROI and customer satisfaction should also be used to ensure that process improvements don’t accidentally harm the organization.
There was an interesting discussion on the LeanAgileScrum Yahoo Group early in December regarding the difference between flow (lean) and iterations (agile) that caught my eye. I only just now have had the time to write about it.
Work can often be divided up so that the smaller pieces are valuable on their own. By dividing work this way, a team can deliver value incrementally. The team can choose a short period of time called an iteration and select a small amount of work to complete in that time. This work should be valuable on its own. For example, if a team is building something, then at the end of each iteration whatever is built is usable as it is. This means that each iteration includes all the planning and design as well as construction or creation necessary to deliver a final product or result.
For example, a volunteer group may desire to attract new members. A non-agile approach would have the group plan their membership campaign completely before actually executing on it. An agile approach using iterative delivery would have the group plan a small piece of work that will attract some small number of new members, execute it, and then start a new iteration. One iteration may cover the creation of and delivery of a door-to-door flyer in a neighborhood. Another iteration may cover the design, creation and publishing of a small advertisement in a local newspaper. Each iteration includes all the steps necessary to produce a furthering of the group’s goal of attracting new members.
In a business environment, iterative delivery allows for a much faster return on investment. The following diagram compares delivering value iteratively with a non-agile project delivery where results are delivered only at the end of the project:
One can see clearly from the diagrams that the non-agile delivery of value at the end of a project is also extremely risk prone and suseptible to change. If the project is cancelled just before it delivers, then a fairly substantial amount of effort is wasted. In the agile iterative delivery situation, an endeavor can be cancelled at almost any time and it is likely that substantial value has already been delivered.
Even if the work cannot actually be delivered incrementally, it almost always can be divided in a way so that it can be inspected in stages. Either method of dividing work allows us to do the work in iterations.
Iterations are fixed and consistent units of time during which work is performed and between which planning, inspection and adjustment is done. The empowered team will decide on the length of iterations for their work. As a rule of thumb iterations should be shorter than the horizon of predictability. Generally, iterations should never be longer than one month, no matter what the endeavor.
At the end of each iteration, a demonstration of the work completed is given to the stakeholders in order to amplify learning and feedback. Between iterations, the stakeholders collaborate with the team to prioritize the remaining work and choose what will be worked on during the next iteration. During the iteration, the stakeholders need to be accessible for questions and clarifications.
Iterative and incremental delivery is used to allow for the early discovery and correction of mistakes and the incorporation of learning and feedback while at the same time delivering value early.