Report average velocity and fail 50% of the time

The question of “expected velocity” and long-term planning has come up at more than one client. A recent client conversation got me thinking, however, questioning how to interpret velocity when estimating and plotting a roadmap based on a current backlog of features. Assume, for a moment, a backlog of story-pointed features, and 10 good iterations (consistent team, no odd occurrences that would affect velocity). Mathematically average velocity (well, a mean really) is a 50/50 proposition for any subsequent iteration. Some organizations don’t find this level of confidence acceptable. What velocity should be reported as expected for iteration/sprint planning and roadmap forecasting, and how should it be used?

Context

Interpreting velocity, before anything else, requires some context. An agile organization that sees estimates as hypothetical might find this article is of less use. In fact, a good question is whether estimation is even a value-added activity. For this post assume an organization that sees strong value in estimation and planning.

Culture

The biggest piece of context is to know the organizational culture. This is important in two respects, and both of these cultural factors are important because they impact how Velocity is understood within the organization.

What is Failure?

First is the meaning of failure in the organization. Is failure to deliver what was committed to by the planned date considered a failure of the team, or is it simply a fact to be understood and accounted for in future planning? Even in Agile organizations, the former is often true and a hard habit to break. If not delivering to expectations is considered failure and has negative consequences, then that means that estimation is being treated not as estimation, but as prediction and contract. Velocity is therefore a commitment, and should therefore be used conservatively.

Consistency or Speed?

The second item to know is whether consistency and predictability of delivery is of a higher strategic value than the actual rate of delivery. This is often un-stated. Usually people want fast and consistent delivery. The truth is that you can get consistent, or fast software development, or a balance between the two. Lack of trust is usually a strong motivation to encourage consistency over speed, or a history of quality problems, etc. In this case, as well, Velocity is more of a boundary than an indicator.

Emotional Loading in Estimation (or why not Low-ball?)

If estimation is seen as binding, contractual, or limiting, then additional emotions get overloaded. Trust, promise, and betrayal are words used in such organizational cultures. Distrust is usually a strong factor, especially between silos (business vs. technology, company vs. project management vs. customer, etc.). So when people are asked to give estimates, even using agile-friendly mechanisms such as story points, there is usually a process of cementing that estimate into a part of an accountability model, so estimates start to get conservative. People are then accused of low-balling, others are accused of irrational expectations… we’ve all seen this. The language clearly becomes one of contention and blame. Even the term low-balling is often an outright pejorative term for estimating too conservatively.

This doesn’t happen only in agile environments, and project managers in traditional PMBOK frameworks have long factored risk into “contingency budgets”. Interestingly, however, if a Project Manager were to factor risk into the task estimates, they’d be “low-balling capacity,” yet if they were to factor it out and layer it on top of the project work, it’s “contingency budgeting” (At least in a few experiences I’ve had). Either way, someone’s adding a factor for uncertainty, based on the need to predict conservatively or liberally or somewhere in between.

That’s the point of the article: how can Agile projects use velocity to estimate as conservatively (or liberally) as is appropriate?

An average is a 50% chance to succeed (or fail)

Velocity is not a constant. It’s a set of instantaneous values on a curve, with instances being iterations. That means that it varies, and is therefore only meaningful statistically. So how do you reasonably use velocity statistically, and improve confidence? One way is to stop delivering against “average” velocity.

A lot of coaches use average velocity over the previous N iterations. This is not helpful for all sorts of reasons, if estimation is a commitment. By definition, average (well, actually a mean, but they’re close) is a 50/50 proposition. If you report the average team velocity (assuming it’s accurate), then about half the time the team will be under and about half the time the team will be over, statistically. So basically an average is a crap shoot, when taken in any given instance. It’s can only be good in the long run. For this to work, the long-haul has to include permission to fail and a lot of trust. Teams need to be able to go miss dates but will sometimes exceed dates and it should all wash out in the end. In organizations such as I’m describing, that trust isn’t there, so. Additionally, if the language of commitment is around meeting instantaneous iteration commitments (as opposed to delivering high-quality customer value as quickly as is sustain-ably possible) then you aren’t playing the long-game, you’re playing a very short-game.

Simulate Velocity, not work

In a PMI training course I took when I was at Sun Microsystems, we were nicely informed that two point estimates of tasks are a perfect way to fail half the time, per the above logic. One point estimates are just idiotic. Three point estimates were better. We simulated with a monte-carlo algorithm and found a curve and a distribution, and then determined a confidence level yadda yadda. Well, we’re trying to avoid wasting a lot of time estimating up-front, but one way to start representing velocity properly is to do the same kind of statistical modelling done in traditional product management, only simulate velocity, not work items.

In this approach, you take the last N iterations (say 10). Determine the maximum velocity (optimistic) and the minimum velocity (pessimistic), and then the mode (the velocity value that seems to occur most frequently). Then you do monte-carlo simulation so you get a statistical pattern. Now, you actually can determine an answer based on confidence. If you want to be right with an 80% confidence, you pick a velocity where 80% of the simulated runs were successful. (Note – there are a paucity of excel templates to do this math automatically, and often they are for sale. It would be nice to have a few functions with arbitrary distributions based on min-max-mode to help this along.)

It’s not perfect, and it’s a potentially huge amount of administrative overhead. Elsewhere I’ve referenced blogs that entirely oppose any estimation at all, but if you are gong to, then working statistically with simulation is the only way to take small sample numbers meaningful.

Commitment Velocity: Low-Ball as a policy.

Another approach, one perhaps controversial, but taught by some Scrum trainers is to pick the lowest historical delivered velocity. This is a commitment-based approach, on the assumption that building trust around consistent delivery is critical to building sound relationships where product owners and teams can safely state their needs and get things done with a minimum of contractual behaviour. By taking the minimum, you force a low-ball capacity, which means you can have high-confidence of success after a few iterations. You have, likely, after a while, some spare time on your hands. Teams can then choose to pull more work in (without adjusting their commitment velocity), work on “technical debt”, improve their skills, etc. A team could raise their commitment velocity in certain inflection points in the project. A new team member is added that provides a necessary skill not previously available, and after a few iterations the team is consistently hitting a higher number, but this is a careful process to ensure that they are committing, and if they don’t make their new number, it goes down to what they got accomplished.

Indemnify teams’ learning

An arguably healthier option, if you have built enough trust, is to simply indemnify a team from failing to meet the estimate. Since you’re doing mathematics on actuals to generate an expected future number, everyone can acknowledge that past behaviour is no guarantee of future behaviour, and simply use it for capacity planning. In this case, estimation is actually estimation, not commitment or contract. The team is expected to be ahead sometimes, and behind sometimes. The upside of this is that a lot of extra time isn’t spent playing with fictional numbers. Teams are spending their efforts on delivery as quickly-yet-sustain-ably as they can, and the organization treats them as trusted professionals in this. The temptation to assume you can predict the future is seen as folly, and the estimates are used to guide overall direction, not to make outward customer commitments.

Don’t be mindless

There may be other approaches, I’m sure. The agile community is certainly not short of people who love this topic and can talk for hours on “proper” estimation. The point of this post is merely to point out some options, and ask you to look at your organizational culture, team culture, customer culture, the meaning of terms like commitment, failure, success, consistency, speed, etc. As you understand the culture, balance consistency vs. speed, trust, and other factors to choose a method of estimation that meets your goals. Don’t do estimation based on your own, internal cultural assumptions, as you may have developed or been taught techniques that are useful when and where they were taught, but may no longer be so. Or maybe they weren’t so useful then either. Regardless, this because estimation cuts at the heart of the dialogue between producer and consumer, and establishes parameters for that discussion, it’s critical that you think your choice through.

[Christian also blogs at http://www.geekinasuit.com/]

Try out our Virtual Scrum Coach with the Scrum Team Assessment tool - just $500 for a team to get targeted advice and great how-to information

Please share!
facebooktwittergoogle_plusredditpinterestlinkedinmailfacebooktwittergoogle_plusredditpinterestlinkedinmail

ANN: Agile Software Engineering Practices training by Isráfíl Consulting

Isráfíl Consulting is finally prepared for the first series of its Agile Software Engineering Practices training courses. This series is offered in partnership with Berteig Consulting who are graciously hosting the registration process. Their team has also helped greatly in shaping the presentation style and structure of the course. The initial run will be in Ottawa, Toronto (Markham), and Kitchener/Waterloo.   

Topics covered will include Test Driven Development (TDD), testability, supportive infrastructure such as build and continuous integration, team metrics, incremental design and evolutionary architecture, dependency injection, and so much more. (This course won’t present the planning side of XP, but covers many other aspects common to XP projects) It makes a great complement for training in Agile Processes such as XP, Scrum, or OpenAgile. The overview slide presentation is available for free download from the Isráfíl web site.

The courses are scheduled for:

The course is $1250 CAD per student, and participants receive a transferrable discount of $100 CAD for other training with Berteig Consulting as a part of our ongoing partnership. I initially prototyped this course in Ottawa this December, and am very excited to see this through in several locales. Class size is limited to 15, so we can keep the instruction style more involved. The above schedules are linked to Berteig Consulting’s course system and have registration links at the bottom of the description. Locations are TBD, but will be updated at the above links as soon as they’re finalized.

A further series is planned for several US cities in March, and we’ll be sure to announce them as well.

Try out our Virtual Scrum Coach with the Scrum Team Assessment tool - just $500 for a team to get targeted advice and great how-to information

Please share!
facebooktwittergoogle_plusredditpinterestlinkedinmailfacebooktwittergoogle_plusredditpinterestlinkedinmail

Yet another misunderstanding of TDD, testing, and code coverage

I was vaguely annoyed to see this blog article featured in JavaLobby’s recent mailout. Not because Kevin Pang doesn’t make some good points about the limits of code coverage, but because his title is needlessly controversial. And, because JavaLobby is engaging in some agile-baiting by publishing it without some editorial restraint.

In asking the question, “Is code coverage all that useful,” he asserts at the beginning of his article that Test Driven Development (TDD) proponents “often tend to push code coverage as a useful metric for gauging how well tested an application is.” This statement is true, but the remainder of the blog post takes apart code coverage as a valid “one true metric,” a claim that TDD proponents don’t make, except in Kevin’s interpretation.

He further asserts that “100% code coverage has long been the ultimate goal of testing fanatics.” This isn’t true. High code coverage is a desired attribute of a well tested system, but the goal is to have a fully and sufficiently tested system. Code coverage is indicative, but not proof, of a well-tested system. How do I mean that? Any system whose authors have taken the time to sufficiently test it such that it gets > 95% code coverage is likely (in my experience) thinking through how to test their system in order to fully express its happy paths, edge cases, etc. However, the code coverage here is a symptom, not a cause, of a well-tested system. And the metric can be gamed. Actually, when imposed as a management quality criterion, it usually is gamed. Good metrics should confirm a result obtained by other means, or provide leading indicators. Few numeric measurements are subtle enough to really drive system development.

Having said that, I have used code-coverage in this way, but in context, as I’ll mention later in this post.

Kevin provides example code similar to the following:

String foo(boolean condition) {
    if (condition)
        return "true";
    else
        return "false";
}

… and talks about how if the unit tests are only testing the true path, then this is only working on 50% coverage. Good so far. But then he goes on to express that “code coverage only tells us what was executed by our unit tests, not what executed correctly.” He is carefully telling us that a unit test executing a line doesn’t guarantee that the line is working as intended. Um… that’s obvious. And if the tests didn’t pass correctly, then the line should not be considered covered. It seems there are some unclear assumptions on how testing needs to work, so let me get some assertions out of the way…

  1. Code coverage is only meaningful in the context of well-written tests. It doesn’t save you from crappy tests.
  2. Code coverage should only be measured on a line/branch if the covering tests are passing.
  3. Code coverage suggests insufficiency, but doesn’t guarantee sufficiency.
  4. Test-driven code will likely have the symptom of nearly perfect coverage.
  5. Test-driven code will be sufficiently tested, because the author wrote all the tests that form, in full, the requirements/spec of that code.
  6. Perfectly covered code will not necessarily be sufficiently tested.

What I’m driving at is that Kevin is arguing against something entirely different than that which TDD proponents argue. He’s arguing against a common misunderstanding of how TDD works. On point 1 he and I are in agreement. Many of his commentators mention #3 (and he states it in various ways himself). His description of what code coverage doesn’t give you is absurd when you take #2 into account (we assume that a line of covered code is only covered if the covering test is passing). But most importantly – “TDD proponents” would, in my experience, find this whole line of explanation rather irrelevant, as it is an argument against code-coverage as a single metric for code quality, and they would attempt to achieve code quality through thoroughness of testing by driving the development through tests. TDD is a design methodology, not a testing methodology. You just get tests as side-effect artifacts of the approach. Useful in their own right? Sure, but it’s only sort of the point. It isn’t just writing the tests-first.

In other words – TDD implies high or perfect coverage. But the inverse is not necessarily true.

How do you achieve thoroughness by driving your development with tests? You imagine the functionality you need next (your next increment of useful change), and you write or modify your tests to “require” the new piece of functionality. They you write it, then you go green. Code coverage doesn’t enter into it, because you should have near perfect coverage at all times by implication, because every new piece of functionality you develop is preceded by tests which test its main paths and error states, upper and lower bounds, etc. Code coverage in this model is a great way to notice that you screwed up and missed something, but nothing else.

So, is code-coverage useful? Heck yeah! I’ve used coverage to discover lots of waste in my system. I’ve removed whole sets of APIs that were “just in case I need them” APIs, because they become rote (lots of accessors/mutators that are not called in normal operations). Is code coverage the only way I would find them? No. If I’m dealing with a system that wasn’t driven with tests, or was poorly tested in general, I may use coverage as a quick health meter, but probably not. Going from zero to 90% on legacy code is likely to be less valuable than just re-writing whole subsystems using TDD… and often more expensive.

Regardless, while Kevin is formally asking “is code coverage useful?” he’s really asking (rhetorically) is it reasonable to worship code coverage as the primary metric. But if no one’s asserting the positive, why is he questioning it? He may be dealing with a lot of people with misunderstandings of how TDD works. He could be dealing with metrics bigots. He could be dealing with management-imposed-metrics initiatives which often fail. It might be a pet peeve or he’s annoyed with TDD and this is a great way to do some agile-baiting of his own. I don’t know him, so I can’t say. His comments seem reasonable, so I assume no ill intent. But the answer to his rhetorical question is “yes, but in context.” Not surprising, since most rhetorically asked questions are answerable in this fashion. Hopefully it’s a bit clearer where it’s useful (and where/how) it’s not.

(This article is a cross-post from “Geek in a Suit”)

Try out our Virtual Scrum Coach with the Scrum Team Assessment tool - just $500 for a team to get targeted advice and great how-to information

Please share!
facebooktwittergoogle_plusredditpinterestlinkedinmailfacebooktwittergoogle_plusredditpinterestlinkedinmail

Dependecy Injection on J2ME/CLDC devices.

This post is a little geeky and technical and product-related for AgileAdvice, and is a shameless self-promotion. Nevertheless, since testability, test-driven-development, and incremental design are non-exclusive sub-topics of Agile, I though I’d report this here anyway.

Many developers use the Dependency Injection and Inversion of Control (IoC) patterns through such IoC containers as Spring, Hivemind, Picocontainer, and others. They have all sorts of benefits to testability, flexibility, etc. that I won’t repeat here, but can be read about here, here, and here. A great summary of the history of “IoC” can be found here. J2ME developers, however, especially those on limited devices that use the CLDC configuration of J2ME, cannot use the substantial number of IoC/DI containers out there, because they nearly all rely on reflection. These also often make use of APIs not present in the CLDC – APIs which could not easily be added. Lastly there’s a tendency among developers of “embedded software” to be very suspicious of complexity.

In working out some examples of DI as part of a testability workshop at one of my clients, I whipped up a quick DI container, and being the freak that I am, hardened it until it was suitable for production, because I hate half-finished products. So allow me to introduce the Israfil Micro Container. (That is, the Container from the Israfil Micro project). As I mention in the docs, “FemtoContainer” just was too ridiculous, and this container is smaller than pico-container. The project is BSD licensed, and hosted on googlecode, so source is freely available and there’s an issue/feature tracker, yadda yadda.

Essentially I believe that people working on cellphones and set-top boxes shouldn’t be constrained out of some basic software design approaches – you just have to bend the design approach to fit the environment. So hopefully this is of use to more than one of my clients. It currently supports an auto-wiring registration, delayed object creation (until first need), and forthcoming are some basic lifecycle support, and a few other nicities. It does not use reflection (you use a little adapter for object creation instead), and performs quicker than pico-container. Low, low overhead. It’s also less than 10 classes and interfaces (including the two classes in the util project). It’s built with Maven2, so you can use it in any Maven2-built project with ease, but of course you can always also just download the jar (and the required util jar too). Enjoy…

P.S. There are a few other bits on googlecode that I’m working on in the micro-zone. Some minimalist backports of some of java.lang.concurrency (just the locks), as well as some of the java.util.Collections stuff. Not finished, but also part of the googlecode project.

Try out our Virtual Scrum Coach with the Scrum Team Assessment tool - just $500 for a team to get targeted advice and great how-to information

Please share!
facebooktwittergoogle_plusredditpinterestlinkedinmailfacebooktwittergoogle_plusredditpinterestlinkedinmail

Quality is not an attribute, it’s a mindset

This was actually cribbed from a Bruce Schneier blog post about security…

Security engineers see the world differently than other engineers. Instead of focusing on how systems work, they focus on how systems fail, how they can be made to fail, and how to prevent–or protect against–those failures. Most software vulnerabilities don’t ever appear in normal operations, only when an attacker deliberately exploits them. So security engineers need to think like attackers.People without the mindset sometimes think they can design security products, but they can’t. And you see the results all over society–in snake-oil cryptography, software, Internet protocols, voting machines, and fare card and other payment systems. Many of these systems had someone in charge of “security” on their teams, but it wasn’t someone who thought like an attacker.  

There’s an interesting parallel between this statement and how most software quality is handled. Quality and Security are similar. In fact, I see security as a very specific subset of quality-mindedness. Certainly both require the same mindset to ensure – rather than thinking merely “how will this work”, a quality-focused person will also, or perhaps alternately think: “how might this be breakable”. From this simple change in thinking flows several important approaches

  • Constraint-based thinking (as opposed to solution based thinking): allows an architect/developer to conceive of the set of possible solutions, rather than an enumeration of solutions. By looking at constraints, a developer implements the lean principle of deciding as late as possible, with as full information as possible.
  • Test-First: As one thinks of how it might break, scenarios emerge that can form the basis of test cases. These cases form a sort of executable acceptance criteria
  • Lateral Thinking: The constraint+test approach starts to get people into a very different mode, where vastly different kinds of solutions show up. The creative exercise of trying to break something provides insights that can change the whole approach of the system.

 Schneier goes on to ponder 

This mindset is difficult to teach, and may be something you’re born with or not. But in order to train people possessing the mindset, they need to search for and find security vulnerabilities–again and again and again. And this is true regardless of the domain. Good cryptographers discover vulnerabilities in others’ algorithms and protocols. Good software security experts find vulnerabilities in others’ code. Good airport security designers figure out new ways to subvert airport security. And so on.  

 Here again – I think it’s possible to help people get a mind-set about quality, but some do seem to have a knack. It’s important to have some of these people on your teams, as they’ll disturb the waters and identify potential failure modes. These are going to be the ones who want to “mistake proof” (to borrow Toyota’s phrase) the system by writing more unit tests and other executable proofs of the system. But most importantly (and I can personally testify to this) it is critical that people just write more tests. It is a learned skill to start to think of “how might this fail” until it becomes a background mental thread, always popping up risk models.A related concept is Demmings’ “systems-thinking”, which, applied to software quality, causes one to start looking at whole ecosystems of error states. This is when fearless re-factoring starts to pay off, because the elimination of duplication allows one to catch classes of error in fewer and fewer locations, where they’re easier to fix. There are many and multifarious spin-off effects of this inverted questioning and the mindset it generates. Try it yourself. When you’re writing code, ask yourself how you might break it? What inputs, external state, etc. might cause it to fail, crash, or behave in odd ways. This starts to show you where you might have state leaking into the wild, or side-effects from excessively complex interactions in your code. So quality focus can start to improve not only the external perception of your product, but also its fitness to new requirements by making it more resilient and less brittle. Cleaner interactions and less duplication allow for much faster implementation of new features.I could go on, but I just wanted to convey this sense of “attitude” or “mindset,” over mere technique. Technique can help you get to a certain level, but you have to let it “click”, and the powerful questions can sometimes help.

Try out our Virtual Scrum Coach with the Scrum Team Assessment tool - just $500 for a team to get targeted advice and great how-to information

Please share!
facebooktwittergoogle_plusredditpinterestlinkedinmailfacebooktwittergoogle_plusredditpinterestlinkedinmail