As a technical agile coach and trainer, I help teams discover ways of testing. Some teams ignore tests altogether, while others write every possible test possible, wasting valuable time and not being able to deliver at a good pace.
My first question is always this: How much does the customer pay for tests?
That’s right! Not a dime. I don’t even ship my product to them with any tests. They aren’t even compiled into bytecode for them. They are not going to pick up my application and open a debugger to make sure I’ve written tests that pass. They don’t care how many tests I’ve written or my code coverage ratio. They don’t care about unit, integration and acceptance tests, or how much time I spent on mocking and stubbing to isolate my functions. They only pay for working software.
So why write them?
I don’t write tests for the user. I don’t write tests for management. I write tests for me. I write tests for my future self. I write tests for my team members and any other developer that will need to change my code.
I write tests to prove that what I have written is what I have intended. I write tests to make my code manageable, to help me refactor when, inevitably, a new feature or change request arrives. I write tests so that I can fearlessly alter a system and know what I will break, and to find and repair bugs quickly before they are pushed into production.
I write tests so that my team members can feel a sense of code ownership, so that they too can alter, improve and remove my code and be able to predict the outcome. I write tests so that it becomes a form of documentation of the capability of the system.
Certainly they take time to write, but they save all kinds of time when it comes to changing things later. They allow me to do the one thing that software needs to do in the rapidly evolving market: adapt. I can adapt quickly to the needs of my customers to deliver quality features rapidly.
I write as many tests to make myself and my team feel confident that we can continuously develop a quality product at a sustainable pace, responding to the changes of the market and the needs of our customers. I write enough tests that I am releasing nearly bug-free code and I write as many tests as needed to satisfy just that!
In my Agile Software Developer training events, I help developers learn ways of writing tests to improve the quality of their work, so that they are able to spend more time developing new features rather than debugging old ones.
This article is adapted from a session proposal to Toronto Agile Conference 2018.
Leadership occurs as conscious choice carried out as actions.
Everyone has the ability to carry out acts leadership. Therefore, everyone is a potential leader.
For leadership to be appropriate and effective, acts of leadership need to be tuned to the receptivity of those whose behaviour the aspiring leader seeks to influence. Tuning leadership requires the ability to perceive and discern meaningful signals from people and, more importantly, the system and environment in which they work.
As leaders, the choices we make and the actions we carry out are organic with our environment. That is, leaders are influenced by their environments (often in ways that are not easily perceived), and on the other hand influence their environments in ways that can have a powerful impact on business performance, organizational structures and the well-being of people. Leaders who are conscious of this bidirectional dynamic can greatly improve their ability to sense and respond to the needs of their customers, their organizations and the people with whom they interact in their work. The following list is one way of describing the set of capabilities that such leaders can develop over time.
Create Identity: Real leaders understand that identity rules. They work with the reality that “Who?” comes first (“Who are we?”), then “Why?” (“Why do we do what we do?”).
Focus on Customers: Real leaders help everyone in their organization focus on understanding and fulfilling the needs of customers. This is, ultimately, how “Why?” is answered.
Cultivate a Service Orientation: Real leaders design and evolve transparent systems for serving the needs of customers. A leader’s effectiveness in this dimension can be gauged both by the degree of customer satisfaction with deliverables and to the extent which those working in the system are able to self-organize around the work.
Limit Work-In-Progress: Real leaders know the limits of the capacity of systems and never allow them to become overburdened. They understand that overburdened systems also mean overburdened people and dissatisfied customers.
Manage Flow: Real leaders leverage transparency and sustainability to manage the flow of customer-recognizable value through the stages of knowledge discovery of their services. The services facilitated by such leaders is populated with work items whose value is easily recognizable by its customers and the delivery capability of the service is timely and predictable (trustworthy).
Let People Self-Organize: As per #3 above, when people doing the work of providing value to customers can be observed as self-organizing, this is a strong indication that there is a real leader doing actions 1-5 (above).
Measure the Fitness of Services (Never People): Real leaders never measure the performance of people, whether individuals, teams or any other organization structure. Rather, real leaders, practicing actions 1-6 (above) understand that the only true metrics are those that provide signals about customers’ purposes and the fitness of services for such purposes. Performance evaluation of people is a management disease that real leaders avoid like the plague.
Foster a Culture of Learning: Once a real leader has established all of the above, people involved in the work no longer need be concerned with “safe boundaries”. They understand the nature of the enterprise and the risks it takes in order to pursue certain rewards. With this understanding and the transparency and clear limits of the system in which they work, they are able to take initiative, run experiments and carry out their own acts of leadership for the benefit of customers, the organization and the people working in it. Fear of failure finds no place in environments cultivated by real leaders. Rather, systematic cycles of learning take shape in which all can participate and contribute. Feedback loop cadences enable organic organizational structures to evolve naturally towards continuous improvement of fitness for purpose.
Encourage Others to Act as Leaders: Perhaps the highest degree of leadership is when other people working with the “real leader” begin to emerge as real leaders themselves. At this level, it can be said that the culture of learning has naturally evolved into a culture of leadership.
Stay Humble: Real leaders never think that they have it all figured out or that they have reached some higher state of consciousness that somehow makes them superior to others in any way. They are open and receptive to the contributions of others and always seek ways to improve themselves. Such humility also protects them from the inevitable manipulations of charlatans who will, form time to time, present them with mechanical formulas, magic potions, palm readings and crystal ball predictions. Real leaders keep both feet on the ground and are not susceptible to the stroking of their egos.
If you live in Toronto, and you would like to join a group of people who are thinking together about these ideas, please feel welcome to join the KanbanTO Meetup.
Register here for a LeanKanban University accredited leadership class with Travis.
Over the years I have done a number of talks for local chapters of the Project Management Institute. They have covered a range of topics, but one common theme that comes up over and over is that Scrum is not the best Agile method for delivering an IT Project. I even published a short video on the topic:
Several years ago, I also published a short article describing what Scrum is good for:
So… if Scrum isn’t so good for IT project work, then what can bring real agility to IT projects?
IT Project Attributes
Most of my work experience prior to running my business was in IT projects in banking, capital markets, insurance and a bit in government and healthcare. I mention that merely to indicate that my discussion of this isn’t just theoretical: I’ve seen good projects and bad projects. I’ve been on death-march projects, small projects and massive projects ($1b+). I’ve dealt with regulatory issues, vendor issues, offshoring issues, telecommuting issues, architectural issues, political issues, and seen enough problems to understand the complexity of reality.
IT projects have some common characteristics:
Like any project, there’s a deadline and a scope of work and a budget. These things don’t work well with Scrum. It’s possible to force them to fit together, but you lose a lot of what makes Scrum effective.
IT (as opposed to, say, tech startups) tends to use more mature technology platforms. Scrum is neutral about technology, but there are other Agile methods that address this type of technology more effectively.
IT Projects are often not the only thing going on in the technology organization. In particular, operations and user support add to IT project complexity, and require different “classes of service” than Scrum provides.
The issues that I mentioned above such as regulation, vendors, offshoring etc. are also common attributes of IT projects. Scrum makes harsh demands on an organization that challenge the approach to dealing with these issues. The change required to accommodate Scrum may not be worth it.
The Bad News about IT Project Agility
The whole project orientation to IT work is questionable. It’s just not a good fit. In most mid- to large-size organizations, IT does two things: it provides technology services to the rest of the organization, and it provides technical product development capacity to lines of business. For example, upgrading the office wi-fi routers and adding a new payment type to the online customer portal, respectively. The work of the IT department, therefore, falls into several different categories:
New artifacts that need to be created. Usually this is the stuff like coding algorithms and other business logic, creating new databases, configuring purchased systems, etc.
Repetitive activities that need to be sustained for a period or indefinitely, or which occur on-demand but at irregular times. For example, running a nightly batch process or deploying an update to a production environment.
Quality problems that need to be fixed. Defects and production problems are the obvious categories here, but also quality problems that are causing user confusion or time wastage.
Obstacles to work that need to be overcome. Often obstacles come from outside the project team in the form of interruptions. Other forms of obstacles can be unexpected bureaucracy, shifting funding, problems with a vendor, etc.
Calendar events that need to be accommodated. Milestones in the project, particularly regulatory milestones are crucial in IT project work, there are many other types such as all-hands meetings, statutory holidays, hiring or contract end dates, etc.
Of these, only repetitive activities and calendar events fit well into a project perspective. The others typically have a level of uncertainty… complexity… that makes it very difficult to approach with the project perspective of fixed deadlines and scope.
On the other hand, Scrum only really handles new artifacts and obstacles directly, and quality problems indirectly. These are the kinds of activities that are the focus of product development. Repetitive activities and calendar events are anathema to the core Scrum framework. If I think about this from a scoring perspective, Scrum supports these kinds of work as follows (-5 means totally counter, 0 means no impact, +5 means total support):
Scrum Support for IT Project Work Types:
New artifacts: +5
Repetitive activities: -2
Quality problems: 0
Calendar events: -5
SCORE: +2 – barely positive impact on IT project work!!!
The bad news, therefore: neither a project orientation nor Scrum really cover all the needs of an IT project environment.
There are many, but these are my three favourite alternatives: Extreme Programming, Kanban and OpenAgile. All three of them cover the five types of work more effectively than Scrum. All three of them are oriented to more generic types of work. After describing each briefly, I’ll also mention which one is my top choice for IT project work.
Extreme Programming for IT Project Work
Historically, Extreme Programming (XP) emerged in an IT Project context: the famous C3 project at Chrysler. This approach to IT project work has many things in common with other approaches to agility (which are described in the Agile Manifesto). XP allows the five types of work as follows:
New artifacts are the core of XP and are usually expressed as User Stories. This is common to Scrum and many other Agile methods. These are typically the features and functionality of a system… the scope of the project work. XP does not make any strong assertions about the size or stability of the backlog of new artifacts and as such can accommodate the project orientation in IT with relatively fixed scope.
Repetitive activities are not explicitly addressed in XP, but nor is there anything in XP which would cause problems if an XP team is required to do operational or support work which is the source of most repetitive activities in an IT environment.
Quality problems are addressed directly with both preventative and reactive measures. Specifically, Test-Driven Development, Acceptance Test-Driven Development are preventative, and Refactoring and Continuous Integration are reactive. XP has a deep focus on quality.
Obstacles are not directly addressed in XP, but indirectly through the XP value of courage. Implicitly, then, obstacles would be overcome (or attempted) with courage.
Calendar events are not addressed directly for the most part with the exception of release planning for a release date. However, the stuff related to other calendar activities is not directly handled. XP is less antagonistic to such things than Scrum, but only by implication: Scrum would often put calendar events in the category of obstacles to be removed to help a team focus.
XP Support for IT Project Work Types:
New artifacts: +5
Repetitive activities: 0
Quality problems: +5
Calendar events: +1
SCORE: +13 – moderate to strong positive impact on IT project work!
Summary: much better than Scrum, but still with some weaknesses.
Kanban for IT Project Work
Kanban is different from most other approaches to agility in that it is a “continuous flow” method, rather than an iterative/incremental method. This distinction basically means that we move packages of work through a process based on capacity instead of based on a fixed cadence. Kanban asks that we visualize the current state of all work packages, limit the amount of work in progress at any stage in our delivery process, and use cadences only for iterative and incremental improvement of our process (not our work products).
Kanban is much gentler than Scrum or Extreme Programming in that it does not require leader-led reorganization of staff into cross-functional team units. Instead, we identify a service delivery value stream and leaders manage that stream as it currently operates.
New artifacts in Kanban are supported, and certainly welcome, but Kanban does not seem to acknowledge the problem of formal complexity (creativity, problem-solving, human dynamics) in the creation of new artifacts. There are good attempts to apply statistical methods to the management of new artifacts, but their fundamentally unknowable cost/end (undecidable problem) is not really effectively addressed.
Repetitive activities are handled extremely well in Kanban including different classes of service. Repetitive activities are handled well partly as a result of the history of Kanban as a signalling system in manufacturing environments.
Quality problems are handled similarly to new artifacts: supported, welcome, and even possibly addressed in the cadences of continuous improvement that Kanban supports. However, quality problems are another area where technical complexity makes proper analysis of these activities difficult.
Kanban relegates the handling of obstacles to the manager of service delivery. There is no explicit support for strong organizational change efforts. In fact, Kanban discourages “transformative” change which is sometimes required given the problem of Nash equilibriums.
Kanban works well with Calendar events by treating them as activities with a particular class of service required.
Kanban Support for IT Project Work Types:
New artifacts: +3
Repetitive activities: +5
Quality problems: +3
Calendar events: +5
SCORE: +16 – strong positive impact on IT project work!!
Summary: even better than XP, easier to adopt. (Actually, almost anything is easier to adopt than XP!!!)
OpenAgile for IT Project Work
OpenAgile is an obscure non-technology-oriented method based on the work I and a few others did about 10 years ago. The OpenAgile Primer is the current reference on the core of the OpenAgile framework. OpenAgile has been applied to general management, small business startups, sales management, mining project management, emergency services IT, and many other areas of work. I’m partial to it because I helped to create it!
OpenAgile emerged from consulting work I did at CapitalOne in 2004 and 2005 and work I did with my own business in 2006 and 2007. A great deal of the older articles on this blog are forerunners of OpenAgile as it was being developed. See, for example, Seven Core Practices of Agile Work.
The types of work listed above, are indeed the core types of work described in OpenAgile. As such, OpenAgile fully supports (nearly) all five types of activities found in IT projects. However, OpenAgile is not just a work delivery method. It is also a continuous improvement system (like Kanban and Scrum) and so it also assumes that a team or organization using OpenAgile must also support learning. This support for learning means that OpenAgile does not over-specify or give precise definitions on how to handle all five types of work. Thus, my scores below are not all +5’s…
OpenAgile Support for IT Project Work Types:
New artifacts: +4
Repetitive activities: +4
Quality problems: +4
Calendar events: +4
SCORE: +20 – very strong positive impact on IT project work!!!
Summary: OpenAgile is the best approach I know of for general IT project environments.
Regrettably, I wouldn’t always recommend OpenAgile – there are just too few people who really understand it or know how to help an organization adopt it effectively. If you are interested, I’d be happy to help, and we can certainly arrange private training and consulting, but mostly I would recommend Kanban to people interested in taking the next step in effectiveness in IT projects. Please check out or Kanban learning events and consider registering for one or asking for us to come to your organization to deliver training, coaching or consulting privately.
So, what “awful circumstances” led to Equifax’s recent breach?
Let’s reflect: News has surfaced (TechCrunch, Reuters) that hackers have scraped untold amounts of sensitive data from Equifax systems. 143+ million people are affected as hackers have amassed a huge database of names, addresses, credit records, license numbers, banking histories. (That probably includes you too!)
I want to be clear though, the news broke yesterday but the problem started long ago. The security vulnerability has existed for (probably) years and I have no doubt some Equifax staff have known about it.
Equifax! We’re not talking about some high-school project with junior coders and tech newbs. We’re talking about one of the world’s most trusted organizations. We’re talking about a company whose very existence (their whole business!) is to protect our collateral. This is supposed to be one of the best-funded, most secure, most technologically-advanced companies on the planet.
But I’m not surprised. Here’s why…
I teach Scrum and my classrooms are filled regularly with people who work in companies exactly like Equifax. I hear their stories every day:
“Our managers don’t provide the tools we need to do the job.”
“Our managers don’t understand the time required to deliver high-quality software. We’re always pressured to cut corners to meet arbitrary and impossible deadlines.”
“Our systems are broken, everyone knows it, but managers continue to outsource and off-shore our QA.”
“We don’t have authority to decide the implementation, we’re often told what to implement by architects and supervisors, even if we know it to be rotten.”
“Our managers never ask us about quality…they ask us only ‘when will this be done’?”
And that’s the crux of the problem: people are hired by companies like Equifax to provide technical expertise, then their advice is ignored and their implementation decisions aren’t trusted.
Some important questions to consider…
1. Does Equifax lack the money to hire excellent technical staff?
No, their offices are filled with some of the best programmers in the world. I meet them (or people like them) regularly in my classes and I have no doubt that the technical staff at Equifax have warned the managers for years of security holes and technical defects. I have no doubt those managers have ignored the alarms and have pushed the staff to deliver deficient code.
2. Does Equifax lack the time to build high-quality systems?
No, last I checked they’ve been at it a long time and their existing contracts will carry their operation years into the future. And as mentioned earlier, securing our data is the reason the company exists. It’s the one thing they’re supposed to get right – I’d think their time should be entirely devoted to building high-quality systems.
3. Does Equifax lack the financial resources to invest in proper tools and training for security/quality testing?
No, such techniques and tools are widely available and inexpensive (even hackers can afford them!) Managers at Equifax have denied budgets for training, tools, and upgrades because “it’s too costly” – hmm…I wonder the cost of this data breach?
4. And my favourite question of the day: Are the hackers “smarter”?
Absolutely not. But they’re more dedicated and have equipped themselves with good techniques and tools for penetration testing. In my personal experience as a hacker (er, I use that term loosely) security holes are all around us if we look for them. Equifax simply wasn’t looking!
What to do about it…
First, it’s clear to me the problem isn’t technical or financial. It’s cultural. I see it all the time. Teams of good product developers are surrounded by bureaucracy instead of support. Teams of good coders aren’t trusted to see the source code of the systems used by the company – yes, that’s as crazy as it sounds! Teams of good coders are kept silent about the security vulnerabilities they see. Solutions are ignored by management and the arguments are: “improving the security isn’t a priority right now” or “we know there are some possible security concerns and we are in discussion with vendors or outside agencies to address it” or “we have a budget for security improvements scheduled for next quarter; let’s focus for now on new features instead”. Managers are more concerned with deadlines than with quality. Managers scrutinize the number of hours a developer works on a task, and outsource or off-shore the quality assurance and testing! Managers conduct endless planning activities then compress the implementation into tight budgets and timelines – evidently, a lot of energy is spent getting the plan “right” but getting the software right is not a priority. I could go on.
If you’re interested to know how things work at Equifax, just think of the Dilbert cartoons. I mean it. I am very serious. Dilbert isn’t funny because it’s fiction; it’s funny because it’s NON-fiction. Sadly. Typically, for enterprises like Equifax, their technical staff and customers take a back seat to management “theatre”. This needs to be fixed and it starts by asking the technical staff a single, simple question: “Who among you have raised concerns about technical debt with your managers/supervisors and were ignored?” That question will unearth bugs which have been deprioritized by managers, budgets that have been denied for technical training and automated testing, projects which have been reported as “done” before they were actually ready for deployment – in other words, that question will reveal the many (fixable) ways managers get in the way of quality.
Second, executive staff at Equifax need a crash course in automated testing. Yes, THE EXECUTIVE STAFF! It’s is essential they understand and see with their own eyes that:
Automated testing is cheaper and exponentially more effective than manual testing;
ALL defects are discoverable and fixable before hitting production environments;
Quality is not something one outsources
and the techniques to achieve ZERO DEFECTS are well-known, teachable, repeatable, and proven. I’m of course referring to techniques like Test-Driven Development, Continuous Integration, Refactoring, and Swarming. For example, these technical topics form the bulk of our Certified Scrum Developer classes. (Shameless plug.)
And third, technical staff need to stop behaving like sheep. So far in this article, I’ve been very critical of managers, sure, and anyone who knows me personally knows I have no time for inept management. But too often I meet smart, well-meaning developers who deliver shoddy code – perhaps at under pressure and against their better judgement, but in the end whose code is it? Developers! I understand you might feel trapped in a pattern of quantity-over-quality and you are frequently coerced by your management to cut corners. I get it… I understand it… it’s easy to feel that deadlines are some sort of immutable truth and that managers wield all the power. But the fact is, developers, YOU hold all the responsibility and therefore you need to be the professional. You need to say “no” and demand the latitude you require to deliver high quality. You’re the one closest to the code and therefore directly responsible for the safety and well-being of your users.
So, Equifax and enterprises everywhere, I’m speaking now as your user or stakeholder or customer…
Equifax has failed. Miserably. The company deserves all the class-action suits coming there way. From leaders to developers. Everyone.
Most members of society are unwilling participants in all this. Most of us aren’t your direct customers. Example: I’m not a direct customer of Equifax – nobody has chosen Equifax as their personal agent. Instead, our banks and our governments have selected Equifax on our behalf. This presents a problem: if I were a direct customer of Equifax I’d call them today and close my accounts; but I can’t do that. Instead, the best I can do as an individual is contact my banks, lenders, and insurance agents to demand change. (Yes, I likely will do that. I’m that sort.)
However, the larger issue is that we are at the mercy of YOU who produce software. I’m talking about the software in our vehicles, in our heart-monitors, in our subway systems, in our air-traffic-control centres, in our banks – this is serious stuff! We must be able to trust those systems…with our lives, with our security. We must be able to trust you even though we don’t and won’t ever know you.
A hacker friend of mine once said, “if self-driving cars are being produced without complete automated test coverage, then that’s a future I don’t want.”
Many organizations won’t survive the next decade. Of those that survive, even they are likely to be extinct before century’s end — especially the largest of contemporary organizations.
I was thinking today of a few essential adaptations that enterprises must make immediately in order to stave off their own almost-inevitable death.
With Regard to Business Strategy
Measure value delivered and make decisions empirically based on those data.
Strive toward a single profit-and-loss statement. Understand which value streams contribute to profit, yes, but minimize fine-grained inspection of cost.
Direct-to-consumer, small-batch delivery is winning. It will continue to win.
With Regard to People
Heed Conway’s Law. Understand that patterns of communication between workers directly effect the design and structure of their results. Organize staff flexibly and in a way which resembles future states or ‘desired next-states’ so those people produce the future or desired next-architectures. This implies that functional business units and structures based on shared services must be disassembled; instead, organize people around products and then finance the work as long-term initiatives instead of finite projects.
Distribute all decision-making to people closest to the market and assess their effectiveness by their results; ensure they interact directly with end users and measure (primarily) trailing indicators of value delivered. Influence decision-making with guiding principles, not policies.
The words ‘manager’ and ‘management’ are derogatory terms and not to be used anymore.
Teams are the performance unit, not individuals. Get over it.
With Regard to Technology
Technical excellence must be known by all to be the enabler of agility.
Technical excellence cannot be purchased — it is an aspect of organizational culture.
For example, in the realm of software delivery, extremely high levels of quality are found in organizations with the shortest median times-to-market and the most code deployments per minute. The topic of Continuous Delivery is so important currently because reports show a direct correlation between (a) the frequency of deployment and (b) quality.
That is, as teams learn to deploy more frequently, their time-to-market (lead time), recovery rates, and success rates all change for the better — dramatically!
I have a theory which is exemplified in the following graph.
Sabine’s Principle of Cumulative Quality Advantage Explained
As the intervals between deployments decrease (blue/descending line)
…quality increases (gold/ascending line)
…and the amount & cost of technical debt decreases (red area)
The best architectures, requirements and designs emerge from self-organizing teams.
The quality of our software systems depends on refactoring. In fact, I believe that the only way that an organization can avoid refactoring is by going out of business. Maybe I should explain that.
Refactor or Die
Every software system that we build is inside a dynamic environment. The organization(s) using the software are all in a state of constant change. The people using the software are also constantly changing. Due to this constant change, every software system needs to be adapted to the environment in which it is used. Most of the time, businesses think of this constant change in terms of new features and enhancements – the scope of functionality that a system can handle. Less commonly, businesses think of this change in terms of the obvious external qualities and attributes of the system such as performance or security. But almost never does an organization, from a business perspective, think of the invisible qualities of the software system such as simplicity and technical excellence.
What happens when the business does not recognize those invisible qualities? I’m sure almost every software developer reading this can answer this question easily: the system becomes “crufty”, hard to maintain, bug-prone, costly to change, maze-like, complex. Some people refer to this as legacy code or technical debt.
The longer this state is allowed to continue, the more it costs to add new features – the stuff that the business really cares about. It is pretty easy to see how this works – for someone who has a technical background. But for those without a technical background it can be hard to understand. Here is a little analogy to help out.
Imagine that you set up a system for giving allowance to your kids. In this system, every week your kids have to fill out a simple form that has their name, the amount that they are requesting, and their signature. After a few weeks of doing this, you realize that it would be helpful to have the date on the form. You do this so that you can enter their allowance payments in your personal bookkeeping records. Then you decide that you need to add a spot for you to counter-sign so that the paper becomes a legal record of the allowance payment. Then your kids want extra allowance for a special outing. So you add some things on the form to allow them to make these special requests. Your accountant tells you that some portions of your kids allowance might be good to track for tax purposes. So, the form gets expanded to have fields for the several different possible uses that are beneficial to your taxes. Your form is getting quite complex by this point. Your kids start making other requests like to be paid by cheque or direct-deposit instead of in cash or to be paid advances against future allowances. Every new situation adds complexity to the form. The form expands over multiple pages. Filling out the form weekly starts to take significant time for each child and for you to review them. You realize that in numerous places on the form it would be more efficient to ask for information in a different way, but you’re not sure if it will have tax implications, so you decide not to make the changes… yet. You decide you need your own checklist to make sure that the forms are being filled out correctly. A new tax law means that you could claim some refunds if you have some additional information… and it can be applied retroactively, so you ask your kids to help transcribe all the old versions of the form into the latest version. It takes three days, and there is lots of guess-work. Your allowance tracking forms have become a bureaucratic nightmare.
The forms and their handling is what software developers have to deal with on a daily basis – and the business usually doesn’t give time to do that simplification step. The difference is that in software development there are tools, techniques and skills that allow your developers to maintain a system so that it doesn’t get into that nightmare state.
For a more in-deth description of this process of systems gradually becoming more and more difficult to improve, please see these two excellent articles by Kane Mar:
Ultimately, a software system can become so crufty that it costs more to add features than the business benefit of adding those features. If the business has the capacity, it is usually at this point that the business makes a hard decision: let’s re-write the system from scratch.
I used the word “decision” in that last sentence. What are the other options in that decision? Ignoring the problem might be okay for a while longer: if the company is still getting benefit from the operation of the system, then this can go on for quite a while. Throwing more bodies at the system can seem to help for a bit, but there are rapidly diminishing returns on that approach (see The Mythical Man-Month for details). At some point, however, another threshold is reached: the cost of maintaining the operation of the system grows to the point where it is more expensive than the operational value of the system. Again, the business can make a hard decision, but it is in a worse place to do so: to replace the system (either by re-writing or buying a packaged solution), but without the operating margin to fund the replacement.
In his articles, Kane Mar describes this like so:
It’s pretty clear that a company in this situation has some difficult decisions ahead. There may be some temporary solution that would allow [a company] to use the existing system while building a new product, [A company] may decide to borrow money to fund the rewrite, or [a company] may want to consider returning any remaining value to their shareholders.
There are a few principles that are important in helping to answer these questions. All of these principles assume that we are talking about refactoring in an Agile team using a framework like Scrum, OpenAgile, or Kanban.
Refactoring Principle One: Keep It Small
Refactoring is safest and cheapest when it is done in many small increments rather than in large batches. The worst extreme is the complete system re-write refactoring. The best refactoring activities take seconds or minutes to execute. Small refactorings create a constant modest “overhead” in the work of the team. This overhead then becomes a natural part of the pace of the team.
Not all refactoring moves can be kept so small. For example, upgrading a component or module from a third party might show that your system has many dependencies on that module. In this case, efforts should be made to allow your system to use both the old and the new versions of the component simultaneously. This allows your system to be partially refactored. In other words, to break a large refactoring into many small refactorings. This, in turn, may force you to refactor your system to be more modular in its dependencies.
Another common problem with keeping refactorings small is the re-write problem. Your own system may have a major component that needs to be re-written. Again, finding creative technical means to allow for incremental refactoring of the component is crucial. This can often mean having temporary structures in your system to allow for the old and new parts to work harmoniously. One system that I was working on had to have two separate database platforms with some shared data in order to enable this “bi-modal” operation.
Refactoring Principle Two: Business Catalysts
When is the earliest that a refactoring should be done? Not whenever the technical team wants to do it. Instead, the technical team needs to use business requests as catalysts for refactoring. If the business needs a new feature, then refactoring should only be done on those parts of the system that are required to enable that feature. In other words, don’t refactor the whole user interface, just refactor the parts that relate to the specific business request.
Again, there can be exceptions to doing this… but only in the sense that some refactorings might be delayed until a later date. This is tricky: we want to make sure that we are not accumulating technical debt or creating legacy code. So, instead, we need to allow the technical team to refactor when they detect duplication. Duplication of code, data or structure in the system. A business request might impact a particular part of the system and the team sees how it might be necessary to refactor a large swath of the system as a result. But, the cost of doing so is not yet justified: the single request is not enough of a catalyst, and the team can also choose a simple temporary solution. Later, the business makes another request that also implies the same large refactoring. Now is the time to seriously consider it. It is now a question of duplication of another simple temporary solution. The business may not be happy with the extra expense of the large refactoring so the principle of keeping it small still applies. However, the technical team must also be willing to push back to the business under the right circumstances.
Refactoring Principle Three: Team Cohesion
Teamwork in Agile requires high levels of communication and collaboration. In refactoring work, teamwork applies just as much as in any other activity. Here, it is critical that all members of the team have a unified understanding of the principles and purpose of refactoring. But that is just the first level of team cohesion around refactoring.
The next level of team cohesion comes in the tools, techniques and practices that a team uses in refactoring. Examples include the unit testing frameworks, the mocking frameworks, the automation provided by development tools, continuous integration, and perhaps most importantly, the team working agreements about standard objectives of refactoring. This last idea is best expressed by the concept of refactoring to patterns.
The highest level of team cohesion in refactoring comes from collective code ownership and trust. Usually, this is built from practices such as pair programming or mob programming. These practices create deep levels of shared understanding among team members. This shared understanding leads to self-organizing behaviour in which team members make independent decisions that they know the other team members will support. It also impacts research and learning processes so that teams can do experiments and try alternatives quickly. All of which leads to the ability to do refactoring, large and small, quickly and without fear.
Refactoring Principle Four: Transparency
In many ways, this is the simplest refactoring principle: the team needs to be completely open and honest with all stakeholders about the cost of refactoring. This can be difficult at first. Another analogy helps to see the value of this. A surgeon does not hide the fact that care is put into creating a clean operating environment: washing hands, sterilizing instruments, wearing face masks and hair covers, restricted spaces, etc. In fact, all of those things contribute to the cost of surgery. A surgeon is a professional who has solid reasons for doing all those things and is open about the need for them. Likewise, software professionals need to be open about the costs of refactoring. This comes back to the main point of the first part of this article: hidden and deferred costs will still need to be paid… but with interest. Software professionals are up-front about the costs because doing so both minimizes the costs and gives stakeholders important information to make decisions.
The challenge for business stakeholders is to accept the costs. Respecting the team and trusting their decisions can sometimes be very hard. Teams sometimes make mistakes too, which complicates trust-building. The business stakeholders (for example, the Product Owner), must allow the team freedom to do refactoring. Ideally, it is continuous, small, and low-level. But once in a while, a team will have to do a large refactoring. How do you know if the cost is legitimate? Unfortunately, as a non-technical stakeholder, you can’t know with certainty. However, there are a few factors that can help you understand the cost and it’s legitimacy, namely, the principles that are described here.
If the refactoring is small, it is more likely to be legitimate.
If the refactoring is in response to a business catalyst, it is more likely to be legitimate.
If the refactoring is reflective of team cohesion, it is more likely to be legitimate.
And, of course, if the refactoring is made transparent, it is more likely to be legitimate.
As a product owner, what are the best ways to record technical debt and what are some approaches to prioritizing that work amid the continuous delivery of working software?
Hi Meredith! This is an interesting question. I’ll start by answering the second part of your question first. The two most common ways of handling technical debt, quality debt and legacy debt are:
Fix as you go. The Scrum Team works on new PBIs every Sprint, but every time a PBI touches a technical, quality or legacy debt area, the team fixes “just enough” to make the PBI implementation have no debt. This means that refactoring and the creation of automated tests (usually through TDD) are done on the parts of the product/system that have the problems.
Allocate a percentage. In this scenario, the Scrum Team reduces its velocity (sometimes significantly) to allow for time to deal with the technical, quality and legacy issues. This reduction could be adjusted every Sprint, but is usually consistent for several Sprints in a row.
In both approaches, the business is paying for the debt accumulated, and the cost includes an “interest” fee. In other words, the sooner you fix technical, quality and legacy debt, the less it costs. This approach to thinking about your product/system is essential for long-term sustainability. One organization I worked with took three years working on their system to clean it up without being able to add any new features! Don’t let your system get to that point.
Now to the first part of your question…
As a Product Owner, you shouldn’t really be making decisions about this cleanup work. Your authority is limited to the Product Backlog which should not include technical items. The only grey area here is with defects which may be hard to classify as either fully business or fully technical. But technical design, duplication of code, technical defects, and legacy code all are under the full authority of the Scrum Development Team. Practically, this means that every Sprint the team has the authority to choose however few PBIs they feel they can take while considering the technical state of the product/system. We trust and respect the team to make wise decisions.
Therefore, your main job as a Product Owner is to provide the team with as much information as possible about the business consequences of the work they are doing. With strong communication and collaboration about this aspect of their work, the technical members of your team can make good trade-off decisions, and balance the need for new features with the need to clean up previous compromises in quality.
A final note: in order for this to work well, it is critical that the team not be pushed to further sacrifice quality and that they are given the support to learn the techniques and skills to create debt-free code. (You might consider sending someone to our CSD training to learn these techniques and skills.)
Using these techniques, I have been able to help teams get very close to defect-free software deliveries (defect rates of 1 or 2 in production per year!)
Let me know in the comments if you would like any further clarification.
When asked to provide metrics to assess “how well” an Agile transformation is going, re-frame the discussion around measuring changes in the impact the IT organization is having (or not) on it’s Business environment, and define a small set of “fitness for purpose” metrics.
The Inevitable Question about Agile Transformation Metrics
Sooner or later, as an IT organization embarks on a transformation towards Agile mindset and practices, someone will be asked to provide “hard evidence” that the effort is paying off, and the conclusion will be that metrics is the vehicle to satisfy that request. What are your Agile transformation metrics?
It’s been my experience that this request usually leads to a discussion about measuring the specific Agile initiatives the IT organization has launched. In organizations where the emphasis has been around engineering disciplines, such metrics might be things like unit test code coverage, or integration build times. If the focus was around teams and process, then counting number of teams “converted” to Scrum, or people sent to Scrum Master training may appear as the choice.
While those measurement might be useful indicators in some context, they have two problems. First, they are akin to measuring the performance of the car engine without looking outside the window; the engine might be performing well, but if the car doesn’t have the wheels attached, we’re going nowhere. More importantly, though, these figures are usually meaningless for Business stakeholders, who are the ones usually asking for them in the first place. Agile transformation metrics need to be meaningful to the Business.
Re-framing the Agile Transformation Metrics Question
Agile transformation efforts can be very costly exercises, therefore it is legitimate to ask about the results of such endeavour. The important thing to realize, though, is that this question is really equivalent to another question: “is the IT organization improving its impact on its Business environment.” Another way to put it is, borrowing from the terminology used by the Kanban community: “is the IT organization becoming more and more fit for purpose?” Answering this question, of course, requires a clear understanding of what is that the Business expects from its interactions with IT.
The IT organization can be seen as providing various services to customers. Arguably, if IT has decided to “transform” in some way (perhaps by moving towards an Agile mindset), it’s doing so to improve its impact on those customers, so this is what needs to be measured to know “how the transformation” is going.
Some of those customers are different areas of the organization (like Finance, or HR.) But it doesn’t stop there, because the Business’ engagement with IT doesn’t have value for its own sake. Ultimately, the Business is using IT as a way to optimize its operations so that it can provide external customers with more effective products and services. Moreover, IT is these days the direct channel through which those products and services are delivered to external customers (for example, through web sites and mobile applications.) Therefore, the concept of “fitness for purpose” of the IT organization can be extended to consider the fitness for purpose of the Business respect the external customers it intends to serve.
Defining the “Agile” Transformation Metrics
Measuring “agile transformation success” really means measuring the success of the exchanges between IT and the Business, and between the Business and its external customers. Measuring the internal processes and practices that IT puts in place as part of that “transformation” is beside the point. This implies starting with a careful definition of the boundaries that delineate the exchanges to be measured. There might be more to external customer fitness for purpose than IT operations, for example, and that needs to be considered when defining Agile transformation metrics, especially if we’re later going to be drawing causation conclusions.
Defining Agile transformation metrics will be, of course, a highly contextual exercise because every business organization is different. But we can, however, draw again from the Kanban community for some general guidelines on what to look for. Their thought leaders talk about classifying metrics into 3 categories: fitness for purpose metrics, health indicators and improvement drivers. Using this framework, when talking about “agile transformation metrics” we are referring mainly to the first category, and perhaps a bit to the second. Based on those, improvement initiatives can be put in place, and perhaps driven with metrics belonging to the third category.
A fitness for purpose metric (also known as KPI) is an indicator of something a customer will care about. This is a key distinction: if the metric is not easily recognizable and meaningful for the customer, then it’s not a KPI. Another key characteristic is that a minimum threshold for its value can be defined: if the metric goes below the threshold, the Business is putting the relation with its customers at risk (perhaps they will walk away, initiate legal actions, etc.). In other words, the Business is no longer “fit for purpose”. We can then measure the effectiveness of the “agile transformation” by analyzing how KPI values over time compare to their respective thresholds. A typical KPI is delivery time, measured from the moment a customer request is accepted and committed to, until the moment it’s delivered to production. This is usually a good Agile transformation metric.
Health indicators are metrics that are inwards facing. Customers don’t really care about them (or even understand), but they indicate how a given aspect of the system is operating. The key characteristic is that they are not directly actionable; they only provide information that needs to be analyzed and put in context. As the value of a health indicator changes, we can draw some conclusions about how the system works, or explain why something is happening (or not), but it doesn’t necessarily leads to concrete action. Defect count is an example of this. Customers will certainly care about quality of the product, and we can make inferences about that quality by looking at how many defects we have, but the absolute number of defects will not necessarily make the product more or less fit for purpose. It may happen that customers consider the current quality to be “good enough”, irrespective of the number of defects.
Finally, improvement driver metrics are metrics put in place to influence behaviour towards a particular change. Their key characteristic is that they are temporary: we set a target on them and once the target is achieved, the metric is no longer necessary. The reason for this is related to the unintended behaviours that a metric might encourage in people, which may lead to locally optimizing the metric at the expense of other aspects, leading to global sub-optimization of the system. An example is unit testing code coverage: if we have determined that a given service is not fit for purpose and the cause is related to poor unit test coverage, then establishing a target for minimum coverage may influence developers to work on adding tests to reverse the situation.
Technical Debt is a term which captures sloppy code, unmaintainable architecture, clumsy user experience, cluttered visual layout, bloated feature-sets, etc. My stance is that the term, Technical Debt, includes all the problems which occur when people defer professional discipline — regarding any/every technical practice such as product management, visual and UX design, or code.
I assert that the change we need to catalyze in the business community is larger than any one discipline and I am worried that I have seen an increase in blog articles in recent years about concepts like “Design Debt”, “UX Debt”, “Experience Debt” — these articles unfortunately are not helping and have served only to divide the community. They are divisive, not because we shouldn’t be discussing the discreet facets in which Technical Debt can manifest, but because authors often take a decidedly combative approach in their writing. Take these phrases for example:
“Product Design Debt Versus Technical Debt” written by Andrew Chen
“User Experience Debt: Technical debt is only half the battle” written by Clinton Christian
“Design debt is more dangerous because…” written by James Engwall.
I agree with Andrew Chen that Product Design Debt is a problem — I just don’t like that he chose to impose a dichotomy where there is none. Why must he argue one “versus” another? Clinton Christian has implied that we’re in a “battle”. James Engwall has compared the “danger” of Design Debt relative to Technical Debt. These words are damaging, I argue, because they divert attention to symptoms and away from root causes.
The root cause of Technical Debt is that people forget this simple principle of the Agile Manifesto: “Continuous attention to technical excellence and good design enhances agility.”
The root solution to Technical Debt — all of its forms — is to help business leaders realize there is a difference between “incremental” development and “iterative” development so they may understand the ROI of refactoring. No technical expert should ever have to justify the business case for feature-pruning, refreshing a user interface, refactoring code, prioritizing defects. Every business leader should trust that their technical staff are disciplined and excellent.
Yes, please blog about UX Debt and Product Development Debt, etc. But please do so in a way that encourages cohesion and unity within the Product Development community.
The other day a technology leader was asking questions as if he didn’t agree that things like pair programming and code review should be part of the Definition of “Done” because they are activities that don’t show up in a tangible way in the end product. But if these things are part of your quality standards, they should be included in the definition of “Done” because they inform the “right way” of getting things done. In other words, the Definition of “Done” is not merely a description of the “Done” state but also the way(s) of getting to “Done” – the “how” in terms of quality standards. In fact, if you look at most of any team’s definition of “Done”, a lot of it is QA activity, carried out either as a practice or as an operation that is automated. Every agile engineering practice is essentially a quality standard and as they become part of a team’s practice, should be included as part of the definition of “Done”. The leader’s question was “if we’re done and we didn’t do pair programming and pair programming is part of our definition of “Done”, then does that mean we’re not done?” Which is sort of a backwards question because if you are saying you’re done and you haven’t done pair programming, then by definition pair programming isn’t part of your definition of done. But there are teams out there who would never imagine themselves to be done without pair programming because pair programming is a standard that they see as being essential to delivering quality product.
Everything that a Scrum Development Team does to ensure quality should be part of their definition of “Done”. The definition of “Done” isn’t just a description of the final “Done” state of an increment of product. In fact, If that were true, then we should be asking why anything is part of the definition of “Done”. This is the whole problem that this artifact solves. If this were the case, the team could just say that they are done whenever they say they are done and never actually identify better ways of getting to done and establishing better standards. We could just say (and we did and still do), “there’s the software, it’s done,” the software itself being its own definition of “Done”.
On the contrary the definition of “Done” is what it means for something to be done properly. In other words, it is the artifact that captures the “better ways of developing software” that the team has uncovered and established as practice because their practices reflect their belief that “Continuous attention to technical excellence and good design enhances agility” (Manifesto for Agile Software Development). The definition of “Done” is essentially about integrity—what is done every Sprint in order to be Agile and get things done better. When we say that testing is part of our definition of “Done”, that is our way of saying that as a team we have a shared understanding that it is better to test something before we say that it is done than to say that it’s done without testing it because without testing it we are not confident that it is done to our standards of quality. Otherwise, we would be content in just writing a bunch of code, seeing that it “works” on a workstation or in the development environment and pushing it into production as a “Done” feature with a high chance that there are a bunch of bugs or that it may even break the build.
This is similar to saying a building is “Done” without an inspection (activity/practice) that it meets certain safety standards or for a surgeon to say that he or she has done a good enough job of performing a surgical operation without monitoring the vital signs of the patient (partly automated, partly a human activity). Of course, this is false. The same logic holds true when we add other activities (automated or otherwise) that reflect more stringent quality standards around our products. The definition of “Done”,therefore, is partly made up of a set of activities that make up the standard quality practices of a team.
Professions have standards. For example, it is a standard practice for a surgeon to wash his or her hands between performing surgical operations. At one time it wasn’t. Much like TDD or pair programming, it was discovered as a better way to get a job done. In this day and age, we would not say that a surgeon had done a good job if he or she failed to carry out this standard practice. It would be considered preposterous for someone to say that they don’t care whether or not surgeons wash their hands between operations as long as the results are good. If a dying patient said to a surgeon, “don’t waste time washing your hands just cut me open and get right to it,” of course this would be dismissed as an untenable request. Regardless of whether or not the results of the surgery were satisfactory to the patient, we would consider it preposterous that a surgeon would not wash his or her hands because we know that this is statistically extremely risky, even criminal behaviour. We just know better. Hand washing was discovered, recognized as a better way of working, formalized as a standard and is now understood by even the least knowledgable members of society as an obvious part of the definition of “Done” of surgery. Similarly, there are some teams that would not push anything to production without TDD and automated tests. This is a quality standard and is therefore part of their definition of “Done”, because they understand that manual testing alone is extremely risky. And then there are some teams with standards that would make it unthinkable for them to push a feature that has not been developed with pair programming. For these teams, pair programming is a quality standard practice and therefore part of their definition of “Done”.
“As Scrum Teams mature,” reads the Scrum Guide, “it is expected that their definitions of “Done” will expand to include more stringent criteria for higher quality.” What else is pair programming, or any other agile engineering practice, if it is not a part of a team’s criteria for higher quality? Is pair programming not a more stringent criteria than, say, traditional code review? Therefore, any standard, be it a practice or an automated operation, that exists as part of the criteria for higher quality should be included as part of the definition of “Done”. If it’s not part of what it means for an increment of product to be “Done”—that is “done right”—then why are you doing it?
I was asked yesterday what measurements a team could start to take to track their progress towards continuous delivery. Here are some initial thoughts.
Lead time per work item to production
Lead time starts the moment we have enough information that we could start the work (ie it’s “ready”). Most teams that measure lead time will stop the clock when that item reaches the teams definition of “done” which may or may not mean that the work is in production. In this case, we want to explicitly keep tracking the time until it really is in production.
Note that when we’re talking about continuous delivery, we make the distinction between deploy and release. Deploy is when we’ve pushed it to the production environment and release is when we turn it on. This measurement stops at the end of deploy.
Cycle time to “done”
If the lead time above is excessively long then we might want to track just cycle time. Cycle time starts when we begin working on the item and stops when we reach “done”.
When teams are first starting their journey to continuous delivery, lead times to production are often measured in months and it can be hard to get sufficient feedback with cycles that long. Measuring cycle time to “done” can be a good intermediate measurement while we work on reducing lead time to production.
If a bug is discovered after the team said the work was done then we want to track that. Prior to hitting “done”, it’s not really a bug – it’s just unfinished work.
Shipping buggy code is bad and this should be obvious. Continuously delivering buggy code is worse. Let’s get the code in good shape before we start pushing deploys out regularly.
Defect fix times
How old is the oldest reported bug? I’ve seen teams that had bug lists that went on for pages and where the oldest were measured in years. Really successful teams fix bugs as fast as they appear.
Total regression test time
Track the total time it takes to do a full regression test. This includes both manual and automated tests. Teams that have primarily manual tests will measure this in weeks or months. Teams that have primarily automated tests will measure this in minutes or hours.
This is important because we would like to do a full regression test prior to any production deploy. Not doing that regression test introduces risk to the deployment. We can’t turn on continuous delivery if the risk is too high.
Time the build can be broken
How long can your continuous integration build be broken before it’s fixed? We all make mistakes. Sometimes something gets checked in that breaks the build. The question is how important is it to the team to get that build fixed? Does the team drop everything else to get it fixed or do they let it stay broken for days at a time?
Continuous delivery isn’t possible with a broken build.
Number of branches in version control
By the time you’ll be ready to turn on continuous delivery, you’ll only have one branch. Measuring how many you have now and tracking that over time will give you some indication of where you stand.
If your code isn’t in version control at all then stop taking measurements and just fix that one right now. I’m aware of teams in 2015 that still aren’t using version control and you’ll never get to continuous delivery that way.
Production outages during deployment
If your production deployments require taking the system offline then measure how much time it’s offline. If you achieve zero-downtime deploys then stop measuring this one. Some applications such as batch processes may never require zero-downtime deploys. Interactive applications like webapps absolutely do.
I don’t suggest starting with everything at once. Pick one or two measurements and start there.
At Berteig Consulting we have been working for 10 years to learn how to help organizations transform people, process and culture. The problem is simple to state: there is a huge amount of opportunity waste and process waste in most normal enterprise-scale organizations. If you have more than a couple hundred people in your organization, this almost certainly affects you.
We like to call this problem “the Bureaucratic Beast”. The Bureaucratic Beast is a self-serving monster that seems to grow and grow and grow. As it grows, this Beast makes it progressively more difficult for business leaders to innovate, respond to changes in the market, satisfy existing customers, and retain great employees.
Real Agility, a system to tame the Bureaucratic Beast, comes from our experience working with numerous enterprise Agile adoptions. This experience, in turn, rests on the shoulders of giants like John Kotter (“Leading Change”), Edgar Schein (“The Corporate Culture Survival Guide”), Jim Collins (“Good to Great” and “Built to Last”), Mary Poppendieck (“Lean Software Development”) Jon Katzenbach (“The Wisdom of Teams”) and Frederick Brooks (“The Mythical Man-Month”). Real Agility is designed to tame all the behaviours of the Bureaucratic Beast: inefficiency, dis-engaged staff, poor quality and slow time-to-market.
Studies have proven that Agile methods work in IT. In 2012, the Standish Group observed that 42% of Agile projects succeed vs. just 14% of projects done with traditional “Bureaucratic Beast” methods. Agile and associated techniques aren’t just for IT. There is growing use of these same techniques in non-technoogy environments such as marketing, operations, sales, education, healthcare, and even heavy industry like mining.
Real Agility Basics: Agile + Lean
Real Agility is a combination of Agile and Lean; both systems used harmoniously throughout an enterprise. Real Agility affects delivery processes by taking long-term goals and dividing them into short cycles of work that deliver valuable results rapidly while providing fast feedback on scope, quality and most importantly value. Real Agility affects management processes by finding and eliminating wasteful activities with a system view. And Real Agility affects human resources (people!) by creating “Delivery Teams” which have clear goals, are composed of multi-skilled people who self-organize, and are stable in membership over long periods of time.
There are lots of radical differences between Real Agility and traditional management (that led to the Bureaucratic Beast in the first place). Real Agility prioritizes work by value instead of critical path, encourages self-organizing instead of command-and-control management, a team focus instead of project focus, evolving requirements instead of frozen requirements, skills-based interactions instead of roles-based interaction, continuous learning instead of crisis management, and many others.
Real Agility is built on a rich Agile and Lean ecosystem of values, principles and tools. Examples include the Agile Manifesto, the “Stop the Line” practice, various retrospective techniques, methods and frameworks such as Scrum and OpenAgile, and various thinking tools compatible with the Agile – Lean ecosystem such as those developed by Edward de Bono (“Lateral Thinking”) and Genrich Altshuller (“TRIZ”).
Real Agility acknowledges that there are various approaches to Agile adoption at the enterprise level: Ad Hoc (not usually successful – Nortel tried this), Grassroots (e.g. Yahoo!), Pragmatic (SAFe and DAD fall into this category), Transformative (the best balance of speed of change and risk reduction – this is where the Real Agility Program falls), and Big-Bang (only used in situations of true desperation).
Why Choose Transformative?
One way to think about these five approaches to Agile adoption is to compare the magnitude of actual business results. This is certainly the all-important bottom line. But most businesses also consider risk (or certainty of results). Ad-Hoc approaches to Agile adoption have poor business results and a very high level of risk. Big-Bang approaches (changing a whole enterprise to Agile literally over night) often have truly stunning business results, but are also extremely high risk. Grassroots, where leaders give staff a great deal of choice about how and when to adopt Agile, is a bit better in that the risk is lower, but the business results often take quite a while to manifest themselves. Pragmatic approaches tend to be very low risk because they often accommodate the Bureaucratic Beast, but that also limits their business results to merely “good” and not great. Transformative approaches which systematically address organizational culture are just a bit riskier than Pragmatic approaches, but the business results are generally outstanding.
More specifically, Pragmatic approaches such as SAFe (Scaled Agile Framework) are popular because they are designed to fit in with existing middle management structures (where the Bureaucratic Beast is most often found). As a result, there is slow incremental change that typically has to be driven top-down from leadership. Initial results are good, but modest. And the long term? These techniques haven’t been around long enough to know, but in theory it will take a long time to get to full organizational Agility. Bottom line is that Pragmatic approaches are low risk but the results are modest.
Transformative approaches such as the Real Agility Program (there are others too) are less popular because there is significantly more disruption: the Bureaucratic Beast has to be completely tamed to serve a new master: business leadership! Transformative approaches require top-to-bottom organizational and structural change. They include a change in power relationships to allow for grassroots-driven change that is empowered by servant leaders. Transformative approaches are moderate in some ways: they are systematic and they don’t require all change to be done overnight. Nevertheless, often great business results are obtained relatively quickly. There is a moderate risk that the change won’t deliver the great results, but that moderate risk is usually worth taking.
Regardless of adoption strategy (Transformative or otherwise) there are a few critical success factors. Truthfulness is the foundation because without it, it is impossible to see the whole picture including organizational culture. And love is the strongest driver of change because cultural and behavioural change requires emotional commitment on the part of everyone.
Culture change is often challenging. There are unexpected problems. Two steps forward are often followed by one step back. Some roadblocks to culture change will be surprisingly persistent. Leaders need patience and persistence… and a systematic change program.
The Real Agility Program
The Real Agility Program has four tracks or lines of action (links take you to the Real Agility Program web site):
Recommendations: consultants assess an organization and create a playbook that customizes the other tracks of the Real Agility Program as well as dealing with any important outliers.
Execution: coaches help to launch project, product and operational Delivery Teams and Delivery Groups that learn the techniques of grassroots-driven continuous improvement.
Accompaniment: trainer/coaches help you develop key staff into in-house Real Agility Coaches that learn to manage Delivery Groups for sustainable long-term efforts such as a product or line of business.
Leadership: coaches help your executive team to drive strategic change for long-term results with an approach that helps executives lead by example for enterprise culture change.
Structurally an enterprise using Real Agility is organized into Delivery Groups. A Delivery Group is composed of one or more Delivery Teams (up to 150 people) who work together to produce business results. Key roles include a Business leader, a People leader and a Technology leader all of whom become Real Agility Coaches and take the place of traditional functional management. As well, coordination across multiple Delivery Teams within a Delivery Group is done using an organized list of “Value Drivers” maintained by the Business leader and a supporting Business Leadership Group. Cross-team support is handled by a People and Technology Support Group co-led by the People and Technology leaders. Depending on need there may also be a number of communities of practice for Delivery Team members to help spread learning.
At an organizational or enterprise level, the Leadership Team includes top executives from business, finance, technology, HR, operations and any other critical parts of the organization. This Leadership Team communicates the importance of the changes that the Delivery Groups are going through. They lead by example using techniques from Real Agility to execute organizational changes. And, of course, they manage the accountability of the various Delivery Groups throughout the enterprise.
The results of using the Real Agility Program are usually exceptional. Typical results include:
20x improvement in quality
10x improvement in speed to market
5x improvement in process efficiency
and 60% improvement in employee retention.
Of course, these results depend on baseline measures and that key risk factors are properly managed by the Leadership Team.
Not every organization needs (or is ready for) the Real Agility Program. Your organization is likely a good candidate if three or more of the following problems are true for your organization:
Leaders of Agile Transformations for the Enterprise need to have good sources of information, concepts and techniques that will guide and assist them. This short list of twelve books (yes, books) is what I consider critical reading for any executive, leader or enterprise change agent. Of course, there are many books that might also belong on this list, so if you have suggestions, please make them in the comments.
I want to be clear about the focus of this list: it is for leaders that need to do a deep and complete change of culture throughout their entire organization. It is not a list for people who want to do Agile pilot projects and maybe eventually lots of people will use Agile. It is about urgency and need, and about a recognition that Agile is better than not-Agile. If you aren’t in that situation, this is not the book list for you.
These books all help you to understand and work with the deeper aspects of corporate behaviour which are rooted in culture. Becoming aware of culture and learning to work with it is probably the most difficult part of any deep transformation in an organization.
The Corporate Culture Survival Guide – Edgar Schein
Beyond the Culture of Contest – Michael Karlburg
The Heart of Change – John Kotter
This set of books gets a bit more specific: it is the “how” of managing and leading in high-change environments. These books all touch on culture in various ways, and build on the ideas in the books about culture. For leaders of an organization, there are dozens of critical, specific, management concepts that often challenge deeply held beliefs and behaviours about the role of management.
Good to Great – Jim Collins
The Leaders’ Guide to Radical Management – Steve Denning
The Mythical Man-Month – Frederick Brooks
Agile at Scale
These books discuss how to get large numbers of people working together effectively. They also start to get a bit technical and definitely assume that you are working in technology or IT. However, they are focused on management, organization and process rather than the technical details of software development. I highly recommend these books even if you have a non-technical background. There will be parts where it may be a bit more difficult to follow along with some examples, but the core concepts will be easily translated into almost any type of work that requires problem-solving and creativity.
Scaling Lean and Agile Development – Bas Vodde, Craig Larman
Scaling Agility – Dean Leffingwell
Lean Software Development – Mary and Tom Poppendieck
These books (including some free online books) are related to some of the key supporting ideas that are part of any good enterprise Agile transformation.
Toyota Talent: Developing Your People the Toyota Way – Jeffrey Liker, David Meier
Whenever I run a Certified Scrum Product Owner training session, one concept stands out as critical for participants: the relationship of the Product Owner to the technical demands of the work being done by the Scrum team.
The Product Owner is responsible for prioritizing the Product Backlog. This responsibility is, of course, also matched by their authority to do so. When the Product Owner collaborates with the team in the process of prioritization, there may be ways which the team “pushes back”. There are two possible reasons for push-back. One is good, one is bad.
Bad Technical Push-Back
The team may look at a product backlog item or a user story and say “O gosh! There’s a lot there to think about! We have to build this fully-architected infrastructure before we can implement that story.” This is old waterfall thinking. It is bad. The team should always be thinking (and doing) YAGNI and KISS. Technical challenges should be solved in the simplest responsible way. Features should be implemented with the simplest technical solution that actually works.
As a Product Owner, one technique that you can use to help teams with this is that when the team asks questions, that you aggressively keep the user story as simple as possible. The questions that are asked may lead to the creation of new stories, or splitting the existing story. Here is an example…
Suppose the story is “As a job seeker I can post my resume to the web site…” If the technical team makes certain assumptions, they may create a complex system that allows resumes to be uploaded in multiple formats with automatic keyword extraction, and even beyond that, they may anticipate that the code needs to be ready for edge cases like WordPerfect format. The technical team might also assume that the system needs a database schema that includes users, login credentials, one-to-many relationships with resumes, detailed structures about jobs, organizations, positions, dates, educational institutions, etc. The team might insist that creating a login screen in the UI is an essential prerequisite to allowing a user to upload their resume. And as for business logic, they might decide that in order to implement all this, they need some sort of standard intermediate XML format that all resumes will be translated into so that searching features are easier to implement in the future.
Because that’s not what the Product Owner asked for. The thing that’s really difficult for a team of techies to get with Scrum is that software is to be built incrementally. The very first feature built is built in the simplest responsible way without assuming anything about future features. In other words, build it like it is the last feature you will build, not the first. In the Agile Manifesto this is described as:
Simplicity, the art of maximizing the amount of work not done, is essential.
The second feature the team builds should only add exactly what the Product Owner asks for. Again, as if it was going to be the last feature built. Every single feature (User Story / Product Backlog Item) is treated the same way. Whenever the team starts to anticipate the business in any of these three ways, the team is wrong:
Building a feature because the team thinks the Product Owner will want it.
Building a feature because the Product Owner has put it later on the Product Backlog.
Building a technical aspect of the system to support either of the first types of anticipation, even if the team doesn’t actually build the feature they are anticipating.
Okay, but what about architecture? Fire your architects. No kidding.¹
Good Technical Push-Back
Sometimes stuff gets non-simple: complicated, messy, hard to understand, hard to change. This happens despite us techies all being super-smart. Sometimes, in order to implement a new feature, we have to clean up what is already there. The Product Owner might ask the Scrum Team to build this Product Backlog Item next and the team says something like: “yes, but it will take twice as long as we initially estimated, because we have to clean things up.” This can be greatly disappointing for the Product Owner. But, this is actually the kind of push-back a Product Owner wants. Why? In order to avoid destroying your business! (Yup, that serious.)
This is called “Refactoring” and it is one of the critical Agile Engineering practices. Martin Fowler wrote a great book about this about 15 years ago. Refactoring is, simply, improving the design of your system without changing it’s business behaviour. A simple example is changing a set of 3 radio buttons in the UI to a drop-down box with 3 options… so that later, the Product Owner can add 27 more options. Refactoring at the level of code is often described as removing duplication. But some types of refactoring are large: replacing a relational database with a NoSQL database, moving from Java to Python for a significant component of your system, doing a full UX re-design on your web application. All of these are changes to the technical attributes of your system that are driven by an immediate need to add a new feature (or feature set) that is not supported by the current technology.
The Product Owner has asked for a new feature, now, and the team has decided that in order to build it, the existing system needs refactoring. To be clear: the team is not anticipating that the Product Owner wants some feature in the future; it’s the very next feature that the team needs to build.
This all relates to another two principles from the Agile Manifesto:
Continuous attention to technical excellence and good design enhances agility.
The best architectures, requirements, and designs emerge from self-organizing teams.
In this case, the responsibilities of the team for technical excellence and creating the best system possible override the short-term (and short-sighted) desire of the business to trade off quality in order to get speed. That trade-off always bites you in the end! Why? Because of the cost of fixing quality problems increases exponentially as time passes from when they were introduced.
Refactoring is not a bad word.
Keep your code clean.
Let your team keep its code clean.
Oh. And fire your architects.
Update Sep. 8, 2015: Check out this YouTube video on the closely related topic of who has authority over the Product Backlog and why developers should not set the order of PBIs:
¹ I used to be a senior architect reporting directly to the CTO of Charles Schwab. Effectively, I fired myself and launched an incredibly successful enterprise architecture re-write project… with no up-front architecture plan. Really… fire your architects. Everything they do is pure waste and overhead. Someday I’ll write that article 🙂