May people have concerns about the possibility of using Scrum or other Agile methods on large projects that don’t directly involve software development. Data warehousing projects are commonly brought up as examples where, just maybe, Scrum wouldn’t work.
I have worked as a coach on a couple of such projects. Here is a brief description of how it worked (both the good and the bad) on one such project:
The project was a data warehouse migration from Oracle to Teradata. The organization had about 30 people allocated to the project. Before adopting Scrum, they had done a bunch of up-front analysis work. This analysis work resulted in a dependency map among approximately 25,000 tables, views and ETL scripts. The dependency map was stored in an MS Access DB (!). When I arrived as the coach, there was an expectation that the work would be done according to dependencies and that the “team” would just follow that sequence.
I learned about this all in the first week as I was doing boot-camp style training on Scrum and Agile with the team and helping them to prepare for their first Sprint.
I decided to challenge the assumption about working based on dependencies. I spoke with the Product Owner about the possible ways to order the work based on value. We spoke about a few factors including:
- retiring Oracle data warehouse licenses / servers,
- retiring disk space / hardware,
- and saving CPU time with new hardware
The Product Owner started to work on getting metrics for these three factors. He was able to find that the data was available through some instrumentation that could be implemented quickly so we did this. It took about a week to get initial data from the instrumentation.
In the meantime, the Scrum teams (4 of them) started their Sprints working on the basis of the dependency analysis. I “fought” with them to address the technical challenges of allowing the Product Owner to work on the migration in order based more on value – to break the dependencies with a technical solution. We discussed the underlying technologies for the ETL which included bash scripts, AbInitio and a few other technologies. We also worked on problems related to deploying every Sprint including getting approval from the organization’s architectural review board on a Sprint-by-Sprint basis. We also had the teams moved a few times until an ideal team workspace was found.
After the Product Owner found the data, we sorted (ordered) the MS Access DB by business value. This involved a fairly simple calculation based primarily on disk space and CPU time associated with each item in the DB. This database of 25000 items became the Product Backlog. I started to insist to the teams that they work based on this order, but there was extreme resistance from the technical leads. This led to a few weeks of arguing around whiteboards about the underlying data warehouse ETL technology. Fundamentally, I wanted to the teams to treat the data warehouse tables as the PBIs and have both Oracle and Teradata running simultaneously (in production) with updates every Sprint for migrating data between the two platforms. The Technical team kept insisting this was impossible. I didn’t believe them. Frankly, I rarely believe a technical team when they claim “technical dependencies” as a reason for doing things in a particular order.
Finally, after a total of 4 Sprints of 3 weeks each, we finally had a breakthrough. In a one-on-one meeting, the most senior tech lead admitted to me that what I was arguing was actually possible, but that the technical people didn’t want to do it that way because it would require them to touch many of the ETL scripts multiple times – they wanted to avoid re-work. I was (internally) furious due to the wasted time, but I controlled my feelings and asked if it would be okay if I brought the Product Owner into the discussion. The tech lead allowed it and we had the conversation again with the PO present. The tech lead admitted that breaking the dependencies was possible and explained how it could lead to the teams touching ETL scripts more than once. The PO basically said: “awesome! Next Sprint we’re doing tables ordered by business value.”
A couple Sprints later, the first of 5 Oracle licenses was retired, and the 2-year $20M project was a success, with nearly every Sprint going into production and with Oracle and Teradata running simultaneously until the last Oracle license was retired. Although I don’t remember the financial details anymore, the savings were huge due to the early delivery of value. The apprentice coach there went on to become a well-known coach at this organization and still is a huge Agile advocate 10 years later!