Scrum Data Warehouse Project

May people have concerns about the possibility of using Scrum or other Agile methods on large projects that don’t directly involve software development.  Data warehousing projects are commonly brought up as examples where, just maybe, Scrum wouldn’t work.
I have worked as a coach on a couple of such projects.  Here is a brief description of how it worked (both the good and the bad) on one such project:
The project was a data warehouse migration from Oracle to Teradata.  The organization had about 30 people allocated to the project.  Before adopting Scrum, they had done a bunch of up-front analysis work.  This analysis work resulted in a dependency map among approximately 25,000 tables, views and ETL scripts.  The dependency map was stored in an MS Access DB (!).  When I arrived as the coach, there was an expectation that the work would be done according to dependencies and that the “team” would just follow that sequence.
I learned about this all in the first week as I was doing boot-camp style training on Scrum and Agile with the team and helping them to prepare for their first Sprint.
I decided to challenge the assumption about working based on dependencies.  I spoke with the Product Owner about the possible ways to order the work based on value.  We spoke about a few factors including:
  • retiring Oracle data warehouse licenses / servers,
  • retiring disk space / hardware,
  • and saving CPU time with new hardware
The Product Owner started to work on getting metrics for these three factors.  He was able to find that the data was available through some instrumentation that could be implemented quickly so we did this.  It took about a week to get initial data from the instrumentation.
In the meantime, the Scrum teams (4 of them) started their Sprints working on the basis of the dependency analysis.  I “fought” with them to address the technical challenges of allowing the Product Owner to work on the migration in order based more on value – to break the dependencies with a technical solution.  We discussed the underlying technologies for the ETL which included bash scripts, AbInitio and a few other technologies.  We also worked on problems related to deploying every Sprint including getting approval from the organization’s architectural review board on a Sprint-by-Sprint basis.  We also had the teams moved a few times until an ideal team workspace was found.
After the Product Owner found the data, we sorted (ordered) the MS Access DB by business value.  This involved a fairly simple calculation based primarily on disk space and CPU time associated with each item in the DB.  This database of 25000 items became the Product Backlog.  I started to insist to the teams that they work based on this order, but there was extreme resistance from the technical leads.  This led to a few weeks of arguing around whiteboards about the underlying data warehouse ETL technology.  Fundamentally, I wanted to the teams to treat the data warehouse tables as the PBIs and have both Oracle and Teradata running simultaneously (in production) with updates every Sprint for migrating data between the two platforms.  The Technical team kept insisting this was impossible.  I didn’t believe them.  Frankly, I rarely believe a technical team when they claim “technical dependencies” as a reason for doing things in a particular order.
Finally, after a total of 4 Sprints of 3 weeks each, we finally had a breakthrough.  In a one-on-one meeting, the most senior tech lead admitted to me that what I was arguing was actually possible, but that the technical people didn’t want to do it that way because it would require them to touch many of the ETL scripts multiple times – they wanted to avoid re-work.  I was (internally) furious due to the wasted time, but I controlled my feelings and asked if it would be okay if I brought the Product Owner into the discussion.  The tech lead allowed it and we had the conversation again with the PO present.  The tech lead admitted that breaking the dependencies was possible and explained how it could lead to the teams touching ETL scripts more than once.  The PO basically said: “awesome!  Next Sprint we’re doing tables ordered by business value.”
A couple Sprints later, the first of 5 Oracle licenses was retired, and the 2-year $20M project was a success, with nearly every Sprint going into production and with Oracle and Teradata running simultaneously until the last Oracle license was retired.  Although I don’t remember the financial details anymore, the savings were huge due to the early delivery of value.  The apprentice coach there went on to become a well-known coach at this organization and still is a huge Agile advocate 10 years later!

Affiliated Promotions:

Register for a Scrum, Kanban and Agile training sessions for your, your team or your organization -- All Virtual! Satisfaction Guaranteed!

Please share!

4 thoughts on “Scrum Data Warehouse Project”

  1. Hi Mishkin

    Thanks for sharing your experience with this project. I folowed a similar approach for a Business Intelligence platform migration from an older software version to the newest release. I based my work on the literature of Scott Amblers Disciplined Agile Delivery framework ( focussing on a proper inception phase and handling the various dependencies between different teams in a large organisation.
    Although these kind of migration projects relate to datawarehousing I’m wondering if you could share your experience with more “classical” DWH projects where it is about implementing user requirements in new data marts and business intelligence frontend solutions.
    Best regards

    1. Hi Raphael,

      My experience with data warehousing is relatively limited. I’ll ask someone I know who works in this space to take a look at the article and see if he can offer a comment here.

    2. Hi Raphael.
      Even your comment is a little bit older I can share my point of view: agile DWH development works if it is built on top of a Data Vault architecture. Only DV separtes the working units in a way that different people can work at different points in time on different topics and you are sure that the thing as whole will work together. To support my claim: Bill Inmon and Scott Ambler are speakers at the next World Wide Data Vault conference.
      Even some poeple try to manually create a Data Vault I am personally convinced that it is faster and more cost effective to use a Data Warehousing automation tool which is not only covering design and development but also automates the DWH lifecycle.
      But I have to admit that I’m biased on this topic as our company is creating the Datavault Builder DWA tool.

  2. I’m curious about this because you refer to wasted time but the technical team advised you that not doing the work in dependency order would itself lead to wasted time in re-work. I’m working on a Scrum basis for the first time on a DWH and one of the things that irks me is that Scrum inherently forces re-work and extra regression testing.

    Like Raphael I’d be interested in knowing more about how Scrum is used in a proper DWH project delivering marts, and also delivering a 3NF enterprise data layer. I don’t think your migration project specifically needed Scrum to prioritise the order of migration to maximum cost savings in retiring licences incrementally.

    I’ve worked on a corporate DWH that over 10 years delivered hundreds of projects in a more traditional project management methodology and we rarely had problems with requirements changing. Perhaps we were just lucky!

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.