Dreams of Lean

In 1926, Sakichi Toyoda founded Toyoda Automatic Loom Works. It was he, who invented the principle of jidoka, which later became one of the cornerstone principles of The Toyota Production System (TPS). Jidoka means “automation with a human touch”. The idea behind it is very simple: detect a problem, stop, correct quickly, find the root cause and implement a long-term solution. TPS has evolved over the following decades and is still evolving. It is the most sophisticated and successful engineering framework we know today. Ed Catmull in his book Creativity, Inc talks about being influenced by the lean ideas of TPS from the early days of Pixar. He also mentions that lean was the talk of Silicon Valley back in those days, but in general, Lean probably did not start grabbing western hearts and minds until 1990’s.

Any new invention has its critics and doubters. It’s our distinct ability as collaborative species to doubt, learn, and accept new ideas when we are ready to accept them. Today, 90 years later, after dozens of books (my favorite is “The Toyota Way” by Jeffrey Liker) and papers having been written about it, most people still do not understand Lean and are scared of it. Copernicus died for believing that the Earth was not flat. TPS adoption seems to repeat the fate of Copernicus’ vision. We know it is true. We’ve been blind for too long and the time to accept it is now.

Back in 1970, Winston Royce published his famous “MANAGING THE DEVELOPMENT OF LARGE SOFTWARE SYSTEMS” paper, which, very controversially, made him the Father of the Waterfall Model. I read and re-read this paper several times and came to believe that his ideas, which were just another step in the evolution of software processes, were greatly misinterpreted at the time and even more so more recently. Royce made a rather simple observation: customers typically want to pay for the design and the actual code written. They do not want to know the specifics of all the work that goes into building a new shiny app. Think about building a new house on a pristine piece of land. You would want to hire an architect to come up with blueprints and then a general contractor to build a new house. It’s the architect’s job to know building codes and ask you questions about how many rooms you want, and then provide you with the blueprint. It is his job to get those blueprints approved by municipality. General contractor takes care of permits and all other details. His job is to give you the keys to the new dream house. You would not want to know how much top soil must be removed for the foundation and how many two-by-fours would need to be brought in. In software world, abstracting work in progress to just design and code often raises an interesting dilemma. To do a better job building software, we do need requirements, architecture, design, code, testing, etc. Yet a customer would often ask: if software is code, then why should I pay for all these extra things. So Winston Royce correctly pointed out 3 things:
1. what (steps) does it take to do a good job building software
2. it takes iterations because there may be changes or defects needed to be addressed at each stage
3. we need a process framework which sets customer expectations of the work that needs to be done by developers
To me, much of the criticism of “MANAGING THE DEVELOPMENT OF LARGE SOFTWARE SYSTEMS” is misguided. And I am not the only one thinking that way. I don’t see anything in it that would not fit a good lean or agile process. It is the people who interpreted Winston’s ideas, who got it totally wrong. Over the next 30 years or so, the phases (steps) of the waterfall model turned into rigid sequences, each step taking weeks to complete, and a smallest change sometimes requiring a complete redesign of much of already written software. Just like a general contractor knows what it takes to build a house, a project manager knows what it takes to build software, and this knowledge should be reflected in the cost of crafting the software. Additional transparency of having a customer knowing the details of the process adds unnecessary complexity and is waste. Winston wrote that “It is important to involve the customer in a formal way so that he has committed himself at earlier points before final delivery.” This means keeping the customer aware of the issues and working with customer to address those issues: it doesn’t mean the complete transparency! The world was very hierarchical in 1970s. And people took Royce’s ideas and interpreted them the only way they could see fit. Thus The Waterfall Model was born.

Waterfall projects were often delivered on time, at cost, and of reasonable quality. Though more often, they were not. I was lucky to spend the late 1990’s working at Motorola on various projects that were using Waterfall. I have a reason to believe that the company kept time, cost, and quality largely in check (a 3 month overrun is not much for a Waterfall). I did observe two major issues arise as the process overhead was becoming more and more of a burden over the years. Since the lead time was so large and defects, especially later in the life-cycle, were so expensive, management excelled in excessive padding in order to keep the schedule in check. This, in turn, could create days, weeks, or even months of idle time for large teams of developers, if they, in fact, delivered on-time. Then, in order to increase scheduling accuracy, management rightly observed that they needed to reduce the defect rate. This rather obvious observation, nevertheless, did not lead to the discover that the root cause of all the inefficiencies was the process itself. Instead, it forced management to stop innovating: the less code we wrote, the fewer defects we had, the better big bosses looked during performance reviews, the more opportunities they had for bonuses and promotions. Innovation at the company stalled; it missed important new markets, and the company that used to be a household name has now largely passed into oblivion.

Late 1990’s saw an explosive growth of small software companies. And small companies, with much smaller budgets and talent pools, could hardly afford to employ the process heavy Waterfall. Agile Methodologies were born. Extreme Programming (XP) and Scrum were among the first. While Waterfall made things possible, Agile focused on solving things that did not work with original process. XP focused on development techniques. Some of them, such as using small iterations, shortening the feedback loop, getting developers and business together, using stories, code reviews, TDD, automation, made a lot of sense and worked great. Others, authors would tell us, would be counter-intuitive but would work once we become enlightened. Those would fail every time management would try them. We would not be enlightened, just like we would not be enlightened by Waterfall, and for one simple reason. While opening the doors for experimentation and learning, XP would shut the conversation down when it came to what the authors believed was the Holy Grail and thus could not be disputed. As we would rightly learn from Scrum, we needed to inspect and adapt. Pair programming could work well for sharing knowledge, for brainstorming designs, and to improve unit test quality. It would largely fail when developers of a similar skill level would be pared, or when people simply did not want to work together. The latest scenario would usually result in less than 50% efficiency, as we could get at most one of the two people being productive at a given point in time. XP was a little too prescriptive than it should have been.

Scrum, on the other hand was a lot more open and focused on interaction of people. It was born from a 1986 article by Hirotaka Takeuchi and Ikujiro Nonaka titled “New Product Development Game”, which described “a flexible, holistic product development strategy where a development team works as a unit to reach a common goal”. “The Scrum Guide” defines it as a framework for developing and sustaining complex projects, based on empirical process control. It’s a framework, not a process. It defines events, roles, and artifacts, rather than prescribing specific actions, and therefore has a lot more flexibility when it comes to exactly which processes to follow. In fact, the very things XP is famous for are being used as best practices by virtually every Scrum Team. In Scrum, fast cycles of inspection and adaptation work toward transparency between developers themselves as well as transparency between development team and customers represented by the product owner. Customers no longer need to wait for months to see any results at all, because every iteration has to result in a potentially shippable product. Customers can provide feedback at the end of every iteration or even in between, when their input is requested. Fast changing requirements no longer lock up the development process, since sprint (not product) backlog cannot not be changed in the middle of an iteration. But product backlog does get re-ordered for every iteration so only the most value-adding things get done. A belief in self-organizing, empowering teams, to me is the most important feature of Scrum. Instead of a hierarchical management structure inefficiently ordering people what to do, teams themselves decide on how to attack each problem. Team members are aware of what everyone is working on and can step in to help other team members when help is needed. Teams decide how much work they can take on during each iteration and can engage the customers not just at the beginning or end, but at any time they may have questions. Scrum forces customers to channel sometimes differing requests through a single product owner, thus organizing requirements and acceptance criteria.

As liberating as Scrum may initially seem, it introduces a number of very serious drawbacks that we all have to overcome. The fast going mentality favors coding against design and architecture. Difficult decisions can get pushed back because re-factoring may seem too expensive to work on during next iteration. How do we architect and design? During preparation phase for a new project, which some people erroneously call Sprint 0, the team gets its main shot at design and architecture. Is this totally sufficient: absolutely not, so gurus have suggested 2 different workarounds:
1. design throughout each iteration as needed
2. bring in an outside architect or a team of architects to tell developers what to do
Design as needed is a great idea and clearly works when there are very few road-bumps. But let’s say you are building a new trading system and you just have to experiment with a new piece of 3rd party software, which customers really want. All your back-end is Linux, the vendor says their API’s only works with .NET, and you need to research whether to stick with your old platform and develop using Mono on Linux, or workaround by ferrying events between two incompatible platforms. Optimally, a prototype or two exercising different design approaches need to be implemented and very serious bench-marking and stability tests need to be performed. Even with a dedicated resource, it may take a long time to make a well-educated decision. Often, a quick and possibly sub-optimal solution will be rushed through. Having self-organizing teams means that it’s too hard to keep a full-time specialist-architect on a team. Also, specialists don’t like doing generalists work. When I worked with teams which had full time specialists, the success always depended on either keeping both specialists and the rest of the team happy. Walking the fine line was not easy. Some companies bring in a team of specialists from outside of the Scrum Team, so that team does Scrum and specialists do their work. Everybody is happy and it works, except that it is cheating. In my view it’s one of those Scrum-buts, but there is really no right solution for this.

Scrum also fails to address the same problem that Winston Royce warned against in his seminal paper: how do I sell all this extra work to a customer without either looking greedy or incompetent. As a result, many teams simply take shortcuts and keep kicking the can down the road. Scrum, which advocates constant re-factoring, often makes significant and difficult re-factoring impossible because of its cost to the business. Estimation can be a sore point too. Very few product owners are savvy enough to be able to create similar-sized stories for developers to take on. By working with the development team, over time, a product owner may be able create proper stories that can be easily estimated. But what if a story has a technical dependency on another story that is not related to customer requirements, for example a new component needs to be added to the system that neither adds features nor visibly improves the performance? It is simply needed to be able to help do a certain task in the future, without which other things in this Sprint or next cannot be done? Frequently, developers are afraid to raise the issue of so called Technical Stories with the product owner, and inflate an estimate on a dependent story to make up for complexity. Are we cheating again?

Thus far, we have wondered through the desert of sub-par software development processes for over 40 years. From 1970 to 2003, when David Anderson published his book “Agile Management for Software Engineering: Applying the Theory of Constraints for Business Results”. His idea was to find a way to bring Lean concepts over into Software Development, and so Kanban was born. Scrum looked at TPS and copied a few tricks from its tool-set: empiricism, self-organizing teams, visualization to name a few. It was designed to make it easier for traditional functional managers to start viewing software development in a different light. Yet it totally didn’t get it. While Scrum was a framework, a set of tools and ideas based on empiricism, TPS was a framework within a larger philosophical context, based on a number of core principles (4Ps): long term thinking (Philosophy), waste elimination (Process), growing people (People and Partners), and continuous improvement and learning (Problem Solving). We have come to know it as The Toyota Way. On the surface, Scrum would advocate all the same things. I have come to realize that it didn’t quite understand TPS. I am struggling to find any serious relationship between Scrum principles and long-term thinking, except perhaps the idea that customer knows best, and that Project Owner will think ahead. In fact, quick iterations and re-factoring as needed seriously handicap the long-term thinking idea, frequently producing an opposite result. Waste elimination is helped by the previous example, except that we are not looking for waste in the right place. Short of showing the value-stream map of Scrum and pointing out inefficiencies, here are the biggest offenders in my view:
1. Planning and estimating: hours of meetings which do not need to happen! What’s worse, planning often forces us to overestimate because customers may feel like we are not in control of situation if we miss the target. Reaching for the stars has become a punishable offense.
2. Specialized tasks that can be done more efficiently by specialists must be up for grab by anyone in the team in the name of learning and cross-pollination
3. Instead of leveling out the workflow, we are leveling out the iteration time. Since we often overestimate, the result can be idle time at the end of sprint, which we now can spend on continuous improvement tasks. This is great, but not always so depending on urgency of items in the backlog. And now the customer has to seat waiting for the sprint to complete. In my career, I have seen cases where the customers required daily production releases on a brand new product for a short period of time immediately after its release. Waiting for 2 weeks for the next release would have been totally unacceptable for the business.

Growing people in Scrum is done mainly through cross-pollination. After all, teams are supposed to be self-organizing and cross-functional. This works for a lot of people, including myself. But imagine a scenario where a specialist, say a Quantitative developer on a cross-functional team, needs to learn outside of his/her domain area. That kind of learning does little to help improve that specialist’s skills, and the person is quite likely to quit sooner than later. Lastly, continuous improvement and learning also get a bit shortchanged. While Scrum prescribes sprint reviews and retrospectives, TPS’ genchi genbutsu (go and see for yourself) is subliminally discouraged because it may require spikes (quick technical prototyping) which may require already tied up resources or take longer than an iteration. While Scrum does not say that decisions cannot be done slowly by consensus, the idea that design is done throughout the iteration means that this slow design process could be quite difficult to implement during iteration.

Kanban is just one of the tools used by TPS to provide visual flow. Though Kanban today stands to represent a process, it’s really just a board with sticky notes on it. The board is typically divided into columns that represent buffers and stations. A station is a step where the actual work is done by work cells, for example developers taking story cards and turning them into code. A buffer is where these story cards can accumulate to even out the flow. If users create more stories than developers can implement, the number of items grows. When customers slow down, developers keep pulling cards at an even speed and don’t need to slow down. In Kanban, we want to allocate stations and buffers in such a way that a pull system is be the most efficient, the flow even, and lead time (time from the moment a customer added a new story into the backlog till time when the item has been completely done) minimal. Optimizing lead time means that customers have more of an incentive to create tasks of similar size and complexity. We can now have a specialist station looking at tasks that require a specialist intervention (if we have to). This can be done as another station or a parallel swim lane. Release cycle can be separated from development cycle, so estimation (while still useful for release planning) is no longer required for development.

So far, I haven’t even scratched the surface Lean can give us. I think of Kanban as a Way House on the road to Lean. Ancient humans first invented a spear, and only then learned how to use it for hunting. Similarly, we can learn tools, such as Kanban, in order to gain understanding of the philosophy of The Toyota Way. I’ve heard many excuses for why Lean doesn’t work: we can’t make tasks (or stories) of small and even size; this was designed for manufacturing and will never work in a creative environment; companies work around the release cycles and lean makes it hard to create release schedules. These are just a few examples of what people might say. In reality, if we applied the 5 why’s technique to any of these arguments, they would collapse like a house of cards. The Lean ideas are simple and generic, and with minor modifications can be applied anywhere, where something is being produced: whether it’s a car or a software application. I have done it, and so could anyone. I just hope that after wondering in the desert all these years, we are now ready to embrace the inevitable.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s