After describing the Software Crisis in the previous episode, we discuss the various methodologies and practices implemented over the years to combat the complexities of software development. We’ll tell the sad story of the FBI’s VCF project – perhaps the most expensive failed software project ever – and hear about Dr. Fred Brooks’ classic book, ‘The Mythical Man-Month’.
Big thanks to Aviran Mordo from Wix.com for appearing in the episode.
Subscribe: iTunes | Android App | RSS link | Facebook | Twitter
Are Software Bugs Inevitable?
Written By: Ran Levi
Part II: The Most Expensive Failed Software Project Ever
In this episode, we’re continuing our discussion about the Software Crisis, which we introduced last week. If you missed that episode, you might want to go back and have a read before listening to this one.
The question we asked was: why do so many large software projects fail so often? THAT is the Software Crisis, a term that was first coined in 1968.
“Failure”, in the context of software, has many aspects: software projects tend to deviate from schedule and budgets, produce software that does not meet the customer’s specs or expectation – and often contains a significant number of errors and bugs. It is a question that troubled many engineers and computer scientists over the years: what makes software so complicated to engineer?
The solutions to this problem changed over the years, along with changes in the business culture of the High Tech world.
The Waterfall Methodology
In the 1960s, The dominant approach to software development – especially when it came to complicated projects – was known as the “Waterfall” methodology. This approach divides the software projects into well-defined stages: first, the customer defines the requirements for the product. A software architect – usually an experienced programmer – creates the outline of the system that fits these requirements, and then a team of programmers writes the actual code that fits this outline. In essence, the Waterfall approach is the same approach a carpenter would use when creating a new piece of furniture: Learn what the customer wants, draw a schematic outline of the product, measure twice – and cut once.
The name “Waterfall” hints at the progression between stages: a new stage in the project will begin only when the last one is complete, just like water flowing down a waterfall. You can’t start writing code until the client has defined their needs and wants. Seems like a sensible approach, and indeed the Waterfall methodology served engineers outside of the software industry for hundreds of years. Why shouldn’t it be used here as well?
But in the last twenty years, the “Waterfall” method has been under constant and profound criticism, coming from software developers and business leaders. The main argument against Waterfall is that even though it served other engineering disciplines, from architecture to electronics – it is not well-suited to the software field.
And why is that? Let’s examine this question through an example which is, most likely, one of the most expensive failures in the history of software engineering.
VCF – Virtual Case File
In the year 2000, the FBI decided to replace its entire computer system and network. Many of the agency’s computers were old and outdated and had no longer suited the needs of agents and investigators. Some of the computers still used 1980s green screens and didn’t even support using a mouse…After September 11th, FBI agents had to fax photos of suspects because the software they used couldn’t attach a document to their e-mails! It can’t get much worse than that…
Finally, at the end of that year, Congress approved a budget of four hundred million dollars for upgrading the FBI computer system and network. The project was supposed to take three years, and replace all computer stations with modern and capable machines, connected via a fast optic cable network system.
The crowning glory of the new system was a software called VCF, for “Virtual Case File”. VCF was supposed to allow agents at a crime scene to upload documents, images, audio files, and any other investigation material to a central database, in order to cross-reference information on suspects that they could later present in court. The company hired to write the advanced software was called SAIC.
The FBI Goes Waterfall
It’s important to note that the FBI has hundreds of thousands of people, and as most large organizations – it tends to be very bureaucratic and conservative. Naturally, the preferred methodology for the VCF project was the Waterfall, and so the project managers begun by writing an eight hundred page long document that specified all the requirements from the new software. This document was extremely detailed, with sentences like: “In such and such screen, there will be a button on the top left corner. The button’s label will read – ‘E-Mail’.” They didn’t leave the developers a lot of room for questions…
But in an organization as huge and varied as the FBI, it’s doubtful that there is one person, or even one group, who understands the practical daily needs of all the departments and groups for which the program was written. As the project progressed, it became clear that the original requirements didn’t meet the day-to-day needs of agents.
So, special groups were assigned to study the needs of the agents in the field, and they constantly updated the project’s requirements. As you might imagine, the constant changes made the requirements document almost irrelevant to the developers in SAIC who actually wrote the code. The events of September 11th gave the project a new sense of urgency, and the tight schedule soon created conflicts between the programmers and FBI management. The software developers were frustrated over the ever-changing requirements, while FBI agents felt their needs were being ignored.
A Dramatic Failure
Things got worse and worse, and by the beginning of 2003, it was clear that the new software wouldn’t be ready on time. Since the VCF project was deemed a matter of importance to national security, Congress approval an additional budget of two hundred million dollars. But that didn’t help either.
In December of 2003, a year after it was supposed to be ready, SAIC finally released the VCF’s first version. The FBI rejected it almost immediately! Not only was the software buggy and error-prone, it also lacked basic functions like “bookmarking” and search history. It was totally inadequate for field or office work.
In an effort to save the failing project, the FBI invited a committee of outside experts to consult the agency. One of the committee members, Professor Matt Blaze, later said that he and his colleagues were shocked once they analyzed the software. He jokingly told a reporter later, quote, “That was a little bit horrifying. A bunch of us were planning on committing a crime spree the day they switched over. If the new system didn’t work, it would have just put the FBI out of business.”
In January 2005, the head of the FBI decided to abandon the project. This wasn’t an easy decision since it meant that all the agency’s personnel would have to continue using the ancient computers from the 1980s and 90s for at least five more years. This had a sizeable impact on national security, not to mention all that money that was spent for nothing.
Why Did VCF Fail?
The VCF project failed even though the FBI used the age old approach of “measure twice, cut once”. It defined all the software requirements up front, and left nothing to chance – or so it seemed. Critics of the Waterfall methodology claim that the problem was that the FBI was never able to define its needs perfectly, or even adequately. In such a complex and big organization, defining all the software requirements up front is an almost hopeless task, since no single person knows everything that’s going on in the organization and also has a good grasp of what the technology can and can’t do.
The FBI’s VCF fiasco is typical, it seems, for large-scale software projects. I recently had a chance to speak with a very experienced programmer who has worked on a different large scale project.
“My name is Aviran Mordo, I’m the Head of Engineering for Wix.com. I’ve been working in the software industry for over twenty years, from startup companies here in Israel to developing the US National Archives.
“So working on a government project is everything that you hear that is wrong with Software Development, like Waterfall and long processes. We tried to be as Agile as we can, and during the prototyping phase, we actually succeeded – that is why we won the project. But when we started the actual writing of the project, hundreds of people came and worked on humongous documents that nobody read, and built this huge architecture with very very long processes. You could see that this is going to take a long time, and the project just suffered so badly… That was the point that I switched to a smaller team, to do that on the civilian market. We were just five people. We started about six months after the big project started. We were five people against a hundred people. After six months we were a year ahead of their development schedule.”
“They actually had to restart the project twice. [Ran: when you’re saying ‘restart’ you mean they just developed everything from the beginning?] Yes, they had to develop everything from the beginning. The architecture, everything. “
I asked Aviran why, in his opinion, the Waterfall methodology – which does a great job in other engineering disciplines – fails so miserably in software engineering.
“Several Things. One, you do the architecture up front and think you have all the answers. If you don’t have all the answers, you plan for the things you don’t actually know. So you build a lot of things that are wasting your time for future features that you think may affect the product or for requirements that may change in the future – but since the release cycle is so long, and it costs so much to do a new version, you try to cramp up as many features as you can into a project. This derails you from whatever really need to be achieved and have a quick feedback from the market and from the client. [it’s] A waste of time.”
Aviram’s view is echoed by many other developers, including the ones who investigated the failed FBI project. Waterfall assumes you have all the information you need before the project begins. As we already saw, software projects tend to be so complex, that this assumption is often wrong.
The Agile Methodology
So in the 1990s and early 2000s, an alternative to the unsuccessful Waterfall methodology appeared: its name was Agile Programming. The Agile methodology is almost the exact opposite of Waterfall: it encourages maximum flexibility during the development process. The strict requirement documents are abandoned in favor of constant and direct communication with the client.
Say, for example, that the customer wants an email client. The developers work for a week or two, and create a rough skeleton of the software: it might have just the very basic functions, or maybe just buttons that don’t do anything yet. They show the mockup to the customer, who then gives them his feedback: that button is good, that one is wrong, etc. The developers work on the mockup some more, fixing what needs to be fixed and maybe adding a few more features. There’s another demonstration and more feedback from the customer. This process repeats itself over and over until the software is perfected. Experience shows that projects developed using the Agile methodology are much less prone to failure: this is because the customers have ample time and opportunity to understand their true needs and “mold” the software to fit.
Not A Perfect Solution
So, is Agile the solution for the software crisis? Well, Probably not. Aviran, Wix’s Head of Engineering says that Agile is not suited for every kind of software project.
“It’s harder to plan ahead what’s the end goal. This [Agile] works well for a product which is innovative but it won’t work well for a product which is revolutionary. For example, the iPhone. The iPhone cannot work this way – it’s a really huge project with a lot of things in it, and it revolutionized the market. It could be developed with Agile, but you miss the step of getting it in front of your customer. So if you’re doing something which is ‘more of the same’ or you’re not trying to change the market – Agile works extremely well. If you’re trying to change the market and do something revolutionary, you could do Agile to some extent until you actually get to market.”
Also, Agile has been around for twenty years – and large-scale projects do still fail all around us. It is a big improvement over Waterfall, no doubt, but it probably won’t solve the software crisis.
So, what is the solution? Maybe a new and improved methodology? A higher-level programming language?
Dr. Frederick Brooks
One of the many computer scientists who tried to tackle this question was Dr. Frederick Brooks. In the 1960s and 70s Brooks managed some of IBM’s most innovative software projects – and naturally suffered some failures himself. Prompt by his own painful experiences, Brooks wrote a book in 1975 called “The Mythical Man-Month”. A “Man-Month” is a sort of a standard work-unit in a software project: the work an average programmer does in a single month.
Brooks’ book, and another important article he published called “No silver bullet”, influenced a whole generation of software programmers and managers. In his writings, Brooks analyzed the main differences between the craft of software writing and other engineering disciplines, focusing on what he perceived to be the most critical and important characteristic of software: its complexity.
Brooks claims there is no such thing as a “simple software”. Software, he writes, is a product of the human thinking process, which is unique to every individual. No two developers will write the exact same code, even if they are faced with the exact same problem. Each piece of software is always unique since each brain is unique. Imagine a world where each and every clock has its own unique clockwork mechanism. Now imagine a clockmaker trying to fix these clocks: each and every clock he opens is different from the rest! The levers, the tiny screws, the springs – there’s no uniformity. A new clock, a whole new mechanism. This uniqueness would make clock-fixing a complex task that requires a great deal of expertise – and that’s the same situation with software.
Now, high complexity can be found in other engineering disciplines – but we can almost always find ways to overcome it. Take electronic circuits, for example: in many cases, it is possible to overcome complexity by duplicating identical sub-circuits. You could increase a computer’s memory capacity by adding more memory cells: you’ll get an improved computer but the design’s complexity would hardly change since we’re basically just adding more of the same thing. Sadly, that’s often not true for software: each feature you add to a piece of software is usually unique.
What about architecture? Well, architects overcome the complexities of their designs by creating good schematics. Humans are generally good at working with drawings: an architect can open the schematics, take a quick look, and get a fairly good idea of what’s going on in the project. But software development, explains Brooks, does not allow such visualization. A typical software has both a dimension of time – that is, do this, then do that – and a dimension of space, such as moving information from one file to another. A diagram can usually represent only a single dimension, and so can never capture the whole picture. Imagine an architectural schematic that tries to capture both the walls and levels of a building – and the organizational structure of the company that will populate it… many times a software programmer is left with no choice but to build a mental model of the entire project he is working on; or if it is too complex, parts of it.
In other words, Brooks argues that unlike electronics or architecture, it is impossible to avoid the “built-in” complexity of software. And if this is true, then computer software will always contain bugs and errors, and large projects will always have a high risk of failure. There is no “Silver Bullet” to solve the Software Crisis, says Brooks.
But you might be asking – what about high-level languages? As we learned in the previous episode, advanced software languages like FORTRAN, C, and others greatly improved a programmer’s ability to tackle software complexity. They made programming easier, even suitable for kids. Isn’t is possible that in the future someone will invent a new, more successful programming language that would overcome the frustrating complexity of software?
The answer, according to Brooks, is no. High-level languages allow us to ignore the smaller details of programming and focus on the more abstract problems, like thinking up new algorithms. In a way, they remove some of the barriers between our thought processes and their implementation in the computer. But since software complexity is a direct result of our own thought processes, the natural complexity of the human mind – a higher level language will not solve the software crisis.
But Fredrik Brooks does have a rather simple solution – one which might solve the problems of most companies. His suggestion: don’t write software – buy it!
Purchasing software, says Brooks, will always be cheaper and easier than developing it. After all, buying Microsoft’s Office suite takes minutes, while developing it from scratch might take years – and fail. True, a purchased one-size-fits-all software will never meet all the organization’s needs – but it’s often easier to adjust to an existing software then create a new one. Brooks uses salary calculation software to illustrate his point: in the 1960s, most companies preferred writing custom software to handle their salary calculation. Why? Because back then, a computer system cost millions of dollars while writing software only a few tens of thousands. The costs of customized software seemed reasonable when compared to the cost of the hardware. Nowadays, however, computers cost just a few thousand dollars – while large-scale software projects cost millions. Therefore, almost all organizations learned to compromise and purchase standard office software such as “Excel” and “Word”. They adjusted their needs to fit the available, ready-made, software.
Brooks gave another piece of advice, this time to the software companies themselves: nurture your developers! Some brains are better suited to handle the natural complexity of software development. Just as with sports and music, not all people are born equally talented. There are “good programmers” – and then there are “outstanding programmers”. Brooks’ experience taught him that a gifted programmer will create software that is ten times better than the one made by an average programmer”. Software companies should try and identify these gifted individuals early in their careers, and nurture them properly: assign them good tutors, invest in their professional education, and so on. That kind of nurturing isn’t cheap – but neither is a failed software project.
To summarize, we started by asking whether software bugs and errors are inevitable – or can we hope to eliminate them sometime in the future. Digging deeper, we discovered an even more fundamental problem: large-scale software projects not only have plenty of bugs – they also tend to fail spectacularly. This is known as the Software Crisis.
Modern software development methodologies such as Agile can sometimes improve the prospects of large-scale projects – but not always, and not completely. The only real solution, at least according to Dr. Fredrick Brooks, is finding developers who are naturally gifted at handling the complexities of software – and helping them grow to their full potential. And until we find the Silver Bullet that will solve the Software Crisis, we’ll just have to…well – bite the bullet.