Note: this page is part of the “Essays on Software Engineering”
When you realize your interfaces do not match, too late in the development process.
On 15 March 1986, the Hotel New York rapidly disintegrated in less than a minute at about 11:25 am
The investigation led to the discovery that the original structural engineer had made a serious error in calculating the building’s structural load. The structural engineer had calculated the building’s live load (the weight of the building’s potential inhabitants, furniture, fixtures, and fittings) but the building’s dead load (the weight of the building itself) was completely omitted from the calculation. This meant that the building as constructed could not support its own weight. Collapsing was only a matter of time. After three different supporting columns failed in the days before the disaster, the other columns—which took on the added weight no longer supported by the failed columns—could not support the building
Does that story sounds crazy to you? It should. Architectural failures are rare, which is a good think. Yet, this is how we build software today and failures are still very common in the software industry.
The building metaphor
Building a new house follows a clear process that is more or less like this:
- Establish the requirements (number of bathroom, bedrooms, placement, type of heat, etc.)
- Inspect them, validate them with the different stakeholders (customers, suppliers, etc)
- Work on an execution plan with milestones
- Inspect the work by independent inspector.
- Enjoy your new place.
For example, if you build a house, you are going to start the foundations, build the structure of the house, install the plumbing and electricity and finally finish with the dry-wall/paint. Everybody will understand you cannot start to install the bathtub if the plumbing is not installed. You need to follow a specific order, otherwise, you face major delays and cost impact (e.g. if you install the bathroom before the plumbing is validated and checked, you might have to demolish the bathroom to fix it later).
Designing software follows the same rules and processes.
For example, when you design a system with a database, you need to understand the type of data, the frequency and type of access before adding indexes or think of potential sharding schemes. These would be your requirements. It will impact how you access or cache the data but can also impact how you will deploy your system.
However, very often, engineers overlook the most basic requirements and build systems that are either not satisfying them (with system under-performing) or go full berserk mode and over-satisfy them (with a higher implementation cost). Both are not efficient. A great example here is the launch of the healthcare.gov: the system was designed to handle 50,000 simultaneous connections when 250,000 people visited the website on launch day (wikipedia).
The typical argument for a lack of attention on quality is that software can be changed easily once delivered. This argument is a fallacy, especially once the system is deployed in production. Changing a system reliably while fully deployed in production is way harder than changing it at design-phase.
An engineer upgrading a system in production
The case of safety-critical systems
Surprisingly, safety-critical systems (such as the one used in plane or rockets) do not fail as much as regular ones.
The main reasons is because as construction, these types of systems are regulated and need to follow a rigorous production process that catches such mistakes. Avionics and aerospace software follow more or less the same constraints than software running nuclear power plants. And surprisingly, these processes follow the same steps than the one used to build a new house (start with the requirements, validate them, have independent inspection, etc.). They also have clear inspection and requirements to satisfy by law (note that interestingly, cars are not as regulated).
The takeaway: when you carefully follow a (good) process, you avoid mistakes. As Bezos says: Good intentions don’t work. Processes do.
Should we learn better?
The main problem is the lack of knowledge. Projects do not fail intentionally, they fail because engineers do not know better. There is a gap in knowledge, where most engineers see software development only through the lens of programming/coding and never heard what a software architecture is.
We need to teach and explain what software architecture is and how systems should be designed and built. These concepts have been missed, especially in a world where we promise to become a software engineer in just a few weeks of learning. In a world where most of the future software engineers will not be able to get a degree before they enter the workforce, it is becoming urgent to teach these values.
The solution for better software: be architects