Why I don’t like the concept of a root cause
I don’t like the concept of a so-called "root cause" because there’s always multiple factors involved that all need to align for an incident to happen. Usually when you change just one of those factors anywhere in the chain of events you will find that it would have prevented the incident from happening at all.
Another way of looking at this is that a root cause is typically identified as either a single individual or a single event which is to blame for the entire situation:
- Somebody didn’t follow the correct procedure.
- A specific piece of equipment broke down.
- A machine or system didn’t work as designed.
Such focus on a singular cause misses the fact that these events happen within complex socio-technical systems and that failures within these systems are often the result of organizational failure.
For example:
- Did the individual receive the right training? If they didn’t, how come they didn’t?
- Did they have adequate tooling at their disposal to correctly interpret the information and make the right decision given what they knew at the time?
- Did the environment around the affected person or service change in an unpredictable way?
- Did the system have sufficient fail-safes built in to handle unexpected situations?
- Was this a known failure mode that was never considered a priority?
“What you call “root cause” is simply the place where you stop looking any further.”
– Sidney Dekker, in The field guide to understanding human error.