I see some common ingrained misunderstandings around "Incident Reviews" / "Post Mortems" in technical orgs.

🧵

1. There is no single "root cause". IMO this term is harmful because, while it makes for an easily graspable concept, the metaphor encourages identifying a _single_ cause. There is _never_ a single reason behind an incident. Instead there are always several "contributing factors".

Show thread

2. "Human error" is _never_ a contributing factor (or "root cause" 🤬). The problem is that until Human 2.0 comes out it is completely unfixable. Humans don't make decisions or take actions in a vacuum. There is _always_ an outdated procedure, bad policy, false belief, missing documentation, poor tooling, or lack of training behind a mistake made by a human. That is something you can fix!

#DevOps #platform #sre #infosec

Follow

@smlx yeah, the airline industry has had this mindset for decades and makes it the most (?) safe form of travel. We can learn and become a true engineering discipline.

Sign in to participate in the conversation
Librem Social

Librem Social is an opt-in public network. Messages are shared under Creative Commons BY-SA 4.0 license terms. Policy.

Stay safe. Please abide by our code of conduct.

(Source code)

image/svg+xml Librem Chat image/svg+xml