This is a brief note on Postmortem Culture: Learning from Failure
The cost of failure is education - Devin Carraway
The postmortem concept is well known in technology industries, it’s a written record of an indicent and its impact, the actions taken to mitigate or resolve the issue, the root-causes and the follow-up actions to prevent likelihood of recurring.
The primary goals of writing a postmortem are to ensure that the incident is documented, that all contributing root causes are well understood, and, especially that the effective preventive actions are put in place to reduce them.
Writing a postmortem is not punishment - it is a learning oppotunity for the entire system.
Blameless postmortems are a tenet of SRE culture. For a postmortem to be blameless, it must focus on identifying the contributing causes of the incident without indicting any individual or team for bad or inappropriate behavior. A blameless written postmortem assumes that everyone involved in an incident had good intentions and did the right thing with the information they had. If a culture of finger pointing and shaming individuals or teams for doing the “wrong” thing prevails, people will not bring issues to light for fear of punishment.
Blameless culture originated in the healthcare and avionics industries where mistakes can be fatal. These industries nurture an enviroment where every “mistake” is seen as an oppotunity to strengthen the system. When postmortems shift from allocating blames to investigating the systematic reasons why an individual or team has incomplete or incorrect information, effective prevention plans can be put in place. You can’t “fix” people, but you can fix systems and processes to better support people making the right choices when designing and maintaining complex systems.
Avoid Blame and Keep It Constructive