SlideShare une entreprise Scribd logo
1  sur  57
Télécharger pour lire hors ligne
How To Run a Post-Mortem With Humans (Not Robots)
Dan Milstein
Hut 8 Labs
@danmil
Act I: What The Hell Is a Post-Mortem Anyways?
Ahhhhhh! Something Very Bad Just Happened
What Is a Post-Mortem Anyways?
• Something you do when your company has badly screwed up
• E.g. your CEO demos your cloud storage system to an early prospective
customer, and, when he runs a search, it shows other customers’ data (I have
done this, it was not awesome)
• You get a bunch of people into a room and say: “How on earth did that
happen? And how can we make sure it never, ever happens again?”
• That’s a Post-Mortem
• But, there’s a problem....
Shameful Mistakes: Humans vs Robots
Human Beings Will Eff It Up
• Humans (unlike robots) feel this intense emotion called shame
• Shame will suggest (strongly) “Slow Down, Stop Making So Many Mistakes”
• Aka “Destroy your company by way of opportunity costs, immediately!”
• Has potential to be incredibly damaging to your startup
• And I have some bad news...
You Will Totally Experience Shame (I Still Do)
F.A.E.
This Emotional Experience Can Not Be Avoided
• I’ve run c. 50 post-mortems, have studied failure... and I still have this
emotional reaction
• You will, too. And so will your team.
• Much more strongly than you realize right now
• This is the “Fundamental Attribution Error” (FAE), from psychology
• FAE = humans vastly underestimate the power of a situation on our behavior
Big Idea: Adopt Economic, Not Moral Mindset
$, FTW
What Does That Mean
• Let me tell you a story...
Parable: A Tale of Two Factories
Two Factories
• Both make widgets
• Both are missing their monthly Widget Production goals by 10%
• But for different reasons...
Factory 1... Broken Machine
When The Machine Breaks...
• Belt slips off every once in a while
• Ruins a bunch of widgets
• Gotta replace it, drift a little behind plan
• So... what questions do humans ask in this situation?
• “How much is it costing us?”
• “How much does it cost to repair?”
• “Can we kludge a partial fix?”
• “What are risks if we delay a fix?”
Economic Mindset = Broken Machine
Note the Key Words
• “Cost”, “Partial”, “Risk”
• These are things you hear a lot in an economic discussion
• Okay, meanwhile in Factory 2, also missing by 10%, different reason...
Factory 2... One Employee Is an Axe Murderer
After Every Axe Murdering...
• Have to, like, hire a new guy, train him on the machine, takes forever
• Questions we asked before are now somehow deeply wrong:
• “What if we just cut down on the rate, so there’s less axe murdering?”
• “Hey, we can train a pool of temps on all the machines, when someone gets
killed, we’ll just swap some new guy in, bang, problem solved!”
• “How much is it really costing us, anyways?”
• These ideas seem obscene, not merely bad
Moral Mindset = Axe Murderer
“Search for villains,
elevation of accusers,
and mobilization of authority to
mete out punishment”
(Pinker, The Blank Slate)
Moral Mindset, Key Words
• “Villains”, “Accusers”, “Authority”, “Punishment”
• I believe that most companies, in investigating outages, act much more
like they’re looking for an axe murderer, than trying to fix a broken
machine
Act II: What To Do in the Post-Mortem Room
Challenge #1, As Person Running Post-Mortems
Get team out of moral mindset.
Note: this is not, in fact, easy.
Why It’s Hard
• Mindsets control how we interpret the world...
• ...including what people say to us
• So, a team sitting there, fearing moral censure, hears you say “We’re not
looking to blame anyone”, they just think you’re lying. How could you mean
that, when the thing that happened was so terrible and wrong?
• The deep trick (and this is the point of this whole presentation, frankly), is that
you have to take advantage of the thing that separates humans and robots...
Fundamental Tool: Make ‘Em Laugh
Humor == Breaking Frames
• That’s what humor actually is -- something that stretches or breaks the
mental frame that people are using to interpret a situation
• So, you use humor to break the frame, release people from the blame/fear/
punishment of the moral mindset, and then refocus them on the economic
challenges you’re facing
• The humor is, IMHO, not a nice-to-have. It’s absolutely central. I’ve seen
smart, caring leaders get this one wrong, and finish their post-mortems with a
room full of tense, closed-up team members (and no good ideas on the table)
• Talk has specific examples of this, but this is a central point
Tip 1: Share Your Personal “Bad Things”
Place The Bad Thing on a Continuum
• Moral mindset is very absolutist: this bad thing is The Worst Thing Ever
• I like to say “Okay, well it’s pretty bad, let’s compare it to some things”
• Did we irretrievably lose customer data? (I’ve done that, not awesome)
• Did we almost get our customer fired by her boss (also, not awesome)
• Did we send hundreds of emails to everyone on our customer’s mailing list...
but the emails were all question marks? For a customer who was in the
proofreading business? (done that, very much not awesome)
• People laugh, and then say “Okay, how bad was this, really?” Win.
More Stories of Actual Failures (Just For Fun)
• Did we break our allergies-to-medicines module, and risk having a doctor
prescribe the wrong medication to someone?
• Did our internet-connected home thermostat system have a server crash,
causing all the thermostats to set the temp to the default... of 85 degrees?
• Did our high-frequency trading program have flaws that led to our company
losing 450 million dollars? (that is a tough one to beat, IMHO)
• Collect your own! It’s fun!
Tip 2: Mock Hindsight Bias To Its Face
“Let’s plan for a future
where we’re all as stupid
as we are today.”
How Hindsight Bias Shows up in Post-Mortems
• Someone says “Oh, yeah, I screwed that one up, I knew I had to run the
deploy in that one order, and I just forgot. I’m really sorry, I won’t make that
mistake again, totally my bad.”
• You have to utterly reject this. It’s pure hindsight bias (easy to see errors after
the fact, very difficult in the moment).
• I say “It’s like we’re saying ‘I was stupid, this one time, and we’ll fix that
problem by never being stupid again.’”
• Hence: “planning for a future where we’re as stupid as we are today”
• aka “Must create a system which is resilient to occasional bouts of really
intense stupidity”.
Tip 3: Relish Absurdities of Your System
You Will Find That Your Code is a Mess
• E.g. you’ve refactored, and rewritten in python (or node or something), and
moved to the cloud, but this 5 whys is making clear that your most important
report is still run by a VisualCron job on a Windows server that never quite
made it out of the office... and someone just tripped on the power cord
• Team will feel ashamed, you have to give them license to relish absurdity
• I often point out “There are two kinds of startups: the ones that achieve some
modest traction on top of a pile of code of which they are vaguely ashamed...
and the ones that go out of business. That’s it. No third kind.”
• Also sometimes it helps to just laugh: “It’s kind of amazing this works at all”
Interlude: A Worked Example
Three Axioms For Leading Post-Mortems
• Everyone involved acted in good faith
• Everyone involved is competent
• We’re doing this to find improvements
Axioms == Ground Truth From Which You Start
• If you don’t start with these as givens...
• ...you’ll find yourself seeing every incident as human error
• Whereas, if you can convince/trick yourself into such beliefs...
• ...you’ll find a thousand valuable improvements to make
• Or, to put it another way:
Human Error is the Question, Not the Answer
Restate the Problem To Include TTR
We broke the db access code.
Restate the Problem To Include TTR
We pushed a deploy...
which broke db access code.
Restate the Problem To Include TTR
We pushed a deploy...
which broke the db access code...
and didn’t find out until customers complained.
Restate the Problem To Include TTR
We pushed a deploy...
which broke the db access code...
didn’t find out until customers complained...
and couldn’t fix it for three hours.
Redefining Problem Is Very Valuable
• People tend to focus on a single mistake
• Broaden that, to include full cycle back to restored service
• At what point was the triggering decision made?
• How long did it take to find out something was wrong?
• How long did it take to restore service?
“Broadest Fix” vs “Root Cause”
Handling a Fork in the Road
• Which is the Root Cause? DB access bug or monitoring failure?
• Answer: don’t care about “root causes”. They don’t exist (multiple things
conspire for failures to happen). Also, kind of moral/blame-ish.
• Ask instead: if we made an incremental improvement in area A or area B,
which would prevent the broadest class of problems going ahead?
• Much better conversation. Answer here is clear: monitoring.
Act III: Corrective Actions / Remediations / Fixes
Incrementalism Or You’re Fired
Require Small Steps From Your Team
• Team will tell you they have no option but to do Some Huge Thing
• You have to totally reject this, push for a small step
• e.g. “What’s the simplest, dumbest thing that will make it slightly better?”
• After some hemming and hawing, great, cheap ideas emerge
• Might be: small steps towards Huge Thing
• Or: installing data collection to prove Huge Thing is necessary
“Automation” vs “Tools”
“Automation” => Humans Cause Your Problems
• Strong
• Silent
• Clumsy
• Difficult to Direct
David Woods, “Decomposing Automation: Apparent Simplicity, Real Complexity”
Automation Written By People Who Don’t Do Job
“Tooling” => Humans Solve Your Problems
• How do the humans currently do their jobs?
• What tools do they use?
• When you give them a new tool, do they actually use it?
• How badly did you just screw up their jobs?
• YOU MUST ITERATE
Dan Mongers Some Fear
Here’s What’s Happening, Right Now
• Your systems are experiencing constant, small-scale failures... invisibly
• Your team is struggling to keep your systems running... but are so habituated
to it, they don’t even realize that’s true
• Your smart people are spending their smart cycles just trying to work around
the complexity in your system
• The business side is making plans which aren’t supported by your
infrastructure
• Customers are getting ready to surprise you, and it won’t be fun
Do This
• Elect a Post-Mortem Boss (Man|Lady)
• Look for a Goldilocks incident
• Expect awkwardness
• THERE MUST BE FIXES
• Incrementally improve the incremental improvements
Read This
• How Complex Systems Fail, Richard Cook (SOOOOO GOOOD)
• How the Mind Works, Steven Pinker (moral instinct, much other awesome)
• Thinking Fast and Slow, Daniel Kahneman
• The Field Guide to Understanding Human Error, Sidney Dekker
• Complications and Better, Atul Gawande (marvelous narratives)
• Kitchen Soap, blog by John Allspaw
Photo Credits, I
• “Wonderworks Upside Down Building”, by Andy Leonard, http://www.flickr.com/photos/
rover75/3901166997/
• “Robot de Martillo”, by Luis Perez, http://www.flickr.com/photos/pe5pe/2454661748/
• “Helios-Factory floor”, http://commons.wikimedia.org/wiki/File:Helioshall2.jpg
• “old machine”, by Jun Aoyama, http://www.flickr.com/photos/jam343/1730140/
• “Axe Marks The Spot”, by Alan Levine, http://www.flickr.com/photos/cogdog/4461665810/
• “Failboat Has Arrived”, http://www.rotskyinstitute.com/rotsky/wp-content/uploads/2008/02/
failboat2.jpg
Photo Credits, II
• “14 plugs but only 6 sockets”, by Jason Rogers, http://www.flickr.com/photos/restlessglobetrotter/
2661016046/
• “shame in scranton”, by Shira Golding Evergreen, http://www.flickr.com/photos/boojee/
3613772785/
• “tiny dollhouse steps”, by Yi-Tao “Timo” Lee, http://www.flickr.com/photos/timojazz/6235519218/
• “Computers can be stupid”, by Brent Moore, http://www.flickr.com/photos/brent_nashville/
2634912345/
• “Robot Uprising”, http://gordonandthewhale.com/wp-content/uploads/2010/10/How-To-Survive-a-
Robot-Uprising.jpeg
• “Shark”, by Steve Garner, http://www.flickr.com/photos/22032337@N02/8314569214/
Thanks...
Dan Milstein
Hut 8 Labs
@danmil

Contenu connexe

Tendances

Business Agility 2017 (final)
Business Agility 2017 (final)Business Agility 2017 (final)
Business Agility 2017 (final)Fabio Armani
 
Montreal Scaled Agile Meetup SAFe vs DAD
Montreal Scaled Agile Meetup SAFe vs DADMontreal Scaled Agile Meetup SAFe vs DAD
Montreal Scaled Agile Meetup SAFe vs DADEtienne Laverdière
 
Beyond Agile with Team Topologies
Beyond Agile with Team TopologiesBeyond Agile with Team Topologies
Beyond Agile with Team TopologiesRich Allen
 
Applying Disciplined Agile: Become a Learning Organization
Applying Disciplined Agile: Become a Learning OrganizationApplying Disciplined Agile: Become a Learning Organization
Applying Disciplined Agile: Become a Learning OrganizationScott W. Ambler
 
DevOps - Agile on Steroids by Tom Clement Oketch and Augustine Kisitu
DevOps - Agile on Steroids by Tom Clement Oketch and Augustine KisituDevOps - Agile on Steroids by Tom Clement Oketch and Augustine Kisitu
DevOps - Agile on Steroids by Tom Clement Oketch and Augustine KisituThoughtworks
 
AES22 - Bien démarrer sa transformation SAFe avec un LACE
AES22 - Bien démarrer sa transformation SAFe avec un LACEAES22 - Bien démarrer sa transformation SAFe avec un LACE
AES22 - Bien démarrer sa transformation SAFe avec un LACEAgile En Seine
 
Lean Inception & PBB: Cómo integrar ambas técnicas para construir el Backlog ...
Lean Inception & PBB: Cómo integrar ambas técnicas para construir el Backlog ...Lean Inception & PBB: Cómo integrar ambas técnicas para construir el Backlog ...
Lean Inception & PBB: Cómo integrar ambas técnicas para construir el Backlog ...Luis Buchelli
 
The Executives Step-by-Step Guide to Leading a Large-Scale Agile Transformation
The Executives Step-by-Step Guide to Leading a Large-Scale Agile TransformationThe Executives Step-by-Step Guide to Leading a Large-Scale Agile Transformation
The Executives Step-by-Step Guide to Leading a Large-Scale Agile TransformationLeadingAgile
 
Agile Reporting in JIRA
Agile Reporting in JIRAAgile Reporting in JIRA
Agile Reporting in JIRACprime
 
The 7 eyed model of Supervision webinar December 2014
The 7 eyed model of Supervision webinar December 2014 The 7 eyed model of Supervision webinar December 2014
The 7 eyed model of Supervision webinar December 2014 GP Strategies Limited
 
Modern Agile - Porque Agile necesitaba un refresh!
Modern Agile - Porque Agile necesitaba un refresh!Modern Agile - Porque Agile necesitaba un refresh!
Modern Agile - Porque Agile necesitaba un refresh!Johnny Ordóñez
 
An Introduction to Scaled Agile Framework (SAFe)
An Introduction to Scaled Agile Framework (SAFe)An Introduction to Scaled Agile Framework (SAFe)
An Introduction to Scaled Agile Framework (SAFe)CA Technologies
 
Introduction to Recipes for Agile Governance in the Enterprise (RAGE)
Introduction to Recipes for Agile Governance in the Enterprise (RAGE)Introduction to Recipes for Agile Governance in the Enterprise (RAGE)
Introduction to Recipes for Agile Governance in the Enterprise (RAGE)Cprime
 
Introduction to Agile and Lean Software Development
Introduction to Agile and Lean Software DevelopmentIntroduction to Agile and Lean Software Development
Introduction to Agile and Lean Software DevelopmentThanh Nguyen
 
Agile Greece Summit 2017 - Lean Business Agility
Agile Greece Summit 2017 - Lean Business AgilityAgile Greece Summit 2017 - Lean Business Agility
Agile Greece Summit 2017 - Lean Business AgilityAgile Greece
 

Tendances (20)

Business Agility 2017 (final)
Business Agility 2017 (final)Business Agility 2017 (final)
Business Agility 2017 (final)
 
Montreal Scaled Agile Meetup SAFe vs DAD
Montreal Scaled Agile Meetup SAFe vs DADMontreal Scaled Agile Meetup SAFe vs DAD
Montreal Scaled Agile Meetup SAFe vs DAD
 
Beyond Agile with Team Topologies
Beyond Agile with Team TopologiesBeyond Agile with Team Topologies
Beyond Agile with Team Topologies
 
Applying Disciplined Agile: Become a Learning Organization
Applying Disciplined Agile: Become a Learning OrganizationApplying Disciplined Agile: Become a Learning Organization
Applying Disciplined Agile: Become a Learning Organization
 
The Agile Mindset
The Agile MindsetThe Agile Mindset
The Agile Mindset
 
DevOps - Agile on Steroids by Tom Clement Oketch and Augustine Kisitu
DevOps - Agile on Steroids by Tom Clement Oketch and Augustine KisituDevOps - Agile on Steroids by Tom Clement Oketch and Augustine Kisitu
DevOps - Agile on Steroids by Tom Clement Oketch and Augustine Kisitu
 
Culture follows structure
Culture follows structureCulture follows structure
Culture follows structure
 
Agile Mindset
Agile MindsetAgile Mindset
Agile Mindset
 
Introduccion a LeSS
Introduccion a LeSSIntroduccion a LeSS
Introduccion a LeSS
 
Antifragile
AntifragileAntifragile
Antifragile
 
AES22 - Bien démarrer sa transformation SAFe avec un LACE
AES22 - Bien démarrer sa transformation SAFe avec un LACEAES22 - Bien démarrer sa transformation SAFe avec un LACE
AES22 - Bien démarrer sa transformation SAFe avec un LACE
 
Lean Inception & PBB: Cómo integrar ambas técnicas para construir el Backlog ...
Lean Inception & PBB: Cómo integrar ambas técnicas para construir el Backlog ...Lean Inception & PBB: Cómo integrar ambas técnicas para construir el Backlog ...
Lean Inception & PBB: Cómo integrar ambas técnicas para construir el Backlog ...
 
The Executives Step-by-Step Guide to Leading a Large-Scale Agile Transformation
The Executives Step-by-Step Guide to Leading a Large-Scale Agile TransformationThe Executives Step-by-Step Guide to Leading a Large-Scale Agile Transformation
The Executives Step-by-Step Guide to Leading a Large-Scale Agile Transformation
 
Agile Reporting in JIRA
Agile Reporting in JIRAAgile Reporting in JIRA
Agile Reporting in JIRA
 
The 7 eyed model of Supervision webinar December 2014
The 7 eyed model of Supervision webinar December 2014 The 7 eyed model of Supervision webinar December 2014
The 7 eyed model of Supervision webinar December 2014
 
Modern Agile - Porque Agile necesitaba un refresh!
Modern Agile - Porque Agile necesitaba un refresh!Modern Agile - Porque Agile necesitaba un refresh!
Modern Agile - Porque Agile necesitaba un refresh!
 
An Introduction to Scaled Agile Framework (SAFe)
An Introduction to Scaled Agile Framework (SAFe)An Introduction to Scaled Agile Framework (SAFe)
An Introduction to Scaled Agile Framework (SAFe)
 
Introduction to Recipes for Agile Governance in the Enterprise (RAGE)
Introduction to Recipes for Agile Governance in the Enterprise (RAGE)Introduction to Recipes for Agile Governance in the Enterprise (RAGE)
Introduction to Recipes for Agile Governance in the Enterprise (RAGE)
 
Introduction to Agile and Lean Software Development
Introduction to Agile and Lean Software DevelopmentIntroduction to Agile and Lean Software Development
Introduction to Agile and Lean Software Development
 
Agile Greece Summit 2017 - Lean Business Agility
Agile Greece Summit 2017 - Lean Business AgilityAgile Greece Summit 2017 - Lean Business Agility
Agile Greece Summit 2017 - Lean Business Agility
 

En vedette

If You Don't Know People, You Don't Know Ops
If You Don't Know People, You Don't Know OpsIf You Don't Know People, You Don't Know Ops
If You Don't Know People, You Don't Know OpsKate Matsudaira
 
Project post-mortem analysis
Project post-mortem analysisProject post-mortem analysis
Project post-mortem analysisJaiveer Singh
 
GameDay: Creating Resiliency Through Destruction - LISA11
GameDay: Creating Resiliency Through Destruction - LISA11GameDay: Creating Resiliency Through Destruction - LISA11
GameDay: Creating Resiliency Through Destruction - LISA11Jesse Robbins
 
Best Practices
Best PracticesBest Practices
Best Practicesjspet5
 
P Changes Mresentation
P Changes  MresentationP Changes  Mresentation
P Changes MresentationSama Queen
 
Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Taro L. Saito
 
Product Development Process Improvement - A Post-Mortem
Product Development Process Improvement - A Post-MortemProduct Development Process Improvement - A Post-Mortem
Product Development Process Improvement - A Post-MortemChief Innovation
 
Obsolete / Obsolescence for Parts & Materials
Obsolete / Obsolescence for Parts & MaterialsObsolete / Obsolescence for Parts & Materials
Obsolete / Obsolescence for Parts & MaterialsChief Innovation
 
Communicating Corporate Afterlife: Post-mortem Statements of Failed Startups
Communicating Corporate Afterlife: Post-mortem Statements of Failed StartupsCommunicating Corporate Afterlife: Post-mortem Statements of Failed Startups
Communicating Corporate Afterlife: Post-mortem Statements of Failed StartupsDobusch Leonhard
 
Post mortem report
Post mortem reportPost mortem report
Post mortem reportKuaci Pedas
 
Leveling up - taking your career to the next level
Leveling up - taking your career to the next levelLeveling up - taking your career to the next level
Leveling up - taking your career to the next levelKate Matsudaira
 
Leadership Without Management: Scaling Organizations by Scaling Engineers
Leadership Without Management: Scaling Organizations by Scaling EngineersLeadership Without Management: Scaling Organizations by Scaling Engineers
Leadership Without Management: Scaling Organizations by Scaling Engineersbcantrill
 
Cracking the Interview Skills (Coding, Soft Skills, Product Management) Handouts
Cracking the Interview Skills (Coding, Soft Skills, Product Management) HandoutsCracking the Interview Skills (Coding, Soft Skills, Product Management) Handouts
Cracking the Interview Skills (Coding, Soft Skills, Product Management) HandoutsGayle McDowell
 
Lean Learning Activities: The Post Mortem Process
Lean Learning Activities: The Post Mortem ProcessLean Learning Activities: The Post Mortem Process
Lean Learning Activities: The Post Mortem ProcessTom Curtis
 
How to build a Spaghetti Chart
How to build a Spaghetti ChartHow to build a Spaghetti Chart
How to build a Spaghetti ChartTom Curtis
 
3Com 3C5098-TP0
3Com 3C5098-TP03Com 3C5098-TP0
3Com 3C5098-TP0savomir
 
Pixar's 22 Rules to Phenomenal Storytelling
Pixar's 22 Rules to Phenomenal StorytellingPixar's 22 Rules to Phenomenal Storytelling
Pixar's 22 Rules to Phenomenal StorytellingGavin McMahon
 

En vedette (19)

If You Don't Know People, You Don't Know Ops
If You Don't Know People, You Don't Know OpsIf You Don't Know People, You Don't Know Ops
If You Don't Know People, You Don't Know Ops
 
Project post-mortem analysis
Project post-mortem analysisProject post-mortem analysis
Project post-mortem analysis
 
Post Mortem Template
Post Mortem TemplatePost Mortem Template
Post Mortem Template
 
GameDay: Creating Resiliency Through Destruction - LISA11
GameDay: Creating Resiliency Through Destruction - LISA11GameDay: Creating Resiliency Through Destruction - LISA11
GameDay: Creating Resiliency Through Destruction - LISA11
 
Best Practices
Best PracticesBest Practices
Best Practices
 
P Changes Mresentation
P Changes  MresentationP Changes  Mresentation
P Changes Mresentation
 
Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編
 
Product Development Process Improvement - A Post-Mortem
Product Development Process Improvement - A Post-MortemProduct Development Process Improvement - A Post-Mortem
Product Development Process Improvement - A Post-Mortem
 
Obsolete / Obsolescence for Parts & Materials
Obsolete / Obsolescence for Parts & MaterialsObsolete / Obsolescence for Parts & Materials
Obsolete / Obsolescence for Parts & Materials
 
Communicating Corporate Afterlife: Post-mortem Statements of Failed Startups
Communicating Corporate Afterlife: Post-mortem Statements of Failed StartupsCommunicating Corporate Afterlife: Post-mortem Statements of Failed Startups
Communicating Corporate Afterlife: Post-mortem Statements of Failed Startups
 
Post mortem report
Post mortem reportPost mortem report
Post mortem report
 
Leveling up - taking your career to the next level
Leveling up - taking your career to the next levelLeveling up - taking your career to the next level
Leveling up - taking your career to the next level
 
Leadership Without Management: Scaling Organizations by Scaling Engineers
Leadership Without Management: Scaling Organizations by Scaling EngineersLeadership Without Management: Scaling Organizations by Scaling Engineers
Leadership Without Management: Scaling Organizations by Scaling Engineers
 
Cracking the Interview Skills (Coding, Soft Skills, Product Management) Handouts
Cracking the Interview Skills (Coding, Soft Skills, Product Management) HandoutsCracking the Interview Skills (Coding, Soft Skills, Product Management) Handouts
Cracking the Interview Skills (Coding, Soft Skills, Product Management) Handouts
 
Lean Learning Activities: The Post Mortem Process
Lean Learning Activities: The Post Mortem ProcessLean Learning Activities: The Post Mortem Process
Lean Learning Activities: The Post Mortem Process
 
How to build a Spaghetti Chart
How to build a Spaghetti ChartHow to build a Spaghetti Chart
How to build a Spaghetti Chart
 
Post Mortem Changes
Post Mortem ChangesPost Mortem Changes
Post Mortem Changes
 
3Com 3C5098-TP0
3Com 3C5098-TP03Com 3C5098-TP0
3Com 3C5098-TP0
 
Pixar's 22 Rules to Phenomenal Storytelling
Pixar's 22 Rules to Phenomenal StorytellingPixar's 22 Rules to Phenomenal Storytelling
Pixar's 22 Rules to Phenomenal Storytelling
 

Similaire à How to Run a Post-Mortem (With Humans, Not Robots), Velocity 2013

How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)Dan Milstein
 
Corp Web Risks and Concerns
Corp Web Risks and ConcernsCorp Web Risks and Concerns
Corp Web Risks and ConcernsPINT Inc
 
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose TutorialArchitecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose TutorialWill Gallego
 
Supercharging your bug reports
Supercharging your bug reportsSupercharging your bug reports
Supercharging your bug reportsNeil Studd
 
Cross Functional Teams and the Product Manager
Cross Functional Teams and the Product ManagerCross Functional Teams and the Product Manager
Cross Functional Teams and the Product ManagerSVPMA
 
Startups and Smalltak - Presented at Smalltalks2014 Córdoba, Argentina
Startups and Smalltak - Presented at Smalltalks2014 Córdoba, ArgentinaStartups and Smalltak - Presented at Smalltalks2014 Córdoba, Argentina
Startups and Smalltak - Presented at Smalltalks2014 Córdoba, Argentinasebastian sastre
 
How PBworks Used Lean Startup Techniques
How PBworks Used Lean Startup TechniquesHow PBworks Used Lean Startup Techniques
How PBworks Used Lean Startup TechniquesDavid E. Weekly
 
Get Kudos from customers (without bribing them)
Get Kudos from customers (without bribing them)Get Kudos from customers (without bribing them)
Get Kudos from customers (without bribing them)Clairetalbott
 
The basics of e-service
The basics of e-serviceThe basics of e-service
The basics of e-serviceEric Reiss
 
Scrum and-xp-from-the-trenches 03 sprint backlog & daily scrum
Scrum and-xp-from-the-trenches 03 sprint backlog & daily scrumScrum and-xp-from-the-trenches 03 sprint backlog & daily scrum
Scrum and-xp-from-the-trenches 03 sprint backlog & daily scrumHossam Hassan
 
Conflict Management in Technology
Conflict Management in Technology Conflict Management in Technology
Conflict Management in Technology Denton Farley
 
Special Topics Day for Engineering Innovation Lecture on Cybersecurity
Special Topics Day for Engineering Innovation Lecture on CybersecuritySpecial Topics Day for Engineering Innovation Lecture on Cybersecurity
Special Topics Day for Engineering Innovation Lecture on CybersecurityMichael Rushanan
 
Uncharted lands, or why games are not designed but discovered
Uncharted lands, or why games are not designed but discoveredUncharted lands, or why games are not designed but discovered
Uncharted lands, or why games are not designed but discoveredJakub Stokalski
 
All That Glitters Is Not Gold: Usability Design for "When Things Go Wrong"
All That Glitters Is Not Gold: Usability Design for "When Things Go Wrong"All That Glitters Is Not Gold: Usability Design for "When Things Go Wrong"
All That Glitters Is Not Gold: Usability Design for "When Things Go Wrong"⌨️ Steven Proctor
 
Howtostopsucking
HowtostopsuckingHowtostopsucking
HowtostopsuckingHugo Pinto
 
How to stop sucking and be awesome instead
How to stop sucking and be awesome insteadHow to stop sucking and be awesome instead
How to stop sucking and be awesome insteadcodinghorror
 
Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01
Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01
Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01Hugo Pinto
 

Similaire à How to Run a Post-Mortem (With Humans, Not Robots), Velocity 2013 (20)

How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)
 
Corp Web Risks and Concerns
Corp Web Risks and ConcernsCorp Web Risks and Concerns
Corp Web Risks and Concerns
 
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose TutorialArchitecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
 
The alignment
The alignmentThe alignment
The alignment
 
Supercharging your bug reports
Supercharging your bug reportsSupercharging your bug reports
Supercharging your bug reports
 
Cross Functional Teams and the Product Manager
Cross Functional Teams and the Product ManagerCross Functional Teams and the Product Manager
Cross Functional Teams and the Product Manager
 
Startups and Smalltak - Presented at Smalltalks2014 Córdoba, Argentina
Startups and Smalltak - Presented at Smalltalks2014 Córdoba, ArgentinaStartups and Smalltak - Presented at Smalltalks2014 Córdoba, Argentina
Startups and Smalltak - Presented at Smalltalks2014 Córdoba, Argentina
 
How PBworks Used Lean Startup Techniques
How PBworks Used Lean Startup TechniquesHow PBworks Used Lean Startup Techniques
How PBworks Used Lean Startup Techniques
 
11 13 format
11 13 format11 13 format
11 13 format
 
Get Kudos from customers (without bribing them)
Get Kudos from customers (without bribing them)Get Kudos from customers (without bribing them)
Get Kudos from customers (without bribing them)
 
The basics of e-service
The basics of e-serviceThe basics of e-service
The basics of e-service
 
Growth meetup-q4-2014
Growth meetup-q4-2014Growth meetup-q4-2014
Growth meetup-q4-2014
 
Scrum and-xp-from-the-trenches 03 sprint backlog & daily scrum
Scrum and-xp-from-the-trenches 03 sprint backlog & daily scrumScrum and-xp-from-the-trenches 03 sprint backlog & daily scrum
Scrum and-xp-from-the-trenches 03 sprint backlog & daily scrum
 
Conflict Management in Technology
Conflict Management in Technology Conflict Management in Technology
Conflict Management in Technology
 
Special Topics Day for Engineering Innovation Lecture on Cybersecurity
Special Topics Day for Engineering Innovation Lecture on CybersecuritySpecial Topics Day for Engineering Innovation Lecture on Cybersecurity
Special Topics Day for Engineering Innovation Lecture on Cybersecurity
 
Uncharted lands, or why games are not designed but discovered
Uncharted lands, or why games are not designed but discoveredUncharted lands, or why games are not designed but discovered
Uncharted lands, or why games are not designed but discovered
 
All That Glitters Is Not Gold: Usability Design for "When Things Go Wrong"
All That Glitters Is Not Gold: Usability Design for "When Things Go Wrong"All That Glitters Is Not Gold: Usability Design for "When Things Go Wrong"
All That Glitters Is Not Gold: Usability Design for "When Things Go Wrong"
 
Howtostopsucking
HowtostopsuckingHowtostopsucking
Howtostopsucking
 
How to stop sucking and be awesome instead
How to stop sucking and be awesome insteadHow to stop sucking and be awesome instead
How to stop sucking and be awesome instead
 
Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01
Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01
Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01
 

Dernier

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Dernier (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

How to Run a Post-Mortem (With Humans, Not Robots), Velocity 2013

  • 1. How To Run a Post-Mortem With Humans (Not Robots) Dan Milstein Hut 8 Labs @danmil
  • 2. Act I: What The Hell Is a Post-Mortem Anyways?
  • 3. Ahhhhhh! Something Very Bad Just Happened
  • 4. What Is a Post-Mortem Anyways? • Something you do when your company has badly screwed up • E.g. your CEO demos your cloud storage system to an early prospective customer, and, when he runs a search, it shows other customers’ data (I have done this, it was not awesome) • You get a bunch of people into a room and say: “How on earth did that happen? And how can we make sure it never, ever happens again?” • That’s a Post-Mortem • But, there’s a problem....
  • 6. Human Beings Will Eff It Up • Humans (unlike robots) feel this intense emotion called shame • Shame will suggest (strongly) “Slow Down, Stop Making So Many Mistakes” • Aka “Destroy your company by way of opportunity costs, immediately!” • Has potential to be incredibly damaging to your startup • And I have some bad news...
  • 7. You Will Totally Experience Shame (I Still Do) F.A.E.
  • 8. This Emotional Experience Can Not Be Avoided • I’ve run c. 50 post-mortems, have studied failure... and I still have this emotional reaction • You will, too. And so will your team. • Much more strongly than you realize right now • This is the “Fundamental Attribution Error” (FAE), from psychology • FAE = humans vastly underestimate the power of a situation on our behavior
  • 9. Big Idea: Adopt Economic, Not Moral Mindset $, FTW
  • 10. What Does That Mean • Let me tell you a story...
  • 11. Parable: A Tale of Two Factories
  • 12. Two Factories • Both make widgets • Both are missing their monthly Widget Production goals by 10% • But for different reasons...
  • 14. When The Machine Breaks... • Belt slips off every once in a while • Ruins a bunch of widgets • Gotta replace it, drift a little behind plan • So... what questions do humans ask in this situation?
  • 15. • “How much is it costing us?” • “How much does it cost to repair?” • “Can we kludge a partial fix?” • “What are risks if we delay a fix?” Economic Mindset = Broken Machine
  • 16. Note the Key Words • “Cost”, “Partial”, “Risk” • These are things you hear a lot in an economic discussion • Okay, meanwhile in Factory 2, also missing by 10%, different reason...
  • 17. Factory 2... One Employee Is an Axe Murderer
  • 18. After Every Axe Murdering... • Have to, like, hire a new guy, train him on the machine, takes forever • Questions we asked before are now somehow deeply wrong: • “What if we just cut down on the rate, so there’s less axe murdering?” • “Hey, we can train a pool of temps on all the machines, when someone gets killed, we’ll just swap some new guy in, bang, problem solved!” • “How much is it really costing us, anyways?” • These ideas seem obscene, not merely bad
  • 19. Moral Mindset = Axe Murderer “Search for villains, elevation of accusers, and mobilization of authority to mete out punishment” (Pinker, The Blank Slate)
  • 20. Moral Mindset, Key Words • “Villains”, “Accusers”, “Authority”, “Punishment” • I believe that most companies, in investigating outages, act much more like they’re looking for an axe murderer, than trying to fix a broken machine
  • 21. Act II: What To Do in the Post-Mortem Room
  • 22. Challenge #1, As Person Running Post-Mortems Get team out of moral mindset. Note: this is not, in fact, easy.
  • 23. Why It’s Hard • Mindsets control how we interpret the world... • ...including what people say to us • So, a team sitting there, fearing moral censure, hears you say “We’re not looking to blame anyone”, they just think you’re lying. How could you mean that, when the thing that happened was so terrible and wrong? • The deep trick (and this is the point of this whole presentation, frankly), is that you have to take advantage of the thing that separates humans and robots...
  • 24. Fundamental Tool: Make ‘Em Laugh
  • 25. Humor == Breaking Frames • That’s what humor actually is -- something that stretches or breaks the mental frame that people are using to interpret a situation • So, you use humor to break the frame, release people from the blame/fear/ punishment of the moral mindset, and then refocus them on the economic challenges you’re facing • The humor is, IMHO, not a nice-to-have. It’s absolutely central. I’ve seen smart, caring leaders get this one wrong, and finish their post-mortems with a room full of tense, closed-up team members (and no good ideas on the table) • Talk has specific examples of this, but this is a central point
  • 26. Tip 1: Share Your Personal “Bad Things”
  • 27. Place The Bad Thing on a Continuum • Moral mindset is very absolutist: this bad thing is The Worst Thing Ever • I like to say “Okay, well it’s pretty bad, let’s compare it to some things” • Did we irretrievably lose customer data? (I’ve done that, not awesome) • Did we almost get our customer fired by her boss (also, not awesome) • Did we send hundreds of emails to everyone on our customer’s mailing list... but the emails were all question marks? For a customer who was in the proofreading business? (done that, very much not awesome) • People laugh, and then say “Okay, how bad was this, really?” Win.
  • 28. More Stories of Actual Failures (Just For Fun) • Did we break our allergies-to-medicines module, and risk having a doctor prescribe the wrong medication to someone? • Did our internet-connected home thermostat system have a server crash, causing all the thermostats to set the temp to the default... of 85 degrees? • Did our high-frequency trading program have flaws that led to our company losing 450 million dollars? (that is a tough one to beat, IMHO) • Collect your own! It’s fun!
  • 29. Tip 2: Mock Hindsight Bias To Its Face “Let’s plan for a future where we’re all as stupid as we are today.”
  • 30. How Hindsight Bias Shows up in Post-Mortems • Someone says “Oh, yeah, I screwed that one up, I knew I had to run the deploy in that one order, and I just forgot. I’m really sorry, I won’t make that mistake again, totally my bad.” • You have to utterly reject this. It’s pure hindsight bias (easy to see errors after the fact, very difficult in the moment). • I say “It’s like we’re saying ‘I was stupid, this one time, and we’ll fix that problem by never being stupid again.’” • Hence: “planning for a future where we’re as stupid as we are today” • aka “Must create a system which is resilient to occasional bouts of really intense stupidity”.
  • 31. Tip 3: Relish Absurdities of Your System
  • 32. You Will Find That Your Code is a Mess • E.g. you’ve refactored, and rewritten in python (or node or something), and moved to the cloud, but this 5 whys is making clear that your most important report is still run by a VisualCron job on a Windows server that never quite made it out of the office... and someone just tripped on the power cord • Team will feel ashamed, you have to give them license to relish absurdity • I often point out “There are two kinds of startups: the ones that achieve some modest traction on top of a pile of code of which they are vaguely ashamed... and the ones that go out of business. That’s it. No third kind.” • Also sometimes it helps to just laugh: “It’s kind of amazing this works at all”
  • 34. Three Axioms For Leading Post-Mortems • Everyone involved acted in good faith • Everyone involved is competent • We’re doing this to find improvements
  • 35. Axioms == Ground Truth From Which You Start • If you don’t start with these as givens... • ...you’ll find yourself seeing every incident as human error • Whereas, if you can convince/trick yourself into such beliefs... • ...you’ll find a thousand valuable improvements to make • Or, to put it another way:
  • 36. Human Error is the Question, Not the Answer
  • 37. Restate the Problem To Include TTR We broke the db access code.
  • 38. Restate the Problem To Include TTR We pushed a deploy... which broke db access code.
  • 39. Restate the Problem To Include TTR We pushed a deploy... which broke the db access code... and didn’t find out until customers complained.
  • 40. Restate the Problem To Include TTR We pushed a deploy... which broke the db access code... didn’t find out until customers complained... and couldn’t fix it for three hours.
  • 41. Redefining Problem Is Very Valuable • People tend to focus on a single mistake • Broaden that, to include full cycle back to restored service • At what point was the triggering decision made? • How long did it take to find out something was wrong? • How long did it take to restore service?
  • 42. “Broadest Fix” vs “Root Cause”
  • 43. Handling a Fork in the Road • Which is the Root Cause? DB access bug or monitoring failure? • Answer: don’t care about “root causes”. They don’t exist (multiple things conspire for failures to happen). Also, kind of moral/blame-ish. • Ask instead: if we made an incremental improvement in area A or area B, which would prevent the broadest class of problems going ahead? • Much better conversation. Answer here is clear: monitoring.
  • 44. Act III: Corrective Actions / Remediations / Fixes
  • 46. Require Small Steps From Your Team • Team will tell you they have no option but to do Some Huge Thing • You have to totally reject this, push for a small step • e.g. “What’s the simplest, dumbest thing that will make it slightly better?” • After some hemming and hawing, great, cheap ideas emerge • Might be: small steps towards Huge Thing • Or: installing data collection to prove Huge Thing is necessary
  • 48. “Automation” => Humans Cause Your Problems • Strong • Silent • Clumsy • Difficult to Direct David Woods, “Decomposing Automation: Apparent Simplicity, Real Complexity”
  • 49. Automation Written By People Who Don’t Do Job
  • 50. “Tooling” => Humans Solve Your Problems • How do the humans currently do their jobs? • What tools do they use? • When you give them a new tool, do they actually use it? • How badly did you just screw up their jobs? • YOU MUST ITERATE
  • 52. Here’s What’s Happening, Right Now • Your systems are experiencing constant, small-scale failures... invisibly • Your team is struggling to keep your systems running... but are so habituated to it, they don’t even realize that’s true • Your smart people are spending their smart cycles just trying to work around the complexity in your system • The business side is making plans which aren’t supported by your infrastructure • Customers are getting ready to surprise you, and it won’t be fun
  • 53. Do This • Elect a Post-Mortem Boss (Man|Lady) • Look for a Goldilocks incident • Expect awkwardness • THERE MUST BE FIXES • Incrementally improve the incremental improvements
  • 54. Read This • How Complex Systems Fail, Richard Cook (SOOOOO GOOOD) • How the Mind Works, Steven Pinker (moral instinct, much other awesome) • Thinking Fast and Slow, Daniel Kahneman • The Field Guide to Understanding Human Error, Sidney Dekker • Complications and Better, Atul Gawande (marvelous narratives) • Kitchen Soap, blog by John Allspaw
  • 55. Photo Credits, I • “Wonderworks Upside Down Building”, by Andy Leonard, http://www.flickr.com/photos/ rover75/3901166997/ • “Robot de Martillo”, by Luis Perez, http://www.flickr.com/photos/pe5pe/2454661748/ • “Helios-Factory floor”, http://commons.wikimedia.org/wiki/File:Helioshall2.jpg • “old machine”, by Jun Aoyama, http://www.flickr.com/photos/jam343/1730140/ • “Axe Marks The Spot”, by Alan Levine, http://www.flickr.com/photos/cogdog/4461665810/ • “Failboat Has Arrived”, http://www.rotskyinstitute.com/rotsky/wp-content/uploads/2008/02/ failboat2.jpg
  • 56. Photo Credits, II • “14 plugs but only 6 sockets”, by Jason Rogers, http://www.flickr.com/photos/restlessglobetrotter/ 2661016046/ • “shame in scranton”, by Shira Golding Evergreen, http://www.flickr.com/photos/boojee/ 3613772785/ • “tiny dollhouse steps”, by Yi-Tao “Timo” Lee, http://www.flickr.com/photos/timojazz/6235519218/ • “Computers can be stupid”, by Brent Moore, http://www.flickr.com/photos/brent_nashville/ 2634912345/ • “Robot Uprising”, http://gordonandthewhale.com/wp-content/uploads/2010/10/How-To-Survive-a- Robot-Uprising.jpeg • “Shark”, by Steve Garner, http://www.flickr.com/photos/22032337@N02/8314569214/