I like the term he uses to describe the unexpected in production “Operational surprises”
What do we hope to learn?
- Tooling gaps
- Process gaps
- Knowledge gaps
- Mental model gaps
- Resource gaps: The law of stretched systems … socio-technical systems. Any team / system will eventually be running at capacity. What happens when you try to put more work in the system
- How can you tell when a skilled engineer becomes overloaded? It just looks like they’re busy until they fail
- Is a system staffed properly for there responsibilities
“How did we get here?” What were the series of events that got us here? Incident review template title: How did we get here?
“How did X seem reasonable in the moment?”
Narrative story as a critical section in the incident review. (postmortem) This isn’t the timeline. A timeline is important too. This data is fed back into a story that humans can hear and tell and feel something about.