I like the term he uses to describe the unexpected in production “Operational surprises”

What do we hope to learn?

  • Tooling gaps
  • Process gaps
  • Knowledge gaps
    • Mental model gaps
  • Resource gaps: The law of stretched systems … socio-technical systems. Any team / system will eventually be running at capacity. What happens when you try to put more work in the system
    • How can you tell when a skilled engineer becomes overloaded? It just looks like they’re busy until they fail
    • Is a system staffed properly for there responsibilities

“How did we get here?” What were the series of events that got us here? Incident review template title: How did we get here?

“How did X seem reasonable in the moment?”

Narrative story as a critical section in the incident review. (postmortem) This isn’t the timeline. A timeline is important too. This data is fed back into a story that humans can hear and tell and feel something about.