AI Incident Postmortem Prompts (2026)

Q: How do you keep an AI postmortem blameless?

The blameless frame has to be encoded in the prompt as hard constraints, not added as a disclaimer at the top. Three rules carry it: forbid individual names in any output and replace them with actor roles, enforced at the acceptance clause; reject "human error" as a valid root cause and require the model to go one layer deeper to the condition that made the error possible; and instruct the model to ask "what conditions made this failure possible" instead of "who caused this." The prompt also states that humans acted reasonably given the information available at the time, and writes action items in systemic language — "update the runbook," not "train the engineer." Default model outputs drift toward naming whoever deployed the change, so the frame survives only by construction.

Q: Should the timeline and root-cause analysis be one prompt or two?

Two prompts, run in sequence, with the stable timeline as input to the analysis. Combining them lets interpretation creep into the timeline — the model mixes facts with explanation, and a reconstructed timeline that quietly smooths a ten-minute gap is worse than one that flags the gap. The timeline prompt produces facts only: each event tagged by source, timestamp normalized to one time zone, gaps and contradictions flagged rather than guessed at, and actor roles instead of names. Only once that timeline is stable does the second prompt turn facts into explanation through a layered analysis. Splitting the work keeps each artifact honest about what it is.

Q: What makes a postmortem action item actually ship?

Three filled-in fields: an owner (role, then team, then individual, in that order), a deadline, and a verification criterion. Most postmortem action items never ship because, as written, they are not actionable — "improve monitoring" has no owner, no deadline, and no definition of done, so it sits in a doc until the next incident surfaces the same gap. The extraction prompt refuses to produce an item without all three, flagging anything it cannot complete as "incomplete — needs triage" rather than emitting a malformed row. The verification criterion is the field that usually gets skipped; require it to be observable, measurable, and reproducible — "alert fires within 60 seconds when queue depth exceeds 10,000 messages, verified by injecting synthetic load" rather than "add alerting on queue depth."

Imtiaz Rayhan

The pager fires at 02:41. By 03:20 the site is back up. The engineer who caught the page writes a short Slack update, goes to bed, and a week later the "postmortem" is a half-filled template with a timeline nobody trusts and three action items nobody owns. The incident happens again in six weeks. The labor of a good postmortem is high; the leverage is higher. The reason postmortems decay is that the labor comes due when everyone is already tired.

This is exactly the kind of work AI speeds up without hollowing out — if you prompt it right. Reconstructing a timeline from raw logs, chat transcripts, and ticket updates is mechanical. Turning a tangled incident into a blameless narrative is pattern work an AI can do consistently. Extracting action items with owners and deadlines is extraction. All three are low-judgment, high-labor tasks where postmortems break down first when humans are doing them alone at 2am.

What AI will not do on its own is hold the blameless frame. Given a transcript where one engineer deployed the change that triggered the incident, default outputs drift toward naming that engineer. Good postmortem prompts carry the frame explicitly: focus on systems, not individuals; ask what conditions made the failure possible, not who clicked the button.

This post sits in the engineering track of our prompt engineering for business teams guide and pairs with AI architecture review prompts, AI technical spec prompts, and AI SOP writing prompts.

Why Postmortems Are Hard and Why They Matter

The labor is grinding. A medium-severity incident produces an hour of Slack chatter across three channels, dozens of log snippets pasted into threads, a half-dozen tickets updated in flight, and a handful of dashboards that were open during the response but not captured anywhere. Reconstructing what happened, in order, is mostly sorting and deduplication.

The leverage is where an engineering organization's prevention value lives. A good postmortem catches a latent fault — a missing alert, a fragile dependency, a silent degradation — before it causes the next incident. It also builds institutional memory: the second time someone hits the same class of bug, they find the first postmortem and skip ahead.

Labor and leverage live on different schedules. The labor is due now; the leverage shows up months later, as an incident that did not happen. Teams under pressure cut the labor. AI closes the gap — less labor for the same leverage, which makes the postmortem survive the 2am reality.

Pattern 1: Timeline Reconstruction from Raw Data

The first pattern feeds the AI everything: log lines with timestamps, Slack transcripts, PagerDuty events, deploy notifications, ticket updates. The output is a single chronological timeline with each event tagged by source, timestamp normalized to one time zone, and ambiguous events flagged rather than guessed at.

Three things make this prompt work. One, you specify the normalized timestamp format explicitly — the model will otherwise mix ISO 8601 and human-readable strings. Two, you tell it to preserve the source of every event, so a reviewer can audit a suspicious entry back to the original log or message. Three, you instruct it to flag gaps and contradictions rather than paper over them. A reconstructed timeline that quietly smooths a ten-minute gap is worse than one that notes the gap explicitly.

code

ROLE:
  You are an SRE reconstructing an incident timeline from raw
  operational data. You order events strictly by timestamp. You
  preserve the source of every event. You flag gaps and
  contradictions — you do not guess to fill them.

CONTEXT:
  Incident ID: [paste]
  Time zone for normalized output: UTC
  Raw inputs (each in a separate block, clearly labeled):
    - Slack transcript from #incidents channel
    - Slack transcript from #eng-oncall channel
    - PagerDuty event log
    - Deploy notifications from CI
    - Log lines from affected services
    - Ticket updates from the incident ticket

TASK:
  Produce a single chronological timeline. For each event, include:
    - Timestamp in UTC, ISO 8601 format, second precision.
    - Source (which input block it came from).
    - Event description — one sentence, factual, no interpretation.
    - Actor role if visible (e.g., "on-call engineer", "CI system",
      "automated alert") — NEVER a person's name.

  If two events have the same timestamp, order by source precedence:
  automated alerts, logs, deploy notifications, PagerDuty, Slack,
  tickets.

  If there is a gap longer than five minutes with no events from any
  source, insert a row: "[GAP: N minutes with no recorded events]".

  If two sources contradict (e.g., Slack says service recovered at
  03:14 but logs show errors until 03:17), insert a row:
  "[CONTRADICTION: <source A> says X, <source B> says Y]".

FORMAT:
  Markdown table: timestamp, source, event, actor role.

ACCEPTANCE:
  - No individual names anywhere in the output — actor roles only.
  - Every event traceable to a source block.
  - Gaps and contradictions flagged, not smoothed.
  - No interpretation — facts only. Interpretation belongs in the
    root-cause analysis, not the timeline.

The "actor role, never a name" constraint matters. A timeline with names reads as a record of who did what. A timeline with roles reads as a record of how the system behaved. The same events, different framing, very different postmortem.

Pattern 2: Root-Cause Analysis with Blameless Framing Baked In

Once the timeline is stable, root-cause analysis turns facts into explanation. Why did the incident happen? Why was it not caught earlier? Why did the response take as long as it did? These are system questions, and the prompt has to enforce that — otherwise the default answer is "because someone did X."

The blameless frame is not a disclaimer at the top of the prompt. It is a set of hard constraints threaded through the task description. The model is told to ask "what conditions made this failure possible" instead of "who caused this failure." It is told that humans acted reasonably given the information available to them at the time, and that any analysis which treats a human action as the root cause must go one layer deeper — what allowed the human to take that action, what would have caught it, what training or tooling or alerting was missing.

A useful structure is the layered "why" with explicit constraints at each layer:

Layer	Question the layer answers	What counts as a valid answer
Trigger	What event immediately preceded the failure?	A specific change, request, or condition — never an individual's action alone.
Proximate cause	What system behavior caused the user-visible symptom?	A component, dependency, or interaction between components.
Contributing factors	What conditions allowed the proximate cause to have impact?	Missing safeguards, capacity limits, stale assumptions, gaps in monitoring.
Root causes	What systemic properties made those contributing factors present?	Process, architecture, or organizational conditions — never "engineer X should have known."
Detection and response	What slowed discovery or recovery?	Gaps in alerting, runbook coverage, access, or escalation — not individual response time.

The prompt supplies this structure and forbids the model from naming individuals in any layer. If the analysis would otherwise name someone, it is told to replace the name with the role plus the condition that enabled the action — "the on-call engineer, operating from a runbook that had not been updated since the last schema change." The frame is preserved by construction.

Pattern 3: Action-Item Extraction with Owners, Deadlines, Verification

Postmortems produce action items. Many of those action items never ship. The reason is usually not laziness — it is that the action items, as written, are not actionable. "Improve monitoring" has no owner, no deadline, no definition of done. It sits in a doc until the next incident surfaces the same gap.

The extraction prompt refuses to produce an action item without three fields filled in: an owner (role, then team, then individual — in that order), a deadline, and a verification criterion. If the model cannot fill all three, it flags the item as "incomplete — needs triage" instead of producing a malformed row.

The verification criterion is the field that usually gets skipped. "Add alerting on queue depth" is underspecified; "alert fires within 60 seconds when queue depth exceeds 10,000 messages, verified by injecting synthetic load" is shippable. The prompt enforces verification of the form "observable, measurable, reproducible" — three properties checked before accepting a row.

Feeding Raw Incident Data to the Model

Output quality depends on input structure. Five sources pasted as one undifferentiated wall produces confused output. Clearly labeled blocks, one per source, produce structured output the model can cite back.

A practical ingestion pattern:

Slack transcripts — export the relevant channel time windows with timestamps and usernames. Consider redacting usernames to role labels before feeding in, if the tooling allows.
PagerDuty event log — export as CSV or JSON, preserve timestamps, acknowledgments, and resolution events.
Deploy notifications — pull from CI or the deploy bot, include commit SHAs and which service deployed.
Log lines — include 15 minutes before first alert through 15 minutes after resolution, from every service touched during the incident.
Ticket updates — chronological comment stream from the incident ticket, including any linked tickets for dependent services.

Each block gets a clear header the prompt can reference. Large log dumps should be pre-filtered to error and warning levels unless the incident specifically requires debug output — context limits are real, and a wall of info-level logs buries the useful signal.

Avoiding Individual Blame — Explicit Prompt Constraints

The blameless frame is load-bearing and easy to lose. Five constraints that keep it in place:

No individual names in any output. Replace with roles. Enforced at the acceptance clause.
"Human error" is not a valid root cause. If the analysis lands there, the model is told to go one layer deeper and find the condition that made the error possible.
Ask "what made this possible" not "who did this." Phrased in the task description, repeated in the structure.
Reasonable action given information at the time. The prompt states that humans in the transcript acted reasonably given what they knew; analysis must respect that.
Systemic language in action items. "Update the runbook" not "train the engineer." "Add a pre-deploy check" not "require a second reviewer to be more careful."

None of this is new to experienced SREs — the blameless postmortem has been the industry standard for years. What is new is that it has to be encoded in the prompt. The pattern itself is a kind of prompt template — a reusable structure with slots for incident-specific data and constraints that do not change between incidents.

Common Anti-Patterns

Pasting raw Slack without role-masking. Names in the input become names in the output. Fix: mask usernames to roles before feeding in, or add an explicit "never name individuals" constraint at the acceptance clause.
Asking for "the root cause" singular. Incidents have multiple contributing factors. Singular framing forces the model to pick one and call it the cause. Fix: prompt for layered analysis (trigger, proximate, contributing, root, detection).
Timeline and analysis in one prompt. The model mixes facts with interpretation, and interpretation creeps into the timeline. Fix: two prompts, timeline first, analysis second, with the stable timeline as input.
Action items without verification criteria. Produces unshippable items. Fix: require observable, measurable, reproducible verification for every item.
Generic "improve monitoring" items. Fix: require the action item to cite the specific gap from the analysis — which alert, which metric, which threshold.
No deadline column. Items without deadlines do not ship. Fix: require a deadline, and if the model cannot infer one, flag for triage.

For adjacent engineering prompts, pair this guide with AI architecture review prompts, AI technical spec prompts, and AI SOP writing prompts. When an incident traces back to a single defect, the same discipline of turning raw signal into a structured, reproducible record carries over to our prompt patterns for bug reports and issue triage.

FAQ

Should the on-call engineer run the postmortem prompt themselves, or someone else?

Someone else, ideally. The engineer closest to the incident has the most context but also the most implicit narrative — they already know what happened, which makes them the worst reviewer of a reconstructed timeline. A peer running the prompt with the raw inputs catches gaps the close party would paper over.

How much raw data is too much?

A medium incident fits in one pass — Slack transcripts, PagerDuty events, deploy notifications, and filtered logs together run a few thousand lines. For long incidents, chunk by phase (detection, response, mitigation, recovery) and concatenate. Do not feed hours of debug logs whole — pre-filter to error and warning.

What about privacy and security of incident data?

Treat incident transcripts as sensitive. They often contain internal system names, customer identifiers, and vulnerability details. If your AI tooling does not have a data-handling agreement that matches your incident policy, redact before feeding in — replace customer IDs with placeholder tokens, scrub internal hostnames, mask usernames to roles.

Will this replace our postmortem meeting?

No, and you should not want it to. The meeting is where action items get owned and priorities get negotiated — those are human conversations. The AI-assisted draft gets the meeting started from a better place: timeline reconstructed, analysis structured, action items extracted. The meeting spends its hour on decisions instead of reconstruction.

Postmortems are one of the highest-leverage artifacts an engineering organization produces, and also one of the first to decay under pressure. The labor is real. AI closes the gap between the labor required and the labor available at 2am after an incident. Keep the blameless frame in the prompt, feed raw data with structure, split timeline from analysis, and require shippable action items. The postmortem that would otherwise be a half-filled template becomes a document that actually prevents the next incident.

AI Incident Postmortem Prompts (2026)

Why Postmortems Are Hard and Why They Matter

Pattern 1: Timeline Reconstruction from Raw Data

Pattern 2: Root-Cause Analysis with Blameless Framing Baked In

Pattern 3: Action-Item Extraction with Owners, Deadlines, Verification

Feeding Raw Incident Data to the Model

Avoiding Individual Blame — Explicit Prompt Constraints

Common Anti-Patterns

FAQ

Should the on-call engineer run the postmortem prompt themselves, or someone else?

How much raw data is too much?

What about privacy and security of incident data?

Will this replace our postmortem meeting?

Build prompts like these in seconds

Related Resources

Incident Postmortem Template

Related Articles

Prompt Engineering for Business Teams: Marketing, Sales, Engineering, Ops

AI Architecture Review Prompts (2026)

AI SOP Writing Prompts (2026): Standard Operating Procedures That Work