The Challenge
After incidents, teams must document what happened, why, and how to prevent recurrence. Compiling this from alerts, Slack, and memory takes hours and often gets deprioritized.
The AI Desk Solution
AI Desk compiles incident postmortems from all sources automatically, ensuring complete documentation.
The Workflow
Step 1: Incident Resolved
Trigger: Incident marked resolved in PagerDuty
Sources: Alerts, Slack, metrics, runbooks
Step 2: Timeline Reconstruction
- Alert sequence
- Response actions
- Communication threads
- Resolution steps
Step 3: Postmortem Draft
š Incident Postmortem: INC-2847
INCIDENT SUMMARY
āāā Title: API Latency Spike
āāā Severity: P2
āāā Duration: 47 minutes
āāā Impact: 15% of API requests >2s
āāā Detection: Datadog alert
TIMELINE
āāā 14:23 Alert triggered (p95 > 2000ms)
āāā 14:25 On-call acknowledged (Jamie)
āāā 14:31 Root cause identified (DB locks)
āāā 14:45 Mitigation applied (query kill)
āāā 14:52 Metrics normalized
āāā 15:10 All-clear communicated
ROOT CAUSE
Database lock contention from long-running
analytics query during peak hours.
WHAT WENT WELL
āāā Fast detection (2 min)
āāā Clear runbook followed
āāā Good team communication
WHAT COULD IMPROVE
āāā Analytics query not time-limited
āāā No separate read replica for reports
āāā Alert threshold could be tighter
ACTION ITEMS
āāā Add query timeout for analytics
āāā Evaluate read replica for reports
āāā Update runbook with this scenario
āāā Review alert thresholds
Value Proposition
- Time Saved: 2 hours per incident
- Complete Record: Nothing forgotten
- Better Prevention: Systematic learning
Part of the 100 Days 100 Usecases campaign. View all usecases