← Back to list

incident-runbook-generator
by patricio0312rev
Comprehensive library of +100 production-ready development skills covering every aspect of modern software engineering. From project setup to production deployment, from security hardening to performance optimization.
⭐ 6🍴 0📅 Jan 19, 2026
SKILL.md
name: incident-runbook-generator description: Creates step-by-step incident response runbooks for common outages with actions, owners, rollback procedures, and communication templates. Use for "incident runbook", "outage response", "incident management", or "on-call procedures".
Incident Runbook Generator
Create actionable runbooks for common incidents.
Runbook Template
# Runbook: Database Connection Pool Exhausted
**Severity:** P1 (Critical)
**Estimated Time to Resolve:** 15-30 minutes
**Owner:** Database Team (On-call)
## Symptoms
- Application errors: "connection pool exhausted"
- Increased API latency (>5s)
- Failed health checks
- CloudWatch alarm: `DatabaseConnectionsHigh`
## Detection
- Alert: DatabaseConnectionPoolExhausted
- Metrics: `active_connections > max_connections * 0.9`
- Logs: "Error: connect ETIMEDOUT"
## Immediate Actions (5 min)
1. **Verify the issue**
```bash
# Check current connections
SELECT count(*) FROM pg_stat_activity;
```
-
Identify long-running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC LIMIT 10; -
Kill blocking queries (if safe)
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND now() - state_change > interval '5 minutes';
Mitigation (10 min)
-
Scale up connection pool (temporary)
# Update RDS parameter group aws rds modify-db-parameter-group \ --db-parameter-group-name prod-params \ --parameters "ParameterName=max_connections,ParameterValue=200" -
Restart application (if needed)
kubectl rollout restart deployment/api -
Monitor recovery
watch -n 5 'psql -c "SELECT count(*) FROM pg_stat_activity;"'
Root Cause Investigation
Check for:
- Recent deployment (new code with connection leaks)
- Traffic spike (legitimate or DDoS)
- Slow queries holding connections
- Connection pool configuration too small
- Application not releasing connections
Rollback Steps
If caused by deployment:
# Rollback to previous version
kubectl rollout undo deployment/api
# Verify
kubectl rollout status deployment/api
Communication Template
Initial (within 5 min):
🚨 INCIDENT: Database connection pool exhausted
Status: Investigating
Impact: API errors and slowness
ETA: 15-30 min
Next update: 10 min
Update (every 10 min):
UPDATE: Killed long-running queries
Status: Mitigating
Impact: Still degraded, improving
Actions: Scaling connection pool
Next update: 10 min
Resolution:
✅ RESOLVED: Database connections normalized
Duration: 25 minutes
Root cause: Connection leak in v2.3.4
Fix: Rolled back to v2.3.3
Follow-up: Bug fix PR #1234
Postmortem: [link]
Prevention
- Add connection pool metrics to dashboards
- Implement connection timeout (30s)
- Add connection leak detection in tests
- Set up pre-deployment load testing
- Review connection pool sizing
Related Runbooks
- Database High CPU
- Slow Database Queries
- Application OOM
## Output Checklist
- [ ] Symptoms documented
- [ ] Detection criteria
- [ ] Step-by-step actions
- [ ] Owner assigned
- [ ] Rollback procedure
- [ ] Communication templates
- [ ] Prevention measures
ENDFILE
Score
Total Score
70/100
Based on repository quality metrics
✓SKILL.md
SKILL.mdファイルが含まれている
+20
✓LICENSE
ライセンスが設定されている
+10
✓説明文
100文字以上の説明がある
+10
○人気
GitHub Stars 100以上
0/15
✓最近の活動
1ヶ月以内に更新
+10
○フォーク
10回以上フォークされている
0/5
✓Issue管理
オープンIssueが50未満
+5
○言語
プログラミング言語が設定されている
0/5
✓タグ
1つ以上のタグが設定されている
+5
Reviews
💬
Reviews coming soon


