← Back to list

observability
by 5dlabs
Cognitive Task Orchestrator - GitOps on Bare Metal or Cloud for AI Agents
⭐ 2🍴 1📅 Jan 25, 2026
SKILL.md
name: observability description: Query Prometheus metrics, Loki logs, and Grafana dashboards for diagnostics and incident response. agents: [rex, grizz, nova, blaze, bolt, cipher, cleo, tess] triggers: [metrics, logs, prometheus, loki, grafana, monitoring, alerts, incident]
Observability Tools
Query metrics, logs, and dashboards for diagnostics and incident response.
Prometheus (Metrics)
Query metrics for performance analysis and alerting.
# CPU usage by pod
prometheus_query({
query: 'rate(container_cpu_usage_seconds_total{namespace="my-service"}[5m])'
})
# Memory usage
prometheus_query({
query: 'container_memory_usage_bytes{namespace="my-service"}'
})
# HTTP request rate
prometheus_query({
query: 'rate(http_requests_total{namespace="my-service"}[5m])'
})
# Error rate
prometheus_query({
query: 'rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])'
})
Loki (Logs)
Query logs for debugging and incident investigation.
# Application logs
loki_query({
query: '{namespace="my-service", app="api"} |= "error"',
limit: 100
})
# Structured log parsing
loki_query({
query: '{namespace="my-service"} | json | level="error"'
})
# Time-based filtering
loki_query({
query: '{namespace="my-service"}',
start: "2024-01-01T00:00:00Z",
end: "2024-01-01T01:00:00Z"
})
Common Queries
| Scenario | Query Type | Example |
|---|---|---|
| High latency | Prometheus | histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) |
| Errors spike | Loki | {app="api"} |= "error" | json | count by (error_type) |
| Memory leak | Prometheus | container_memory_usage_bytes{pod=~"api.*"} |
| Failed requests | Loki | {app="api"} | json | status >= 500 |
Incident Response Flow
- Check alerts - What triggered?
- Query metrics - Is it resource exhaustion?
- Query logs - What errors are occurring?
- Correlate - Match timestamps across metrics and logs
- Identify root cause - Database? Network? Code bug?
Best Practices
- Start broad, then narrow - Filter down to specific pods
- Use time ranges - Don't query unbounded
- Correlate metrics + logs - Same time window
- Check dashboard first - Grafana may have pre-built views
Score
Total Score
65/100
Based on repository quality metrics
✓SKILL.md
SKILL.mdファイルが含まれている
+20
✓LICENSE
ライセンスが設定されている
+10
○説明文
100文字以上の説明がある
0/10
○人気
GitHub Stars 100以上
0/15
✓最近の活動
1ヶ月以内に更新
+10
○フォーク
10回以上フォークされている
0/5
✓Issue管理
オープンIssueが50未満
+5
✓言語
プログラミング言語が設定されている
+5
✓タグ
1つ以上のタグが設定されている
+5
Reviews
💬
Reviews coming soon


