Back to list
pluginagentmarketplace

monitoring-skill

by pluginagentmarketplace

DevOps automation plugin for Claude AI - CI/CD, deployment, monitoring, and infrastructure management tools for plugin development

2🍴 0📅 Jan 5, 2026

SKILL.md


name: monitoring-skill description: Monitoring and observability with Prometheus, Grafana, ELK Stack, and distributed tracing. sasmp_version: "1.3.0" bonded_agent: 06-monitoring-observability bond_type: PRIMARY_BOND

parameters:

  • name: pillar type: string required: false enum: ["metrics", "logs", "traces", "all"] default: "all"
  • name: tool type: string required: false enum: ["prometheus", "grafana", "elk", "jaeger"] default: "prometheus"

retry_config: strategy: exponential_backoff initial_delay_ms: 1000 max_retries: 3

observability: logging: structured metrics: enabled

Monitoring & Observability Skill

Overview

Master the three pillars of observability: metrics, logs, and traces.

Parameters

NameTypeRequiredDefaultDescription
pillarstringNoallObservability pillar
toolstringNoprometheusTool focus

Core Topics

MANDATORY

  • Prometheus metrics and PromQL
  • Grafana dashboards
  • ELK Stack basics
  • SLIs, SLOs, error budgets
  • Alerting rules

OPTIONAL

  • Distributed tracing
  • OpenTelemetry
  • Custom exporters
  • Log correlation

ADVANCED

  • High cardinality handling
  • Recording rules
  • Federation
  • Continuous profiling

Quick Reference

# PromQL
sum(rate(http_requests_total[5m])) by (service)
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
100 * sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

# Prometheus API
curl http://localhost:9090/api/v1/targets
curl 'http://localhost:9090/api/v1/query?query=up'
curl -X POST http://localhost:9090/-/reload

# Alertmanager
amtool silence add alertname="HighLatency" --duration=2h
amtool alert

SRE Golden Signals

SignalMetric
Latencyhistogram_quantile(0.99, ...)
Trafficsum(rate(requests_total[5m]))
Errorsrate(errors_total[5m])
Saturationnode_memory_MemAvailable_bytes

Troubleshooting

Common Failures

SymptomRoot CauseSolution
No dataScrape failingCheck targets page
Alert not firingPromQL errorTest in UI
High cardinalityToo many labelsReduce labels
Slow queriesToo much dataAdd aggregation

Debug Checklist

  1. Check targets: /targets
  2. Test query in UI
  3. Check logs: journalctl -u prometheus
  4. Verify time sync (NTP)

Recovery Procedures

Prometheus OOM

  1. Check cardinality
  2. Reduce retention
  3. Add federation

Resources

Score

Total Score

75/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 100以上

0/15
最近の活動

3ヶ月以内に更新

+5
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon