Back to list
5dlabs

telemetry

by 5dlabs

Cognitive Task Orchestrator - GitOps on Bare Metal or Cloud for AI Agents

2🍴 1📅 Jan 25, 2026

SKILL.md


name: telemetry description: CTO observability stack expertise - Prometheus metrics, Loki logs, Grafana dashboards. Use when querying logs, metrics, or debugging via telemetry.

Telemetry Skill

Access the CTO observability stack for logs, metrics, and dashboards.

When to Use

  • Querying pod logs via Loki
  • Checking metrics via Prometheus
  • Viewing dashboards in Grafana
  • Debugging agent failures
  • Monitoring Play workflow health

Stack Overview

ServicePortPurpose
Prometheus9090Metrics collection and querying
Loki3100Log aggregation (like Prometheus for logs)
Grafana3000Dashboards and visualization

Port Forwards (Required for Local Access)

kubectl port-forward svc/prometheus-server -n observability 9090:80
kubectl port-forward svc/loki-gateway -n observability 3100:80
kubectl port-forward svc/grafana -n observability 3000:80

MCP Tools Available

Prometheus Tools

ToolPurpose
prometheus_queryInstant query (current value)
prometheus_query_rangeRange query (time series)
prometheus_labelsList all label names
prometheus_seriesFind series matching labels

Loki Tools

ToolPurpose
loki_queryQuery logs with LogQL
loki_labelsList all label names
loki_label_valuesGet values for a label

Grafana Tools

ToolPurpose
grafana_search_dashboardsFind dashboards by name
grafana_get_dashboardGet dashboard definition
grafana_query_prometheusQuery Prometheus via Grafana
grafana_query_loki_logsQuery Loki via Grafana
grafana_list_alert_rulesList configured alerts

Loki (Logs)

LogQL Basics

# All logs from CTO namespace
{namespace="cto"}

# Filter by pod name
{namespace="cto", pod=~"coderun-.*"}

# Search for errors
{namespace="cto"} |= "error"

# JSON parsing
{namespace="cto"} | json | level="error"

# Regex filter
{namespace="cto"} |~ "tool.*mismatch"

Common Queries for CTO

# All CodeRun pod logs
{namespace="cto", app="coderun"}

# Morgan intake logs
{namespace="cto", pod=~"intake-.*"}

# Play workflow logs
{namespace="cto", pod=~"play-.*"}

# Errors only
{namespace="cto"} |= "error" | json

# Tool inventory issues (A10)
{namespace="cto"} |~ "tool.*(mismatch|missing)"

# MCP initialization failures (A12)
{namespace="cto"} |~ "mcp.*failed"

# Config issues (A11)
{namespace="cto"} |~ "cto-config.*(missing|invalid)"

Via MCP Tool

loki_query(query='{namespace="cto"} |= "error"', limit=100)

Via curl

curl -G "http://localhost:3100/loki/api/v1/query_range" \
  --data-urlencode 'query={namespace="cto"} |= "error"' \
  --data-urlencode 'limit=100' | jq

Prometheus (Metrics)

PromQL Basics

# Current CPU usage
container_cpu_usage_seconds_total{namespace="cto"}

# Memory usage
container_memory_usage_bytes{namespace="cto"}

# Rate of requests
rate(http_requests_total{namespace="cto"}[5m])

# Pod restarts
kube_pod_container_status_restarts_total{namespace="cto"}

Common Queries for CTO

# CodeRun pod count
count(kube_pod_info{namespace="cto", pod=~"coderun-.*"})

# Memory by pod
container_memory_usage_bytes{namespace="cto", container!=""}

# CPU by pod
rate(container_cpu_usage_seconds_total{namespace="cto"}[5m])

# OOM killed containers
kube_pod_container_status_last_terminated_reason{namespace="cto", reason="OOMKilled"}

# Pod restart count
sum(kube_pod_container_status_restarts_total{namespace="cto"}) by (pod)

Via MCP Tool

prometheus_query(query='count(kube_pod_info{namespace="cto"})')

Via curl

curl "http://localhost:9090/api/v1/query" \
  --data-urlencode 'query=kube_pod_info{namespace="cto"}' | jq

Grafana (Dashboards)

Access

Common Dashboards

DashboardPurpose
Kubernetes / PodsPod resource usage
Loki / LogsLog explorer
CTO OverviewPlatform health (if configured)

Via MCP Tool

grafana_search_dashboards(query="kubernetes")
grafana_get_dashboard(uid="abc123")

kubectl Alternatives

When MCP tools aren't available, use kubectl directly:

Stream Logs

# All CTO pods
kubectl logs -n cto -l app.kubernetes.io/part-of=cto -f --tail=100

# Specific CodeRun
kubectl logs -n cto -l app=coderun -f

# With grep
kubectl logs -n cto -l app=coderun -f | grep -E "error|mismatch|failed"

Get Pod Status

kubectl get pods -n cto -o wide
kubectl describe pod -n cto <pod-name>

Events

kubectl get events -n cto --sort-by='.lastTimestamp'

Healer Integration

Healer uses Loki to watch for patterns:

// From crates/healer/src/scanner.rs
// Patterns that trigger alerts:
"tool\\s+inventory\\s+mismatch"  // A10
"cto-config.*(missing|invalid)" // A11
"mcp.*failed\\s+to\\s+initialize" // A12

Query Healer-Relevant Logs

# All Healer detection patterns
{namespace="cto"} |~ "tool.*mismatch|cto-config.*(missing|invalid)|mcp.*failed"

Troubleshooting

No Logs Appearing

  1. Check port forward is running: lsof -i :3100
  2. Verify Loki pods: kubectl get pods -n observability -l app=loki
  3. Check label: loki_labels() to see available labels

Metrics Not Found

  1. Check port forward: lsof -i :9090
  2. Verify Prometheus: kubectl get pods -n observability -l app=prometheus
  3. List metrics: prometheus_labels() or browse http://localhost:9090/targets

Grafana Not Loading

  1. Check port forward: lsof -i :3000
  2. Verify pod: kubectl get pods -n observability -l app.kubernetes.io/name=grafana

Reference

Score

Total Score

65/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

0/10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon