
production-observability
by carmentacollective
A heart-centered AI for creating at the speed of thought
SKILL.md
name: production-observability
prettier-ignore
description: Use when adding logging, error monitoring, metrics, Sentry, debugging production issues, or improving observability version: 1.0.0
Structured Logging
Log with context that helps debug issues. Use structured JSON logging with consistent fields across the application.
// Include relevant context in every log logger.info( { userId, action: 'subscription_created', subscriptionType, paymentMethod, amount, }, 'User subscribed successfully' );
logger.error( { error, userId, operation: 'process_payment', paymentProvider: 'stripe', attemptCount, }, 'Payment processing failed' );
logger.warn( { userId, resourceType: 'api_request', endpoint: '/api/data', responseTime: 2500, threshold: 1000, }, 'Slow API response detected' );
requestLogger.info('Processing request'); // All subsequent logs include requestId + userEmail automatically requestLogger.debug({ queryParams }, 'Executing database query'); requestLogger.info({ duration }, 'Request completed');
Error Monitoring
Capture exceptions with rich context that enables debugging. Use Sentry or similar tools to track errors in production.
try { await criticalOperation(); } catch (error) { logger.error({ error, userId, operation }, 'Critical operation failed');
Sentry.captureException(error, { tags: { operation: 'payment_processing', provider: 'stripe', }, extra: { userId, subscriptionId, attemptCount, paymentMethod, }, level: 'error', });
throw error; // Let error bubble to boundary }
// Later when error occurs, breadcrumbs show the sequence
// All subsequent errors include user context automatically
Performance Monitoring
Track operation timing to catch performance degradation early.
// Trace important operations const result = await Sentry.startSpan( { op:
'http.client', name: ${method} ${url}, }, async (span) => {
span.setAttribute('http.method', method); span.setAttribute('http.url', url);
span.setAttribute('user.id', userId);
const response = await fetch(url, options);
span.setAttribute('http.status_code', response.status);
span.setStatus({ code: 1, message: 'Success' });
return response;
} );
if (queryDuration > 1000) { logger.warn( { query: sanitizedQuery, duration: queryDuration, threshold: 1000, }, 'Slow query detected' ); }
Health Checks
Expose endpoints that verify system health and dependencies.
const results = { status: checks.every(c => c.status === 'fulfilled') ? 'healthy' : 'degraded', timestamp: new Date().toISOString(), checks: { database: checks[0].status === 'fulfilled' ? 'ok' : 'failed', redis: checks[1].status === 'fulfilled' ? 'ok' : 'failed', externalAPI: checks[2].status === 'fulfilled' ? 'ok' : 'failed', }, };
logger.info({ health: results }, 'Health check completed');
return Response.json(results, { status: results.status === 'healthy' ? 200 : 503, }); }
Metrics and Monitoring
Track key business and system metrics that indicate application health.
// Track business metrics Sentry.metrics.increment('user.signup', 1, { tags: { source: 'google_oauth', plan: 'free' }, });
Sentry.metrics.distribution('api.response_time', responseTime, { tags: { endpoint: '/api/chat', method: 'POST' }, unit: 'millisecond', });
Sentry.metrics.gauge('active_connections', connectionCount, { tags: { service: 'websocket' }, });
Error Boundaries
Let errors bubble to boundaries where they can be handled appropriately. Don't silently catch and hide errors.
Sentry.captureException(error, {
tags: { endpoint: '/api/process' },
});
if (error instanceof ValidationError) {
return Response.json(
{ error: error.message },
{ status: 400 }
);
}
if (error instanceof NotFoundError) {
return Response.json(
{ error: error.message },
{ status: 404 }
);
}
return Response.json(
{ error: 'Internal server error' },
{ status: 500 }
);
} }
Correlation IDs
Track requests across service boundaries using correlation IDs.
const requestLogger = logger.child({ correlationId });
// Pass correlation ID to downstream services const response = await fetch(upstreamService, { headers: { 'X-Correlation-ID': correlationId, }, });
// All logs include correlationId, making distributed tracing possible requestLogger.info({ service: 'upstream' }, 'Called upstream service');
Monitor these signals to catch production issues early:
Error rates - Track 4xx and 5xx errors. Sudden spikes indicate problems.
Response times - P50, P95, P99 latencies. Degradation affects user experience.
Resource usage - CPU, memory, disk, network. Exhaustion causes failures.
External dependencies - API availability, database connections, third-party services.
Business metrics - User signups, purchases, key user actions. Drops indicate broken flows.
Before deploying code to production, verify:
Structured logging covers critical paths - Can you debug issues from logs alone?
Errors are captured to Sentry - Will you know when things break?
Performance is tracked - Can you identify slow operations?
Health checks are implemented - Can monitoring detect degraded state?
Correlation IDs flow through requests - Can you trace requests across services?
Alerts are configured - Will someone be notified when thresholds are exceeded?
When investigating production problems:
Start with error tracking - Check Sentry for exceptions and error patterns.
Review structured logs - Filter by user, request, or correlation ID to trace execution.
Check metrics - Look for anomalies in response times, error rates, resource usage.
Verify external dependencies - Confirm third-party services are operational.
Reproduce locally - Use production logs to recreate the scenario.
Add instrumentation - If debugging is difficult, add more logging and redeploy.
Treat observability as a core feature, not an afterthought:
Log before releasing - Instrumentation is harder to add after deployment.
Monitor what matters - Focus on user-impacting metrics, not vanity numbers.
Make logs searchable - Use consistent field names and structured data.
Review errors regularly - Sentry notifications should trigger investigation.
Celebrate transparency - Visible problems get fixed. Hidden problems accumulate.
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon


