model-deployment

Name: model-deployment
Rating: 65
Author: secondsky

by secondsky

Production-ready skills for Claude Code CLI - Cloudflare, React, Tailwind v4, and AI integrations

⭐ 21🍴 0📅 Jan 24, 2026

claude-code claude-code-commands claude-code-hooks claude-code-plugin claude-code-plugins claude-code-skill claude-code-skills

View on GitHub Run in Manus

SKILL.md

name: model-deployment description: Deploy ML models with FastAPI, Docker, Kubernetes. Use for serving predictions, containerization, monitoring, drift detection, or encountering latency issues, health check failures, version conflicts. keywords: model deployment, FastAPI, Docker, Kubernetes, ML serving, model monitoring, drift detection, A/B testing, CI/CD, mlops, production ml, model versioning, health checks, Prometheus, containerization, rolling updates, blue-green deployment, canary deployment, model registry license: MIT

ML Model Deployment

Deploy trained models to production with proper serving and monitoring.

Deployment Options

Method	Use Case	Latency
REST API	Web services	Medium
Batch	Large-scale processing	N/A
Streaming	Real-time	Low
Edge	On-device	Very low

FastAPI Model Server

from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI()
model = joblib.load('model.pkl')

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: float
    probability: float

@app.get('/health')
def health():
    return {'status': 'healthy'}

@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
    features = np.array(request.features).reshape(1, -1)
    prediction = model.predict(features)[0]
    probability = model.predict_proba(features)[0].max()
    return PredictionResponse(prediction=prediction, probability=probability)

Docker Deployment

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model.pkl .
COPY app.py .

EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

Model Monitoring

class ModelMonitor:
    def __init__(self):
        self.predictions = []
        self.latencies = []

    def log_prediction(self, input_data, prediction, latency):
        self.predictions.append({
            'input': input_data,
            'prediction': prediction,
            'latency': latency,
            'timestamp': datetime.now()
        })

    def detect_drift(self, reference_distribution):
        # Compare current predictions to reference
        pass

Deployment Checklist

Model validated on test set
API endpoints documented
Health check endpoint
Authentication configured
Logging and monitoring setup
Model versioning in place
Rollback procedure documented

Quick Start: Deploy Model in 6 Steps

# 1. Save trained model
import joblib
joblib.dump(model, 'model.pkl')

# 2. Create FastAPI app (see references/fastapi-production-server.md)
# app.py with /predict and /health endpoints

# 3. Create Dockerfile
cat > Dockerfile << 'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py model.pkl ./
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
EOF

# 4. Build and test locally
docker build -t model-api:v1.0.0 .
docker run -p 8000:8000 model-api:v1.0.0

# 5. Push to registry
docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0
docker push registry.example.com/model-api:v1.0.0

# 6. Deploy to Kubernetes
kubectl apply -f deployment.yaml
kubectl rollout status deployment/model-api

Known Issues Prevention

1. No Health Checks = Downtime

Problem: Load balancer sends traffic to unhealthy pods, causing 503 errors.

Solution: Implement both liveness and readiness probes:

# app.py
@app.get("/health")  # Liveness: Is service alive?
async def health():
    return {"status": "healthy"}

@app.get("/ready")  # Readiness: Can handle traffic?
async def ready():
    try:
        _ = model_store.model  # Verify model loaded
        return {"status": "ready"}
    except:
        raise HTTPException(503, "Not ready")

# deployment.yaml
livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 30
readinessProbe:
  httpGet:
    path: /ready
    port: 8000
  initialDelaySeconds: 5

2. Model Not Found Errors in Container

Problem: FileNotFoundError: model.pkl when container starts.

Solution: Verify model file is copied in Dockerfile and path matches:

# ❌ Wrong: Model in wrong directory
COPY model.pkl /app/models/  # But code expects /app/model.pkl

# ✅ Correct: Consistent paths
COPY model.pkl /models/model.pkl
ENV MODEL_PATH=/models/model.pkl

# In Python:
model_path = os.getenv("MODEL_PATH", "/models/model.pkl")

3. Unhandled Input Validation = 500 Errors

Problem: Invalid inputs crash API with unhandled exceptions.

Solution: Use Pydantic for automatic validation:

from pydantic import BaseModel, Field, validator

class PredictionRequest(BaseModel):
    features: List[float] = Field(..., min_items=1, max_items=100)

    @validator('features')
    def validate_finite(cls, v):
        if not all(np.isfinite(val) for val in v):
            raise ValueError("All features must be finite")
        return v

# FastAPI auto-validates and returns 422 for invalid requests
@app.post("/predict")
async def predict(request: PredictionRequest):
    # Request is guaranteed valid here
    pass

4. No Drift Monitoring = Silent Degradation

Problem: Model performance degrades over time, no one notices until users complain.

Solution: Implement drift detection (see references/model-monitoring-drift.md):

monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)

@app.post("/predict")
async def predict(request: PredictionRequest):
    prediction = model.predict(features)
    monitor.log_prediction(features, prediction, latency)

    # Alert if drift detected
    if monitor.should_retrain():
        alert_manager.send_alert("Model drift detected - retrain recommended")

    return prediction

5. Missing Resource Limits = OOM Kills

Problem: Pod killed by Kubernetes OOMKiller, service goes down.

Solution: Set memory/CPU limits and requests:

resources:
  requests:
    memory: "512Mi"  # Guaranteed
    cpu: "500m"
  limits:
    memory: "1Gi"    # Max allowed
    cpu: "1000m"

# Monitor actual usage:
kubectl top pods

6. No Rollback Plan = Stuck on Bad Deploy

Problem: New model version has bugs, no way to revert quickly.

Solution: Tag images with versions, keep previous deployment:

# Deploy with version tag
kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0

# If issues, rollback to previous
kubectl rollout undo deployment/model-api

# Or specify version
kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0

7. Synchronous Prediction = Slow Batch Processing

Problem: Processing 10,000 predictions one-by-one takes hours.

Solution: Implement batch endpoint:

@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
    # Process all at once (vectorized)
    features = np.array(request.instances)
    predictions = model.predict(features)  # Much faster!
    return {"predictions": predictions.tolist()}

8. No CI/CD Validation = Deploy Bad Models

Problem: Deploying model that fails basic tests, breaking production.

Solution: Validate in CI pipeline (see references/cicd-ml-models.md):

# .github/workflows/deploy.yml
- name: Validate model performance
  run: |
    python scripts/validate_model.py \
      --model model.pkl \
      --test-data test.csv \
      --min-accuracy 0.85  # Fail if below threshold

Best Practices

Version everything: Models (semantic versioning), Docker images, deployments
Monitor continuously: Latency, error rate, drift, resource usage
Test before deploy: Unit tests, integration tests, performance benchmarks
Deploy gradually: Canary (10%), then full rollout
Plan for rollback: Keep previous version, document procedure
Log predictions: Enable debugging and drift detection
Set resource limits: Prevent OOM kills and resource contention
Use health checks: Enable proper load balancing

When to Load References

Load reference files for detailed implementations:

FastAPI Production Server: Load references/fastapi-production-server.md for complete production-ready FastAPI implementation with error handling, validation (Pydantic models), logging, health/readiness probes, batch predictions, model versioning, middleware, exception handlers, and performance optimizations (caching, async)
Model Monitoring & Drift: Load references/model-monitoring-drift.md for ModelMonitor implementation with KS-test drift detection, Jensen-Shannon divergence, Prometheus metrics integration, alert configuration (Slack, email), continuous monitoring service, and dashboard endpoints
Containerization & Deployment: Load references/containerization-deployment.md for multi-stage Dockerfiles, model versioning in containers, Docker Compose setup, A/B testing with Nginx, Kubernetes deployments (rolling update, blue-green, canary), GitHub Actions CI/CD, and deployment checklists
CI/CD for ML Models: Load references/cicd-ml-models.md for complete GitHub Actions pipeline with model validation, data validation, automated testing, security scanning, performance benchmarks, automated rollback, and deployment strategies

Score

Total Score

65/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

○説明文

100文字以上の説明がある

0/10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

○フォーク

10回以上フォークされている

0/5

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

model-deployment

SKILL.md

ML Model Deployment

Deployment Options

FastAPI Model Server

Docker Deployment

Model Monitoring

Deployment Checklist

Quick Start: Deploy Model in 6 Steps

Known Issues Prevention

1. No Health Checks = Downtime

2. Model Not Found Errors in Container

3. Unhandled Input Validation = 500 Errors

4. No Drift Monitoring = Silent Degradation

5. Missing Resource Limits = OOM Kills

6. No Rollback Plan = Stuck on Bad Deploy

7. Synchronous Prediction = Slow Batch Processing

8. No CI/CD Validation = Deploy Bad Models

Best Practices

When to Load References

Score

Reviews

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis

on-call-handoff-patterns

architecture-decision-records

model-deployment

SKILL.md

ML Model Deployment

Deployment Options

FastAPI Model Server

Docker Deployment

Model Monitoring

Deployment Checklist

Quick Start: Deploy Model in 6 Steps

Known Issues Prevention

1. No Health Checks = Downtime

2. Model Not Found Errors in Container

3. Unhandled Input Validation = 500 Errors

4. No Drift Monitoring = Silent Degradation

5. Missing Resource Limits = OOM Kills

6. No Rollback Plan = Stuck on Bad Deploy

7. Synchronous Prediction = Slow Batch Processing

8. No CI/CD Validation = Deploy Bad Models

Best Practices

When to Load References

Score

Reviews

Related

Related Skills

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis

on-call-handoff-patterns

architecture-decision-records