← スキル一覧に戻る

prompt-caching
by yonatangross
The Complete AI Development Toolkit for Claude Code — 159 skills, 34 agents, 20 commands, 144 hooks. Production-ready patterns for FastAPI, React 19, LangGraph, security, and testing.
⭐ 29🍴 4📅 2026年1月23日
SKILL.md
name: prompt-caching description: Provider-native prompt caching for Claude and OpenAI. Use when optimizing LLM costs with cache breakpoints, caching system prompts, or reducing token costs for repeated prefixes. tags: [llm, caching, cost-optimization, anthropic] context: fork agent: llm-integrator version: 1.0.0 author: OrchestKit user-invocable: false
Prompt Caching
Cache LLM prompt prefixes for 90% token savings.
Supported Models (2026)
| Provider | Models |
|---|---|
| Claude | Opus 4.1, Opus 4, Sonnet 4.5, Sonnet 4, Sonnet 3.7, Haiku 4.5, Haiku 3.5, Haiku 3 |
| OpenAI | gpt-4o, gpt-4o-mini, o1, o1-mini (automatic caching) |
Claude Prompt Caching
def build_cached_messages(
system_prompt: str,
few_shot_examples: str | None,
user_content: str,
use_extended_cache: bool = False
) -> list[dict]:
"""Build messages with cache breakpoints.
Cache structure (processing order: tools → system → messages):
1. System prompt (cached)
2. Few-shot examples (cached)
─────── CACHE BREAKPOINT ───────
3. User content (NOT cached)
"""
# TTL: "5m" (default, 1.25x write cost) or "1h" (extended, 2x write cost)
ttl = "1h" if use_extended_cache else "5m"
content_parts = []
# Breakpoint 1: System prompt
content_parts.append({
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral", "ttl": ttl}
})
# Breakpoint 2: Few-shot examples (up to 4 breakpoints allowed)
if few_shot_examples:
content_parts.append({
"type": "text",
"text": few_shot_examples,
"cache_control": {"type": "ephemeral", "ttl": ttl}
})
# Dynamic content (NOT cached)
content_parts.append({
"type": "text",
"text": user_content
})
return [{"role": "user", "content": content_parts}]
Cache Pricing (2026)
┌─────────────────────────────────────────────────────────────┐
│ Cache Cost Multipliers (relative to base input price) │
├─────────────────────────────────────────────────────────────┤
│ 5-minute cache write: 1.25x base input price │
│ 1-hour cache write: 2.00x base input price │
│ Cache read: 0.10x base input price (90% off!) │
└─────────────────────────────────────────────────────────────┘
Example: Claude Sonnet 4 @ $3/MTok input
Without Prompt Caching:
System prompt: 2,000 tokens @ $3/MTok = $0.006
Few-shot examples: 5,000 tokens @ $3/MTok = $0.015
User content: 10,000 tokens @ $3/MTok = $0.030
───────────────────────────────────────────────────
Total: 17,000 tokens = $0.051
With 5m Caching (first request = cache write):
Cached prefix: 7,000 tokens @ $3.75/MTok = $0.02625 (1.25x)
User content: 10,000 tokens @ $3/MTok = $0.03000
Total first req: = $0.05625
With 5m Caching (subsequent = cache read):
Cached prefix: 7,000 tokens @ $0.30/MTok = $0.0021 (0.1x)
User content: 10,000 tokens @ $3/MTok = $0.0300
Total cached req: = $0.0321
Savings: 37% per cached request, break-even after 2 requests
Extended Cache (1-hour TTL)
Use 1-hour cache when:
- Prompt reused > 10 times per hour
- System prompts are highly stable
- Token count > 10k (maximize savings)
# Extended cache: 2x write cost but persists 12x longer
"cache_control": {"type": "ephemeral", "ttl": "1h"}
# Break-even: 1h cache pays off after ~8 reads
# (2x write cost ÷ 0.9 savings per read ≈ 8 reads)
OpenAI Automatic Caching
# OpenAI caches prefixes automatically
# No cache_control markers needed
response = await openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt}, # Cached
{"role": "user", "content": user_content} # Not cached
]
)
# Check cache usage in response
cache_tokens = response.usage.prompt_tokens_cached
Cache Processing Order
Cache references entire prompt in order:
1. Tools (cached first)
2. System messages (cached second)
3. User messages (cached last)
⚠️ Extended thinking changes invalidate message caches
(but NOT system/tools caches)
Monitoring Cache Effectiveness
# Track these fields in API response
response = await client.messages.create(...)
cache_created = response.usage.cache_creation_input_tokens # New cache
cache_read = response.usage.cache_read_input_tokens # Cache hit
regular = response.usage.input_tokens # Not cached
# Calculate cache hit rate
if cache_created + cache_read > 0:
hit_rate = cache_read / (cache_created + cache_read)
print(f"Cache hit rate: {hit_rate:.1%}")
Best Practices
# ✅ Good: Long, stable prefix first
messages = [
{"role": "system", "content": LONG_SYSTEM_PROMPT},
{"role": "user", "content": FEW_SHOT_EXAMPLES},
{"role": "user", "content": user_input} # Variable
]
# ❌ Bad: Variable content early (breaks cache)
messages = [
{"role": "user", "content": user_input}, # Breaks cache!
{"role": "system", "content": LONG_SYSTEM_PROMPT}
]
Key Decisions
| Decision | Recommendation |
|---|---|
| Min prefix size | 1,024 tokens (Claude) |
| Breakpoint count | 2-4 per request |
| Content order | Stable prefix first |
| Default TTL | 5m for most cases |
| Extended TTL | 1h if >10 reads/hour |
Common Mistakes
- Variable content before cached prefix
- Too many breakpoints (overhead)
- Prefix too short (min 1024 tokens)
- Not checking
cache_read_input_tokens - Using 1h TTL for infrequent calls (wastes 2x write)
Related Skills
semantic-caching- Redis similarity cachingcache-cost-tracking- Cost monitoringllm-streaming- Streaming with caching
Capability Details
anthropic-caching
Keywords: anthropic, claude, cache_control, ephemeral Solves:
- Use Anthropic prompt caching
- Set cache breakpoints
- Reduce API costs
openai-caching
Keywords: openai, gpt, cached_tokens, automatic Solves:
- Use OpenAI prompt caching
- Structure prompts for cache hits
- Monitor cache effectiveness
wrapper-template
Keywords: wrapper, template, implementation, python Solves:
- Prompt cache wrapper template
- Python implementation
- Drop-in caching layer
スコア
総合スコア
75/100
リポジトリの品質指標に基づく評価
✓SKILL.md
SKILL.mdファイルが含まれている
+20
✓LICENSE
ライセンスが設定されている
+10
✓説明文
100文字以上の説明がある
+10
○人気
GitHub Stars 100以上
0/15
✓最近の活動
1ヶ月以内に更新
+10
○フォーク
10回以上フォークされている
0/5
✓Issue管理
オープンIssueが50未満
+5
✓言語
プログラミング言語が設定されている
+5
✓タグ
1つ以上のタグが設定されている
+5
レビュー
💬
レビュー機能は近日公開予定です
