
context-optimization
by 5dlabs
Cognitive Task Orchestrator - GitOps on Bare Metal or Cloud for AI Agents
SKILL.md
name: context-optimization description: Extend effective context capacity through compression, masking, caching, and partitioning techniques. agents: [blaze, rex, nova, tap, spark, grizz, bolt, cleo, cipher, tess, morgan, atlas, stitch] triggers: [optimize context, reduce tokens, token costs, context limits, observation masking, context budgeting]
Context Optimization Techniques
Context optimization extends effective capacity through strategic compression, masking, caching, and partitioning. Effective optimization can double or triple effective context capacity.
When to Activate
- Context limits constrain task complexity
- Optimizing for cost reduction (fewer tokens = lower costs)
- Reducing latency for long conversations
- Building production systems at scale
Core Strategies
Compaction
Summarize context contents when approaching limits, reinitialize with summary.
Priority for compression:
- Tool outputs → replace with summaries
- Old turns → summarize early conversation
- Retrieved docs → summarize if recent versions exist
- Never compress system prompt
Summary preservation by type:
- Tool outputs: Key findings, metrics, conclusions
- Conversations: Key decisions, commitments, context shifts
- Documents: Key facts and claims
Observation Masking
Tool outputs can comprise 80%+ of token usage. Replace verbose outputs with compact references once their purpose is served.
Masking Strategy:
| Category | Action |
|---|---|
| Never mask | Current task observations, most recent turn, active reasoning |
| Consider masking | 3+ turns ago, verbose outputs with extractable key points |
| Always mask | Repeated outputs, boilerplate, already summarized |
Example:
if len(observation) > max_length:
ref_id = store_observation(observation)
return f"[Obs:{ref_id} elided. Key: {extract_key(observation)}]"
KV-Cache Optimization
Reuse cached computations across requests with identical prefixes.
Cache-friendly ordering:
- System prompt (stable, first)
- Tool definitions (stable)
- Frequently reused elements
- Unique content (last)
Design tips:
- Avoid dynamic content like timestamps
- Use consistent formatting
- Keep structure stable across sessions
Context Partitioning
Split work across sub-agents with isolated contexts. Each operates in clean context focused on its subtask.
Aggregation pattern:
- Validate all partitions completed
- Merge compatible results
- Summarize if still too large
Budget Management
Design explicit token budgets:
- System prompt: X tokens
- Tool definitions: Y tokens
- Retrieved docs: Z tokens
- Message history: W tokens
- Reserved buffer: 10-20%
Trigger optimization when:
- Token utilization > 70%
- Response quality degrades
- Costs increase due to long contexts
Decision Framework
| Dominant component | Apply |
|---|---|
| Tool outputs | Observation masking |
| Retrieved documents | Summarization or partitioning |
| Message history | Compaction with summarization |
| Multiple | Combine strategies |
Performance Targets
- Compaction: 50-70% reduction, <5% quality degradation
- Masking: 60-80% reduction in masked observations
- Cache optimization: 70%+ hit rate for stable workloads
Guidelines
- Measure before optimizing—know current state
- Apply compaction before masking when possible
- Design for cache stability with consistent prompts
- Partition before context becomes problematic
- Balance token savings against quality preservation
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon


