← Back to list

hyde-retrieval
by yonatangross
The Complete AI Development Toolkit for Claude Code — 159 skills, 34 agents, 20 commands, 144 hooks. Production-ready patterns for FastAPI, React 19, LangGraph, security, and testing.
⭐ 29🍴 4📅 Jan 23, 2026
SKILL.md
name: hyde-retrieval description: HyDE (Hypothetical Document Embeddings) for improved semantic retrieval. Use when queries don't match document vocabulary, retrieval quality is poor, or implementing advanced RAG patterns. tags: [rag, retrieval, hyde, semantic-search] context: fork agent: data-pipeline-engineer version: 1.0.0 author: OrchestKit user-invocable: false
HyDE (Hypothetical Document Embeddings)
Generate hypothetical answer documents to bridge vocabulary gaps in semantic search.
The Problem
Direct query embedding often fails due to vocabulary mismatch:
Query: "scaling async data pipelines"
Docs use: "event-driven messaging", "Apache Kafka", "message brokers"
→ Low similarity scores despite high relevance
The Solution
Instead of embedding the query, generate a hypothetical answer document:
Query: "scaling async data pipelines"
→ LLM generates: "To scale asynchronous data pipelines, use event-driven
messaging with Apache Kafka. Message brokers provide backpressure..."
→ Embed the hypothetical document
→ Now matches docs using similar terminology
Implementation
from openai import AsyncOpenAI
from pydantic import BaseModel, Field
class HyDEResult(BaseModel):
"""Result of HyDE generation."""
original_query: str
hypothetical_doc: str
embedding: list[float]
async def generate_hyde(
query: str,
llm: AsyncOpenAI,
embed_fn: callable,
max_tokens: int = 150,
) -> HyDEResult:
"""Generate hypothetical document and embed it."""
# Generate hypothetical answer
response = await llm.chat.completions.create(
model="gpt-4o-mini", # Fast, cheap model
messages=[
{"role": "system", "content":
"Write a short paragraph that would answer this query. "
"Use technical terminology that documentation would use."},
{"role": "user", "content": query}
],
max_tokens=max_tokens,
temperature=0.3, # Low temp for consistency
)
hypothetical_doc = response.choices[0].message.content
# Embed the hypothetical document (not the query!)
embedding = await embed_fn(hypothetical_doc)
return HyDEResult(
original_query=query,
hypothetical_doc=hypothetical_doc,
embedding=embedding,
)
With Caching
from functools import lru_cache
import hashlib
class HyDEService:
def __init__(self, llm, embed_fn):
self.llm = llm
self.embed_fn = embed_fn
self._cache: dict[str, HyDEResult] = {}
def _cache_key(self, query: str) -> str:
return hashlib.md5(query.lower().strip().encode()).hexdigest()
async def generate(self, query: str) -> HyDEResult:
key = self._cache_key(query)
if key in self._cache:
return self._cache[key]
result = await generate_hyde(query, self.llm, self.embed_fn)
self._cache[key] = result
return result
Per-Concept HyDE (Advanced)
For multi-concept queries, generate HyDE for each concept:
async def batch_hyde(
concepts: list[str],
hyde_service: HyDEService,
) -> list[HyDEResult]:
"""Generate HyDE embeddings for multiple concepts in parallel."""
import asyncio
tasks = [hyde_service.generate(concept) for concept in concepts]
return await asyncio.gather(*tasks)
Overview
| Scenario | Use HyDE? |
|---|---|
| Abstract/conceptual queries | Yes |
| Exact term searches | No (use keyword) |
| Code snippet searches | No |
| Natural language questions | Yes |
| Vocabulary mismatch suspected | Yes |
Fallback Strategy
async def hyde_with_fallback(
query: str,
hyde_service: HyDEService,
embed_fn: callable,
timeout: float = 3.0,
) -> list[float]:
"""HyDE with fallback to direct embedding on timeout."""
import asyncio
try:
async with asyncio.timeout(timeout):
result = await hyde_service.generate(query)
return result.embedding
except TimeoutError:
# Fallback to direct query embedding
return await embed_fn(query)
Performance Tips
- Use fast model (gpt-4o-mini, claude-3-haiku) for generation
- Cache aggressively (queries often repeat)
- Set tight timeouts (2-3s) with fallback
- Keep hypothetical docs concise (100-200 tokens)
- Combine with query decomposition for best results
Related Skills
rag-retrieval- Core RAG patterns that HyDE enhances for better retrievalembeddings- Embedding models used to embed hypothetical documentsquery-decomposition- Complementary technique for multi-concept queriessemantic-caching- Cache HyDE results to avoid repeated LLM calls
Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Generation model | gpt-4o-mini / claude-3-haiku | Fast and cheap for hypothetical doc generation |
| Temperature | 0.3 | Low temperature for consistent, factual hypothetical docs |
| Max tokens | 100-200 | Concise docs match embedding sweet spot |
| Timeout with fallback | 2-3 seconds | Graceful degradation to direct query embedding |
References
Score
Total Score
75/100
Based on repository quality metrics
✓SKILL.md
SKILL.mdファイルが含まれている
+20
✓LICENSE
ライセンスが設定されている
+10
✓説明文
100文字以上の説明がある
+10
○人気
GitHub Stars 100以上
0/15
✓最近の活動
1ヶ月以内に更新
+10
○フォーク
10回以上フォークされている
0/5
✓Issue管理
オープンIssueが50未満
+5
✓言語
プログラミング言語が設定されている
+5
✓タグ
1つ以上のタグが設定されている
+5
Reviews
💬
Reviews coming soon
