← Back to list

reranking-patterns
by yonatangross
The Complete AI Development Toolkit for Claude Code — 159 skills, 34 agents, 20 commands, 144 hooks. Production-ready patterns for FastAPI, React 19, LangGraph, security, and testing.
⭐ 29🍴 4📅 Jan 23, 2026
SKILL.md
name: reranking-patterns description: Reranking patterns for improving search precision. Use when implementing cross-encoder reranking, LLM-based relevance scoring, or improving retrieval quality in RAG pipelines. tags: [rag, retrieval, reranking, relevance] context: fork agent: data-pipeline-engineer version: 1.0.0 author: OrchestKit user-invocable: false
Reranking Patterns
Improve search precision by re-scoring retrieved documents with more powerful models.
Overview
- Improving precision after initial retrieval
- When bi-encoder embeddings miss semantic nuance
- Combining multiple relevance signals
- Production RAG systems requiring high accuracy
Improve search precision by re-scoring retrieved documents with more powerful models.
Why Rerank?
Initial retrieval (bi-encoder) prioritizes speed over accuracy:
- Bi-encoder: Embeds query and docs separately → fast but approximate
- Cross-encoder/LLM: Processes query+doc together → slow but accurate
Solution: Retrieve many (top-50), rerank few (top-10)
Pattern 1: Cross-Encoder Reranking
from sentence_transformers import CrossEncoder
class CrossEncoderReranker:
def __init__(self, model_name: str = "cross-encoder/ms-marco-MiniLM-L-6-v2"):
self.model = CrossEncoder(model_name)
def rerank(
self,
query: str,
documents: list[dict],
top_k: int = 10,
) -> list[dict]:
"""Rerank documents using cross-encoder."""
# Create query-document pairs
pairs = [(query, doc["content"]) for doc in documents]
# Score all pairs
scores = self.model.predict(pairs)
# Sort by score
scored_docs = list(zip(documents, scores))
scored_docs.sort(key=lambda x: x[1], reverse=True)
# Return top-k with updated scores
return [
{**doc, "score": float(score)}
for doc, score in scored_docs[:top_k]
]
Pattern 2: LLM Reranking (Batch)
from openai import AsyncOpenAI
async def llm_rerank(
query: str,
documents: list[dict],
llm: AsyncOpenAI,
top_k: int = 10,
) -> list[dict]:
"""Rerank using LLM relevance scoring."""
# Build prompt with all candidates
docs_text = "\n\n".join([
f"[Doc {i+1}]\n{doc['content'][:300]}..."
for i, doc in enumerate(documents)
])
response = await llm.chat.completions.create(
model="gpt-4o-mini", # Fast, cheap
messages=[
{"role": "system", "content": """
Rate each document's relevance to the query (0.0-1.0).
Output one score per line, in order:
0.95
0.72
0.45
..."""},
{"role": "user", "content": f"Query: {query}\n\nDocuments:\n{docs_text}"}
],
temperature=0,
)
# Parse scores
scores = parse_scores(response.choices[0].message.content, len(documents))
# Sort and return
scored_docs = list(zip(documents, scores))
scored_docs.sort(key=lambda x: x[1], reverse=True)
return [
{**doc, "score": score}
for doc, score in scored_docs[:top_k]
]
def parse_scores(response: str, expected_count: int) -> list[float]:
"""Parse LLM response into scores."""
scores = []
for line in response.strip().split("\n"):
try:
score = float(line.strip())
scores.append(max(0.0, min(1.0, score)))
except ValueError:
scores.append(0.5) # Default on parse error
# Pad if needed
while len(scores) < expected_count:
scores.append(0.5)
return scores[:expected_count]
Pattern 3: Cohere Rerank API
import cohere
class CohereReranker:
def __init__(self, api_key: str):
self.client = cohere.Client(api_key)
def rerank(
self,
query: str,
documents: list[dict],
top_k: int = 10,
) -> list[dict]:
"""Rerank using Cohere's rerank API."""
results = self.client.rerank(
model="rerank-english-v3.0",
query=query,
documents=[doc["content"] for doc in documents],
top_n=top_k,
)
return [
{**documents[r.index], "score": r.relevance_score}
for r in results.results
]
Pattern 4: Combined Scoring
Combine multiple signals with weighted average:
from dataclasses import dataclass
@dataclass
class ReRankScore:
doc_id: str
base_score: float # Original retrieval score
llm_score: float # LLM relevance score
recency_score: float # Metadata-based (e.g., freshness)
final_score: float
def combined_rerank(
documents: list[dict],
llm_scores: dict[str, float],
alpha: float = 0.3, # Base weight
beta: float = 0.5, # LLM weight
gamma: float = 0.2, # Recency weight
) -> list[dict]:
"""Combine multiple scoring signals."""
scored = []
for doc in documents:
base = doc.get("score", 0.5)
llm = llm_scores.get(doc["id"], 0.5)
recency = calculate_recency_score(doc.get("created_at"))
final = (alpha * base) + (beta * llm) + (gamma * recency)
scored.append({
**doc,
"score": final,
"score_components": {
"base": base,
"llm": llm,
"recency": recency,
}
})
scored.sort(key=lambda x: x["score"], reverse=True)
return scored
Complete Reranking Service
class ReRankingService:
def __init__(
self,
llm: AsyncOpenAI,
timeout_seconds: float = 5.0,
):
self.llm = llm
self.timeout = timeout_seconds
async def rerank(
self,
query: str,
documents: list[dict],
top_k: int = 10,
) -> list[dict]:
"""Rerank with timeout and fallback."""
import asyncio
if len(documents) <= top_k:
return documents
try:
async with asyncio.timeout(self.timeout):
return await llm_rerank(
query, documents, self.llm, top_k
)
except TimeoutError:
# Fallback: return by original score
return sorted(
documents,
key=lambda x: x.get("score", 0),
reverse=True
)[:top_k]
Model Selection Guide
| Model | Latency | Cost | Quality |
|---|---|---|---|
cross-encoder/ms-marco-MiniLM-L-6-v2 | ~50ms | Free | Good |
BAAI/bge-reranker-large | ~100ms | Free | Better |
cohere rerank-english-v3.0 | ~200ms | $1/1K | Best |
gpt-4o-mini (LLM) | ~500ms | $0.15/1M | Great |
Best Practices
- Retrieve more, rerank less: Retrieve 50-100, rerank to 10
- Truncate content: 200-400 chars per doc for LLM reranking
- Set timeouts: Always fallback to base ranking
- Cache scores: Same query+doc pair = same score
- Batch when possible: One LLM call for all docs
Related Skills
rag-retrieval- Core RAG pipeline that reranking enhancescontextual-retrieval- Contextual embeddings combined with reranking for best resultsembeddings- Bi-encoder embeddings for initial retrieval before rerankingllm-evaluation- Evaluation patterns for measuring reranking quality
Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Retrieve/rerank ratio | Retrieve 50-100, rerank to 10 | Balance coverage and precision |
| Default reranker | cross-encoder/ms-marco-MiniLM-L-6-v2 | Good quality, free, fast (~50ms) |
| LLM reranking | Batch all docs in one call | Reduces latency vs per-doc calls |
| Timeout handling | Fallback to base ranking | Graceful degradation on slow reranking |
References
Score
Total Score
75/100
Based on repository quality metrics
✓SKILL.md
SKILL.mdファイルが含まれている
+20
✓LICENSE
ライセンスが設定されている
+10
✓説明文
100文字以上の説明がある
+10
○人気
GitHub Stars 100以上
0/15
✓最近の活動
1ヶ月以内に更新
+10
○フォーク
10回以上フォークされている
0/5
✓Issue管理
オープンIssueが50未満
+5
✓言語
プログラミング言語が設定されている
+5
✓タグ
1つ以上のタグが設定されている
+5
Reviews
💬
Reviews coming soon
