reranking-patterns

Name: reranking-patterns
Rating: 75
Author: yonatangross

by yonatangross

The Complete AI Development Toolkit for Claude Code — 159 skills, 34 agents, 20 commands, 144 hooks. Production-ready patterns for FastAPI, React 19, LangGraph, security, and testing.

⭐ 29🍴 4📅 Jan 23, 2026

agents ai-development claude-code claude-plugin fastapi langgraph llm mcp

View on GitHub Run in Manus

SKILL.md

name: reranking-patterns description: Reranking patterns for improving search precision. Use when implementing cross-encoder reranking, LLM-based relevance scoring, or improving retrieval quality in RAG pipelines. tags: [rag, retrieval, reranking, relevance] context: fork agent: data-pipeline-engineer version: 1.0.0 author: OrchestKit user-invocable: false

Reranking Patterns

Improve search precision by re-scoring retrieved documents with more powerful models.

Overview

Improving precision after initial retrieval
When bi-encoder embeddings miss semantic nuance
Combining multiple relevance signals
Production RAG systems requiring high accuracy

Improve search precision by re-scoring retrieved documents with more powerful models.

Why Rerank?

Initial retrieval (bi-encoder) prioritizes speed over accuracy:

Bi-encoder: Embeds query and docs separately → fast but approximate
Cross-encoder/LLM: Processes query+doc together → slow but accurate

Solution: Retrieve many (top-50), rerank few (top-10)

Pattern 1: Cross-Encoder Reranking

from sentence_transformers import CrossEncoder

class CrossEncoderReranker:
    def __init__(self, model_name: str = "cross-encoder/ms-marco-MiniLM-L-6-v2"):
        self.model = CrossEncoder(model_name)

    def rerank(
        self,
        query: str,
        documents: list[dict],
        top_k: int = 10,
    ) -> list[dict]:
        """Rerank documents using cross-encoder."""

        # Create query-document pairs
        pairs = [(query, doc["content"]) for doc in documents]

        # Score all pairs
        scores = self.model.predict(pairs)

        # Sort by score
        scored_docs = list(zip(documents, scores))
        scored_docs.sort(key=lambda x: x[1], reverse=True)

        # Return top-k with updated scores
        return [
            {**doc, "score": float(score)}
            for doc, score in scored_docs[:top_k]
        ]

Pattern 2: LLM Reranking (Batch)

from openai import AsyncOpenAI

async def llm_rerank(
    query: str,
    documents: list[dict],
    llm: AsyncOpenAI,
    top_k: int = 10,
) -> list[dict]:
    """Rerank using LLM relevance scoring."""

    # Build prompt with all candidates
    docs_text = "\n\n".join([
        f"[Doc {i+1}]\n{doc['content'][:300]}..."
        for i, doc in enumerate(documents)
    ])

    response = await llm.chat.completions.create(
        model="gpt-4o-mini",  # Fast, cheap
        messages=[
            {"role": "system", "content": """
Rate each document's relevance to the query (0.0-1.0).
Output one score per line, in order:
0.95
0.72
0.45
..."""},
            {"role": "user", "content": f"Query: {query}\n\nDocuments:\n{docs_text}"}
        ],
        temperature=0,
    )

    # Parse scores
    scores = parse_scores(response.choices[0].message.content, len(documents))

    # Sort and return
    scored_docs = list(zip(documents, scores))
    scored_docs.sort(key=lambda x: x[1], reverse=True)

    return [
        {**doc, "score": score}
        for doc, score in scored_docs[:top_k]
    ]


def parse_scores(response: str, expected_count: int) -> list[float]:
    """Parse LLM response into scores."""
    scores = []
    for line in response.strip().split("\n"):
        try:
            score = float(line.strip())
            scores.append(max(0.0, min(1.0, score)))
        except ValueError:
            scores.append(0.5)  # Default on parse error

    # Pad if needed
    while len(scores) < expected_count:
        scores.append(0.5)

    return scores[:expected_count]

Pattern 3: Cohere Rerank API

import cohere

class CohereReranker:
    def __init__(self, api_key: str):
        self.client = cohere.Client(api_key)

    def rerank(
        self,
        query: str,
        documents: list[dict],
        top_k: int = 10,
    ) -> list[dict]:
        """Rerank using Cohere's rerank API."""

        results = self.client.rerank(
            model="rerank-english-v3.0",
            query=query,
            documents=[doc["content"] for doc in documents],
            top_n=top_k,
        )

        return [
            {**documents[r.index], "score": r.relevance_score}
            for r in results.results
        ]

Pattern 4: Combined Scoring

Combine multiple signals with weighted average:

from dataclasses import dataclass

@dataclass
class ReRankScore:
    doc_id: str
    base_score: float      # Original retrieval score
    llm_score: float       # LLM relevance score
    recency_score: float   # Metadata-based (e.g., freshness)
    final_score: float

def combined_rerank(
    documents: list[dict],
    llm_scores: dict[str, float],
    alpha: float = 0.3,  # Base weight
    beta: float = 0.5,   # LLM weight
    gamma: float = 0.2,  # Recency weight
) -> list[dict]:
    """Combine multiple scoring signals."""

    scored = []
    for doc in documents:
        base = doc.get("score", 0.5)
        llm = llm_scores.get(doc["id"], 0.5)
        recency = calculate_recency_score(doc.get("created_at"))

        final = (alpha * base) + (beta * llm) + (gamma * recency)

        scored.append({
            **doc,
            "score": final,
            "score_components": {
                "base": base,
                "llm": llm,
                "recency": recency,
            }
        })

    scored.sort(key=lambda x: x["score"], reverse=True)
    return scored

Complete Reranking Service

class ReRankingService:
    def __init__(
        self,
        llm: AsyncOpenAI,
        timeout_seconds: float = 5.0,
    ):
        self.llm = llm
        self.timeout = timeout_seconds

    async def rerank(
        self,
        query: str,
        documents: list[dict],
        top_k: int = 10,
    ) -> list[dict]:
        """Rerank with timeout and fallback."""
        import asyncio

        if len(documents) <= top_k:
            return documents

        try:
            async with asyncio.timeout(self.timeout):
                return await llm_rerank(
                    query, documents, self.llm, top_k
                )
        except TimeoutError:
            # Fallback: return by original score
            return sorted(
                documents,
                key=lambda x: x.get("score", 0),
                reverse=True
            )[:top_k]

Model Selection Guide

Model	Latency	Cost	Quality
`cross-encoder/ms-marco-MiniLM-L-6-v2`	~50ms	Free	Good
`BAAI/bge-reranker-large`	~100ms	Free	Better
`cohere rerank-english-v3.0`	~200ms	$1/1K	Best
`gpt-4o-mini` (LLM)	~500ms	$0.15/1M	Great

Best Practices

Retrieve more, rerank less: Retrieve 50-100, rerank to 10
Truncate content: 200-400 chars per doc for LLM reranking
Set timeouts: Always fallback to base ranking
Cache scores: Same query+doc pair = same score
Batch when possible: One LLM call for all docs

rag-retrieval - Core RAG pipeline that reranking enhances
contextual-retrieval - Contextual embeddings combined with reranking for best results
embeddings - Bi-encoder embeddings for initial retrieval before reranking
llm-evaluation - Evaluation patterns for measuring reranking quality

Key Decisions

Decision	Choice	Rationale
Retrieve/rerank ratio	Retrieve 50-100, rerank to 10	Balance coverage and precision
Default reranker	cross-encoder/ms-marco-MiniLM-L-6-v2	Good quality, free, fast (~50ms)
LLM reranking	Batch all docs in one call	Reduces latency vs per-doc calls
Timeout handling	Fallback to base ranking	Graceful degradation on slow reranking

References

Score

Total Score

75/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

✓説明文

100文字以上の説明がある

+10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

○フォーク

10回以上フォークされている

0/5

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

reranking-patterns

SKILL.md

Reranking Patterns

Overview

Why Rerank?

Pattern 1: Cross-Encoder Reranking

Pattern 2: LLM Reranking (Batch)

Pattern 3: Cohere Rerank API

Pattern 4: Combined Scoring

Complete Reranking Service

Model Selection Guide

Best Practices

Key Decisions

References

Score

Reviews

orpc-contract-first

component-refactoring

web-design-guidelines

frontend-code-review

frontend-testing

vercel-react-best-practices

reranking-patterns

SKILL.md

Reranking Patterns

Overview

Why Rerank?

Pattern 1: Cross-Encoder Reranking

Pattern 2: LLM Reranking (Batch)

Pattern 3: Cohere Rerank API

Pattern 4: Combined Scoring

Complete Reranking Service

Model Selection Guide

Best Practices

Related Skills

Key Decisions

References

Score

Reviews

Related

Related Skills

orpc-contract-first

component-refactoring

web-design-guidelines

frontend-code-review

frontend-testing

vercel-react-best-practices