Back to list
yonatangross

add-golden

by yonatangross

The Complete AI Development Toolkit for Claude Code — 159 skills, 34 agents, 20 commands, 144 hooks. Production-ready patterns for FastAPI, React 19, LangGraph, security, and testing.

29🍴 4📅 Jan 23, 2026

SKILL.md


name: add-golden description: Curate and add documents to the golden dataset with multi-agent validation. Use when adding test data, creating golden datasets, saving examples. context: fork version: 1.0.0 author: OrchestKit tags: [curation, golden-dataset, evaluation, testing] user-invocable: true

Add to Golden Dataset

Multi-agent curation workflow for adding high-quality documents.

Quick Start

/add-golden https://example.com/article
/add-golden https://arxiv.org/abs/2312.xxxxx

Phase 1: Input Collection

Get URL and detect content type:

  • article (blog post, tech article)
  • tutorial (step-by-step guide)
  • documentation (API docs, reference)
  • research_paper (academic, whitepaper)

Phase 2: Fetch and Extract

Extract document structure:

  • Title and sections
  • Code blocks
  • Key technical terms
  • Metadata (author, date)

Phase 3: Parallel Analysis (4 Agents)

AgentTask
code-quality-reviewerQuality evaluation
Explore #1Difficulty classification
Explore #2Domain tagging
Explore #3Test query generation

Quality Dimensions

DimensionWeight
Accuracy0.25
Coherence0.20
Depth0.25
Relevance0.30

Difficulty Levels

  • trivial: Direct keyword match (>0.85 score)
  • easy: Common synonyms (>0.70 score)
  • medium: Paraphrased intent (>0.55 score)
  • hard: Multi-hop reasoning (>0.40 score)
  • adversarial: Edge cases, robustness

Phase 4: Validation Checks

  • URL validation (no placeholders)
  • Schema validation (required fields)
  • Duplicate check (>80% similarity)
  • Quality gates (min sections, content length)

Phase 5: Decision Thresholds

ScoreDecision
>= 0.75INCLUDE
>= 0.55REVIEW
< 0.55EXCLUDE

Phase 6: User Approval

Present results for user decision:

  • Approve: Add with generated queries
  • Modify: Edit details before adding
  • Reject: Do not add

Phase 7: Write to Dataset

Update fixture files:

  • documents_expanded.json
  • source_url_map.json
  • queries.json

Validate fixture consistency after writing.

Summary

Total Parallel Agents: 4

  • 1 code-quality-reviewer
  • 3 Explore agents

Quality Gates:

  • Minimum score: 0.55 for review
  • No placeholder URLs
  • No duplicates (>90% similar)
  • At least 2 tags, 2 sections
  • golden-dataset-validation - Validate existing golden datasets for quality and coverage
  • llm-evaluation - LLM output evaluation patterns used in quality scoring
  • test-data-management - General test data strategies and fixture management

Key Decisions

DecisionChoiceRationale
Quality Threshold>= 0.55 for reviewBalances precision with recall for dataset curation
Duplicate Detection80% similarityPrevents near-duplicates while allowing related content
Parallel Agents4 concurrentOptimal parallelism for quality/difficulty/tagging analysis
WeightingRelevance highest (0.30)Retrieval relevance most critical for RAG evaluation

References

Score

Total Score

75/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon