Back to list
yonatangross

golden-dataset-management

by yonatangross

The Complete AI Development Toolkit for Claude Code — 159 skills, 34 agents, 20 commands, 144 hooks. Production-ready patterns for FastAPI, React 19, LangGraph, security, and testing.

29🍴 4📅 Jan 23, 2026

SKILL.md


name: golden-dataset-management description: Use when backing up, restoring, or validating golden datasets. Prevents data loss and ensures test data integrity for AI/ML evaluation systems. context: fork agent: data-pipeline-engineer version: 1.0.0 author: OrchestKit AI Agent Hub tags: [golden-dataset, backup, data-protection, testing, regression, 2026] allowed-tools:

  • Read
  • Grep
  • Glob
  • Bash # For backup/restore scripts user-invocable: false

Golden Dataset Management

Protect and maintain high-quality test datasets for AI/ML systems

Overview

A golden dataset is a curated collection of high-quality examples used for:

  • Regression testing: Ensure new code doesn't break existing functionality
  • Retrieval evaluation: Measure search quality (precision, recall, MRR)
  • Model benchmarking: Compare different models/approaches
  • Reproducibility: Consistent results across environments

When to use this skill:

  • Building test datasets for RAG systems
  • Implementing backup/restore for critical data
  • Validating data integrity (URL contracts, embeddings)
  • Migrating data between environments

OrchestKit's Golden Dataset

Stats (Production):

  • 98 analyses (completed content analyses)
  • 415 chunks (embedded text segments)
  • 203 test queries (with expected results)
  • 91.6% pass rate (retrieval quality metric)

Purpose:

  • Test hybrid search (vector + BM25 + RRF)
  • Validate metadata boosting strategies
  • Detect regressions in retrieval quality
  • Benchmark new embedding models

Core Concepts

Data Integrity Contracts

The URL Contract: Golden dataset analyses MUST store real canonical URLs, not placeholders.

# WRONG - Placeholder URL (breaks restore)
analysis.url = "https://orchestkit.dev/placeholder/123"

# CORRECT - Real canonical URL (enables re-fetch if needed)
analysis.url = "https://docs.python.org/3/library/asyncio.html"

Why this matters:

  • Enables re-fetching content if embeddings need regeneration
  • Allows validation that source content hasn't changed
  • Provides audit trail for data provenance

Backup Strategy Comparison

StrategyVersion ControlRestore SpeedPortabilityInspection
JSON (recommended)YesSlower (regen embeddings)HighEasy
SQL DumpNo (binary)FastDB-version dependentHard

OrchestKit uses JSON backup for version control and portability.


Quick Reference

Backup Format

{
  "version": "1.0",
  "created_at": "2025-12-19T10:30:00Z",
  "metadata": {
    "total_analyses": 98,
    "total_chunks": 415,
    "total_artifacts": 98
  },
  "analyses": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "url": "https://docs.python.org/3/library/asyncio.html",
      "content_type": "documentation",
      "status": "completed",
      "created_at": "2025-11-15T08:20:00Z",
      "chunks": [
        {
          "id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
          "content": "asyncio is a library...",
          "section_title": "Introduction to asyncio"
          // embedding NOT included (regenerated on restore)
        }
      ]
    }
  ]
}

Key Design Decisions:

  • Embeddings excluded (regenerate on restore with current model)
  • Nested structure (analyses -> chunks -> artifacts)
  • Metadata for validation
  • ISO timestamps for reproducibility

CLI Commands

cd backend

# Backup golden dataset
poetry run python scripts/backup_golden_dataset.py backup

# Verify backup integrity
poetry run python scripts/backup_golden_dataset.py verify

# Restore from backup (WARNING: Deletes existing data)
poetry run python scripts/backup_golden_dataset.py restore --replace

# Restore without deleting (adds to existing)
poetry run python scripts/backup_golden_dataset.py restore

Validation Checks

CheckError/WarningDescription
Count mismatchErrorAnalysis/chunk count differs from metadata
Placeholder URLsErrorURLs containing orchestkit.dev or placeholder
Missing embeddingsErrorChunks without embeddings after restore
Orphaned chunksWarningChunks with no parent analysis

Best Practices Summary

  1. Version control backups - Commit to git for history and diffs
  2. Validate before deployment - Run verify before production changes
  3. Test restore in staging - Never test restore in production first
  4. Document changes - Track additions/removals in metadata

Disaster Recovery Quick Guide

ScenarioSteps
Accidental deletionrestore --replace -> verify -> run tests
Migration failurealembic downgrade -1 -> restore --replace -> fix migration
New environmentClone repo -> setup DB -> restore -> run tests

References

For detailed implementation patterns, see:

  • references/storage-patterns.md - Backup strategies, JSON format, backup script implementation, CI/CD automation
  • references/versioning.md - Restore implementation, embedding regeneration, validation checklist, disaster recovery scenarios

  • golden-dataset-validation - Schema and integrity validation
  • golden-dataset-curation - Quality criteria and curation workflows
  • pgvector-search - Retrieval evaluation using golden dataset
  • ai-native-development - Embedding generation for restore

Version: 1.0.0 (December 2025) Status: Production-ready patterns from OrchestKit's 98-analysis golden dataset

Capability Details

backup

Keywords: golden dataset, backup, export, json backup, version control data Solves:

  • How do I backup the golden dataset?
  • Export analyses to JSON for version control
  • Protect critical test datasets
  • Create portable database snapshots

restore

Keywords: restore dataset, import analyses, regenerate embeddings, disaster recovery, new environment Solves:

  • How do I restore from backup?
  • Import golden dataset to new environment
  • Regenerate embeddings after restore
  • Disaster recovery procedures

validation

Keywords: verify dataset, url contract, data integrity, validate backup, placeholder urls Solves:

  • How do I validate dataset integrity?
  • Check URL contracts (no placeholders)
  • Verify embeddings exist
  • Detect orphaned chunks

ci-cd-automation

Keywords: automated backup, github actions, ci cd backup, scheduled backup Solves:

  • How do I automate dataset backups?
  • Set up GitHub Actions for weekly backups
  • Commit backups to git automatically
  • CI/CD integration patterns

disaster-recovery

Keywords: disaster recovery, accidental deletion, migration failure, rollback Solves:

  • What if I accidentally delete the dataset?
  • Database migration gone wrong
  • Restore after data corruption
  • Rollback procedures

orchestkit-golden-dataset

Keywords: orchestkit, 98 analyses, 415 chunks, retrieval evaluation, real world Solves:

  • What is OrchestKit's golden dataset?
  • How does OrchestKit protect test data?
  • Real-world backup/restore examples
  • Production golden dataset stats

Score

Total Score

75/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon