golden-dataset-management

Name: golden-dataset-management
Rating: 75
Author: yonatangross

by yonatangross

The Complete AI Development Toolkit for Claude Code — 159 skills, 34 agents, 20 commands, 144 hooks. Production-ready patterns for FastAPI, React 19, LangGraph, security, and testing.

⭐ 29🍴 4📅 Jan 23, 2026

agents ai-development claude-code claude-plugin fastapi langgraph llm mcp

View on GitHub Run in Manus

SKILL.md

name: golden-dataset-management description: Use when backing up, restoring, or validating golden datasets. Prevents data loss and ensures test data integrity for AI/ML evaluation systems. context: fork agent: data-pipeline-engineer version: 1.0.0 author: OrchestKit AI Agent Hub tags: [golden-dataset, backup, data-protection, testing, regression, 2026] allowed-tools:

Read
Grep
Glob
Bash # For backup/restore scripts user-invocable: false

Golden Dataset Management

Protect and maintain high-quality test datasets for AI/ML systems

Overview

A golden dataset is a curated collection of high-quality examples used for:

Regression testing: Ensure new code doesn't break existing functionality
Retrieval evaluation: Measure search quality (precision, recall, MRR)
Model benchmarking: Compare different models/approaches
Reproducibility: Consistent results across environments

When to use this skill:

Building test datasets for RAG systems
Implementing backup/restore for critical data
Validating data integrity (URL contracts, embeddings)
Migrating data between environments

OrchestKit's Golden Dataset

Stats (Production):

98 analyses (completed content analyses)
415 chunks (embedded text segments)
203 test queries (with expected results)
91.6% pass rate (retrieval quality metric)

Purpose:

Test hybrid search (vector + BM25 + RRF)
Validate metadata boosting strategies
Detect regressions in retrieval quality
Benchmark new embedding models

Core Concepts

Data Integrity Contracts

The URL Contract: Golden dataset analyses MUST store real canonical URLs, not placeholders.

# WRONG - Placeholder URL (breaks restore)
analysis.url = "https://orchestkit.dev/placeholder/123"

# CORRECT - Real canonical URL (enables re-fetch if needed)
analysis.url = "https://docs.python.org/3/library/asyncio.html"

Why this matters:

Enables re-fetching content if embeddings need regeneration
Allows validation that source content hasn't changed
Provides audit trail for data provenance

Backup Strategy Comparison

Strategy	Version Control	Restore Speed	Portability	Inspection
JSON (recommended)	Yes	Slower (regen embeddings)	High	Easy
SQL Dump	No (binary)	Fast	DB-version dependent	Hard

OrchestKit uses JSON backup for version control and portability.

Quick Reference

Backup Format

{
  "version": "1.0",
  "created_at": "2025-12-19T10:30:00Z",
  "metadata": {
    "total_analyses": 98,
    "total_chunks": 415,
    "total_artifacts": 98
  },
  "analyses": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "url": "https://docs.python.org/3/library/asyncio.html",
      "content_type": "documentation",
      "status": "completed",
      "created_at": "2025-11-15T08:20:00Z",
      "chunks": [
        {
          "id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
          "content": "asyncio is a library...",
          "section_title": "Introduction to asyncio"
          // embedding NOT included (regenerated on restore)
        }
      ]
    }
  ]
}

Key Design Decisions:

Embeddings excluded (regenerate on restore with current model)
Nested structure (analyses -> chunks -> artifacts)
Metadata for validation
ISO timestamps for reproducibility

CLI Commands

cd backend

# Backup golden dataset
poetry run python scripts/backup_golden_dataset.py backup

# Verify backup integrity
poetry run python scripts/backup_golden_dataset.py verify

# Restore from backup (WARNING: Deletes existing data)
poetry run python scripts/backup_golden_dataset.py restore --replace

# Restore without deleting (adds to existing)
poetry run python scripts/backup_golden_dataset.py restore

Validation Checks

Check	Error/Warning	Description
Count mismatch	Error	Analysis/chunk count differs from metadata
Placeholder URLs	Error	URLs containing orchestkit.dev or placeholder
Missing embeddings	Error	Chunks without embeddings after restore
Orphaned chunks	Warning	Chunks with no parent analysis

Best Practices Summary

Version control backups - Commit to git for history and diffs
Validate before deployment - Run verify before production changes
Test restore in staging - Never test restore in production first
Document changes - Track additions/removals in metadata

Disaster Recovery Quick Guide

Scenario	Steps
Accidental deletion	`restore --replace` -> `verify` -> run tests
Migration failure	`alembic downgrade -1` -> `restore --replace` -> fix migration
New environment	Clone repo -> setup DB -> `restore` -> run tests

References

For detailed implementation patterns, see:

references/storage-patterns.md - Backup strategies, JSON format, backup script implementation, CI/CD automation
references/versioning.md - Restore implementation, embedding regeneration, validation checklist, disaster recovery scenarios

golden-dataset-validation - Schema and integrity validation
golden-dataset-curation - Quality criteria and curation workflows
pgvector-search - Retrieval evaluation using golden dataset
ai-native-development - Embedding generation for restore

Version: 1.0.0 (December 2025) Status: Production-ready patterns from OrchestKit's 98-analysis golden dataset

Capability Details

backup

Keywords: golden dataset, backup, export, json backup, version control data Solves:

How do I backup the golden dataset?
Export analyses to JSON for version control
Protect critical test datasets
Create portable database snapshots

restore

Keywords: restore dataset, import analyses, regenerate embeddings, disaster recovery, new environment Solves:

How do I restore from backup?
Import golden dataset to new environment
Regenerate embeddings after restore
Disaster recovery procedures

validation

Keywords: verify dataset, url contract, data integrity, validate backup, placeholder urls Solves:

How do I validate dataset integrity?
Check URL contracts (no placeholders)
Verify embeddings exist
Detect orphaned chunks

ci-cd-automation

Keywords: automated backup, github actions, ci cd backup, scheduled backup Solves:

How do I automate dataset backups?
Set up GitHub Actions for weekly backups
Commit backups to git automatically
CI/CD integration patterns

disaster-recovery

Keywords: disaster recovery, accidental deletion, migration failure, rollback Solves:

What if I accidentally delete the dataset?
Database migration gone wrong
Restore after data corruption
Rollback procedures

orchestkit-golden-dataset

Keywords: orchestkit, 98 analyses, 415 chunks, retrieval evaluation, real world Solves:

What is OrchestKit's golden dataset?
How does OrchestKit protect test data?
Real-world backup/restore examples
Production golden dataset stats

Score

Total Score

75/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

✓説明文

100文字以上の説明がある

+10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

○フォーク

10回以上フォークされている

0/5

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

golden-dataset-management

SKILL.md

Golden Dataset Management

Overview

OrchestKit's Golden Dataset

Core Concepts

Data Integrity Contracts

Backup Strategy Comparison

Quick Reference

Backup Format

CLI Commands

Validation Checks

Best Practices Summary

Disaster Recovery Quick Guide

References

Capability Details

backup

restore

validation

ci-cd-automation

disaster-recovery

orchestkit-golden-dataset

Score

Reviews

orpc-contract-first

component-refactoring

web-design-guidelines

frontend-code-review

frontend-testing

vercel-react-best-practices

golden-dataset-management

SKILL.md

Golden Dataset Management

Overview

OrchestKit's Golden Dataset

Core Concepts

Data Integrity Contracts

Backup Strategy Comparison

Quick Reference

Backup Format

CLI Commands

Validation Checks

Best Practices Summary

Disaster Recovery Quick Guide

References

Related Skills

Capability Details

backup

restore

validation

ci-cd-automation

disaster-recovery

orchestkit-golden-dataset

Score

Reviews

Related

Related Skills

orpc-contract-first

component-refactoring

web-design-guidelines

frontend-code-review

frontend-testing

vercel-react-best-practices