Back to list
athola

subagent-testing

by athola

Marketplace repo for Claude Code Plugins developed from personal projects and workflow

127🍴 16📅 Jan 23, 2026

SKILL.md


name: subagent-testing description: | TDD-style testing methodology for skills using fresh subagent instances to prevent priming bias and validate skill effectiveness.

Triggers: test skill, validate skill, skill testing, subagent testing, fresh instance testing, TDD for skills, skill validation

Use when: validating skill improvements, testing skill effectiveness, preventing priming bias, measuring skill impact on behavior

DO NOT use when: implementing skills (use skill-authoring instead), creating hooks (use hook-authoring instead) version: 1.0.0 category: testing tags: [testing, validation, TDD, subagents, fresh-instances] token_budget: 30 progressive_loading: true

Subagent Testing - TDD for Skills

Test skills with fresh subagent instances to prevent priming bias and validate effectiveness.

Overview

Fresh instances prevent priming: Each test uses a new Claude conversation to verify the skill's impact is measured, not conversation history effects.

Why Fresh Instances Matter

The Priming Problem

Running tests in the same conversation creates bias:

  • Prior context influences responses
  • Skill effects get mixed with conversation history
  • Can't isolate skill's true impact

Fresh Instance Benefits

  • Isolation: Each test starts clean
  • Reproducibility: Consistent baseline state
  • Measurement: Clear before/after comparison
  • Validation: Proves skill effectiveness, not priming

Testing Methodology

Three-phase TDD-style approach:

Phase 1: Baseline Testing (RED)

Test without skill to establish baseline behavior.

Phase 2: With-Skill Testing (GREEN)

Test with skill loaded to measure improvements.

Phase 3: Rationalization Testing (REFACTOR)

Test skill's anti-rationalization guardrails.

Quick Start

# 1. Create baseline tests (without skill)
# Use 5 diverse scenarios
# Document full responses

# 2. Create with-skill tests (fresh instances)
# Load skill explicitly
# Use identical prompts
# Compare to baseline

# 3. Create rationalization tests
# Test anti-rationalization patterns
# Verify guardrails work

Detailed Testing Guide

For complete testing patterns, examples, and templates:

Success Criteria

  • Baseline: Document 5+ diverse baseline scenarios
  • Improvement: ≥50% improvement in skill-related metrics
  • Consistency: Results reproducible across fresh instances
  • Rationalization Defense: Guardrails prevent ≥80% of rationalization attempts

See Also

  • skill-authoring: Creating effective skills
  • bulletproof-skill: Anti-rationalization patterns
  • test-skill: Automated skill testing command

Score

Total Score

70/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

0/10
人気

GitHub Stars 100以上

+5
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

+5
Issue管理

オープンIssueが50未満

0/5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon