Back to list
aiskillstore

evidence-verification

by aiskillstore

Security-audited skills for Claude, Codex & Claude Code. One-click install, quality verified.

102🍴 3📅 Jan 23, 2026

SKILL.md


name: evidence-verification description: "This skill teaches agents how to collect and verify evidence before marking tasks complete. Inspired by production-grade development practices, it ensures all claims are backed by executable proof:..."

Evidence-Based Verification Skill

Version: 1.0.0 Type: Quality Assurance Auto-activate: Code review, task completion, production deployment

Overview

This skill teaches agents how to collect and verify evidence before marking tasks complete. Inspired by production-grade development practices, it ensures all claims are backed by executable proof: test results, coverage metrics, build success, and deployment verification.

Key Principle: Show, don't tell. No task is complete without verifiable evidence.


When to Use This Skill

Auto-Activate Triggers

  • Completing code implementation
  • Finishing code review
  • Marking tasks complete in Squad mode
  • Before agent handoff
  • Production deployment verification

Manual Activation

  • When user requests "verify this works"
  • Before creating pull requests
  • During quality assurance reviews
  • When troubleshooting failures

Core Concepts

1. Evidence Types

Test Evidence

  • Exit code (must be 0 for success)
  • Test suite results (passed/failed/skipped)
  • Coverage percentage (if available)
  • Test duration

Build Evidence

  • Build exit code (0 = success)
  • Compilation errors/warnings
  • Build artifacts created
  • Build duration

Deployment Evidence

  • Deployment status (success/failed)
  • Environment deployed to
  • Health check results
  • Rollback capability verified

Code Quality Evidence

  • Linter results (errors/warnings)
  • Type checker results
  • Security scan results
  • Accessibility audit results

2. Evidence Collection Protocol

## Evidence Collection Steps

1. **Identify Verification Points**
   - What needs to be proven?
   - What could go wrong?
   - What does "complete" mean?

2. **Execute Verification**
   - Run tests
   - Run build
   - Run linters
   - Check deployments

3. **Capture Results**
   - Record exit codes
   - Save output snippets
   - Note timestamps
   - Document environment

4. **Store Evidence**
   - Add to shared context
   - Reference in task completion
   - Link to artifacts

3. Verification Standards

Minimum Evidence Requirements:

  • ✅ At least ONE verification type executed
  • ✅ Exit code captured (0 = pass, non-zero = fail)
  • ✅ Timestamp recorded
  • ✅ Evidence stored in context

Production-Grade Requirements:

  • ✅ Tests run with exit code 0
  • ✅ Coverage >70% (or project standard)
  • ✅ Build succeeds with exit code 0
  • ✅ No critical linter errors
  • ✅ Security scan passes

Evidence Collection Templates

Template 1: Test Evidence

Use this template when running tests:

## Test Evidence

**Command:** `npm test` (or equivalent)
**Exit Code:** 0 ✅ / non-zero ❌
**Duration:** X seconds
**Results:**
- Tests passed: X
- Tests failed: X
- Tests skipped: X
- Coverage: X%

**Output Snippet:**

[First 10 lines of test output]


**Timestamp:** YYYY-MM-DD HH:MM:SS
**Environment:** Node vX.X.X, OS, etc.

Template 2: Build Evidence

Use this template when building:

## Build Evidence

**Command:** `npm run build` (or equivalent)
**Exit Code:** 0 ✅ / non-zero ❌
**Duration:** X seconds
**Artifacts Created:**
- dist/bundle.js (XXX KB)
- dist/styles.css (XXX KB)

**Errors:** X
**Warnings:** X

**Output Snippet:**

[First 10 lines of build output]


**Timestamp:** YYYY-MM-DD HH:MM:SS

Template 3: Code Quality Evidence

Use this template for linting and type checking:

## Code Quality Evidence

**Linter:** ESLint / Ruff / etc.
**Command:** `npm run lint`
**Exit Code:** 0 ✅ / non-zero ❌
**Errors:** X
**Warnings:** X

**Type Checker:** TypeScript / mypy / etc.
**Command:** `npm run typecheck`
**Exit Code:** 0 ✅ / non-zero ❌
**Type Errors:** X

**Timestamp:** YYYY-MM-DD HH:MM:SS

Template 4: Combined Evidence Report

Use this comprehensive template for task completion:

## Task Completion Evidence

### Task: [Task description]
### Agent: [Agent name]
### Completed: YYYY-MM-DD HH:MM:SS

### Verification Results

| Check | Command | Exit Code | Result |
|-------|---------|-----------|--------|
| Tests | `npm test` | 0 | ✅ 45 passed, 0 failed |
| Build | `npm run build` | 0 | ✅ Bundle created (234 KB) |
| Linter | `npm run lint` | 0 | ✅ No errors, 2 warnings |
| Types | `npm run typecheck` | 0 | ✅ No type errors |

### Coverage
- Statements: 87%
- Branches: 82%
- Functions: 90%
- Lines: 86%

### Evidence Files
- Test output: `.claude/quality-gates/evidence/tests-2025-XX-XX.log`
- Build output: `.claude/quality-gates/evidence/build-2025-XX-XX.log`

### Conclusion
All verification checks passed. Task ready for review.

Step-by-Step Workflows

Workflow 1: Code Implementation Verification

When: After writing code for a feature or bug fix

Steps:

  1. Save all files - Ensure changes are written

  2. Run tests

    npm test
    # or: pytest, cargo test, go test, etc.
    
    • Capture exit code
    • Note passed/failed counts
    • Record coverage if available
  3. Run build (if applicable)

    npm run build
    # or: cargo build, go build, etc.
    
    • Capture exit code
    • Note any errors/warnings
    • Verify artifacts created
  4. Run linter

    npm run lint
    # or: ruff check, cargo clippy, golangci-lint run
    
    • Capture exit code
    • Note errors/warnings
  5. Run type checker (if applicable)

    npm run typecheck
    # or: mypy, tsc --noEmit
    
    • Capture exit code
    • Note type errors
  6. Document evidence

    • Use Template 4 (Combined Evidence Report)
    • Add to shared context under quality_evidence
    • Reference in task completion message
  7. Mark task complete (only if all evidence passes)

Workflow 2: Code Review Verification

When: Reviewing another agent's code or user's PR

Steps:

  1. Read the code changes

  2. Verify tests exist

    • Are there tests for new functionality?
    • Do tests cover edge cases?
    • Are existing tests updated?
  3. Run tests

    • Execute test suite
    • Verify exit code 0
    • Check coverage didn't decrease
  4. Check build

    • Ensure project still builds
    • No new build errors
  5. Verify code quality

    • Run linter
    • Run type checker
    • Check for security issues
  6. Document review evidence

    • Use Template 3 (Code Quality Evidence)
    • Note any issues found
    • Add to context
  7. Approve or request changes

    • Approve only if all evidence passes
    • If issues found, document them with evidence

Workflow 3: Production Deployment Verification

When: Deploying to production or staging

Steps:

  1. Pre-deployment checks

    • All tests pass (exit code 0)
    • Build succeeds
    • No critical linter errors
    • Security scan passes
  2. Execute deployment

    • Run deployment command
    • Capture output
  3. Post-deployment checks

    • Health check endpoint responds
    • Application starts successfully
    • No immediate errors in logs
    • Smoke tests pass
  4. Document deployment evidence

    ## Deployment Evidence
    
    **Environment:** production
    **Timestamp:** YYYY-MM-DD HH:MM:SS
    **Version:** vX.X.X
    
    **Pre-Deployment:**
    - Tests: ✅ Exit 0
    - Build: ✅ Exit 0
    - Security: ✅ No critical issues
    
    **Deployment:**
    - Command: `kubectl apply -f deployment.yaml`
    - Exit Code: 0 ✅
    
    **Post-Deployment:**
    - Health Check: ✅ 200 OK
    - Smoke Tests: ✅ All passed
    - Error Rate: <0.1%
    
  5. Verify rollback capability

    • Ensure previous version can be restored
    • Document rollback procedure

Evidence Storage

Where to Store Evidence

Shared Context (Primary)

{
  "quality_evidence": {
    "tests_run": true,
    "test_exit_code": 0,
    "coverage_percent": 87,
    "build_success": true,
    "build_exit_code": 0,
    "linter_errors": 0,
    "linter_warnings": 2,
    "timestamp": "2025-11-02T10:30:00Z"
  }
}

Evidence Files (Secondary)

  • .claude/quality-gates/evidence/ directory
  • One file per verification run
  • Format: {type}-{timestamp}.log
  • Example: tests-2025-11-02-103000.log

Task Completion Messages

  • Include evidence summary
  • Link to detailed evidence files
  • Example: "Task complete. Tests passed (exit 0, 87% coverage), build succeeded."

Quality Standards

Minimum Acceptable

Tests executed with captured exit code ✅ Timestamp recorded ✅ Evidence stored in context

Production-Grade

Tests pass (exit code 0) ✅ Coverage ≥70% (or project standard) ✅ Build succeeds (exit code 0) ✅ No critical linter errorsType checker passesSecurity scan shows no critical issues

Gold Standard

✅ All production-grade requirements ✅ Coverage ≥80%No linter warningsPerformance benchmarks within thresholds ✅ Accessibility audit passes (WCAG 2.1 AA) ✅ Integration tests pass ✅ Deployment verification complete


Common Pitfalls

❌ Don't Skip Evidence Collection

Bad:

"I've implemented the login feature. It should work correctly."

Good:

"I've implemented the login feature. Evidence:
- Tests: Exit code 0, 12 tests passed, 0 failed
- Build: Exit code 0, no errors
- Coverage: 89%
Task complete with verification."

❌ Don't Fake Evidence

Bad:

"Tests passed" (without actually running them)

Good:

"Tests passed. Exit code: 0
Command: npm test
Output: Test Suites: 3 passed, 3 total
Timestamp: 2025-11-02 10:30:15"

❌ Don't Ignore Failed Evidence

Bad:

"Build failed with exit code 1, but the code looks correct so marking complete."

Good:

"Build failed with exit code 1. Errors:
- TypeError: Cannot read property 'id' of undefined (line 42)
Fixing the error now before marking complete."

❌ Don't Collect Evidence Only Once

Bad:

"Tests passed yesterday, so the code is still good."

Good:

"Re-running tests after today's changes.
New evidence: Exit code 0, 45 tests passed, coverage 87%"

Integration with Other Systems

Context System Integration

Evidence is automatically tracked in shared context:

// Context structure includes:
{
  quality_evidence?: {
    tests_run: boolean;
    test_exit_code?: number;
    coverage_percent?: number;
    build_success?: boolean;
    linter_errors?: number;
    timestamp: string;
  }
}

Quality Gates Integration

Evidence collection feeds into quality gates:

  • Quality gates check if evidence exists
  • Block task completion if evidence missing
  • Escalate if evidence shows failures

Squad Mode Integration

In parallel execution:

  • Each agent collects evidence independently
  • Studio Coach validates evidence before sync
  • Blocked tasks don't waste parallel cycles

Quick Reference

Evidence Collection Checklist

Before marking task complete:

- [ ] Tests executed
- [ ] Test exit code captured (0 = pass)
- [ ] Build executed (if applicable)
- [ ] Build exit code captured (0 = pass)
- [ ] Code quality checks run (linter, types)
- [ ] Evidence documented with timestamp
- [ ] Evidence added to shared context
- [ ] Evidence summary in completion message

Common Commands by Language/Framework

JavaScript/TypeScript:

npm test                 # Run tests
npm run build           # Build project
npm run lint            # Run ESLint
npm run typecheck       # Run TypeScript compiler

Python:

pytest                  # Run tests
pytest --cov           # Run tests with coverage
ruff check .           # Run linter
mypy .                 # Run type checker

Rust:

cargo test             # Run tests
cargo build            # Build project
cargo clippy           # Run linter

Go:

go test ./...          # Run tests
go build               # Build project
golangci-lint run      # Run linter

Examples

See /skills/evidence-verification/examples/ for:

  • Sample evidence reports
  • Real-world verification scenarios
  • Integration examples

Version History

v1.0.0 - Initial release

  • Core evidence collection templates
  • Verification workflows
  • Quality standards
  • Integration with context system

Remember: Evidence-first development prevents hallucinations, ensures production quality, and builds confidence. When in doubt, collect more evidence, not less.

Score

Total Score

60/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

0/10
説明文

100文字以上の説明がある

0/10
人気

GitHub Stars 100以上

+5
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon