Back to list
WILLOSCAR

schema-normalizer

by WILLOSCAR

Research pipelines as semantic execution units: each skill declares inputs/outputs, acceptance criteria, and guardrails. Evidence-first methodology prevents hollow writing through structured intermediate artifacts.

83🍴 10📅 Jan 24, 2026

SKILL.md


name: schema-normalizer description: | Normalize cross-skill JSONL interfaces (ids + titles + citation key formats) so downstream skills do not rely on best-effort joins. Trigger: schema normalize, jsonl contract, interface drift, join drift, 字段不一致, schema 规范化. Use when: you have generated C2-C4 JSONL artifacts (outline/briefs/bindings/packs/anchors) and want deterministic, stable fields before self-loops/writing. Skip if: you are not using the survey pipelines, or the workspace already has a fresh PASS output/SCHEMA_NORMALIZATION_REPORT.md for the current artifacts. Network: none. Guardrail: NO PROSE; deterministic transforms only; do not invent evidence/claims; only fill missing ids/titles from outline/outline.yml.

Schema Normalizer (NO PROSE)

Purpose: close a common failure mode in skills-first pipelines: schema drift across JSONL artifacts.

When fields are inconsistent (missing ids/titles, mixed citation-key formats), downstream skills start doing best-effort joins and fragile parsing. This skill makes the interface explicit and deterministic.

Inputs

  • outline/outline.yml (source of truth for section/subsection ids + titles)
  • Optional (for citation-key sanity): citations/ref.bib
  • Default JSONL artifacts to normalize (arxiv-survey(-latex) C4 bridge):
    • outline/subsection_briefs.jsonl
    • outline/chapter_briefs.jsonl
    • outline/evidence_bindings.jsonl
    • outline/evidence_drafts.jsonl
    • outline/anchor_sheet.jsonl
  • Optional (run after writer packs are generated):
    • outline/writer_context_packs.jsonl

Outputs

  • output/SCHEMA_NORMALIZATION_REPORT.md (always written; PASS/FAIL + what changed)
  • The processed JSONL files are normalized in place (a .bak.* is created if changes are applied).

What gets normalized

1) IDs + titles (join keys)

For any record with sub_id: "<H2>.<H3>":

  • Ensure section_id exists (derived from the prefix before the dot)
  • Ensure title, section_title exist (filled from outline/outline.yml)

For any record with section_id: "<H2>":

  • Ensure section_title exists (filled from outline/outline.yml)

2) Citation key format (reduce parsing drift)

Within these C2-C4 JSONL artifacts, normalize citation keys so they are raw BibTeX keys (no @ prefix):

  • "citations": ["smith2023", "jones2024"]

Notes:

  • Final prose still uses Markdown citations: [@smith2023].
  • This skill does not add/remove citations; it only normalizes formatting.

When to run

Recommended placement in arxiv-survey(-latex):

  • Run after evidence-draft + anchor-sheet and before writer-context-pack + evidence-selfloop.
  • This ensures outline/evidence_drafts.jsonl and outline/anchor_sheet.jsonl are schema-stable before drafting packs are built.

Failure modes

  • If outline/outline.yml is missing or cannot be parsed, the skill FAILs.
  • If any target JSONL contains invalid JSON lines, the skill reports them and FAILs (do not proceed on corrupted artifacts).

Script (optional)

Quick Start

  • python .codex/skills/schema-normalizer/scripts/run.py --help
  • Normalize the C4 bridge artifacts:
    • python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws>

All Options

  • --workspace <dir>
  • --unit-id <U###>
  • --inputs <semicolon-separated>
  • --outputs <semicolon-separated>
  • --checkpoint <C#>

Examples

  • Normalize the default C4 artifacts (ids/titles + citations format):

    • python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws> --inputs outline/outline.yml;citations/ref.bib;outline/subsection_briefs.jsonl;outline/chapter_briefs.jsonl;outline/evidence_bindings.jsonl;outline/evidence_drafts.jsonl;outline/anchor_sheet.jsonl --outputs output/SCHEMA_NORMALIZATION_REPORT.md
  • Normalize writer packs too (if you are running this after writer-context-pack):

    • python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws> --inputs outline/outline.yml;citations/ref.bib;outline/writer_context_packs.jsonl --outputs output/SCHEMA_NORMALIZATION_REPORT.md

Score

Total Score

70/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

0/10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

+5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon