schema-normalizer

Name: schema-normalizer
Rating: 70
Author: WILLOSCAR

by WILLOSCAR

Research pipelines as semantic execution units: each skill declares inputs/outputs, acceptance criteria, and guardrails. Evidence-first methodology prevents hollow writing through structured intermediate artifacts.

⭐ 83🍴 10📅 Jan 24, 2026

claude claude-code codex gpt pipeline research research-paper research-project

View on GitHub Run in Manus

SKILL.md

name: schema-normalizer description: | Normalize cross-skill JSONL interfaces (ids + titles + citation key formats) so downstream skills do not rely on best-effort joins. Trigger: schema normalize, jsonl contract, interface drift, join drift, 字段不一致, schema 规范化. Use when: you have generated C2-C4 JSONL artifacts (outline/briefs/bindings/packs/anchors) and want deterministic, stable fields before self-loops/writing. Skip if: you are not using the survey pipelines, or the workspace already has a fresh PASS `output/SCHEMA_NORMALIZATION_REPORT.md` for the current artifacts. Network: none. Guardrail: NO PROSE; deterministic transforms only; do not invent evidence/claims; only fill missing ids/titles from `outline/outline.yml`.

Schema Normalizer (NO PROSE)

Purpose: close a common failure mode in skills-first pipelines: schema drift across JSONL artifacts.

When fields are inconsistent (missing ids/titles, mixed citation-key formats), downstream skills start doing best-effort joins and fragile parsing. This skill makes the interface explicit and deterministic.

Inputs

outline/outline.yml (source of truth for section/subsection ids + titles)
Optional (for citation-key sanity): citations/ref.bib
Default JSONL artifacts to normalize (arxiv-survey(-latex) C4 bridge):
- outline/subsection_briefs.jsonl
- outline/chapter_briefs.jsonl
- outline/evidence_bindings.jsonl
- outline/evidence_drafts.jsonl
- outline/anchor_sheet.jsonl
Optional (run after writer packs are generated):
- outline/writer_context_packs.jsonl

Outputs

output/SCHEMA_NORMALIZATION_REPORT.md (always written; PASS/FAIL + what changed)
The processed JSONL files are normalized in place (a .bak.* is created if changes are applied).

What gets normalized

1) IDs + titles (join keys)

For any record with sub_id: "<H2>.<H3>":

Ensure section_id exists (derived from the prefix before the dot)
Ensure title, section_title exist (filled from outline/outline.yml)

For any record with section_id: "<H2>":

Ensure section_title exists (filled from outline/outline.yml)

2) Citation key format (reduce parsing drift)

Within these C2-C4 JSONL artifacts, normalize citation keys so they are raw BibTeX keys (no @ prefix):

"citations": ["smith2023", "jones2024"]

Notes:

Final prose still uses Markdown citations: [@smith2023].
This skill does not add/remove citations; it only normalizes formatting.

When to run

Recommended placement in arxiv-survey(-latex):

Run after evidence-draft + anchor-sheet and before writer-context-pack + evidence-selfloop.
This ensures outline/evidence_drafts.jsonl and outline/anchor_sheet.jsonl are schema-stable before drafting packs are built.

Failure modes

If outline/outline.yml is missing or cannot be parsed, the skill FAILs.
If any target JSONL contains invalid JSON lines, the skill reports them and FAILs (do not proceed on corrupted artifacts).

Script (optional)

Quick Start

python .codex/skills/schema-normalizer/scripts/run.py --help
Normalize the C4 bridge artifacts:
- python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws>

All Options

--workspace <dir>
--unit-id <U###>
--inputs <semicolon-separated>
--outputs <semicolon-separated>
--checkpoint <C#>

Examples

Normalize the default C4 artifacts (ids/titles + citations format):
- python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws> --inputs outline/outline.yml;citations/ref.bib;outline/subsection_briefs.jsonl;outline/chapter_briefs.jsonl;outline/evidence_bindings.jsonl;outline/evidence_drafts.jsonl;outline/anchor_sheet.jsonl --outputs output/SCHEMA_NORMALIZATION_REPORT.md
Normalize writer packs too (if you are running this after writer-context-pack):
- python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws> --inputs outline/outline.yml;citations/ref.bib;outline/writer_context_packs.jsonl --outputs output/SCHEMA_NORMALIZATION_REPORT.md

Score

Total Score

70/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

○LICENSE

ライセンスが設定されている

0/10

✓説明文

100文字以上の説明がある

+10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

✓フォーク

10回以上フォークされている

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

schema-normalizer

SKILL.md

Schema Normalizer (NO PROSE)

Inputs

Outputs

What gets normalized

1) IDs + titles (join keys)

2) Citation key format (reduce parsing drift)

When to run

Failure modes

Script (optional)

Quick Start

All Options

Examples

Score

Reviews

prompt-lookup

skill-lookup

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis

schema-normalizer

SKILL.md

Schema Normalizer (NO PROSE)

Inputs

Outputs

What gets normalized

1) IDs + titles (join keys)

2) Citation key format (reduce parsing drift)

When to run

Failure modes

Script (optional)

Quick Start

All Options

Examples

Score

Reviews

Related

Related Skills

prompt-lookup

skill-lookup

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis