taxonomy-builder

Name: taxonomy-builder
Rating: 70
Author: WILLOSCAR

by WILLOSCAR

Research pipelines as semantic execution units: each skill declares inputs/outputs, acceptance criteria, and guardrails. Evidence-first methodology prevents hollow writing through structured intermediate artifacts.

⭐ 83🍴 10📅 Jan 24, 2026

claude claude-code codex gpt pipeline research research-paper research-project

View on GitHub Run in Manus

SKILL.md

name: taxonomy-builder description: | Build a 2+ level taxonomy (`outline/taxonomy.yml`) from a core paper set and scope constraints, with short descriptions per node. Trigger: taxonomy, taxonomy builder, 分类, 主题树, taxonomy.yml. Use when: survey/snapshot 的结构阶段（NO PROSE），已有 `papers/core_set.csv`，需要生成可映射且读者友好的主题结构。 Skip if: 已经有批准过且可映射的 taxonomy（不要无意义重构）。 Network: none. Guardrail: 避免泛化占位桶；保持 2+ 层且每节点有具体描述。

Taxonomy Builder

Turn a core paper set into a 2+ level, mappable taxonomy that will drive the outline and paper-to-section mapping.

This is structure, not writing: avoid prose paragraphs and avoid “generic placeholder” buckets.

Role cards (prompt-level guidance)

Taxonomy Architect
- Mission: create a taxonomy that reads like a survey’s core chapters (few, thick buckets).
- Do: choose 3–4 top-level chapters by reader questions and decision-relevant axes.
- Avoid: keyword-only clusters, “Misc/Other”, and too many top-level buckets that would bloat the final ToC.
Mapping Sponsor
- Mission: keep the taxonomy mappable to real papers.
- Do: ensure each leaf can plausibly map to multiple papers (ideally ≥3); keep node names discriminative.
- Avoid: overlapping buckets whose boundaries are not explainable.
Scope Guardian
- Mission: encode what counts as in-scope at the taxonomy level.
- Do: bake boundary cues into descriptions (what belongs here, what does not).
- Avoid: relying on later prose to resolve scope drift.

When to use

You have a papers/core_set.csv and need a stable structure for a survey/snapshot.
You want categories that are meaningful to readers (not just keyword clusters).

When not to use

You already have an approved taxonomy that maps well to your target narrative (don’t churn it).

Inputs

papers/core_set.csv (required)
Optional: papers/papers_dedup.jsonl (to peek at abstracts/metadata)
Optional: DECISIONS.md (scope constraints)

Output

outline/taxonomy.yml

Workflow (heuristic)

Uses: papers/papers_dedup.jsonl, DECISIONS.md.

Skim the core set and cluster by reader-relevant axes, not by surface keywords.
- For LLM agents, common axes: control loop/architecture, tool use, planning & reasoning, memory/RAG, multi-agent coordination, evaluation/benchmarks, safety/security, applications.
Choose top-level nodes that feel like “chapters in a survey”, and keep a paper-like section budget:
- If you want a paper-like PDF with ~6–8 H2 sections total (see ref/agent-surveys/STYLE_REPORT.md), remember the pipeline also adds fixed H2 sections (Introduction / Related Work / Discussion / Conclusion).
- In that case, aim for ~3–4 taxonomy-driven chapters (top-level nodes), not 8–12 tiny buckets.
- Deep surveys can go wider (e.g., 5–6 taxonomy chapters), but expect thinner writing unless you also expand evidence and writing budgets.
For each top-level node, create 2–6 subtopics with clear inclusion cues (what belongs here, what doesn’t).
Write a short description for every node:
- define what the bucket covers
- name 2–5 representative paper IDs (or recognizable lines of work) that belong here
Sanity check:
- leaves aren’t too tiny (ideally ≥3 papers per leaf)
- names are mutually exclusive enough (some overlap is OK, confusion is not)

Quality checklist

outline/taxonomy.yml has ≥2 levels.
Every node has a description with concrete meaning (not “Papers and ideas centered on …” boilerplate).
Leaf nodes look mappable (not overly broad like “Misc/Other”).
Top-level nodes feel like chapters (avoid too many tiny buckets if you target a paper-like 6–8 H2 structure).

Common failure modes (and fixes)

Generic buckets (“Overview/Benchmarks/Open Problems”) → rename to content-based subtopics.
Keyword clustering → reframe as design/evaluation questions a reader would ask.
Too much overlap → tighten inclusion cues; split a bucket by mechanism vs evaluation vs safety.
Too many top-level buckets → merge into fewer, thicker chapters; push fine-grained points into subsection bullets/axes instead of new H2 sections.

Helper script (optional)

Quick Start

python .codex/skills/taxonomy-builder/scripts/run.py --help
python .codex/skills/taxonomy-builder/scripts/run.py --workspace <workspace_dir>

All Options

--top-k <n>: number of candidate terms to consider
--min-freq <n>: minimum frequency threshold

Examples

Generate a baseline taxonomy (then optionally refine):
- python .codex/skills/taxonomy-builder/scripts/run.py --workspace <ws> --top-k 100 --min-freq 2

Notes

The script generates a baseline 2-level taxonomy (topic-aware) and never overwrites non-placeholder work.
In pipeline.py --strict it will be blocked only if placeholder markers (TODO/TBD/FIXME/(placeholder)) remain.

Refinement marker (recommended; completion signal)

When you are satisfied with the taxonomy (and after C2 approval if applicable), create:

outline/taxonomy.refined.ok

This is an explicit "I reviewed/refined this" signal:

makes it harder for a scaffold-y taxonomy to silently pass in strict runs
documents that buckets were edited into reader-meaningful, mappable nodes

Troubleshooting

Common Issues

Issue: Quality gate blocks `taxonomy_scaffold`

Symptom:

output/QUALITY_GATE.md reports taxonomy contains TODO/placeholder text.

Causes:

Helper script generated a scaffold, but taxonomy was not rewritten.

Solutions:

Rewrite every node name + description to be domain-meaningful.
Ensure ≥2 levels via children.
Remove generic buckets like “Overview/Benchmarks/Open Problems”.

Issue: Taxonomy has no depth (`children` missing)

Symptom:

Quality gate reports “needs ≥2 levels”.

Causes:

Only top-level nodes were created.

Solutions:

Add 2–6 child nodes per top-level node, each with clear inclusion cues.

Recovery Checklist

outline/taxonomy.yml is valid YAML list.
At least one node has a non-empty children list.
No TODO/(placeholder) remains.

Score

Total Score

70/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

○LICENSE

ライセンスが設定されている

0/10

✓説明文

100文字以上の説明がある

+10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

✓フォーク

10回以上フォークされている

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

taxonomy-builder

SKILL.md

Taxonomy Builder

Role cards (prompt-level guidance)

When to use

When not to use

Inputs

Output

Workflow (heuristic)

Quality checklist

Common failure modes (and fixes)

Helper script (optional)

Quick Start

All Options

Examples

Notes

Refinement marker (recommended; completion signal)

Troubleshooting

Common Issues

Issue: Quality gate blocks `taxonomy_scaffold`

Issue: Taxonomy has no depth (`children` missing)

Recovery Checklist

Score

Reviews

prompt-lookup

skill-lookup

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis

taxonomy-builder

SKILL.md

Taxonomy Builder

Role cards (prompt-level guidance)

When to use

When not to use

Inputs

Output

Workflow (heuristic)

Quality checklist

Common failure modes (and fixes)

Helper script (optional)

Quick Start

All Options

Examples

Notes

Refinement marker (recommended; completion signal)

Troubleshooting

Common Issues

Issue: Quality gate blocks taxonomy_scaffold

Issue: Taxonomy has no depth (children missing)

Recovery Checklist

Score

Reviews

Related

Related Skills

prompt-lookup

skill-lookup

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis

Issue: Quality gate blocks `taxonomy_scaffold`

Issue: Taxonomy has no depth (`children` missing)