
evidence-selfloop
by WILLOSCAR
Research pipelines as semantic execution units: each skill declares inputs/outputs, acceptance criteria, and guardrails. Evidence-first methodology prevents hollow writing through structured intermediate artifacts.
SKILL.md
name: evidence-selfloop
description: |
Evidence self-loop for surveys: read evidence bindings + evidence packs, then write an actionable upstream TODO plan (which stage/skill to fix) before writing more prose.
Writes output/EVIDENCE_SELFLOOP_TODO.md.
Trigger: evidence self-loop, evidence loop, evidence gaps, binding gaps, blocking_missing, 证据自循环, 证据缺口回路.
Use when: C4 outputs exist (outline/evidence_bindings.jsonl, outline/evidence_drafts.jsonl) but writing looks hollow or C5 is BLOCKED due to thin evidence.
Skip if: you are still pre-C3 (no notes/evidence bank yet), or you want to draft anyway and accept a lower evidence bar.
Network: none.
Guardrail: analysis-only; do not edit evidence/writing artifacts; do not invent facts/citations; only write the TODO report.
Evidence Self-loop (C3/C4 fix → rebind → redraft)
Purpose: make the evidence-first pipeline converge without writing filler prose.
This skill reads the intermediate evidence artifacts (briefs/bindings/packs) and produces an actionable TODO list that answers:
- Which subsections are under-supported?
- Is the problem mapping/coverage (C2) or evidence extraction (C3) or binding/planning (C4)?
- Which skill(s) should be rerun, in what order, to unblock high-quality writing?
Inputs
outline/subsection_briefs.jsonloutline/evidence_bindings.jsonl(expectsbinding_gaps/binding_rationaleif available)outline/evidence_drafts.jsonl(expectsblocking_missing, comparisons, eval protocol, limitations)- Optional (improves routing):
outline/evidence_binding_report.mdoutline/anchor_sheet.jsonlpapers/paper_notes.jsonlpapers/fulltext_index.jsonlqueries.md
Outputs
output/EVIDENCE_SELFLOOP_TODO.md(report-class; always written)
Self-loop contract (what “fixing evidence” means)
- Prefer fixing upstream evidence, not writing around gaps.
- If an evidence pack has
blocking_missing, treat it as a STOP signal: strengthen notes/fulltext/mapping, then regenerate packs. - If bindings show
binding_gaps, treat it as a ROUTING signal: either enrich the evidence bank for the mapped papers, expand mapping coverage, or adjustrequired_evidence_fieldsif unrealistic.
Recommended rerun chain (minimal):
- If C3 evidence is thin:
pdf-text-extractor→paper-notes→evidence-binder→evidence-draft→anchor-sheet→writer-context-pack - If C2 coverage is weak:
section-mapper→outline-refiner→ (then rerun C3/C4 evidence skills)
Workflow (analysis-only)
- Read
queries.md(if present)
- Use it only as a soft config hint (evidence_mode / draft_profile); do not override the artifact contract.
- Read
outline/subsection_briefs.jsonl
- For each
sub_id, captureaxes+required_evidence_fields(what evidence types this subsection expects).
- Read
outline/evidence_bindings.jsonl
- For each
sub_id, surfacebinding_rationaleandbinding_gaps(what the binder could/could not cover from the evidence bank).
- (Optional) Read
outline/evidence_binding_report.md
- Use it as a human-readable summary; treat it as a view of
outline/evidence_bindings.jsonl, not a separate truth source.
- Read
outline/evidence_drafts.jsonl
- Surface
blocking_missing(STOP signals), and check for missing comparisons / eval protocol / limitations that would force hollow writing.
- (Optional) Read
outline/anchor_sheet.jsonl
- Check whether each subsection has at least a few citation-backed anchors (numbers / evaluation / limitations).
- (Optional) Read
papers/paper_notes.jsonlandpapers/fulltext_index.jsonl
- Use these to route fixes: if evidence is abstract-only and missing eval tokens, prefer enriching notes/fulltext before drafting prose.
What the report contains
- Summary counts: subsections with
blocking_missing, withbinding_gaps, and common failure reasons. - Per-subsection TODO: the smallest upstream fix path (skills + artifacts) to make the subsection writeable.
Status semantics (unblock rules)
This skill is the prewrite router for evidence quality. Treat its Status: line as the unblock contract:
PASS: noblocking_missingand nobinding_gaps-> proceed to C5 writing (but still scan non-blocking writability smells: low comparisons/eval/anchors often predict hollow prose).OK: noblocking_missing, but somebinding_gaps-> you may draft, but expect weaker specificity; prefer fixing gaps first.FAIL: missing inputs OR anyblocking_missing-> do not write filler prose; fix upstream and rerun C3/C4.
Routing matrix (symptom -> root cause -> upstream fix)
Use this as a semantic routing table (not a script checklist). The goal is to fix the earliest broken intermediate artifact.
| Symptom (where you see it) | Likely root cause | Inspect first | Smallest upstream fix chain |
|---|---|---|---|
evidence_drafts.blocking_missing: no usable citation keys | mapped papers lack bibkey / bibkeys not in citations/ref.bib | papers/paper_notes.jsonl (bibkey fields), citations/ref.bib | C3 paper-notes (ensure bibkeys) -> C4 citation-verifier -> rerun evidence-binder -> rerun evidence-draft |
blocking_missing: title-only evidence | retrieval/metadata lacks abstracts (or aggressive filtering) | papers/papers_raw.jsonl abstracts, papers/paper_notes.jsonl evidence_level | C1 literature-engineer (enrich metadata) OR C3 pdf-text-extractor (fulltext) -> rerun paper-notes |
blocking_missing: no evidence snippets extractable | notes are too thin / evidence bank empty for mapped papers | papers/evidence_bank.jsonl (counts), papers/paper_notes.jsonl | C3 paper-notes (richer extraction; prefer fulltext when possible) -> rerun C4 packs |
blocking_missing: no concrete evaluation tokens | notes/bank did not extract benchmarks/metrics/budgets | papers/paper_notes.jsonl (metrics/benchmarks fields), outline/anchor_sheet.jsonl | C3 paper-notes (extract eval anchors) -> rerun anchor-sheet + evidence-draft |
evidence pack comparisons are sparse (signals: comparisons low) | clusters are not contrastable OR mapping coverage too weak | outline/subsection_briefs.jsonl (clusters), outline/mapping.tsv | C2 section-mapper (coverage) OR C3 subsection-briefs (better clusters) -> rerun evidence-draft |
bindings.binding_gaps mentions benchmarks/metrics/protocol | binder cannot find evaluation-tagged evidence for this subsection | outline/evidence_binding_report.md (tag mix), papers/evidence_bank.jsonl tags | C3 paper-notes (tag/evidence extraction) OR C2 expand mapping for that subsection -> rerun evidence-binder |
binding_gaps mentions security/threat model/attacks | mapped set lacks security-focused works or notes lack threat-model detail | outline/mapping.tsv, papers/paper_notes.jsonl | C2 expand mapping (+ C1 queries if needed) OR C3 enrich notes -> rerun binder/packs |
binding report looks mechanically uniform across H3 (same mix, low tag variance) | binder selection too recipe-like OR evidence bank tags too coarse | outline/evidence_binding_report.md (tag mix), evidence bank tags | tighten required_evidence_fields + improve evidence bank tags, then rerun binder; avoid writing around non-specific bindings |
Interface with the writer self-loop (avoid writing around evidence)
- If
writer-selfloopis FAIL due to missing anchors/comparisons and the corresponding writer pack haspack_warnings, stop and run this evidence self-loop: the section is telling you the pack is not writeable. - Prefer fixing evidence gaps once, upstream, rather than patching every H3 with generic filler.
What this skill does NOT do
- It does not edit
papers/*,outline/*, orsections/*. - It does not invent new facts/citations.
- It does not "relax" quality by changing thresholds; it routes you to the earliest artifact to fix.
Script
Quick Start
python .codex/skills/evidence-selfloop/scripts/run.py --workspace workspaces/<ws>
All Options
--workspace <dir>--unit-id <U###>(optional)--inputs <semicolon-separated>(optional override)--outputs <semicolon-separated>(optional override; default writesoutput/EVIDENCE_SELFLOOP_TODO.md)--checkpoint <C#>(optional)
Examples
- Generate an evidence TODO list after C4 packs are generated:
python .codex/skills/evidence-selfloop/scripts/run.py --workspace workspaces/<ws>
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon

