agent-survey-corpus

Name: agent-survey-corpus
Rating: 70
Author: WILLOSCAR

by WILLOSCAR

Research pipelines as semantic execution units: each skill declares inputs/outputs, acceptance criteria, and guardrails. Evidence-first methodology prevents hollow writing through structured intermediate artifacts.

⭐ 83🍴 10📅 Jan 24, 2026

claude claude-code codex gpt pipeline research research-paper research-project

View on GitHub Run in Manus

SKILL.md

name: agent-survey-corpus description: | Download a small corpus of open-access arXiv survey/review PDFs about LLM agents and extract text for style learning. Trigger: agent survey corpus, ref corpus, download surveys, 学习综述写法, 下载 survey. Use when: you want to study how real agent surveys structure sections (6–8 H2), size subsections, and write evidence-backed comparisons. Skip if: you cannot download PDFs (no network) or you don't want local PDF files. Network: required. Guardrail: only download arXiv PDFs; store under `ref/` and keep large files out of git.

Agent Survey Corpus (arXiv PDFs → text extracts)

Goal: create a small, local reference library so you can learn from real agent surveys when refining:

C2 outline structure (paper-like sectioning)
C4 tables/claims organization
C5 writing style and density

This is intentionally not part of the pipeline; it is an optional, repo-level toolkit.

Inputs

ref/agent-surveys/arxiv_ids.txt

Outputs

ref/agent-surveys/pdfs/
ref/agent-surveys/text/
ref/agent-surveys/STYLE_REPORT.md (tracked; auto-generated summary)

Workflow

Edit ref/agent-surveys/arxiv_ids.txt (one arXiv id per line).
Run the downloader to fetch PDFs and extract the first N pages to text.
Skim the extracted text under ref/agent-surveys/text/:
- look at section counts (H2), subsection granularity (H3), and how they transition between chapters.
- identify repeated rhetorical patterns you want the pipeline writer to imitate.

Script

Quick Start

python .codex/skills/agent-survey-corpus/scripts/run.py --help
python .codex/skills/agent-survey-corpus/scripts/run.py --workspace . --max-pages 20

All Options

--workspace <dir> (use . to write into repo root)
--inputs <semicolon-separated> (default: ref/agent-surveys/arxiv_ids.txt)
--max-pages <N> (default: 20)
--sleep <seconds> (default: 1.0)
--overwrite (re-download + re-extract)

Examples

Download/extract into repo root ref/:
- python .codex/skills/agent-survey-corpus/scripts/run.py --workspace . --max-pages 20
Download/extract into a specific folder (treated as workspace root):
- python .codex/skills/agent-survey-corpus/scripts/run.py --workspace /tmp/surveys --max-pages 30

Troubleshooting

Download fails / timeout: rerun with a larger --sleep, or try fewer ids.
Text extract is empty: the PDF may be scanned; try another survey or increase --max-pages.
Files showing up in git status: PDFs/text are ignored via .gitignore (ref/**/pdfs/, ref/**/text/).

Score

Total Score

70/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

○LICENSE

ライセンスが設定されている

0/10

✓説明文

100文字以上の説明がある

+10

○人気

GitHub Stars 100以上

0/15

○最近の活動

3ヶ月以内に更新がある

0/10

✓フォーク

10回以上フォークされている

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

agent-survey-corpus

SKILL.md

Agent Survey Corpus (arXiv PDFs → text extracts)

Inputs

Outputs

Workflow

Script

Quick Start

All Options

Examples

Troubleshooting

Score

Reviews

prompt-lookup

skill-lookup

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis

agent-survey-corpus

SKILL.md

Agent Survey Corpus (arXiv PDFs → text extracts)

Inputs

Outputs

Workflow

Script

Quick Start

All Options

Examples

Troubleshooting

Score

Reviews

Related

Related Skills

prompt-lookup

skill-lookup

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis