← スキル一覧に戻る
name: agent-survey-corpus
description: |
Download a small corpus of open-access arXiv survey/review PDFs about LLM agents and extract text for style learning.
Trigger: agent survey corpus, ref corpus, download surveys, 学习综述写法, 下载 survey.
Use when: you want to study how real agent surveys structure sections (6–8 H2), size subsections, and write evidence-backed comparisons.
Skip if: you cannot download PDFs (no network) or you don't want local PDF files.
Network: required.
Guardrail: only download arXiv PDFs; store under

agent-survey-corpus
by WILLOSCAR
agent-survey-corpusは、other分野における実用的なスキルです。複雑な課題への対応力を強化し、業務効率と成果の質を改善します。
⭐ 83🍴 10📅 2026年1月24日
SKILL.md
name: agent-survey-corpus
description: |
Download a small corpus of open-access arXiv survey/review PDFs about LLM agents and extract text for style learning.
Trigger: agent survey corpus, ref corpus, download surveys, 学习综述写法, 下载 survey.
Use when: you want to study how real agent surveys structure sections (6–8 H2), size subsections, and write evidence-backed comparisons.
Skip if: you cannot download PDFs (no network) or you don't want local PDF files.
Network: required.
Guardrail: only download arXiv PDFs; store under ref/ and keep large files out of git.
Agent Survey Corpus (arXiv PDFs → text extracts)
Goal: create a small, local reference library so you can learn from real agent surveys when refining:
- C2 outline structure (paper-like sectioning)
- C4 tables/claims organization
- C5 writing style and density
This is intentionally not part of the pipeline; it is an optional, repo-level toolkit.
Inputs
ref/agent-surveys/arxiv_ids.txt
Outputs
ref/agent-surveys/pdfs/ref/agent-surveys/text/ref/agent-surveys/STYLE_REPORT.md(tracked; auto-generated summary)
Workflow
- Edit
ref/agent-surveys/arxiv_ids.txt(one arXiv id per line). - Run the downloader to fetch PDFs and extract the first N pages to text.
- Skim the extracted text under
ref/agent-surveys/text/:- look at section counts (H2), subsection granularity (H3), and how they transition between chapters.
- identify repeated rhetorical patterns you want the pipeline writer to imitate.
Script
Quick Start
python .codex/skills/agent-survey-corpus/scripts/run.py --helppython .codex/skills/agent-survey-corpus/scripts/run.py --workspace . --max-pages 20
All Options
--workspace <dir>(use.to write into repo root)--inputs <semicolon-separated>(default:ref/agent-surveys/arxiv_ids.txt)--max-pages <N>(default: 20)--sleep <seconds>(default: 1.0)--overwrite(re-download + re-extract)
Examples
- Download/extract into repo root
ref/:python .codex/skills/agent-survey-corpus/scripts/run.py --workspace . --max-pages 20
- Download/extract into a specific folder (treated as workspace root):
python .codex/skills/agent-survey-corpus/scripts/run.py --workspace /tmp/surveys --max-pages 30
Troubleshooting
- Download fails / timeout: rerun with a larger
--sleep, or try fewer ids. - Text extract is empty: the PDF may be scanned; try another survey or increase
--max-pages. - Files showing up in git status: PDFs/text are ignored via
.gitignore(ref/**/pdfs/,ref/**/text/).
スコア
総合スコア
70/100
リポジトリの品質指標に基づく評価
✓SKILL.md
SKILL.mdファイルが含まれている
+20
○LICENSE
ライセンスが設定されている
0/10
✓説明文
100文字以上の説明がある
+10
○人気
GitHub Stars 100以上
0/15
✓最近の活動
3ヶ月以内に更新
+5
✓フォーク
10回以上フォークされている
+5
✓Issue管理
オープンIssueが50未満
+5
✓言語
プログラミング言語が設定されている
+5
✓タグ
1つ以上のタグが設定されている
+5
レビュー
💬
レビュー機能は近日公開予定です

