pdf

Name: pdf
Rating: 85
Author: shareAI-lab

by shareAI-lab

pdfは、業務プロセスの自動化を支援するスキルです。ワークフロー管理と自動化により、生産性の向上と運用効率の改善を実現します。

⭐ 14,920🍴 3,396📅 2026年1月23日

agent claude-code teaching pdf document-processing pymupdf reportlab pandoc

GitHubで見る Manusで実行

SKILL.md

name: pdf description: Process PDF files - extract text, create PDFs, merge documents. Use when user asks to read PDF, create PDF, or work with PDF files.

PDF Processing Skill

You now have expertise in PDF manipulation. Follow these workflows:

Reading PDFs

Option 1: Quick text extraction (preferred)

# Using pdftotext (poppler-utils)
pdftotext input.pdf -  # Output to stdout
pdftotext input.pdf output.txt  # Output to file

# If pdftotext not available, try:
python3 -c "
import fitz  # PyMuPDF
doc = fitz.open('input.pdf')
for page in doc:
    print(page.get_text())
"

Option 2: Page-by-page with metadata

import fitz  # pip install pymupdf

doc = fitz.open("input.pdf")
print(f"Pages: {len(doc)}")
print(f"Metadata: {doc.metadata}")

for i, page in enumerate(doc):
    text = page.get_text()
    print(f"--- Page {i+1} ---")
    print(text)

Creating PDFs

Option 1: From Markdown (recommended)

# Using pandoc
pandoc input.md -o output.pdf

# With custom styling
pandoc input.md -o output.pdf --pdf-engine=xelatex -V geometry:margin=1in

Option 2: Programmatically

from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

c = canvas.Canvas("output.pdf", pagesize=letter)
c.drawString(100, 750, "Hello, PDF!")
c.save()

Option 3: From HTML

# Using wkhtmltopdf
wkhtmltopdf input.html output.pdf

# Or with Python
python3 -c "
import pdfkit
pdfkit.from_file('input.html', 'output.pdf')
"

Merging PDFs

import fitz

result = fitz.open()
for pdf_path in ["file1.pdf", "file2.pdf", "file3.pdf"]:
    doc = fitz.open(pdf_path)
    result.insert_pdf(doc)
result.save("merged.pdf")

Splitting PDFs

import fitz

doc = fitz.open("input.pdf")
for i in range(len(doc)):
    single = fitz.open()
    single.insert_pdf(doc, from_page=i, to_page=i)
    single.save(f"page_{i+1}.pdf")

Key Libraries

Task	Library	Install
Read/Write/Merge	PyMuPDF	`pip install pymupdf`
Create from scratch	ReportLab	`pip install reportlab`
HTML to PDF	pdfkit	`pip install pdfkit` + wkhtmltopdf
Text extraction	pdftotext	`brew install poppler` / `apt install poppler-utils`

Best Practices

Always check if tools are installed before using them
Handle encoding issues - PDFs may contain various character encodings
Large PDFs: Process page by page to avoid memory issues
OCR for scanned PDFs: Use pytesseract if text extraction returns empty

スコア

総合スコア

85/100

リポジトリの品質指標に基づく評価

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

○説明文

100文字以上の説明がある

0/10

✓人気

GitHub Stars 1000以上

+15

○最近の活動

3ヶ月以内に更新がある

0/10

✓フォーク

10回以上フォークされている

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

レビュー

💬

レビュー機能は近日公開予定です

pdf

SKILL.md

name: pdf description: Process PDF files - extract text, create PDFs, merge documents. Use when user asks to read PDF, create PDF, or work with PDF files.

PDF Processing Skill

Reading PDFs

Creating PDFs

Merging PDFs

Splitting PDFs

Key Libraries

Best Practices

スコア

レビュー

orpc-contract-first

component-refactoring

web-design-guidelines

frontend-code-review

frontend-testing

vercel-react-best-practices

pdf

SKILL.md

name: pdf description: Process PDF files - extract text, create PDFs, merge documents. Use when user asks to read PDF, create PDF, or work with PDF files.

PDF Processing Skill

Reading PDFs

Creating PDFs

Merging PDFs

Splitting PDFs

Key Libraries

Best Practices

スコア

レビュー

関連

関連スキル

orpc-contract-first

component-refactoring

web-design-guidelines

frontend-code-review

frontend-testing

vercel-react-best-practices