pdf-processing

Name: pdf-processing
Rating: 80
Author: davila7

by davila7

pdf-processingは、other分野における実用的なスキルです。複雑な課題への対応力を強化し、業務効率と成果の質を改善します。

⭐ 17,985🍴 1,638📅 2026年1月23日

anthropic anthropic-claude claude claude-code pdf text-extraction table-extraction form-filling

GitHubで見る Manusで実行

SKILL.md

name: PDF Processing description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.

PDF Processing

Quick start

Use pdfplumber to extract text from PDFs:

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    text = pdf.pages[0].extract_text()
    print(text)

Extracting tables

Extract tables from PDFs with automatic detection:

import pdfplumber

with pdfplumber.open("report.pdf") as pdf:
    page = pdf.pages[0]
    tables = page.extract_tables()

    for table in tables:
        for row in table:
            print(row)

Extracting all pages

Process multi-page documents efficiently:

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    full_text = ""
    for page in pdf.pages:
        full_text += page.extract_text() + "\n\n"

    print(full_text)

Form filling

For PDF form filling, see FORMS.md for the complete guide including field analysis and validation.

Merging PDFs

Combine multiple PDF files:

from pypdf import PdfMerger

merger = PdfMerger()

for pdf in ["file1.pdf", "file2.pdf", "file3.pdf"]:
    merger.append(pdf)

merger.write("merged.pdf")
merger.close()

Splitting PDFs

Extract specific pages or ranges:

from pypdf import PdfReader, PdfWriter

reader = PdfReader("input.pdf")
writer = PdfWriter()

# Extract pages 2-5
for page_num in range(1, 5):
    writer.add_page(reader.pages[page_num])

with open("output.pdf", "wb") as output:
    writer.write(output)

Available packages

pdfplumber - Text and table extraction (recommended)
pypdf - PDF manipulation, merging, splitting
pdf2image - Convert PDFs to images (requires poppler)
pytesseract - OCR for scanned PDFs (requires tesseract)

Common patterns

Extract and save text:

import pdfplumber

with pdfplumber.open("input.pdf") as pdf:
    text = "\n\n".join(page.extract_text() for page in pdf.pages)

with open("output.txt", "w") as f:
    f.write(text)

Extract tables to CSV:

import pdfplumber
import csv

with pdfplumber.open("tables.pdf") as pdf:
    tables = pdf.pages[0].extract_tables()

    with open("output.csv", "w", newline="") as f:
        writer = csv.writer(f)
        for table in tables:
            writer.writerows(table)

Error handling

Handle common PDF issues:

import pdfplumber

try:
    with pdfplumber.open("document.pdf") as pdf:
        if len(pdf.pages) == 0:
            print("PDF has no pages")
        else:
            text = pdf.pages[0].extract_text()
            if text is None or text.strip() == "":
                print("Page contains no extractable text (might be scanned)")
            else:
                print(text)
except Exception as e:
    print(f"Error processing PDF: {e}")

Performance tips

Process pages in batches for large PDFs
Use multiprocessing for multiple files
Extract only needed pages rather than entire document
Close PDF objects after use

スコア

総合スコア

80/100

リポジトリの品質指標に基づく評価

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

○説明文

100文字以上の説明がある

0/10

✓人気

GitHub Stars 1000以上

+15

○最近の活動

3ヶ月以内に更新がある

0/10

✓フォーク

10回以上フォークされている

○Issue管理

オープンIssueが50未満

0/5

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

レビュー

💬

レビュー機能は近日公開予定です

pdf-processing

SKILL.md

name: PDF Processing description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.

PDF Processing

Quick start

Extracting tables

Extracting all pages

Form filling

Merging PDFs

Splitting PDFs

Available packages

Common patterns

Error handling

Performance tips

スコア

レビュー

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis

on-call-handoff-patterns

architecture-decision-records

pdf-processing

SKILL.md

name: PDF Processing description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.

PDF Processing

Quick start

Extracting tables

Extracting all pages

Form filling

Merging PDFs

Splitting PDFs

Available packages

Common patterns

Error handling

Performance tips

スコア

レビュー

関連

関連スキル

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis

on-call-handoff-patterns

architecture-decision-records