← Back to list

pdf-processing-pro
by Crumbgrabber
In the interest of data sovereignty and avoiding vendor lock in, a template repository with all of our favorite prmopts, skills, agents, etc but scrupulously avoiding aspects that lock you in to one particular vendor. The one exception is patterns, which comes from fabric which contains "patterns" These are similar to both a skill and a sub agent
⭐ 1🍴 1📅 Dec 28, 2025
SKILL.md
name: pdf-processing-pro description: Production-ready PDF processing with forms, tables, OCR, validation, and batch operations. Use when working with complex PDF workflows in production environments, processing large volumes of PDFs, or requiring robust error handling and validation.
PDF Processing Pro
Production-ready PDF processing guidance with comprehensive error handling and support for complex workflows (forms, tables, OCR, batch operations).
Core patterns (tool-agnostic)
- Text extraction: Use robust libraries (e.g., pdfplumber) and validate output per page.
- Form workflows: Detect fields, validate data against schemas, then fill; revalidate outputs.
- Table extraction: Combine multiple extractors and normalize columns; handle merged cells explicitly.
- OCR for scanned PDFs: Preprocess pages (deskew/denoise) before OCR; store page-level confidence and flag low-confidence regions.
- Batch operations: Process PDFs page-by-page to control memory; log per-file successes/failures with timestamps and error details.
Reliability practices
- Wrap PDF operations in try/except; include filename, operation, and stack trace in logs.
- Validate inputs (paths, expected fields, schemas) before processing.
- Use structured logging with timestamps, operation, result, duration.
- Never log sensitive PDF contents; sanitize before sending to external services.
- For large jobs, checkpoint outputs (per-file artifacts) so you can resume after failures.
Troubleshooting tips
- File not found/invalid: Confirm path/permissions; verify file is not encrypted/corrupted.
- OCR quality issues: Increase DPI during rasterization, deskew, and denoise before OCR.
- Table extraction errors: Try alternate parsers/settings; manually define column boundaries when automated detection fails.
- Memory issues: Stream pages instead of loading whole documents; free resources after each file.
Score
Total Score
60/100
Based on repository quality metrics
✓SKILL.md
SKILL.mdファイルが含まれている
+20
○LICENSE
ライセンスが設定されている
0/10
✓説明文
100文字以上の説明がある
+10
○人気
GitHub Stars 100以上
0/15
✓最近の活動
3ヶ月以内に更新
+5
○フォーク
10回以上フォークされている
0/5
✓Issue管理
オープンIssueが50未満
+5
○言語
プログラミング言語が設定されている
0/5
✓タグ
1つ以上のタグが設定されている
+5
Reviews
💬
Reviews coming soon
