Back to list
trpc-group

ocr

by trpc-group

trpc-agent-go is a powerful Go framework for building intelligent agent systems using large language models (LLMs) and tools.

832🍴 80📅 Jan 23, 2026

SKILL.md


name: ocr description: Extract text from images using Tesseract OCR

OCR Image Text Extraction Skill

Extract text from images using Tesseract OCR engine.

Capabilities

  • Extract text from image files (PNG, JPG, JPEG, GIF, BMP, TIFF)
  • Support for 100+ languages
  • Optional image preprocessing for better accuracy
  • Output in plain text or JSON format with confidence scores

Usage

Basic OCR

python3 scripts/ocr.py <image_file> <output_file>

With Options

# Specify language (default: eng)
python3 scripts/ocr.py image.png text.txt --lang eng

# Chinese text
python3 scripts/ocr.py image.png text.txt --lang chi_sim

# Multiple languages
python3 scripts/ocr.py image.png text.txt --lang eng+chi_sim

# With image preprocessing (improves accuracy)
python3 scripts/ocr.py image.png text.txt --preprocess

# JSON output with confidence scores
python3 scripts/ocr.py image.png output.json --format json

Download and OCR from URL

# OCR from remote image
python3 scripts/ocr_url.py <image_url> <output_file>

# With options
python3 scripts/ocr_url.py https://example.com/image.jpg text.txt --lang eng --preprocess

Parameters

  • image_file / image_url (required): Path to local image or image URL
  • output_file (required): Path to output text/JSON file
  • --lang: Language code (e.g., eng, chi_sim, jpn, fra, deu). Default: eng
  • --preprocess: Apply image preprocessing (grayscale, thresholding) for better accuracy
  • --format: Output format (text/json, default: text)

Common Languages

LanguageCode
Englisheng
Chinese (Simplified)chi_sim
Chinese (Traditional)chi_tra
Japanesejpn
Koreankor
Frenchfra
Germandeu
Spanishspa
Russianrus
Arabicara

Supported Image Formats

PNG, JPG, JPEG, GIF, BMP, TIFF, WEBP

Dependencies

  • Python 3.8+
  • pytesseract
  • Pillow (PIL)
  • tesseract-ocr (system package)

Installation

# Python packages
pip install pytesseract Pillow

# Tesseract OCR engine
sudo apt-get install tesseract-ocr  # Ubuntu/Debian
sudo yum install tesseract           # CentOS/RHEL
brew install tesseract               # macOS

Score

Total Score

90/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 500以上

+10
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

+5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon