← Back to list

processor
by grc-iit
An automated workflow that conducts AI-powered research, extracts and acquires academic papers, converts them into structured markdown, and creates a searchable vector database for RAG applications.
⭐ 0🍴 0📅 Jan 24, 2026
SKILL.md
name: processor description: Process documents into RAG database. Use when user wants to chunk, embed, or index files into a vector database for semantic search.
Document Processing
This skill helps you process documents, codebases, and papers into a searchable RAG (Retrieval-Augmented Generation) database using LanceDB.
Quick Start
# 1. Check that services are running
uv run processor check
# 2. Process files into database
uv run processor process ./input -o ./lancedb
# 3. Verify results
uv run processor stats ./lancedb
Common Use Cases
Process a codebase
uv run processor process ./my-project -o ./code_db --content-type code
Process papers/documents
uv run processor process ./papers -o ./papers_db
Incremental updates (skip unchanged files)
uv run processor process ./input -o ./lancedb --incremental
High-quality embeddings (slower, better retrieval)
uv run processor process ./input -o ./lancedb --text-profile high --code-profile high
Embedding Profiles
| Type | Profile | Model | Dimensions | Use Case |
|---|---|---|---|---|
| text | low | Qwen3-Embedding-0.6B | 1024 | Fast, good quality |
| text | medium | Qwen3-Embedding-4B | 2560 | Balanced |
| text | high | Qwen3-Embedding-8B | 4096 | Maximum quality |
| code | low | jina-code-0.5b | 896 | Fast code search |
| code | high | jina-code-1.5b | 1536 | Best code search |
Key Options
| Option | Values | Description |
|---|---|---|
--embedder | ollama, transformers | Embedding backend |
--text-profile | low, medium, high | Text embedding quality |
--code-profile | low, high | Code embedding quality |
--table-mode | separate, unified, both | Table organization |
--incremental/--full | - | Skip unchanged files |
--content-type | auto, code, paper, markdown | Force content detection |
MCP Server
Start the processor MCP server for programmatic access:
uv run processor-mcp
Configure in Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"processor": {
"command": "uv",
"args": ["run", "processor-mcp"],
"cwd": "/path/to/processor"
}
}
}
Available MCP Tools
process_documents- Process files into LanceDBcheck_services- Check backend availabilitysetup_models- Download embedding modelsget_db_stats- Database statisticsexport_db- Export database
Troubleshooting
"Model not found" error
uv run processor setup # Download required models
Ollama not running
ollama serve # Start Ollama server
Check available models
uv run processor check
Score
Total Score
65/100
Based on repository quality metrics
✓SKILL.md
SKILL.mdファイルが含まれている
+20
○LICENSE
ライセンスが設定されている
0/10
✓説明文
100文字以上の説明がある
+10
○人気
GitHub Stars 100以上
0/15
✓最近の活動
1ヶ月以内に更新
+10
○フォーク
10回以上フォークされている
0/5
✓Issue管理
オープンIssueが50未満
+5
✓言語
プログラミング言語が設定されている
+5
✓タグ
1つ以上のタグが設定されている
+5
Reviews
💬
Reviews coming soon
