
entity-resolution
by costiash
Reference implementation of an AI Agent using Claude Agent SDK. Transforms videos into knowledge graphs via MCP tools: transcription, entity extraction, graph building, natural language querying.
SKILL.md
name: entity-resolution description: Identifies and merges duplicate entities in Knowledge Graph projects. Use after extraction to consolidate duplicate entities, or when users ask about potential duplicates. Supports automatic merging for high-confidence matches and user confirmation for medium-confidence candidates.
Entity Resolution Skill
Identifies and merges duplicate entities in Knowledge Graph projects.
When to Use
- Proactively after extraction: "I extracted 15 entities. Let me check for potential duplicates..."
- On user request: "Can you check for duplicates?" or "These seem like the same person"
- When graph seems noisy: Multiple similar-looking nodes that may be the same entity
Workflow
1. Scan for Duplicates
Use find_duplicate_entities with project_id
The tool uses multiple signals to detect duplicates:
- String similarity (Jaro-Winkler on labels)
- Alias overlap (Jaccard similarity)
- Type matching (same entity type bonus)
- Graph context (shared neighbors)
2. Apply Confidence-Based Automation
| Confidence | Action |
|---|---|
| >= 0.9 (HIGH) | Auto-merge with merge_entities_tool. Inform user: "I merged X and Y (95% confident they're the same)" |
| 0.7-0.9 (MEDIUM) | Ask user: "I found potential duplicates: X and Y (82% match). Should I merge them?" |
| < 0.7 (LOW) | Mention if relevant: "X and Y might be related but confidence is low (65%)" |
3. Handle User Decisions
- If user approves:
approve_mergeormerge_entities_tool - If user rejects:
reject_merge - If user wants more info:
compare_entities_semanticfor detailed analysis
Proactive Triggers
After any extract_to_kg operation, automatically:
- Call
find_duplicate_entities - Process HIGH confidence matches silently (auto-merge)
- Report MEDIUM confidence matches to user
- Mention LOW confidence only if user asks
Example Responses
After extraction:
"I extracted 12 entities and 8 relationships. I noticed 'Elon Musk' and 'Musk' appear to be the same person (94% confidence), so I merged them. I also found 'SpaceX' and 'Space X' might be duplicates (78% confidence) - should I merge these too?"
On duplicate scan:
"I scanned for duplicates and found 3 potential matches:
- 'OpenAI' and 'Open AI' (91% - auto-merged)
- 'Sam Altman' and 'Samuel Altman' (85% - awaiting your approval)
- 'Microsoft' and 'MS' (68% - low confidence, skipped)"
When comparing entities:
"Comparing 'Dr. John Smith' and 'J. Smith':
Signal Score Name similarity 72% Same type (Person) Yes Shared connections 3 Overall: 78% match. They share connections to MIT and OpenAI. Would you like to merge them?"
Available Tools
| Tool | Description |
|---|---|
find_duplicate_entities | Scan for duplicates in a project |
merge_entities_tool | Execute a merge directly (for high confidence) |
review_pending_merges | See pending candidates awaiting approval |
approve_merge | Approve a pending candidate |
reject_merge | Reject a pending candidate |
compare_entities_semantic | Deep comparison of two specific entities |
Tool Parameters
find_duplicate_entities
{
"project_id": "abc123",
"min_confidence": 0.7
}
merge_entities_tool
{
"project_id": "abc123",
"survivor_id": "node_to_keep",
"merged_id": "node_to_remove"
}
compare_entities_semantic
{
"project_id": "abc123",
"node_a_id": "first_entity_id",
"node_b_id": "second_entity_id"
}
Merge Behavior
When entities are merged:
- Survivor keeps its primary label
- Merged entity's label becomes an alias of survivor
- All aliases transfer to survivor
- All relationships redirect to survivor
- Properties merge (survivor wins on conflict)
- Source IDs combine for provenance tracking
Error Handling
| Issue | Response |
|---|---|
| No project selected | "Please select a Knowledge Graph project first" |
| Empty graph | "Your graph doesn't have any entities yet. Extract content first" |
| No duplicates found | "No potential duplicates found above the confidence threshold" |
| Entity not found | "Entity 'X' was not found. It may have been merged or deleted" |
Follow-Up Suggestions Format
After presenting duplicate scan results, offer interactive follow-ups:
### Explore Further
- "Merge Sam Altman and Samuel Altman" - Merge these entities
- "Compare Sam Altman and Samuel Altman" - See detailed similarity analysis
- "Reject the Sam Altman merge" - Keep them as separate entities
- "Show me all pending merges" - Review all candidates
Integration with KG Insights
After merging entities, the graph may reveal new insights:
- "After merging, [Entity] is now connected to 5 more entities"
- "The merge resolved an isolated topic - [Entity] now links to the main graph"
- "Consider running
ask_about_graphwithquestion_type: key_entitiesto see updated rankings"
Best Practices
- Be transparent - Always explain what was merged and why
- Preserve information - Merged labels become aliases, nothing is lost
- Ask when uncertain - Only auto-merge above 90% confidence
- Show evidence - Include signal breakdown for user decisions
- Suggest next steps - Offer to scan again or explore the updated graph
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
3ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon
