Back to list
costiash

entity-resolution

by costiash

Reference implementation of an AI Agent using Claude Agent SDK. Transforms videos into knowledge graphs via MCP tools: transcription, entity extraction, graph building, natural language querying.

4🍴 0📅 Dec 30, 2025

SKILL.md


name: entity-resolution description: Identifies and merges duplicate entities in Knowledge Graph projects. Use after extraction to consolidate duplicate entities, or when users ask about potential duplicates. Supports automatic merging for high-confidence matches and user confirmation for medium-confidence candidates.

Entity Resolution Skill

Identifies and merges duplicate entities in Knowledge Graph projects.

When to Use

  • Proactively after extraction: "I extracted 15 entities. Let me check for potential duplicates..."
  • On user request: "Can you check for duplicates?" or "These seem like the same person"
  • When graph seems noisy: Multiple similar-looking nodes that may be the same entity

Workflow

1. Scan for Duplicates

Use find_duplicate_entities with project_id

The tool uses multiple signals to detect duplicates:

  • String similarity (Jaro-Winkler on labels)
  • Alias overlap (Jaccard similarity)
  • Type matching (same entity type bonus)
  • Graph context (shared neighbors)

2. Apply Confidence-Based Automation

ConfidenceAction
>= 0.9 (HIGH)Auto-merge with merge_entities_tool. Inform user: "I merged X and Y (95% confident they're the same)"
0.7-0.9 (MEDIUM)Ask user: "I found potential duplicates: X and Y (82% match). Should I merge them?"
< 0.7 (LOW)Mention if relevant: "X and Y might be related but confidence is low (65%)"

3. Handle User Decisions

  • If user approves: approve_merge or merge_entities_tool
  • If user rejects: reject_merge
  • If user wants more info: compare_entities_semantic for detailed analysis

Proactive Triggers

After any extract_to_kg operation, automatically:

  1. Call find_duplicate_entities
  2. Process HIGH confidence matches silently (auto-merge)
  3. Report MEDIUM confidence matches to user
  4. Mention LOW confidence only if user asks

Example Responses

After extraction:

"I extracted 12 entities and 8 relationships. I noticed 'Elon Musk' and 'Musk' appear to be the same person (94% confidence), so I merged them. I also found 'SpaceX' and 'Space X' might be duplicates (78% confidence) - should I merge these too?"

On duplicate scan:

"I scanned for duplicates and found 3 potential matches:

  1. 'OpenAI' and 'Open AI' (91% - auto-merged)
  2. 'Sam Altman' and 'Samuel Altman' (85% - awaiting your approval)
  3. 'Microsoft' and 'MS' (68% - low confidence, skipped)"

When comparing entities:

"Comparing 'Dr. John Smith' and 'J. Smith':

SignalScore
Name similarity72%
Same type (Person)Yes
Shared connections3

Overall: 78% match. They share connections to MIT and OpenAI. Would you like to merge them?"

Available Tools

ToolDescription
find_duplicate_entitiesScan for duplicates in a project
merge_entities_toolExecute a merge directly (for high confidence)
review_pending_mergesSee pending candidates awaiting approval
approve_mergeApprove a pending candidate
reject_mergeReject a pending candidate
compare_entities_semanticDeep comparison of two specific entities

Tool Parameters

find_duplicate_entities

{
  "project_id": "abc123",
  "min_confidence": 0.7
}

merge_entities_tool

{
  "project_id": "abc123",
  "survivor_id": "node_to_keep",
  "merged_id": "node_to_remove"
}

compare_entities_semantic

{
  "project_id": "abc123",
  "node_a_id": "first_entity_id",
  "node_b_id": "second_entity_id"
}

Merge Behavior

When entities are merged:

  1. Survivor keeps its primary label
  2. Merged entity's label becomes an alias of survivor
  3. All aliases transfer to survivor
  4. All relationships redirect to survivor
  5. Properties merge (survivor wins on conflict)
  6. Source IDs combine for provenance tracking

Error Handling

IssueResponse
No project selected"Please select a Knowledge Graph project first"
Empty graph"Your graph doesn't have any entities yet. Extract content first"
No duplicates found"No potential duplicates found above the confidence threshold"
Entity not found"Entity 'X' was not found. It may have been merged or deleted"

Follow-Up Suggestions Format

After presenting duplicate scan results, offer interactive follow-ups:

### Explore Further

- "Merge Sam Altman and Samuel Altman" - Merge these entities
- "Compare Sam Altman and Samuel Altman" - See detailed similarity analysis
- "Reject the Sam Altman merge" - Keep them as separate entities
- "Show me all pending merges" - Review all candidates

Integration with KG Insights

After merging entities, the graph may reveal new insights:

  • "After merging, [Entity] is now connected to 5 more entities"
  • "The merge resolved an isolated topic - [Entity] now links to the main graph"
  • "Consider running ask_about_graph with question_type: key_entities to see updated rankings"

Best Practices

  1. Be transparent - Always explain what was merged and why
  2. Preserve information - Merged labels become aliases, nothing is lost
  3. Ask when uncertain - Only auto-merge above 90% confidence
  4. Show evidence - Include signal breakdown for user decisions
  5. Suggest next steps - Offer to scan again or explore the updated graph

Score

Total Score

75/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 100以上

0/15
最近の活動

3ヶ月以内に更新

+5
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon