Back to list
harehimself

check-health

by harehimself

ETL system utilizing the DuxSoup API for programmatic LinkedIn extraction. The project is a data extraction pipeline that automatically retrieves extensive LinkedIn profile data from first-degree connections for network analysis and relationship intelligence applications.

1🍴 0📅 Jan 23, 2026

SKILL.md


name: check-health description: Run comprehensive data health checks on the database, reporting on data quality issues, dead letters, duplicates, and anomalies. Use when you need to assess overall data quality, identify issues requiring attention, or perform daily data quality monitoring.

Check Health

Purpose: Quickly assess overall data quality and identify issues requiring attention.

Instructions for Claude

When this skill is invoked:

  1. Create and run a Node.js script that connects to MongoDB and performs these checks:

    Basic Stats:

    • Total people count
    • Total visits count
    • Total scans count
    • Total dead letters count
    • Recent observations (last 24h, last 7d)

    Identity Issues:

    • People without canonical_id
    • People with missing Sales Nav ID
    • People with unstable IDs (URL-based _id)
    • Duplicate aliases (same alias.value on multiple people)

    Data Quality:

    • People without observations
    • People with missing critical fields (fullName, currentTitle, etc.)
    • Orphaned observations (not linked to any person)
    • Observations with missing stable identifiers

    Dead Letters:

    • Count by status (pending, replayed, failed_again)
    • Most common error types
    • Oldest pending dead letter
    • Recent failure rate

    Role & Company Issues:

    • People with overlapping role timelines
    • Companies missing canonical IDs
    • Locations missing structured data
  2. Output format:

    DATABASE HEALTH CHECK
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    📊 BASIC STATS
    ✓ People: [count]
    ✓ Visits: [count]
    ✓ Scans: [count]
    ⚠ Dead Letters: [count]
    
    📈 RECENT ACTIVITY
    • Last 24h: [count] observations
    • Last 7d: [count] observations
    
    🔍 IDENTITY ISSUES [priority: high]
    ⚠ Missing canonical_id: [count]
    ⚠ URL-based IDs: [count]
    ⚠ Duplicate aliases: [count]
    
    📋 DATA QUALITY [priority: medium]
    • Missing names: [count]
    • Missing positions: [count]
    • Orphaned observations: [count]
    
    💀 DEAD LETTERS
    • Pending: [count]
    • Failed again: [count]
    • Common errors:
      - [error type]: [count]
      - [error type]: [count]
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    [SUMMARY]
    Overall health: [GOOD/WARNING/CRITICAL]
    Priority actions: [list top 3 issues to fix]
    
  3. Priority levels:

    • CRITICAL: > 10% people have identity issues
    • WARNING: > 5% people have quality issues or > 100 pending dead letters
    • GOOD: < 5% issues overall
  4. Detailed mode (--detailed flag):

    • Show sample records for each issue type
    • List specific people with problems
    • Provide SQL/MongoDB queries to investigate further

Error Handling

  • If database connection fails, show clear error
  • If any check fails, continue with others and note the failure

Score

Total Score

75/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon