Back to list
GOATnote-Inc

phi-detection

by GOATnote-Inc

Open-source medical LLM safety evaluation pipeline with reproducible benchmarks and high-risk clinical failure analysis.

3🍴 1📅 Jan 24, 2026

SKILL.md


name: phi_detection description: > Scan repository for Protected Health Information (PHI) using HIPAA Safe Harbor patterns. Ensures evaluation data remains synthetic-only. version: 1.0.0 author: ScribeGoat2 Team license: MIT safety_level: critical

PHI Detection Skill

Purpose

Ensure no Protected Health Information (PHI) enters the evaluation pipeline. Enforces ScribeGoat2's "synthetic only" data policy for HIPAA compliance.

When to Use

  • Before committing new scenario files
  • CI/CD pre-merge validation
  • Periodic repository audits
  • Before sharing evaluation data externally

Triggers

  • "scan for PHI"
  • "check for protected health information"
  • "validate data is synthetic"
  • "run PHI detection"

Tools

# Full repository scan (CI mode)
python scripts/detect_phi.py --strict

# Scan specific directory
python scripts/detect_phi.py --path bloom_medical_eval/scenarios/

# Show verbose matches
python scripts/detect_phi.py --verbose

Prerequisites

  • Python 3.11+
  • No external dependencies (uses stdlib only)

Input Schema

path:
  type: path
  default: "."
  description: Directory or file to scan
strict:
  type: boolean
  default: false
  description: Fail on warnings (provenance metadata)
verbose:
  type: boolean
  default: false
  description: Show all matched patterns

Output Schema

status: enum           # pass, fail, warning
phi_detected: boolean
matches:
  - file: string
    pattern: string
    severity: enum     # HIGH, MEDIUM, LOW
    examples: [string]
    count: integer
files_scanned: integer
excluded_directories: [string]

PHI Patterns Detected

PatternSeverityExample
SSNHIGH123-45-6789
Medical Record NumberHIGHMRN: 12345678
Full Date of BirthHIGHDOB: 01/15/1985
Phone NumberMEDIUM555-123-4567
Personal EmailMEDIUMjohn.doe@gmail.com
Street AddressMEDIUM123 Main Street
Patient Full NameHIGHPatient: John Smith

Whitelist Patterns

The following patterns are not flagged (legitimate use cases):

  • Example domains (example.com)
  • Fake phone numbers (555-xxxx)
  • Toll-free numbers (800-xxx-xxxx, 888-xxx-xxxx, etc.)
  • Crisis hotlines (988)
  • Medical abbreviations (PT, ST elevation)

Excluded Directories

These directories contain evaluation artifacts and are excluded:

  • results* - Evaluation outputs
  • reports/ - Generated reports
  • experiments/ - Experimental data
  • .private/ - Private test data
  • scripts/ - Source code
  • docs/ - Documentation

Success Criteria

MetricThresholdSeverity
phi_detectedfalse🔴 Critical
high_severity_matches0🔴 Critical
medium_severity_matches0🟠 High

Safety Gates

- gate: no_phi_detected
  metric: phi_detected
  operator: "=="
  threshold: false
  action: block_merge
  severity: critical

- gate: no_high_severity
  metric: high_severity_matches
  operator: "=="
  threshold: 0
  action: block_merge
  severity: critical

HIPAA Safe Harbor Compliance

This skill implements detection of HIPAA's 18 Safe Harbor identifiers:

  1. Names ✅
  2. Geographic data ✅
  3. Dates (except year) ✅
  4. Phone numbers ✅
  5. Fax numbers ✅
  6. Email addresses ✅
  7. Social Security numbers ✅
  8. Medical record numbers ✅
  9. Health plan beneficiary numbers
  10. Account numbers
  11. Certificate/license numbers
  12. Vehicle identifiers
  13. Device identifiers
  14. Web URLs
  15. IP addresses
  16. Biometric identifiers
  17. Full-face photographs
  18. Any other unique identifying number
  • bloom_integrity_verification - Verify data integrity after PHI check
  • crisis_persistence_eval - Requires PHI-clean scenarios

Documentation

Score

Total Score

75/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon