← スキル一覧に戻る

phi-detection
by GOATnote-Inc
Open-source medical LLM safety evaluation pipeline with reproducible benchmarks and high-risk clinical failure analysis.
⭐ 3🍴 1📅 2026年1月24日
SKILL.md
name: phi_detection description: > Scan repository for Protected Health Information (PHI) using HIPAA Safe Harbor patterns. Ensures evaluation data remains synthetic-only. version: 1.0.0 author: ScribeGoat2 Team license: MIT safety_level: critical
PHI Detection Skill
Purpose
Ensure no Protected Health Information (PHI) enters the evaluation pipeline. Enforces ScribeGoat2's "synthetic only" data policy for HIPAA compliance.
When to Use
- Before committing new scenario files
- CI/CD pre-merge validation
- Periodic repository audits
- Before sharing evaluation data externally
Triggers
- "scan for PHI"
- "check for protected health information"
- "validate data is synthetic"
- "run PHI detection"
Tools
# Full repository scan (CI mode)
python scripts/detect_phi.py --strict
# Scan specific directory
python scripts/detect_phi.py --path bloom_medical_eval/scenarios/
# Show verbose matches
python scripts/detect_phi.py --verbose
Prerequisites
- Python 3.11+
- No external dependencies (uses stdlib only)
Input Schema
path:
type: path
default: "."
description: Directory or file to scan
strict:
type: boolean
default: false
description: Fail on warnings (provenance metadata)
verbose:
type: boolean
default: false
description: Show all matched patterns
Output Schema
status: enum # pass, fail, warning
phi_detected: boolean
matches:
- file: string
pattern: string
severity: enum # HIGH, MEDIUM, LOW
examples: [string]
count: integer
files_scanned: integer
excluded_directories: [string]
PHI Patterns Detected
| Pattern | Severity | Example |
|---|---|---|
| SSN | HIGH | 123-45-6789 |
| Medical Record Number | HIGH | MRN: 12345678 |
| Full Date of Birth | HIGH | DOB: 01/15/1985 |
| Phone Number | MEDIUM | 555-123-4567 |
| Personal Email | MEDIUM | john.doe@gmail.com |
| Street Address | MEDIUM | 123 Main Street |
| Patient Full Name | HIGH | Patient: John Smith |
Whitelist Patterns
The following patterns are not flagged (legitimate use cases):
- Example domains (
example.com) - Fake phone numbers (
555-xxxx) - Toll-free numbers (
800-xxx-xxxx,888-xxx-xxxx, etc.) - Crisis hotlines (
988) - Medical abbreviations (
PT,ST elevation)
Excluded Directories
These directories contain evaluation artifacts and are excluded:
results*- Evaluation outputsreports/- Generated reportsexperiments/- Experimental data.private/- Private test datascripts/- Source codedocs/- Documentation
Success Criteria
| Metric | Threshold | Severity |
|---|---|---|
phi_detected | false | 🔴 Critical |
high_severity_matches | 0 | 🔴 Critical |
medium_severity_matches | 0 | 🟠 High |
Safety Gates
- gate: no_phi_detected
metric: phi_detected
operator: "=="
threshold: false
action: block_merge
severity: critical
- gate: no_high_severity
metric: high_severity_matches
operator: "=="
threshold: 0
action: block_merge
severity: critical
HIPAA Safe Harbor Compliance
This skill implements detection of HIPAA's 18 Safe Harbor identifiers:
- Names ✅
- Geographic data ✅
- Dates (except year) ✅
- Phone numbers ✅
- Fax numbers ✅
- Email addresses ✅
- Social Security numbers ✅
- Medical record numbers ✅
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers
- Device identifiers
- Web URLs
- IP addresses
- Biometric identifiers
- Full-face photographs
- Any other unique identifying number
Related Skills
bloom_integrity_verification- Verify data integrity after PHI checkcrisis_persistence_eval- Requires PHI-clean scenarios
Documentation
スコア
総合スコア
75/100
リポジトリの品質指標に基づく評価
✓SKILL.md
SKILL.mdファイルが含まれている
+20
✓LICENSE
ライセンスが設定されている
+10
✓説明文
100文字以上の説明がある
+10
○人気
GitHub Stars 100以上
0/15
✓最近の活動
1ヶ月以内に更新
+10
○フォーク
10回以上フォークされている
0/5
✓Issue管理
オープンIssueが50未満
+5
✓言語
プログラミング言語が設定されている
+5
✓タグ
1つ以上のタグが設定されている
+5
レビュー
💬
レビュー機能は近日公開予定です


