
clinical-research-pitfalls
by hannesill
Give your AI agents clinical intelligence & access to MIMIC-IV, eICU, and more. This project provides them with MCP, Code Execution, and clinical agent skills.
SKILL.md
name: clinical-research-pitfalls description: Avoid common methodological mistakes in clinical research with MIMIC-IV and eICU databases. Covers immortal time bias, information leakage, selection bias, and other critical pitfalls. license: Apache-2.0 metadata: author: m4-clinical-extraction version: "1.0" database: both category: data-quality source: https://github.com/MIT-LCP/mimic-code validated: true
Clinical Research Pitfalls
This skill documents common methodological mistakes in ICU database research and how to avoid them. These errors can invalidate study conclusions.
When to Use This Skill
- Designing research studies
- Reviewing analysis plans
- Debugging unexpected results
- Peer review of methods
1. Immortal Time Bias
Definition
Time during which the outcome cannot occur, often because the exposure has not yet been assigned or identified.
Common Mistake
-- WRONG: Patients who "received Drug X during ICU stay"
-- Survival bias: Must survive long enough to receive the drug
SELECT stay_id
FROM mimiciv_derived.antibiotic
WHERE antibiotic LIKE '%vancomycin%';
Correct Approach
-- CORRECT: Define exposure at a fixed time point (e.g., first 24h)
SELECT DISTINCT stay_id
FROM mimiciv_derived.antibiotic ab
INNER JOIN mimiciv_icu.icustays ie ON ab.stay_id = ie.stay_id
WHERE ab.starttime <= DATETIME_ADD(ie.intime, INTERVAL 24 HOUR);
Key Principle
- Define exposure status at a fixed time point (e.g., ICU admission, 24 hours, 48 hours)
- Time zero should be the same for exposed and unexposed groups
- Consider landmark analysis or time-varying covariates
2. Information Leakage (Future Data)
Definition
Using information that would not be available at the time of prediction/decision.
Common Mistake
-- WRONG: Using diagnosis codes for prediction at admission
-- ICD codes are assigned at discharge!
SELECT hadm_id, icd_code
FROM mimiciv_hosp.diagnoses_icd
WHERE icd_code LIKE 'I21%'; -- MI diagnosis
Correct Approach
-- CORRECT: Use chief complaint or admission diagnosis
-- Or clearly acknowledge this is retrospective phenotyping
SELECT hadm_id
FROM mimiciv_hosp.admissions
WHERE LOWER(admission_type) LIKE '%emergency%';
Common Sources of Leakage
- Diagnosis codes: Assigned at discharge
- Procedure codes: May be coded after completion
- Length of stay: Only known at discharge
- Discharge disposition: Future information
- Labs ordered later: Not available at admission
3. Selection Bias
Definition
Systematic differences between study groups due to how subjects were selected.
Common Mistakes
Survivor Bias:
-- WRONG: Selecting patients who have 7-day labs
-- Excludes early deaths and early discharges
SELECT stay_id
FROM mimiciv_derived.chemistry
WHERE charttime >= DATETIME_ADD(
(SELECT intime FROM mimiciv_icu.icustays WHERE stay_id = chemistry.stay_id),
INTERVAL 7 DAY
);
Data Availability Bias:
-- WRONG: Patients with complete data
-- Complete cases may be systematically different
SELECT *
FROM mimiciv_derived.sofa
WHERE respiration_24hours IS NOT NULL
AND coagulation_24hours IS NOT NULL
AND liver_24hours IS NOT NULL
AND cardiovascular_24hours IS NOT NULL
AND cns_24hours IS NOT NULL
AND renal_24hours IS NOT NULL;
Correct Approach
- Report exclusions explicitly in CONSORT diagram
- Analyze whether excluded patients differ
- Consider imputation for missing data
- Use intention-to-treat principles
4. Confounding by Indication
Definition
Treatment assignment is associated with prognosis, creating spurious treatment effects.
Example
Sicker patients receive more aggressive treatment, making treatment appear harmful:
-- WRONG: Comparing mortality by vasopressor use
-- Vasopressors given to sicker patients
SELECT
CASE WHEN v.stay_id IS NOT NULL THEN 'Vasopressor' ELSE 'No Vasopressor' END AS treatment,
AVG(a.hospital_expire_flag) AS mortality
FROM mimiciv_icu.icustays ie
LEFT JOIN mimiciv_derived.vasoactive_agent v ON ie.stay_id = v.stay_id
INNER JOIN mimiciv_hosp.admissions a ON ie.hadm_id = a.hadm_id
GROUP BY 1;
-- This will show higher mortality in vasopressor group (confounding!)
Correct Approaches
- Propensity score matching/weighting
- Instrumental variables
- Regression discontinuity
- Target trial emulation
- Clearly state observational limitations
5. Multiple Comparisons
Definition
Testing many hypotheses increases false positive rate.
Common Mistake
- Testing 20 lab values without adjustment
- Subgroup analyses without pre-specification
- Feature selection on full dataset
Correct Approach
- Pre-specify primary outcome
- Use Bonferroni or FDR correction
- Hold out test set for final evaluation
- Register analysis plan prospectively
6. Time-Related Errors
Aggregation Window Mismatch
-- WRONG: Mixing 24h and 48h windows
SELECT
s.sofa_24hours, -- 24-hour worst
lab.creatinine_max -- first_day_lab uses 24h
FROM mimiciv_derived.sofa s
INNER JOIN mimiciv_derived.first_day_lab lab
ON s.stay_id = lab.stay_id
WHERE s.hr = 48; -- SOFA at 48h, but lab is day 1!
Temporal Alignment
-- CORRECT: Align time windows
SELECT
s.sofa_24hours,
lab.creatinine_max
FROM mimiciv_derived.sofa s
INNER JOIN mimiciv_derived.first_day_lab lab
ON s.stay_id = lab.stay_id
WHERE s.hr = 24; -- Both at 24 hours
7. Handling Missing Data
Wrong Approaches
- Complete case analysis (introduces bias)
- Single imputation (underestimates variance)
- Zero imputation for labs (not clinically meaningful)
Better Approaches
- Multiple imputation
- Maximum likelihood estimation
- Sensitivity analyses
- Pattern-mixture models
- Report missingness rates
8. Outcome Definition
Ambiguous Mortality
-- Be specific about which mortality
SELECT
hospital_expire_flag, -- In-hospital only
-- vs
CASE WHEN dod IS NOT NULL
AND dod <= DATETIME_ADD(dischtime, INTERVAL 30 DAY)
THEN 1 ELSE 0 END AS mortality_30d
FROM mimiciv_hosp.admissions a
INNER JOIN mimiciv_hosp.patients p ON a.subject_id = p.subject_id;
Time Zero Definition
- ICU admission? Hospital admission? First abnormal vital?
- Be explicit and consistent
Checklist for Study Design
- Time zero clearly defined
- Exposure determined at fixed time point
- No future information used as predictors
- Selection criteria reported with flow diagram
- Missing data handling specified
- Confounders identified and addressed
- Primary outcome pre-specified
- Multiple comparison correction planned
- Sensitivity analyses planned
- External validation considered
References
- Suissa S. "Immortal time bias in observational studies of drug effects." Pharmacoepidemiology and Drug Safety. 2007.
- Hernán MA, Robins JM. "Causal Inference: What If." Chapman & Hall/CRC. 2020.
- Johnson AEW et al. "Machine Learning and Decision Support in Critical Care." IEEE. 2016.
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon

