Back to list
hannesill

clinical-research-pitfalls

by hannesill

Give your AI agents clinical intelligence & access to MIMIC-IV, eICU, and more. This project provides them with MCP, Code Execution, and clinical agent skills.

7🍴 3📅 Jan 21, 2026

SKILL.md


name: clinical-research-pitfalls description: Avoid common methodological mistakes in clinical research with MIMIC-IV and eICU databases. Covers immortal time bias, information leakage, selection bias, and other critical pitfalls. license: Apache-2.0 metadata: author: m4-clinical-extraction version: "1.0" database: both category: data-quality source: https://github.com/MIT-LCP/mimic-code validated: true

Clinical Research Pitfalls

This skill documents common methodological mistakes in ICU database research and how to avoid them. These errors can invalidate study conclusions.

When to Use This Skill

  • Designing research studies
  • Reviewing analysis plans
  • Debugging unexpected results
  • Peer review of methods

1. Immortal Time Bias

Definition

Time during which the outcome cannot occur, often because the exposure has not yet been assigned or identified.

Common Mistake

-- WRONG: Patients who "received Drug X during ICU stay"
-- Survival bias: Must survive long enough to receive the drug
SELECT stay_id
FROM mimiciv_derived.antibiotic
WHERE antibiotic LIKE '%vancomycin%';

Correct Approach

-- CORRECT: Define exposure at a fixed time point (e.g., first 24h)
SELECT DISTINCT stay_id
FROM mimiciv_derived.antibiotic ab
INNER JOIN mimiciv_icu.icustays ie ON ab.stay_id = ie.stay_id
WHERE ab.starttime <= DATETIME_ADD(ie.intime, INTERVAL 24 HOUR);

Key Principle

  • Define exposure status at a fixed time point (e.g., ICU admission, 24 hours, 48 hours)
  • Time zero should be the same for exposed and unexposed groups
  • Consider landmark analysis or time-varying covariates

2. Information Leakage (Future Data)

Definition

Using information that would not be available at the time of prediction/decision.

Common Mistake

-- WRONG: Using diagnosis codes for prediction at admission
-- ICD codes are assigned at discharge!
SELECT hadm_id, icd_code
FROM mimiciv_hosp.diagnoses_icd
WHERE icd_code LIKE 'I21%';  -- MI diagnosis

Correct Approach

-- CORRECT: Use chief complaint or admission diagnosis
-- Or clearly acknowledge this is retrospective phenotyping
SELECT hadm_id
FROM mimiciv_hosp.admissions
WHERE LOWER(admission_type) LIKE '%emergency%';

Common Sources of Leakage

  • Diagnosis codes: Assigned at discharge
  • Procedure codes: May be coded after completion
  • Length of stay: Only known at discharge
  • Discharge disposition: Future information
  • Labs ordered later: Not available at admission

3. Selection Bias

Definition

Systematic differences between study groups due to how subjects were selected.

Common Mistakes

Survivor Bias:

-- WRONG: Selecting patients who have 7-day labs
-- Excludes early deaths and early discharges
SELECT stay_id
FROM mimiciv_derived.chemistry
WHERE charttime >= DATETIME_ADD(
    (SELECT intime FROM mimiciv_icu.icustays WHERE stay_id = chemistry.stay_id),
    INTERVAL 7 DAY
);

Data Availability Bias:

-- WRONG: Patients with complete data
-- Complete cases may be systematically different
SELECT *
FROM mimiciv_derived.sofa
WHERE respiration_24hours IS NOT NULL
    AND coagulation_24hours IS NOT NULL
    AND liver_24hours IS NOT NULL
    AND cardiovascular_24hours IS NOT NULL
    AND cns_24hours IS NOT NULL
    AND renal_24hours IS NOT NULL;

Correct Approach

  • Report exclusions explicitly in CONSORT diagram
  • Analyze whether excluded patients differ
  • Consider imputation for missing data
  • Use intention-to-treat principles

4. Confounding by Indication

Definition

Treatment assignment is associated with prognosis, creating spurious treatment effects.

Example

Sicker patients receive more aggressive treatment, making treatment appear harmful:

-- WRONG: Comparing mortality by vasopressor use
-- Vasopressors given to sicker patients
SELECT
    CASE WHEN v.stay_id IS NOT NULL THEN 'Vasopressor' ELSE 'No Vasopressor' END AS treatment,
    AVG(a.hospital_expire_flag) AS mortality
FROM mimiciv_icu.icustays ie
LEFT JOIN mimiciv_derived.vasoactive_agent v ON ie.stay_id = v.stay_id
INNER JOIN mimiciv_hosp.admissions a ON ie.hadm_id = a.hadm_id
GROUP BY 1;
-- This will show higher mortality in vasopressor group (confounding!)

Correct Approaches

  • Propensity score matching/weighting
  • Instrumental variables
  • Regression discontinuity
  • Target trial emulation
  • Clearly state observational limitations

5. Multiple Comparisons

Definition

Testing many hypotheses increases false positive rate.

Common Mistake

  • Testing 20 lab values without adjustment
  • Subgroup analyses without pre-specification
  • Feature selection on full dataset

Correct Approach

  • Pre-specify primary outcome
  • Use Bonferroni or FDR correction
  • Hold out test set for final evaluation
  • Register analysis plan prospectively

Aggregation Window Mismatch

-- WRONG: Mixing 24h and 48h windows
SELECT
    s.sofa_24hours,     -- 24-hour worst
    lab.creatinine_max  -- first_day_lab uses 24h
FROM mimiciv_derived.sofa s
INNER JOIN mimiciv_derived.first_day_lab lab
    ON s.stay_id = lab.stay_id
WHERE s.hr = 48;  -- SOFA at 48h, but lab is day 1!

Temporal Alignment

-- CORRECT: Align time windows
SELECT
    s.sofa_24hours,
    lab.creatinine_max
FROM mimiciv_derived.sofa s
INNER JOIN mimiciv_derived.first_day_lab lab
    ON s.stay_id = lab.stay_id
WHERE s.hr = 24;  -- Both at 24 hours

7. Handling Missing Data

Wrong Approaches

  • Complete case analysis (introduces bias)
  • Single imputation (underestimates variance)
  • Zero imputation for labs (not clinically meaningful)

Better Approaches

  • Multiple imputation
  • Maximum likelihood estimation
  • Sensitivity analyses
  • Pattern-mixture models
  • Report missingness rates

8. Outcome Definition

Ambiguous Mortality

-- Be specific about which mortality
SELECT
    hospital_expire_flag,  -- In-hospital only
    -- vs
    CASE WHEN dod IS NOT NULL
         AND dod <= DATETIME_ADD(dischtime, INTERVAL 30 DAY)
         THEN 1 ELSE 0 END AS mortality_30d
FROM mimiciv_hosp.admissions a
INNER JOIN mimiciv_hosp.patients p ON a.subject_id = p.subject_id;

Time Zero Definition

  • ICU admission? Hospital admission? First abnormal vital?
  • Be explicit and consistent

Checklist for Study Design

  • Time zero clearly defined
  • Exposure determined at fixed time point
  • No future information used as predictors
  • Selection criteria reported with flow diagram
  • Missing data handling specified
  • Confounders identified and addressed
  • Primary outcome pre-specified
  • Multiple comparison correction planned
  • Sensitivity analyses planned
  • External validation considered

References

  • Suissa S. "Immortal time bias in observational studies of drug effects." Pharmacoepidemiology and Drug Safety. 2007.
  • Hernán MA, Robins JM. "Causal Inference: What If." Chapman & Hall/CRC. 2020.
  • Johnson AEW et al. "Machine Learning and Decision Support in Critical Care." IEEE. 2016.

Score

Total Score

75/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon