Back to list
adaptyvbio

protein-qc

by adaptyvbio

Claude Code skills for protein design

70🍴 7📅 Jan 23, 2026

SKILL.md


name: protein-qc description: > Quality control metrics and filtering thresholds for protein design. Use this skill when: (1) Evaluating design quality for binding, expression, or structure, (2) Setting filtering thresholds for pLDDT, ipTM, PAE, (3) Checking sequence liabilities (cysteines, deamidation, polybasic clusters), (4) Creating multi-stage filtering pipelines, (5) Computing PyRosetta interface metrics (dG, SC, dSASA), (6) Checking biophysical properties (instability, GRAVY, pI), (7) Ranking designs with composite scoring.

This skill provides research-backed thresholds from binder design competitions and published benchmarks. license: MIT category: evaluation tags: [qc, filtering, metrics, thresholds]

Protein Design Quality Control

Critical Limitation

Individual metrics have weak predictive power for binding. Research shows:

  • Individual metric ROC AUC: 0.64-0.66 (slightly better than random)
  • Metrics are pre-screening filters, not affinity predictors
  • Composite scoring is essential for meaningful ranking

These thresholds filter out poor designs but do NOT predict binding affinity.

QC Organization

QC is organized by purpose and level:

PurposeWhat it assessesKey metrics
BindingInterface quality, binding geometryipTM, PAE, SC, dG, dSASA
ExpressionManufacturability, solubilityInstability, GRAVY, pI, cysteines
StructuralFold confidence, consistencypLDDT, pTM, scRMSD

Each category has two levels:

  • Metric-level: Calculated values with thresholds (pLDDT > 0.85)
  • Design-level: Pattern/motif detection (odd cysteines, NG sites)

Quick Reference: All Thresholds

CategoryMetricStandardStringentSource
StructuralpLDDT> 0.85> 0.90AF2/Chai/Boltz
pTM> 0.70> 0.80AF2/Chai/Boltz
scRMSD< 2.0 Å< 1.5 ÅDesign vs pred
BindingipTM> 0.50> 0.60AF2/Chai/Boltz
PAE_interaction< 12 Å< 10 ÅAF2/Chai/Boltz
Shape Comp (SC)> 0.50> 0.60PyRosetta
interface_dG< -10< -15PyRosetta
ExpressionInstability< 40< 30BioPython
GRAVY< 0.4< 0.2BioPython
ESM2 PLL> 0.0> 0.2ESM2

Design-Level Checks (Expression)

PatternRiskAction
Odd cysteine countUnpaired disulfidesRedesign
NG/NS/NT motifsDeamidationFlag/avoid
K/R >= 3 consecutiveProteolysisFlag
>= 6 hydrophobic runAggregationRedesign

See: references/binding-qc.md, references/expression-qc.md, references/structural-qc.md


Sequential Filtering Pipeline

import pandas as pd

designs = pd.read_csv('designs.csv')

# Stage 1: Structural confidence
designs = designs[designs['pLDDT'] > 0.85]

# Stage 2: Self-consistency
designs = designs[designs['scRMSD'] < 2.0]

# Stage 3: Binding quality
designs = designs[(designs['ipTM'] > 0.5) & (designs['PAE_interaction'] < 10)]

# Stage 4: Sequence plausibility
designs = designs[designs['esm2_pll_normalized'] > 0.0]

# Stage 5: Expression checks (design-level)
designs = designs[designs['cysteine_count'] % 2 == 0]  # Even cysteines
designs = designs[designs['instability_index'] < 40]

Composite Scoring (Required for Ranking)

Individual metrics alone are too weak. Use composite scoring:

def composite_score(row):
    return (
        0.30 * row['pLDDT'] +
        0.20 * row['ipTM'] +
        0.20 * (1 - row['PAE_interaction'] / 20) +
        0.15 * row['shape_complementarity'] +
        0.15 * row['esm2_pll_normalized']
    )

designs['score'] = designs.apply(composite_score, axis=1)
top_designs = designs.nlargest(100, 'score')

For advanced composite scoring, see references/composite-scoring.md.


Tool-Specific Filtering

BindCraft Filter Levels

LevelUse CaseStringency
DefaultStandard designMost stringent
RelaxedNeed more designsHigher failure rate
PeptideDesigns < 30 AA~5-10x lower success

BoltzGen Filtering

boltzgen run ... \
  --budget 60 \
  --alpha 0.01 \
  --filter_biased true \
  --refolding_rmsd_threshold 2.0 \
  --additional_filters 'ALA_fraction<0.3'
  • alpha=0.0: Quality-only ranking
  • alpha=0.01: Default (slight diversity)
  • alpha=1.0: Diversity-only

Design-Level Severity Scoring

For pattern-based checks, use severity scoring:

Severity LevelScoreAction
LOW0-15Proceed
MODERATE16-35Review flagged issues
HIGH36-60Redesign recommended
CRITICAL61+Redesign required

Experimental Correlation

MetricAUCUse
ipTM~0.64Pre-screening
PAE~0.65Pre-screening
ESM2 PLL~0.72Best single metric
Composite~0.75+Always use

Key insight: Metrics work as filters (eliminating failures) not predictors (ranking successes).


Campaign Health Assessment

Quick assessment of your design campaign:

Pass RateStatusInterpretation
> 15%ExcellentAbove average, proceed
10-15%GoodNormal, proceed
5-10%MarginalBelow average, review issues
< 5%PoorSignificant problems, diagnose

Failure Recovery Trees

Too Few Pass pLDDT Filter (< 5% with pLDDT > 0.85)

Low pLDDT across campaign
├── Check scRMSD distribution
│   ├── High scRMSD (>2.5Å): Backbone issue
│   │   └── Fix: Regenerate backbones with lower noise_scale (0.5-0.8)
│   └── Low scRMSD but low pLDDT: Disordered regions
│       └── Fix: Check design length, simplify topology
├── Try more sequences per backbone
│   └── modal run modal_proteinmpnn.py --num-seq-per-target 32 --sampling-temp 0.1
├── Use SolubleMPNN instead of ProteinMPNN
│   └── Better for expression-optimized sequences
└── Consider different design tool
    └── BindCraft (integrated design) may work better

Too Few Pass ipTM Filter (< 5% with ipTM > 0.5)

Low ipTM across campaign
├── Review hotspot selection
│   ├── Are hotspots surface-exposed? (SASA > 20Ų)
│   ├── Are hotspots conserved? (check MSA)
│   └── Try 3-6 different hotspot combinations
├── Increase binder length (more contact area)
│   └── Try 80-100 AA instead of 60-80 AA
├── Check interface geometry
│   ├── Is target flat? → Try helical binders
│   └── Is target concave? → Try smaller binders
└── Try all-atom design tool
    └── BoltzGen (all-atom, better packing)

High scRMSD (> 50% with scRMSD > 2.0Å)

Sequences don't specify intended structure
├── ProteinMPNN issue
│   ├── Lower temperature: --sampling-temp 0.1
│   ├── Increase sequences: --num-seq-per-target 32
│   └── Check fixed_positions aren't over-constraining
├── Backbone geometry issue
│   ├── Backbones may be unusual/strained
│   ├── Regenerate with lower noise_scale (0.5-0.8)
│   └── Reduce diffuser.T to 30-40
└── Try different sequence design
    └── ColabDesign (AF2 gradient-based) may work better

Everything Passes But No Experimental Hits

In silico metrics don't predict affinity
├── Generate MORE designs (10x current)
│   └── Computational metrics have high false positive rate
├── Increase diversity
│   ├── Higher ProteinMPNN temperature (0.2-0.3)
│   ├── Different backbone topologies
│   └── Different hotspot combinations
├── Try different design approach
│   ├── BindCraft (different algorithm)
│   ├── ColabDesign (AF2 hallucination)
│   └── BoltzGen (all-atom diffusion)
└── Check if target is druggable
    └── Some targets are inherently difficult

Too Many Designs Pass (> 50%)

Suspiciously high pass rate
├── Check if thresholds are too lenient
│   └── Use stringent thresholds: pLDDT > 0.90, ipTM > 0.60
├── Verify prediction quality
│   ├── Are predictions actually running? Check output files
│   └── Are complexes being predicted, not just monomers?
├── Check for data issues
│   ├── Same sequence being predicted multiple times?
│   └── Wrong FASTA format (missing chain separator)?
└── Apply diversity filter
    └── Cluster at 70% identity, take top per cluster

Diagnostic Commands

Quick Campaign Assessment

import pandas as pd

df = pd.read_csv('designs.csv')

# Pass rates at each stage
print(f"Total designs: {len(df)}")
print(f"pLDDT > 0.85: {(df['pLDDT'] > 0.85).mean():.1%}")
print(f"ipTM > 0.50: {(df['ipTM'] > 0.50).mean():.1%}")
print(f"scRMSD < 2.0: {(df['scRMSD'] < 2.0).mean():.1%}")
print(f"All filters: {((df['pLDDT'] > 0.85) & (df['ipTM'] > 0.5) & (df['scRMSD'] < 2.0)).mean():.1%}")

# Identify top issue
if (df['pLDDT'] > 0.85).mean() < 0.1:
    print("ISSUE: Low pLDDT - check backbone or sequence quality")
elif (df['ipTM'] > 0.50).mean() < 0.1:
    print("ISSUE: Low ipTM - check hotspots or interface geometry")
elif (df['scRMSD'] < 2.0).mean() < 0.5:
    print("ISSUE: High scRMSD - sequences don't specify backbone")

Score

Total Score

60/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

0/10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

0/5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon