Back to list
adaptyvbio

chai

by adaptyvbio

Claude Code skills for protein design

70🍴 7📅 Jan 23, 2026

SKILL.md


name: chai description: > Structure prediction using Chai-1, a foundation model for molecular structure. Use this skill when: (1) Predicting protein-protein complex structures, (2) Validating designed binders, (3) Predicting protein-ligand complexes, (4) Using the Chai API for high-throughput prediction, (5) Need an alternative to AlphaFold2.

For QC thresholds, use protein-qc. For AlphaFold2 prediction, use alphafold. For ESM-based analysis, use esm. license: MIT category: design-tools tags: [structure-prediction, validation, foundation-model] biomodals_script: modal_chai1.py

Chai-1 Structure Prediction

Prerequisites

RequirementMinimumRecommended
Python3.10+3.11
CUDA12.0+12.1+
GPU VRAM24GB40GB (A100)
RAM32GB64GB

How to run

First time? See Installation Guide to set up Modal and biomodals.

Option 1: Modal

cd biomodals
modal run modal_chai1.py \
  --input-faa complex.fasta \
  --out-dir predictions/

GPU: A100 (40GB) | Timeout: 30min default

pip install chai_lab

python -c "
import chai_lab
from chai_lab.chai1 import run_inference

# Run prediction
run_inference(
    fasta_file='complex.fasta',
    output_dir='predictions/',
    num_trunk_recycles=3
)
"

Option 3: Local installation

git clone https://github.com/chaidiscovery/chai-lab.git
cd chai-lab
pip install -e .

chai-lab predict \
  --fasta complex.fasta \
  --output predictions/

FASTA Format

Protein complex

>binder
MKTAYIAKQRQISFVKSHFSRQLE...
>target
MVLSPADKTNVKAAWGKVGAHAGE...

Protein + ligand

>protein
MKTAYIAKQRQISFVKSHFSRQLE...
>ligand|smiles
CCO

Protein + DNA/RNA

>protein
MKTAYIAKQRQISFVKSHFSRQLE...
>dna
ATCGATCGATCG

Key parameters

ParameterDefaultRangeDescription
num_trunk_recycles31-10Recycles (more = better)
num_diffn_timesteps20050-500Diffusion steps
seed0intRandom seed

Output format

predictions/
├── pred.model_idx_0.cif    # Best model (CIF format)
├── pred.model_idx_1.cif    # Second model
├── scores.json             # Confidence scores
├── pae.npy                 # PAE matrix
└── plddt.npy               # pLDDT values

Note: Chai-1 outputs CIF format. Convert to PDB if needed:

from Bio.PDB import MMCIFParser, PDBIO
parser = MMCIFParser()
structure = parser.get_structure("pred", "pred.model_idx_0.cif")
io = PDBIO()
io.set_structure(structure)
io.save("pred.model_idx_0.pdb")

Extracting metrics

import numpy as np
import json

# Load scores
with open('predictions/scores.json') as f:
    scores = json.load(f)

plddt = np.load('predictions/plddt.npy')
pae = np.load('predictions/pae.npy')

print(f"pLDDT: {plddt.mean():.3f}")
print(f"pTM: {scores['ptm']:.3f}")
print(f"ipTM: {scores.get('iptm', 'N/A')}")

Use cases

Binder validation

# Predict complex with Chai
chai-lab predict --fasta binder_target.fasta --output val/

# Check ipTM > 0.5
scores = json.load(open('val/scores.json'))
if scores['iptm'] > 0.5:
    print("Design passes validation")

Protein-ligand complex

# FASTA with SMILES
fasta = """
>protein
MKTA...
>ligand|smiles
CCO
"""

# Chai handles both protein and small molecules

Batch prediction

# Multiple sequences
for fasta in sequences/*.fasta; do
    chai-lab predict \
        --fasta "$fasta" \
        --output "predictions/$(basename $fasta .fasta)"
done

Comparison with AF2

AspectChai-1AlphaFold2
MSA requiredNoYes
Small moleculesYesNo
DNA/RNAYesLimited
SpeedFasterSlower
AccuracyComparableReference

Sample output

Successful run

$ chai-lab predict --fasta complex.fasta --output predictions/
[INFO] Loading Chai-1 model...
[INFO] Running inference...
[INFO] Saved 5 models to predictions/

predictions/scores.json:
{
  "ptm": 0.82,
  "iptm": 0.71,
  "ranking_score": 0.76
}

What good output looks like:

  • pTM: > 0.7 (confident global structure)
  • ipTM: > 0.5 (confident interface, > 0.7 for high confidence)
  • CIF files with reasonable atom positions

Decision tree

Should I use Chai?
│
├─ What are you predicting?
│  ├─ Protein-protein complex → Chai ✓ or ColabFold
│  ├─ Protein + small molecule → Chai ✓
│  ├─ Protein + DNA/RNA → Chai ✓
│  └─ Single protein only → Use ESMFold (faster)
│
├─ Need MSA?
│  ├─ No / want speed → Chai ✓
│  └─ Yes / want accuracy → ColabFold
│
└─ Priority?
   ├─ Highest accuracy → ColabFold with MSA
   ├─ Speed / no MSA → Chai ✓
   └─ Ligand binding → Chai ✓

Typical performance

Campaign SizeTime (A100)Cost (Modal)Notes
100 complexes30-60 min~$10Standard validation
500 complexes2-4h~$45Large campaign
1000 complexes5-8h~$90Comprehensive

Per-complex: ~20-40s for typical binder-target complex.


Verify

find predictions -name "*.cif" | wc -l  # Should match input count

Troubleshooting

Low pLDDT: Increase num_trunk_recycles Low ipTM: Check chain order, interface region OOM errors: Use A100-80GB or reduce batch Slow prediction: Reduce num_diffn_timesteps

Error interpretation

ErrorCauseFix
RuntimeError: CUDA out of memoryComplex too largeUse A100-80GB or split prediction
KeyError: 'iptm'Single chain predictedEnsure FASTA has multiple chains
ValueError: invalid SMILESMalformed ligandValidate SMILES with RDKit
torch.cuda.OutOfMemoryErrorGPU exhaustedReduce num_diffn_timesteps to 100

Next: protein-qc for filtering and ranking.

Score

Total Score

60/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

0/10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

0/5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon