Back to list
adaptyvbio

boltzgen

by adaptyvbio

Claude Code skills for protein design

70🍴 7📅 Jan 23, 2026

SKILL.md


name: boltzgen description: > All-atom protein design using BoltzGen diffusion model. Use this skill when: (1) Need side-chain aware design from the start, (2) Designing around small molecules or ligands, (3) Want all-atom diffusion (not just backbone), (4) Require precise binding geometries, (5) Using YAML-based configuration.

For backbone-only generation, use rfdiffusion. For sequence-only design, use proteinmpnn. For structure validation, use boltz. license: MIT category: design-tools tags: [structure-design, sequence-design, diffusion, all-atom, binder] proteinbase_slug: boltzgen proteinbase_url: https://proteinbase.com/design-methods/boltzgen biomodals_script: modal_boltzgen.py

BoltzGen All-Atom Design

Prerequisites

RequirementMinimumRecommended
Python3.10+3.11
CUDA12.0+12.1+
GPU VRAM24GB48GB (L40S)
RAM32GB64GB

How to run

First time? See Installation Guide to set up Modal and biomodals.

# Clone biomodals
git clone https://github.com/hgbrian/biomodals && cd biomodals

# Run BoltzGen (requires YAML config file)
modal run modal_boltzgen.py \
  --input-yaml binder_config.yaml \
  --protocol protein-anything \
  --num-designs 50

# With custom GPU
GPU=L40S modal run modal_boltzgen.py \
  --input-yaml binder_config.yaml \
  --protocol protein-anything \
  --num-designs 100

GPU: L40S (48GB) recommended | Timeout: 120min default

Available protocols: protein-anything, peptide-anything, protein-small_molecule, nanobody-anything, antibody-anything

Option 2: Local installation

git clone https://github.com/HannesStark/boltzgen.git
cd boltzgen
pip install -e .

python sample.py config=config.yaml

Option 3: Python API

from boltzgen import BoltzGen

model = BoltzGen.load_pretrained()
designs = model.sample(
    target_pdb="target.pdb",
    num_samples=50,
    binder_length=80
)

GPU: L40S (48GB) | Time: ~30-60s per design

Key parameters (CLI)

ParameterDefaultDescription
--input-yamlrequiredPath to YAML design specification
--protocolprotein-anythingDesign protocol
--num-designs10Number of designs to generate
--stepsallPipeline steps to run (e.g., design inverse_folding)

YAML configuration

BoltzGen uses an entity-based YAML format where you specify designed proteins and target structures as entities.

Important notes:

  • Residue indices use label_seq_id (1-indexed), not author residue numbers
  • File paths are relative to the YAML file location
  • Target files should be in CIF format (PDB also works but CIF preferred)
  • Run boltzgen check config.yaml to verify your specification before running

Basic Binder Config

entities:
  # Designed protein (variable length 80-140 residues)
  - protein:
      id: B
      sequence: 80..140

  # Target from structure file
  - file:
      path: target.cif
      include:
        - chain:
            id: A
      # Specify binding site residues (optional but recommended)
      binding_types:
        - chain:
            id: A
            binding: 45,67,89

Binder with Specific Binding Site

entities:
  - protein:
      id: G
      sequence: 60..100

  - file:
      path: 5cqg.cif
      include:
        - chain:
            id: A
      binding_types:
        - chain:
            id: A
            binding: 343,344,251
      structure_groups: "all"

Peptide Design (Cyclic)

entities:
  - protein:
      id: S
      sequence: 10..14C6C3  # With cysteines for disulfide

  - file:
      path: target.cif
      include:
        - chain:
            id: A

constraints:
  - bond:
      atom1: [S, 11, SG]
      atom2: [S, 18, SG]  # Disulfide bond

Design protocols

ProtocolUse Case
protein-anythingDesign proteins to bind proteins or peptides
peptide-anythingDesign cyclic peptides to bind proteins
protein-small_moleculeDesign proteins to bind small molecules
nanobody-anythingDesign nanobody CDRs
antibody-anythingDesign antibody CDRs

Output format

output/
├── sample_0/
│   ├── design.cif         # All-atom structure (CIF format)
│   ├── metrics.json       # Confidence scores
│   └── sequence.fasta     # Sequence
├── sample_1/
│   └── ...
└── summary.csv

Note: BoltzGen outputs CIF format. Convert to PDB if needed:

from Bio.PDB import MMCIFParser, PDBIO
parser = MMCIFParser()
structure = parser.get_structure("design", "design.cif")
io = PDBIO()
io.set_structure(structure)
io.save("design.pdb")

Sample output

Successful run

$ modal run modal_boltzgen.py --input-yaml binder.yaml --protocol protein-anything --num-designs 10
Running: boltzgen run binder.yaml --output /tmp/out --protocol protein-anything --num_designs 10
[INFO] Loading BoltzGen model...
[INFO] Generating designs...
[INFO] Running inverse folding...
[INFO] Running structure prediction...
[INFO] Filtering and ranking...
[INFO] Pipeline complete

Results saved to: ./out/boltzgen/2501161234/

Output directory structure:

out/boltzgen/2501161234/
├── intermediate_designs/           # Raw diffusion outputs
│   ├── design_0.cif
│   └── design_0.npz
├── intermediate_designs_inverse_folded/
│   ├── refold_cif/                 # Refolded complexes
│   └── aggregate_metrics_analyze.csv
└── final_ranked_designs/
    ├── final_10_designs/           # Top designs
    └── results_overview.pdf        # Summary plots

What good output looks like:

  • Refolding RMSD < 2.0A (design folds as predicted)
  • ipTM > 0.5 (confident interface)
  • All designs complete pipeline without errors

Decision tree

Should I use BoltzGen?
│
├─ What type of design?
│  ├─ All-atom precision needed → BoltzGen ✓
│  ├─ Ligand binding pocket → BoltzGen ✓
│  └─ Standard miniprotein → RFdiffusion (faster)
│
├─ What matters most?
│  ├─ Side-chain packing → BoltzGen ✓
│  ├─ Speed / diversity → RFdiffusion
│  ├─ Highest success rate → BindCraft
│  └─ AF2 optimization → ColabDesign
│
└─ Compute resources?
   ├─ Have L40S/A100 (48GB+) → BoltzGen ✓
   └─ Only A10G (24GB) → Consider RFdiffusion

Typical performance

Campaign SizeTime (L40S)Cost (Modal)Notes
50 designs30-45 min~$8Quick exploration
100 designs1-1.5h~$15Standard campaign
500 designs5-8h~$70Large campaign

Per-design: ~30-60s for typical binder.


Verify

find output -name "*.cif" | wc -l  # Should match num_samples

Troubleshooting

Verify config first: Always run boltzgen check config.yaml before running the full pipeline Slow generation: Use fewer designs for initial testing, then scale up OOM errors: Use A100-80GB or reduce --num-designs Wrong binding site: Residue indices use label_seq_id (1-indexed), check in Molstar viewer

Error interpretation

ErrorCauseFix
RuntimeError: CUDA out of memoryLarge design or long proteinUse A100-80GB or reduce designs
FileNotFoundError: *.cifTarget file not foundFile paths are relative to YAML location
ValueError: invalid chainChain not in targetVerify chain IDs with Molstar or PyMOL
modal: command not foundModal CLI not installedRun pip install modal && modal setup

Next: Validate with boltz or chaiprotein-qc for filtering.

Score

Total Score

60/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

0/10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

0/5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon