How do I install mamba-architecture?

Download the SKILL.md file from the GitHub repository and place it in your project. Claude Code automatically detects the skill and activates it for relevant tasks.

Is prior knowledge required to use mamba-architecture?

Basic programming knowledge is sufficient to get started. Since the skill guides procedures and best practices, you can use it effectively without deep domain expertise.

Can I customize mamba-architecture's configuration?

Yes, you can customize settings to match your project requirements. Edit the SKILL.md content to adjust behavior conditions and guidelines.

← Back to list

mamba-architecture

Name: mamba-architecture
Rating: 80
Author: davila7

by davila7

CLI tool for configuring and monitoring Claude Code

⭐ 17,985🍴 1,638📅 Jan 23, 2026

anthropic anthropic-claude claude claude-code mamba state-space-models ssm linear-complexity

View on GitHub Run in Manus

Use Cases

⚡

Work Efficiency

Streamline daily tasks and improve productivity.

📋

Project Management

Assist with task management and project tracking.

👥

Team Collaboration

Improve team communication and collaboration.

FAQ

SKILL.md

name: mamba-architecture description: State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace. version: 1.0.0 author: Orchestra Research license: MIT tags: [Model Architecture, Mamba, State Space Models, SSM, Linear Complexity, Long Context, Efficient Inference, Hardware-Aware, Alternative To Transformers] dependencies: [mamba-ssm, torch, transformers, causal-conv1d]

Mamba - Selective State Space Models

Quick start

Mamba is a state-space model architecture achieving O(n) linear complexity for sequence modeling.

Installation:

# Install causal-conv1d (optional, for efficiency)
pip install causal-conv1d>=1.4.0

# Install Mamba
pip install mamba-ssm
# Or both together
pip install mamba-ssm[causal-conv1d]

Prerequisites: Linux, NVIDIA GPU, PyTorch 1.12+, CUDA 11.6+

Basic usage (Mamba block):

import torch
from mamba_ssm import Mamba

batch, length, dim = 2, 64, 16
x = torch.randn(batch, length, dim).to("cuda")

model = Mamba(
    d_model=dim,      # Model dimension
    d_state=16,       # SSM state dimension
    d_conv=4,         # Conv1d kernel size
    expand=2          # Expansion factor
).to("cuda")

y = model(x)  # O(n) complexity!
assert y.shape == x.shape

Common workflows

Workflow 1: Language model with Mamba-2

Complete LM with generation:

from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
from mamba_ssm.models.config_mamba import MambaConfig
import torch

# Configure Mamba-2 LM
config = MambaConfig(
    d_model=1024,           # Hidden dimension
    n_layer=24,             # Number of layers
    vocab_size=50277,       # Vocabulary size
    ssm_cfg=dict(
        layer="Mamba2",     # Use Mamba-2
        d_state=128,        # Larger state for Mamba-2
        headdim=64,         # Head dimension
        ngroups=1           # Number of groups
    )
)

model = MambaLMHeadModel(config, device="cuda", dtype=torch.float16)

# Generate text
input_ids = torch.randint(0, 1000, (1, 20), device="cuda", dtype=torch.long)
output = model.generate(
    input_ids=input_ids,
    max_length=100,
    temperature=0.7,
    top_p=0.9
)

Workflow 2: Use pretrained Mamba models

Load from HuggingFace:

from transformers import AutoTokenizer
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel

# Load pretrained model
model_name = "state-spaces/mamba-2.8b"
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")  # Use compatible tokenizer
model = MambaLMHeadModel.from_pretrained(model_name, device="cuda", dtype=torch.float16)

# Generate
prompt = "The future of AI is"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
output_ids = model.generate(
    input_ids=input_ids,
    max_length=200,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.2
)
generated_text = tokenizer.decode(output_ids[0])
print(generated_text)

Available models:

state-spaces/mamba-130m
state-spaces/mamba-370m
state-spaces/mamba-790m
state-spaces/mamba-1.4b
state-spaces/mamba-2.8b

Workflow 3: Mamba-1 vs Mamba-2

Mamba-1 (smaller state):

from mamba_ssm import Mamba

model = Mamba(
    d_model=256,
    d_state=16,      # Smaller state dimension
    d_conv=4,
    expand=2
).to("cuda")

Mamba-2 (multi-head, larger state):

from mamba_ssm import Mamba2

model = Mamba2(
    d_model=256,
    d_state=128,     # Larger state dimension
    d_conv=4,
    expand=2,
    headdim=64,      # Head dimension for multi-head
    ngroups=1        # Parallel groups
).to("cuda")

Key differences:

State size: Mamba-1 (d_state=16) vs Mamba-2 (d_state=128)
Architecture: Mamba-2 has multi-head structure
Normalization: Mamba-2 uses RMSNorm
Distributed: Mamba-2 supports tensor parallelism

Workflow 4: Benchmark vs Transformers

Generation speed comparison:

# Benchmark Mamba
python benchmarks/benchmark_generation_mamba_simple.py \
  --model-name "state-spaces/mamba-2.8b" \
  --prompt "The future of machine learning is" \
  --topp 0.9 --temperature 0.7 --repetition-penalty 1.2

# Benchmark Transformer
python benchmarks/benchmark_generation_mamba_simple.py \
  --model-name "EleutherAI/pythia-2.8b" \
  --prompt "The future of machine learning is" \
  --topp 0.9 --temperature 0.7 --repetition-penalty 1.2

Expected results:

Mamba: 5× faster inference
Memory: No KV cache needed
Scaling: Linear with sequence length

When to use vs alternatives

Use Mamba when:

Need long sequences (100K+ tokens)
Want faster inference than Transformers
Memory-constrained (no KV cache)
Building streaming applications
Linear scaling important

Advantages:

O(n) complexity: Linear vs quadratic
5× faster inference: No attention overhead
No KV cache: Lower memory usage
Million-token sequences: Hardware-efficient
Streaming: Constant memory per token

Use alternatives instead:

Transformers: Need best-in-class performance, have compute
RWKV: Want RNN+Transformer hybrid
RetNet: Need retention-based architecture
Hyena: Want convolution-based approach

Common issues

Issue: CUDA out of memory

Reduce batch size or use gradient checkpointing:

model = MambaLMHeadModel(config, device="cuda", dtype=torch.float16)
model.gradient_checkpointing_enable()  # Enable checkpointing

Issue: Slow installation

Install binary wheels (not source):

pip install mamba-ssm --no-build-isolation

Issue: Missing causal-conv1d

Install separately:

pip install causal-conv1d>=1.4.0

Issue: Model not loading from HuggingFace

Use MambaLMHeadModel.from_pretrained (not AutoModel):

from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
model = MambaLMHeadModel.from_pretrained("state-spaces/mamba-2.8b")

Advanced topics

Selective SSM: See references/selective-ssm.md for mathematical formulation, state-space equations, and how selectivity enables O(n) complexity.

Mamba-2 architecture: See references/mamba2-details.md for multi-head structure, tensor parallelism, and distributed training setup.

Performance optimization: See references/performance.md for hardware-aware design, CUDA kernels, and memory efficiency techniques.

Hardware requirements

GPU: NVIDIA with CUDA 11.6+
VRAM:
- 130M model: 2GB
- 370M model: 4GB
- 790M model: 8GB
- 1.4B model: 14GB
- 2.8B model: 28GB (FP16)
Inference: 5× faster than Transformers
Memory: No KV cache (lower than Transformers)

Performance (vs Transformers):

Speed: 5× faster inference
Memory: 50% less (no KV cache)
Scaling: Linear vs quadratic

Resources

Paper (Mamba-1): https://arxiv.org/abs/2312.00752 (Dec 2023)
Paper (Mamba-2): https://arxiv.org/abs/2405.21060 (May 2024)
GitHub: https://github.com/state-spaces/mamba ⭐ 13,000+
Models: https://huggingface.co/state-spaces
Docs: Repository README and wiki

Score

Total Score

80/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

○説明文

100文字以上の説明がある

0/10

✓人気

GitHub Stars 1000以上

+15

✓最近の活動

1ヶ月以内に更新

+10

✓フォーク

10回以上フォークされている

○Issue管理

オープンIssueが50未満

0/5

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

mamba-architecture

Use Cases

Work Efficiency

Project Management

Team Collaboration

FAQ

SKILL.md

Mamba - Selective State Space Models

Quick start

Common workflows

Workflow 1: Language model with Mamba-2

Workflow 2: Use pretrained Mamba models

Workflow 3: Mamba-1 vs Mamba-2

Workflow 4: Benchmark vs Transformers

When to use vs alternatives

Common issues

Advanced topics

Hardware requirements

Resources

Score

Reviews

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis

on-call-handoff-patterns

architecture-decision-records

mamba-architecture

Use Cases

Work Efficiency

Project Management

Team Collaboration

FAQ

SKILL.md

Mamba - Selective State Space Models

Quick start

Common workflows

Workflow 1: Language model with Mamba-2

Workflow 2: Use pretrained Mamba models

Workflow 3: Mamba-1 vs Mamba-2

Workflow 4: Benchmark vs Transformers

When to use vs alternatives

Common issues

Advanced topics

Hardware requirements

Resources

Score

Reviews

Related

Related Skills

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis

on-call-handoff-patterns

architecture-decision-records