Back to list
rustfs

domain-ml

by rustfs

🚀2.3x faster than MinIO for 4KB object payloads. RustFS is an open-source, S3-compatible high-performance object storage system supporting migration and coexistence with other S3-compatible platforms such as MinIO and Ceph.

20,125🍴 866📅 Jan 23, 2026

SKILL.md


name: domain-ml description: "Use when building ML/AI apps in Rust. Keywords: machine learning, ML, AI, tensor, model, inference, neural network, deep learning, training, prediction, ndarray, tch-rs, burn, candle, 机器学习, 人工智能, 模型推理"

Machine Learning Domain

Layer 3: Domain Constraints

Domain Constraints → Design Implications

Domain RuleDesign ConstraintRust Implication
Large dataEfficient memoryZero-copy, streaming
GPU accelerationCUDA/Metal supportcandle, tch-rs
Model portabilityStandard formatsONNX
Batch processingThroughput over latencyBatched inference
Numerical precisionFloat handlingndarray, careful f32/f64
ReproducibilityDeterministicSeeded random, versioning

Critical Constraints

Memory Efficiency

RULE: Avoid copying large tensors
WHY: Memory bandwidth is bottleneck
RUST: References, views, in-place ops

GPU Utilization

RULE: Batch operations for GPU efficiency
WHY: GPU overhead per kernel launch
RUST: Batch sizes, async data loading

Model Portability

RULE: Use standard model formats
WHY: Train in Python, deploy in Rust
RUST: ONNX via tract or candle

Trace Down ↓

From constraints to design (Layer 2):

"Need efficient data pipelines"
    ↓ m10-performance: Streaming, batching
    ↓ polars: Lazy evaluation

"Need GPU inference"
    ↓ m07-concurrency: Async data loading
    ↓ candle/tch-rs: CUDA backend

"Need model loading"
    ↓ m12-lifecycle: Lazy init, caching
    ↓ tract: ONNX runtime

Use Case → Framework

Use CaseRecommendedWhy
Inference onlytract (ONNX)Lightweight, portable
Training + inferencecandle, burnPure Rust, GPU
PyTorch modelstch-rsDirect bindings
Data pipelinespolarsFast, lazy eval

Key Crates

PurposeCrate
Tensorsndarray
ONNX inferencetract
ML frameworkcandle, burn
PyTorch bindingstch-rs
Data processingpolars
Embeddingsfastembed

Design Patterns

PatternPurposeImplementation
Model loadingOnce, reuseOnceLock<Model>
BatchingThroughputCollect then process
StreamingLarge dataIterator-based
GPU asyncParallelismData loading parallel to compute

Code Pattern: Inference Server

use std::sync::OnceLock;
use tract_onnx::prelude::*;

static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new();

fn get_model() -> &'static SimplePlan<...> {
    MODEL.get_or_init(|| {
        tract_onnx::onnx()
            .model_for_path("model.onnx")
            .unwrap()
            .into_optimized()
            .unwrap()
            .into_runnable()
            .unwrap()
    })
}

async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> {
    let model = get_model();
    let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?;
    let result = model.run(tvec!(input.into()))?;
    Ok(result[0].to_array_view::<f32>()?.iter().copied().collect())
}

Code Pattern: Batched Inference

async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> {
    let mut results = Vec::with_capacity(inputs.len());

    for batch in inputs.chunks(batch_size) {
        // Stack inputs into batch tensor
        let batch_tensor = stack_inputs(batch);

        // Run inference on batch
        let batch_output = model.run(batch_tensor).await;

        // Unstack results
        results.extend(unstack_outputs(batch_output));
    }

    results
}

Common Mistakes

MistakeDomain ViolationFix
Clone tensorsMemory wasteUse views
Single inferenceGPU underutilizedBatch processing
Load model per requestSlowSingleton pattern
Sync data loadingGPU idleAsync pipeline

Trace to Layer 1

ConstraintLayer 2 PatternLayer 1 Implementation
Memory efficiencyZero-copyndarray views
Model singletonLazy initOnceLock
Batch processingChunked iterationchunks() + parallel
GPU asyncConcurrent loadingtokio::spawn + GPU

WhenSee
Performancem10-performance
Lazy initializationm12-lifecycle
Async patternsm07-concurrency
Memory efficiencym01-ownership

Score

Total Score

90/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 1000以上

+15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

+5
Issue管理

オープンIssueが50未満

0/5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon