unsloth-kernel-optimization

Name: unsloth-kernel-optimization
Rating: 65
Author: tzervas

by tzervas

Memory-optimized GPU kernels for LLM fine-tuning in Rust (2-5x speedup, 70-80% less VRAM)

⭐ 0🍴 0📅 2026年1月25日

cuda gpu machine-learning optimization rust

GitHubで見る Manusで実行

SKILL.md

name: unsloth-kernel-optimization description: Optimize and implement GPU kernels using CubeCL for memory-efficient LLM training

Kernel Optimization Skill

When to Use

Invoke when the user asks to:

Implement new GPU kernels with CubeCL
Optimize existing kernel performance
Profile and benchmark kernel execution
Reduce VRAM usage in training operations
Add CPU fallback implementations

Performance Targets

2-5x speedup vs naive implementation
70-80% VRAM reduction vs baseline
50% GPU occupancy

Kernel Implementation Workflow

1. Design Phase

Document mathematical operation
Calculate memory access pattern
Choose between compute-bound vs memory-bound strategy

2. CPU Reference

fn operation_cpu(input: &Tensor) -> Result<Tensor> {
    // Correct implementation for validation
}

3. CubeCL Kernel

#[cube(launch)]
fn operation_kernel<F: Float>(
    input: &Tensor<F>,
    output: &mut Tensor<F>,
) {
    let idx = ABSOLUTE_POS;
    // GPU implementation
}

4. Dispatch

pub fn operation(input: &Tensor) -> Result<Tensor> {
    match input.device() {
        Device::Cuda(_) => operation_cuda(input),
        _ => operation_cpu(input),
    }
}

5. Benchmarking

cargo bench -p unsloth-rs -- kernel_name

Memory Optimization Techniques

Fused Operations - Combine sequential ops
Tiled Algorithms - Use shared memory
Streaming - Process in chunks
Mixed Precision - f16/bf16 where possible

Key Files

src/kernels/ - Kernel implementations
benches/kernels.rs - Performance benchmarks
Global CUDA Instructions

スコア

総合スコア

65/100

リポジトリの品質指標に基づく評価

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

○説明文

100文字以上の説明がある

0/10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

○フォーク

10回以上フォークされている

0/5

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

レビュー

💬

レビュー機能は近日公開予定です

unsloth-kernel-optimization

SKILL.md

name: unsloth-kernel-optimization description: Optimize and implement GPU kernels using CubeCL for memory-efficient LLM training

Kernel Optimization Skill

When to Use

Performance Targets

Kernel Implementation Workflow

1. Design Phase

2. CPU Reference

3. CubeCL Kernel

4. Dispatch

5. Benchmarking

Memory Optimization Techniques

Key Files

スコア

レビュー

add-uint-support

skill-writer

docstring

at-dispatch-v2

skill-lookup

prompt-lookup

unsloth-kernel-optimization

SKILL.md

name: unsloth-kernel-optimization description: Optimize and implement GPU kernels using CubeCL for memory-efficient LLM training

Kernel Optimization Skill

When to Use

Performance Targets

Kernel Implementation Workflow

1. Design Phase

2. CPU Reference

3. CubeCL Kernel

4. Dispatch

5. Benchmarking

Memory Optimization Techniques

Key Files

スコア

レビュー

関連

関連スキル

add-uint-support

skill-writer

docstring

at-dispatch-v2

skill-lookup

prompt-lookup