fine-tuning-customization

Name: fine-tuning-customization
Rating: 75
Author: yonatangross

by yonatangross

The Complete AI Development Toolkit for Claude Code — 159 skills, 34 agents, 20 commands, 144 hooks. Production-ready patterns for FastAPI, React 19, LangGraph, security, and testing.

⭐ 29🍴 4📅 Jan 23, 2026

agents ai-development claude-code claude-plugin fastapi langgraph llm mcp

View on GitHub Run in Manus

SKILL.md

name: fine-tuning-customization description: LLM fine-tuning with LoRA, QLoRA, DPO alignment, and synthetic data generation. Efficient training, preference learning, data creation. Use when customizing models for specific domains. version: 1.0.0 tags: [fine-tuning, lora, qlora, dpo, synthetic-data, rlhf, 2026] context: fork agent: llm-integrator author: OrchestKit user-invocable: false

Fine-Tuning & Customization

Customize LLMs for specific domains using parameter-efficient fine-tuning and alignment techniques.

Unsloth 2026: 7x longer context RL, FP8 RL on consumer GPUs, rsLoRA support. TRL: OpenEnv integration, vLLM server mode, transformers 5.0.0+ compatible.

Decision Framework: Fine-Tune or Not?

Approach	Try First	When It Works
Prompt Engineering	Always	Simple tasks, clear instructions
RAG	External knowledge needed	Knowledge-intensive tasks
Fine-Tuning	Last resort	Deep specialization, format control

Fine-tune ONLY when:

Prompt engineering tried and insufficient
RAG doesn't capture domain nuances
Specific output format consistently required
Persona/style must be deeply embedded
You have ~1000+ high-quality examples

LoRA vs QLoRA (Unsloth 2026)

Criteria	LoRA	QLoRA
Model fits in VRAM	Use LoRA
Memory constrained		Use QLoRA
Training speed	39% faster
Memory savings		75%+ (dynamic 4-bit quants)
Quality	Baseline	~Same (Unsloth recovered accuracy loss)
70B LLaMA		<48GB VRAM with QLoRA

Quick Reference: LoRA Training

from unsloth import FastLanguageModel
from trl import SFTTrainer

# Load with 4-bit quantization (QLoRA)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Meta-Llama-3.1-8B",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,              # Rank (16-64 typical)
    lora_alpha=32,     # Scaling (2x r)
    lora_dropout=0.05,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",  # Attention
        "gate_proj", "up_proj", "down_proj",      # MLP (QLoRA paper)
    ],
)

# Train
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    max_seq_length=2048,
)
trainer.train()

DPO Alignment

from trl import DPOTrainer, DPOConfig

config = DPOConfig(
    learning_rate=5e-6,  # Lower for alignment
    beta=0.1,            # KL penalty coefficient
    per_device_train_batch_size=4,
    num_train_epochs=1,
)

# Preference dataset: {prompt, chosen, rejected}
trainer = DPOTrainer(
    model=model,
    ref_model=ref_model,  # Frozen reference
    args=config,
    train_dataset=preference_dataset,
    tokenizer=tokenizer,
)
trainer.train()

Synthetic Data Generation

async def generate_synthetic(topic: str, n: int = 100) -> list[dict]:
    """Generate training examples using teacher model."""
    examples = []
    for _ in range(n):
        response = await client.chat.completions.create(
            model="gpt-4o",  # Teacher
            messages=[{
                "role": "system",
                "content": f"Generate a training example about {topic}. "
                          "Include instruction and response."
            }],
            response_format={"type": "json_object"}
        )
        examples.append(json.loads(response.choices[0].message.content))
    return examples

Key Hyperparameters

Parameter	Recommended	Notes
Learning rate	2e-4	LoRA/QLoRA standard
Epochs	1-3	More risks overfitting
LoRA r	16-64	Higher = more capacity
LoRA alpha	2x r	Scaling factor
Batch size	4-8	Per device
Warmup	3%	Ratio of steps

Anti-Patterns (FORBIDDEN)

# NEVER fine-tune without trying alternatives first
model.fine_tune(data)  # Try prompt engineering & RAG first!

# NEVER use low-quality training data
data = scrape_random_web()  # Garbage in, garbage out

# NEVER skip evaluation
trainer.train()
deploy(model)  # Always evaluate before deploy!

# ALWAYS use separate eval set
train, eval = split(data, test_size=0.1)
trainer = SFTTrainer(..., eval_dataset=eval)

Detailed Documentation

Resource	Description
references/lora-qlora.md	Parameter-efficient fine-tuning
references/dpo-alignment.md	Direct Preference Optimization
references/synthetic-data.md	Training data generation
references/when-to-finetune.md	Decision framework

llm-evaluation - Evaluate fine-tuned models
embeddings - When to use embeddings instead
rag-retrieval - When RAG is better than fine-tuning
langfuse-observability - Track training experiments

Capability Details

lora-qlora

Keywords: LoRA, QLoRA, PEFT, parameter efficient, adapter, low-rank Solves:

Fine-tune large models on consumer hardware
Configure LoRA hyperparameters
Choose target modules for adapters

dpo-alignment

Keywords: DPO, RLHF, preference, alignment, human feedback, preference data Solves:

Align models to human preferences
Create preference datasets
Configure DPO training

synthetic-data

Keywords: synthetic data, data generation, teacher model, distillation Solves:

Generate training data with LLMs
Implement teacher-student training
Scale training data quality

when-to-finetune

Keywords: should I fine-tune, fine-tune decision, customize model Solves:

Decide when fine-tuning is appropriate
Evaluate alternatives to fine-tuning
Assess data requirements

Score

Total Score

75/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

✓説明文

100文字以上の説明がある

+10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

○フォーク

10回以上フォークされている

0/5

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

fine-tuning-customization

SKILL.md

Fine-Tuning & Customization

Decision Framework: Fine-Tune or Not?

LoRA vs QLoRA (Unsloth 2026)

Quick Reference: LoRA Training

DPO Alignment

Synthetic Data Generation

Key Hyperparameters

Anti-Patterns (FORBIDDEN)

Detailed Documentation

Capability Details

lora-qlora

dpo-alignment

synthetic-data

when-to-finetune

Score

Reviews

orpc-contract-first

component-refactoring

web-design-guidelines

frontend-code-review

frontend-testing

vercel-react-best-practices

fine-tuning-customization

SKILL.md

Fine-Tuning & Customization

Decision Framework: Fine-Tune or Not?

LoRA vs QLoRA (Unsloth 2026)

Quick Reference: LoRA Training

DPO Alignment

Synthetic Data Generation

Key Hyperparameters

Anti-Patterns (FORBIDDEN)

Detailed Documentation

Related Skills

Capability Details

lora-qlora

dpo-alignment

synthetic-data

when-to-finetune

Score

Reviews

Related

Related Skills

orpc-contract-first

component-refactoring

web-design-guidelines

frontend-code-review

frontend-testing

vercel-react-best-practices