
fine-tuning-customization
by yonatangross
The Complete AI Development Toolkit for Claude Code — 159 skills, 34 agents, 20 commands, 144 hooks. Production-ready patterns for FastAPI, React 19, LangGraph, security, and testing.
SKILL.md
name: fine-tuning-customization description: LLM fine-tuning with LoRA, QLoRA, DPO alignment, and synthetic data generation. Efficient training, preference learning, data creation. Use when customizing models for specific domains. version: 1.0.0 tags: [fine-tuning, lora, qlora, dpo, synthetic-data, rlhf, 2026] context: fork agent: llm-integrator author: OrchestKit user-invocable: false
Fine-Tuning & Customization
Customize LLMs for specific domains using parameter-efficient fine-tuning and alignment techniques.
Unsloth 2026: 7x longer context RL, FP8 RL on consumer GPUs, rsLoRA support. TRL: OpenEnv integration, vLLM server mode, transformers 5.0.0+ compatible.
Decision Framework: Fine-Tune or Not?
| Approach | Try First | When It Works |
|---|---|---|
| Prompt Engineering | Always | Simple tasks, clear instructions |
| RAG | External knowledge needed | Knowledge-intensive tasks |
| Fine-Tuning | Last resort | Deep specialization, format control |
Fine-tune ONLY when:
- Prompt engineering tried and insufficient
- RAG doesn't capture domain nuances
- Specific output format consistently required
- Persona/style must be deeply embedded
- You have ~1000+ high-quality examples
LoRA vs QLoRA (Unsloth 2026)
| Criteria | LoRA | QLoRA |
|---|---|---|
| Model fits in VRAM | Use LoRA | |
| Memory constrained | Use QLoRA | |
| Training speed | 39% faster | |
| Memory savings | 75%+ (dynamic 4-bit quants) | |
| Quality | Baseline | ~Same (Unsloth recovered accuracy loss) |
| 70B LLaMA | <48GB VRAM with QLoRA |
Quick Reference: LoRA Training
from unsloth import FastLanguageModel
from trl import SFTTrainer
# Load with 4-bit quantization (QLoRA)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Meta-Llama-3.1-8B",
max_seq_length=2048,
load_in_4bit=True,
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16, # Rank (16-64 typical)
lora_alpha=32, # Scaling (2x r)
lora_dropout=0.05,
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj", # Attention
"gate_proj", "up_proj", "down_proj", # MLP (QLoRA paper)
],
)
# Train
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
max_seq_length=2048,
)
trainer.train()
DPO Alignment
from trl import DPOTrainer, DPOConfig
config = DPOConfig(
learning_rate=5e-6, # Lower for alignment
beta=0.1, # KL penalty coefficient
per_device_train_batch_size=4,
num_train_epochs=1,
)
# Preference dataset: {prompt, chosen, rejected}
trainer = DPOTrainer(
model=model,
ref_model=ref_model, # Frozen reference
args=config,
train_dataset=preference_dataset,
tokenizer=tokenizer,
)
trainer.train()
Synthetic Data Generation
async def generate_synthetic(topic: str, n: int = 100) -> list[dict]:
"""Generate training examples using teacher model."""
examples = []
for _ in range(n):
response = await client.chat.completions.create(
model="gpt-4o", # Teacher
messages=[{
"role": "system",
"content": f"Generate a training example about {topic}. "
"Include instruction and response."
}],
response_format={"type": "json_object"}
)
examples.append(json.loads(response.choices[0].message.content))
return examples
Key Hyperparameters
| Parameter | Recommended | Notes |
|---|---|---|
| Learning rate | 2e-4 | LoRA/QLoRA standard |
| Epochs | 1-3 | More risks overfitting |
| LoRA r | 16-64 | Higher = more capacity |
| LoRA alpha | 2x r | Scaling factor |
| Batch size | 4-8 | Per device |
| Warmup | 3% | Ratio of steps |
Anti-Patterns (FORBIDDEN)
# NEVER fine-tune without trying alternatives first
model.fine_tune(data) # Try prompt engineering & RAG first!
# NEVER use low-quality training data
data = scrape_random_web() # Garbage in, garbage out
# NEVER skip evaluation
trainer.train()
deploy(model) # Always evaluate before deploy!
# ALWAYS use separate eval set
train, eval = split(data, test_size=0.1)
trainer = SFTTrainer(..., eval_dataset=eval)
Detailed Documentation
| Resource | Description |
|---|---|
| references/lora-qlora.md | Parameter-efficient fine-tuning |
| references/dpo-alignment.md | Direct Preference Optimization |
| references/synthetic-data.md | Training data generation |
| references/when-to-finetune.md | Decision framework |
Related Skills
llm-evaluation- Evaluate fine-tuned modelsembeddings- When to use embeddings insteadrag-retrieval- When RAG is better than fine-tuninglangfuse-observability- Track training experiments
Capability Details
lora-qlora
Keywords: LoRA, QLoRA, PEFT, parameter efficient, adapter, low-rank Solves:
- Fine-tune large models on consumer hardware
- Configure LoRA hyperparameters
- Choose target modules for adapters
dpo-alignment
Keywords: DPO, RLHF, preference, alignment, human feedback, preference data Solves:
- Align models to human preferences
- Create preference datasets
- Configure DPO training
synthetic-data
Keywords: synthetic data, data generation, teacher model, distillation Solves:
- Generate training data with LLMs
- Implement teacher-student training
- Scale training data quality
when-to-finetune
Keywords: should I fine-tune, fine-tune decision, customize model Solves:
- Decide when fine-tuning is appropriate
- Evaluate alternatives to fine-tuning
- Assess data requirements
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon
