
agentdb-reinforcement-learning-training
by aiskillstore
Security-audited skills for Claude, Codex & Claude Code. One-click install, quality verified.
SKILL.md
skill_id: when-training-rl-agents-use-agentdb-learning name: agentdb-reinforcement-learning-training description: "Train AI agents using AgentDB's 9 reinforcement learning algorithms including Q-Learning, DQN, PPO, and Actor-Critic. Build self-learning agents, implement RL training loops with experience replay, and deploy optimized models to production." version: 1.0.0 category: agentdb subcategory: machine-learning trigger_pattern: "when-training-rl-agents" agents:
- ml-developer
- safla-neural
- performance-benchmarker complexity: advanced estimated_duration: 6-10 hours prerequisites:
- AgentDB basics
- Reinforcement learning fundamentals
- Neural network knowledge
- Python/TypeScript proficiency outputs:
- Trained RL agents
- Learning plugin modules
- Performance benchmarks
- Deployment pipeline validation_criteria:
- Training converges successfully
- Reward curve shows improvement
- Agent passes validation tasks
- Benchmarks meet targets evidence_based_techniques:
- Self-consistency validation
- Program-of-thought decomposition
- Chain-of-verification
- Multi-agent consensus
metadata:
author: claude-flow
created: 2025-10-30
updated: 2025-10-30
tags:
- agentdb
- reinforcement-learning
- neural-networks
- ai-training
- q-learning
AgentDB Reinforcement Learning Training
Overview
Train AI learning plugins with AgentDB's 9 reinforcement learning algorithms including Decision Transformer, Q-Learning, SARSA, Actor-Critic, PPO, and more. Build self-learning agents, implement RL, and optimize agent behavior through experience.
When to Use This Skill
Use this skill when you need to:
- Train autonomous agents that learn from experience
- Implement reinforcement learning systems
- Optimize agent behavior through trial and error
- Build self-improving AI systems
- Deploy RL agents in production environments
- Benchmark and compare RL algorithms
Available RL Algorithms
- Q-Learning - Value-based, off-policy
- SARSA - Value-based, on-policy
- Deep Q-Network (DQN) - Deep RL with experience replay
- Actor-Critic - Policy gradient with value baseline
- Proximal Policy Optimization (PPO) - Trust region policy optimization
- Decision Transformer - Offline RL with transformers
- Advantage Actor-Critic (A2C) - Synchronous advantage estimation
- Twin Delayed DDPG (TD3) - Continuous control
- Soft Actor-Critic (SAC) - Maximum entropy RL
SOP Framework: 5-Phase RL Training Deployment
Phase 1: Initialize Learning Environment (1-2 hours)
Objective: Setup AgentDB learning infrastructure with environment configuration
Agent: ml-developer
Steps:
- Install AgentDB Learning Module
npm install agentdb-learning@latest
npm install @agentdb/rl-algorithms @agentdb/environments
- Initialize learning database
import { AgentDB, LearningPlugin } from 'agentdb-learning';
const learningDB = new AgentDB({
name: 'rl-training-db',
dimensions: 512, // State embedding dimension
learning: {
enabled: true,
persistExperience: true,
replayBufferSize: 100000
}
});
await learningDB.initialize();
// Create learning plugin
const learningPlugin = new LearningPlugin({
database: learningDB,
algorithms: ['q-learning', 'dqn', 'ppo', 'actor-critic'],
config: {
batchSize: 64,
learningRate: 0.001,
discountFactor: 0.99,
explorationRate: 1.0,
explorationDecay: 0.995
}
});
await learningPlugin.initialize();
- Define environment
import { Environment } from '@agentdb/environments';
const environment = new Environment({
name: 'grid-world',
stateSpace: {
type: 'continuous',
shape: [10, 10],
bounds: [[0, 10], [0, 10]]
},
actionSpace: {
type: 'discrete',
actions: ['up', 'down', 'left', 'right']
},
rewardFunction: (state, action, nextState) => {
// Distance to goal reward
const goalDistance = Math.sqrt(
Math.pow(nextState[0] - 9, 2) +
Math.pow(nextState[1] - 9, 2)
);
return -goalDistance + (goalDistance === 0 ? 100 : 0);
},
terminalCondition: (state) => {
return state[0] === 9 && state[1] === 9; // Reached goal
}
});
await environment.initialize();
- Setup monitoring
const monitor = learningPlugin.createMonitor({
metrics: ['reward', 'loss', 'exploration-rate', 'episode-length'],
logInterval: 100, // Log every 100 episodes
saveCheckpoints: true,
checkpointInterval: 1000
});
monitor.on('episode-complete', (episode) => {
console.log('Episode:', episode.number, 'Reward:', episode.totalReward);
});
Memory Pattern:
await agentDB.memory.store('agentdb/learning/environment', {
name: environment.name,
stateSpace: environment.stateSpace,
actionSpace: environment.actionSpace,
initialized: Date.now()
});
Validation:
- Learning database initialized
- Environment configured and tested
- Monitor capturing metrics
- Configuration stored in memory
Phase 2: Configure RL Algorithm (1-2 hours)
Objective: Select and configure RL algorithm for the learning task
Agent: ml-developer
Steps:
- Select algorithm
// Example: Deep Q-Network (DQN)
const dqnAgent = learningPlugin.createAgent({
algorithm: 'dqn',
config: {
networkArchitecture: {
layers: [
{ type: 'dense', units: 128, activation: 'relu' },
{ type: 'dense', units: 128, activation: 'relu' },
{ type: 'dense', units: environment.actionSpace.size, activation: 'linear' }
]
},
learningRate: 0.001,
batchSize: 64,
replayBuffer: {
size: 100000,
prioritized: true,
alpha: 0.6,
beta: 0.4
},
targetNetwork: {
updateFrequency: 1000,
tauSync: 0.001 // Soft update
},
exploration: {
initial: 1.0,
final: 0.01,
decay: 0.995
},
training: {
startAfter: 1000, // Start training after 1000 experiences
updateFrequency: 4
}
}
});
await dqnAgent.initialize();
- Configure hyperparameters
const hyperparameters = {
// Learning parameters
learningRate: 0.001,
discountFactor: 0.99, // Gamma
batchSize: 64,
// Exploration
epsilonStart: 1.0,
epsilonEnd: 0.01,
epsilonDecay: 0.995,
// Experience replay
replayBufferSize: 100000,
minReplaySize: 1000,
prioritizedReplay: true,
// Training
maxEpisodes: 10000,
maxStepsPerEpisode: 1000,
targetUpdateFrequency: 1000,
// Evaluation
evalFrequency: 100,
evalEpisodes: 10
};
dqnAgent.setHyperparameters(hyperparameters);
- Setup experience replay
import { PrioritizedReplayBuffer } from '@agentdb/rl-algorithms';
const replayBuffer = new PrioritizedReplayBuffer({
capacity: 100000,
alpha: 0.6, // Prioritization exponent
beta: 0.4, // Importance sampling
betaIncrement: 0.001,
epsilon: 0.01 // Small constant for stability
});
dqnAgent.setReplayBuffer(replayBuffer);
- Configure training loop
const trainingConfig = {
episodes: 10000,
stepsPerEpisode: 1000,
warmupSteps: 1000,
trainFrequency: 4,
targetUpdateFrequency: 1000,
saveFrequency: 1000,
evalFrequency: 100,
earlyStoppingPatience: 500,
earlyStoppingThreshold: 0.01
};
dqnAgent.setTrainingConfig(trainingConfig);
Memory Pattern:
await agentDB.memory.store('agentdb/learning/algorithm-config', {
algorithm: 'dqn',
hyperparameters: hyperparameters,
trainingConfig: trainingConfig,
configured: Date.now()
});
Validation:
- Algorithm selected and configured
- Hyperparameters validated
- Replay buffer initialized
- Training config set
Phase 3: Train Agents (3-4 hours)
Objective: Execute training iterations and optimize agent behavior
Agent: safla-neural
Steps:
- Start training loop
async function trainAgent() {
console.log('Starting RL training...');
const trainingStats = {
episodes: [],
totalReward: [],
episodeLength: [],
loss: [],
explorationRate: []
};
for (let episode = 0; episode < trainingConfig.episodes; episode++) {
let state = await environment.reset();
let episodeReward = 0;
let episodeLength = 0;
let episodeLoss = 0;
for (let step = 0; step < trainingConfig.stepsPerEpisode; step++) {
// Select action
const action = await dqnAgent.selectAction(state, {
explore: true
});
// Execute action
const { nextState, reward, done } = await environment.step(action);
// Store experience
await dqnAgent.storeExperience({
state,
action,
reward,
nextState,
done
});
// Train if enough experiences
if (dqnAgent.canTrain()) {
const loss = await dqnAgent.train();
episodeLoss += loss;
}
episodeReward += reward;
episodeLength += 1;
state = nextState;
if (done) break;
}
// Update target network
if (episode % trainingConfig.targetUpdateFrequency === 0) {
await dqnAgent.updateTargetNetwork();
}
// Decay exploration
dqnAgent.decayExploration();
// Log progress
trainingStats.episodes.push(episode);
trainingStats.totalReward.push(episodeReward);
trainingStats.episodeLength.push(episodeLength);
trainingStats.loss.push(episodeLoss / episodeLength);
trainingStats.explorationRate.push(dqnAgent.getExplorationRate());
if (episode % 100 === 0) {
console.log(`Episode ${episode}:`, {
reward: episodeReward.toFixed(2),
length: episodeLength,
loss: (episodeLoss / episodeLength).toFixed(4),
epsilon: dqnAgent.getExplorationRate().toFixed(3)
});
}
// Save checkpoint
if (episode % trainingConfig.saveFrequency === 0) {
await dqnAgent.save(`checkpoint-${episode}`);
}
// Evaluate
if (episode % trainingConfig.evalFrequency === 0) {
const evalReward = await evaluateAgent(dqnAgent, environment);
console.log(`Evaluation at episode ${episode}: ${evalReward.toFixed(2)}`);
}
// Early stopping
if (checkEarlyStopping(trainingStats, episode)) {
console.log('Early stopping triggered');
break;
}
}
return trainingStats;
}
const trainingStats = await trainAgent();
- Monitor training progress
monitor.on('training-update', (stats) => {
// Calculate moving averages
const window = 100;
const recentRewards = stats.totalReward.slice(-window);
const avgReward = recentRewards.reduce((a, b) => a + b, 0) / recentRewards.length;
// Store metrics
agentDB.memory.store('agentdb/learning/training-progress', {
episode: stats.episodes[stats.episodes.length - 1],
avgReward: avgReward,
explorationRate: stats.explorationRate[stats.explorationRate.length - 1],
timestamp: Date.now()
});
// Plot learning curve (if visualization enabled)
if (monitor.visualization) {
monitor.plot('reward-curve', stats.episodes, stats.totalReward);
monitor.plot('loss-curve', stats.episodes, stats.loss);
}
});
- Handle convergence
function checkConvergence(stats, windowSize = 100, threshold = 0.01) {
if (stats.totalReward.length < windowSize * 2) {
return false;
}
const recent = stats.totalReward.slice(-windowSize);
const previous = stats.totalReward.slice(-windowSize * 2, -windowSize);
const recentAvg = recent.reduce((a, b) => a + b, 0) / recent.length;
const previousAvg = previous.reduce((a, b) => a + b, 0) / previous.length;
const improvement = (recentAvg - previousAvg) / Math.abs(previousAvg);
return improvement < threshold;
}
- Save trained model
await dqnAgent.save('trained-agent-final', {
includeReplayBuffer: false,
includeOptimizer: false,
metadata: {
trainingStats: trainingStats,
hyperparameters: hyperparameters,
finalReward: trainingStats.totalReward[trainingStats.totalReward.length - 1]
}
});
console.log('Training complete. Model saved.');
Memory Pattern:
await agentDB.memory.store('agentdb/learning/training-results', {
algorithm: 'dqn',
episodes: trainingStats.episodes.length,
finalReward: trainingStats.totalReward[trainingStats.totalReward.length - 1],
converged: checkConvergence(trainingStats),
modelPath: 'trained-agent-final',
timestamp: Date.now()
});
Validation:
- Training completed or converged
- Reward curve shows improvement
- Model saved successfully
- Training stats stored
Phase 4: Validate Performance (1-2 hours)
Objective: Benchmark trained agent and validate performance
Agent: performance-benchmarker
Steps:
- Load trained agent
const trainedAgent = await learningPlugin.loadAgent('trained-agent-final');
- Run evaluation episodes
async function evaluateAgent(agent, env, numEpisodes = 100) {
const results = {
rewards: [],
episodeLengths: [],
successRate: 0
};
for (let i = 0; i < numEpisodes; i++) {
let state = await env.reset();
let episodeReward = 0;
let episodeLength = 0;
let success = false;
for (let step = 0; step < 1000; step++) {
const action = await agent.selectAction(state, { explore: false });
const { nextState, reward, done } = await env.step(action);
episodeReward += reward;
episodeLength += 1;
state = nextState;
if (done) {
success = env.isSuccessful(state);
break;
}
}
results.rewards.push(episodeReward);
results.episodeLengths.push(episodeLength);
if (success) results.successRate += 1;
}
results.successRate /= numEpisodes;
return {
meanReward: results.rewards.reduce((a, b) => a + b, 0) / results.rewards.length,
stdReward: calculateStd(results.rewards),
meanLength: results.episodeLengths.reduce((a, b) => a + b, 0) / results.episodeLengths.length,
successRate: results.successRate,
results: results
};
}
const evalResults = await evaluateAgent(trainedAgent, environment, 100);
console.log('Evaluation results:', evalResults);
- Compare with baseline
// Random policy baseline
const randomAgent = learningPlugin.createAgent({ algorithm: 'random' });
const randomResults = await evaluateAgent(randomAgent, environment, 100);
// Calculate improvement
const improvement = {
rewardImprovement: (evalResults.meanReward - randomResults.meanReward) / Math.abs(randomResults.meanReward),
lengthImprovement: (randomResults.meanLength - evalResults.meanLength) / randomResults.meanLength,
successImprovement: evalResults.successRate - randomResults.successRate
};
console.log('Improvement over random:', improvement);
- Run comprehensive benchmarks
const benchmarks = {
performanceMetrics: {
meanReward: evalResults.meanReward,
stdReward: evalResults.stdReward,
successRate: evalResults.successRate,
meanEpisodeLength: evalResults.meanLength
},
algorithmComparison: {
dqn: evalResults,
random: randomResults,
improvement: improvement
},
inferenceTiming: {
actionSelection: 0,
totalEpisode: 0
}
};
// Measure inference speed
const timingTrials = 1000;
const startTime = performance.now();
for (let i = 0; i < timingTrials; i++) {
const state = await environment.randomState();
await trainedAgent.selectAction(state, { explore: false });
}
const endTime = performance.now();
benchmarks.inferenceTiming.actionSelection = (endTime - startTime) / timingTrials;
await agentDB.memory.store('agentdb/learning/benchmarks', benchmarks);
Memory Pattern:
await agentDB.memory.store('agentdb/learning/validation', {
evaluated: true,
meanReward: evalResults.meanReward,
successRate: evalResults.successRate,
improvement: improvement,
timestamp: Date.now()
});
Validation:
- Evaluation completed (100 episodes)
- Mean reward exceeds threshold
- Success rate acceptable
- Improvement over baseline demonstrated
Phase 5: Deploy Trained Agents (1-2 hours)
Objective: Deploy trained agents to production environment
Agent: ml-developer
Steps:
- Export production model
await trainedAgent.export('production-agent', {
format: 'onnx', // or 'tensorflowjs', 'pytorch'
optimize: true,
quantize: 'int8', // Quantization for faster inference
includeMetadata: true
});
- Create inference API
import express from 'express';
const app = express();
app.use(express.json());
// Load production agent
const productionAgent = await learningPlugin.loadAgent('production-agent');
app.post('/api/predict', async (req, res) => {
try {
const { state } = req.body;
const action = await productionAgent.selectAction(state, {
explore: false,
returnProbabilities: true
});
res.json({
action: action.action,
probabilities: action.probabilities,
confidence: action.confidence
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000, () => {
console.log('RL agent API running on port 3000');
});
- Setup monitoring
import { ProductionMonitor } from '@agentdb/monitoring';
const prodMonitor = new ProductionMonitor({
agent: productionAgent,
metrics: ['inference-latency', 'action-distribution', 'reward-feedback'],
alerting: {
latencyThreshold: 100, // ms
anomalyDetection: true
}
});
await prodMonitor.start();
- Create deployment pipeline
const deploymentPipeline = {
stages: [
{
name: 'validation',
steps: [
'Load trained model',
'Run validation suite',
'Check performance metrics',
'Verify inference speed'
]
},
{
name: 'export',
steps: [
'Export to production format',
'Optimize model',
'Quantize weights',
'Package artifacts'
]
},
{
name: 'deployment',
steps: [
'Deploy to staging',
'Run smoke tests',
'Deploy to production',
'Monitor performance'
]
}
]
};
await agentDB.memory.store('agentdb/learning/deployment-pipeline', deploymentPipeline);
Memory Pattern:
await agentDB.memory.store('agentdb/learning/production', {
deployed: true,
modelPath: 'production-agent',
apiEndpoint: 'http://localhost:3000/api/predict',
monitoring: true,
timestamp: Date.now()
});
Validation:
- Model exported successfully
- API running and responding
- Monitoring active
- Deployment pipeline documented
Integration Scripts
Complete Training Script
#!/bin/bash
# train-rl-agent.sh
set -e
echo "AgentDB RL Training Script"
echo "=========================="
# Phase 1: Initialize
echo "Phase 1: Initializing learning environment..."
npm install agentdb-learning @agentdb/rl-algorithms
# Phase 2: Configure
echo "Phase 2: Configuring algorithm..."
node -e "require('./config-algorithm.js')"
# Phase 3: Train
echo "Phase 3: Training agent..."
node -e "require('./train-agent.js')"
# Phase 4: Validate
echo "Phase 4: Validating performance..."
node -e "require('./evaluate-agent.js')"
# Phase 5: Deploy
echo "Phase 5: Deploying to production..."
node -e "require('./deploy-agent.js')"
echo "Training complete!"
Quick Start Script
// quickstart-rl.ts
import { setupRLTraining } from './setup';
async function quickStart() {
console.log('Starting RL training quick setup...');
// Setup
const { learningDB, environment, agent } = await setupRLTraining({
algorithm: 'dqn',
environment: 'grid-world',
episodes: 1000
});
// Train
console.log('Training agent...');
const stats = await agent.train(environment, {
episodes: 1000,
logInterval: 100
});
// Evaluate
console.log('Evaluating agent...');
const results = await agent.evaluate(environment, {
episodes: 100
});
console.log('Results:', results);
// Save
await agent.save('quickstart-agent');
console.log('Quick start complete!');
}
quickStart().catch(console.error);
Evidence-Based Success Criteria
-
Training Convergence (Self-Consistency)
- Reward curve stabilizes
- Moving average improvement < 1%
- Agent achieves consistent performance
-
Performance Benchmarks (Quantitative)
- Mean reward exceeds baseline by 50%
- Success rate > 80%
- Inference time < 10ms per action
-
Algorithm Validation (Chain-of-Verification)
- Hyperparameters validated
- Exploration-exploitation balanced
- Experience replay functioning
-
Production Readiness (Multi-Agent Consensus)
- Model exported successfully
- API responds within latency threshold
- Monitoring active and alerting
- Deployment pipeline documented
Additional Resources
- AgentDB Learning Documentation: https://agentdb.dev/docs/learning
- RL Algorithms Guide: https://agentdb.dev/docs/rl-algorithms
- Training Best Practices: https://agentdb.dev/docs/training
- Production Deployment: https://agentdb.dev/docs/deployment
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon
