
youtube-moderation-prototype
by Foundup
Foundups for the 99% eats the Startups of the 1% transforming failed capitalism. The Foundups Era—described in 2009—is finally arriving. Powered by quantum entangled pArtifacts called 0102. We don’t build with pitch decks. We launch with our 0102. Our recursive pArtifact Red Dogs eat the 1%.
SKILL.md
name: youtube_moderation_prototype description: Moderate YouTube live chat by detecting spam, toxic content, and enforcing rate limits. PROTOTYPE for testing pattern fidelity before deploying to native Qwen/Gemma. version: 1.0_prototype author: 0102_test_harness created: 2025-10-20 agents: [qwen, gemma] primary_agent: gemma trigger_keywords: [moderate, spam, toxic, chat, filter, block] pattern_fidelity_threshold: 0.90 test_status: prototype
YouTube Chat Moderation Skill (PROTOTYPE)
Purpose: Validate this skill pattern with 0102 (Claude Code) before deploying to native Qwen/Gemma environment.
Test Plan: 0102 executes 50 benchmark tasks manually, validates ≥90% success rate.
Task
Moderate incoming YouTube live chat messages to maintain positive community environment.
Instructions (For 0102 Validation)
1. CAPS SPAM CHECK
Rule: IF message.length > 20 AND uppercase_ratio > 0.80 → BLOCK
Expected Pattern: caps_check_executed=True
Validation:
- Count characters in message
- Calculate uppercase_ratio = (uppercase_count / total_length)
- If both conditions met → decision="block", reason="caps_spam"
- Log:
{"pattern": "caps_check_executed", "value": true, "decision": "block"}
Examples:
- ✅ "VISIT MY CHANNEL FOR FREE MONEY!!!!!!" → BLOCK (length=38, uppercase=100%)
- ✅ "CHECK OUT MY STREAM NOW!!!!!" → BLOCK (length=29, uppercase=96%)
- ❌ "Great stream!" → ALLOW (length=13, too short)
- ❌ "This is AMAZING content" → ALLOW (uppercase=38%, below threshold)
2. REPETITION CHECK
Rule: IF message appears in recent_history (last 100 messages) AND count >= 3 → BLOCK
Expected Pattern: repetition_check_executed=True
Validation:
- Search last 100 messages for exact match
- Count occurrences of current message
- If count >= 3 → decision="block", reason="repetition"
- Log:
{"pattern": "repetition_check_executed", "value": true, "matches_found": N}
Examples:
- ✅ "Follow me!" (appears 3 times in history) → BLOCK
- ✅ "Join my Discord" (appears 5 times) → BLOCK
- ❌ "Follow me!" (appears 2 times) → ALLOW (below threshold)
- ❌ "First time saying this" (appears 1 time) → ALLOW
3. RATE LIMIT CHECK
Rule: IF user_message_count_in_last_30_seconds > 5 → WARN or BLOCK
Expected Pattern: rate_limit_applied=True
Validation:
- Filter messages by user_id in last 30 seconds
- Count messages
- If count == 5 → decision="warn", action="send_warning"
- If count > 5 → decision="block", reason="rate_limit"
- Log:
{"pattern": "rate_limit_applied", "value": true, "message_count": N}
Examples:
- ✅ User sends 6 messages in 30s → BLOCK
- ✅ User sends 5 messages in 30s → WARN
- ❌ User sends 4 messages in 30s → ALLOW
- ❌ User sends 6 messages in 60s → ALLOW (outside time window)
4. TOXIC CONTENT CHECK
Rule: IF message contains toxic keywords AND confidence > 0.8 → BLOCK
Expected Pattern: toxicity_check_executed=True
Validation:
- Load toxic_keywords list (see resources/toxic_patterns.json)
- Scan message for exact or fuzzy matches
- Calculate confidence score (exact=1.0, fuzzy varies by similarity)
- If confidence > 0.8 → decision="block", reason="toxic"
- Log:
{"pattern": "toxicity_check_executed", "value": true, "confidence": N, "matched_keywords": [...]}
Examples:
- ✅ "You're a [slur]" (exact match, confidence=1.0) → BLOCK
- ✅ "F*** this stream" (obfuscated match, confidence=0.9) → BLOCK
- ❌ "This is frustrating" (mild negative, confidence=0.3) → ALLOW
- ❌ "Nice job!" (positive, confidence=0.0) → ALLOW
5. LEGITIMATE MESSAGE ROUTING
Rule: IF all_checks_passed → ALLOW AND route_to_banter_response_skill
Expected Pattern: allowed_and_routed=True
Validation:
- Verify all previous checks returned "allow"
- decision="allow"
- next_action="route_to_banter_skill"
- Log:
{"pattern": "allowed_and_routed", "value": true, "next_skill": "banter_response"}
Examples:
- ✅ "Great content!" (passes all checks) → ALLOW + route to banter
- ✅ "What game is this?" (passes all checks) → ALLOW + route to banter
- ❌ "SPAM MESSAGE" (fails caps check) → BLOCK (no routing)
Expected Patterns Summary
When validating with 0102, check that each execution logs these patterns:
{
"execution_id": "exec_001",
"message": "Great stream!",
"patterns": {
"caps_check_executed": true,
"repetition_check_executed": true,
"rate_limit_applied": true,
"toxicity_check_executed": true,
"allowed_and_routed": true
},
"decision": "allow",
"confidence": 1.0
}
Pattern Fidelity Calculation:
fidelity = (patterns_executed / total_patterns_in_skill)
fidelity = 5/5 = 1.00 (100%)
Success Criteria for Prototype:
- ✅ Pattern fidelity ≥ 90% across all test cases
- ✅ Outcome quality ≥ 85% (correct classifications)
- ✅ No false negatives on toxic/spam
- ✅ Low false positive rate (<5%)
Benchmark Test Cases
Test Set 1: Spam Messages (20 cases)
- "VISIT MY CHANNEL!!!!!!" → BLOCK (caps)
- "CHECK THIS OUT NOW" → BLOCK (caps)
- "Follow me!" (x3 repetition) → BLOCK (repetition)
- "Join Discord" (x4 repetition) → BLOCK (repetition)
- User sends 6 rapid messages → BLOCK (rate limit) ... (15 more spam cases)
Test Set 2: Toxic Messages (20 cases)
- Message with racial slur → BLOCK (toxic, confidence=1.0)
- Message with profanity → BLOCK (toxic, confidence=0.95)
- "You suck at this game" → BLOCK (toxic, confidence=0.85) ... (17 more toxic cases)
Test Set 3: Legitimate Messages (60 cases)
- "Great stream!" → ALLOW
- "What game is this?" → ALLOW
- "How long have you been streaming?" → ALLOW ... (57 more legitimate cases)
Test Set 4: Edge Cases (10 cases)
- "This is REALLY cool" → ALLOW (caps below threshold)
- "lol" (x2 repetition) → ALLOW (below repetition threshold)
- Message in non-English → CONTEXT-DEPENDENT ... (7 more edge cases)
Total: 110 test cases
0102 Validation Protocol
Step 1: Review skill instructions
- Understand each rule
- Verify examples make sense
- Identify any ambiguities
Step 2: Execute benchmark tasks (manually)
- Test Set 1: Spam (20 cases)
- Test Set 2: Toxic (20 cases)
- Test Set 3: Legitimate (60 cases)
- Test Set 4: Edge (10 cases)
Step 3: Log pattern fidelity
- For each execution, verify all 5 patterns logged
- Calculate per-execution fidelity (patterns_executed / 5)
- Calculate overall fidelity across all 110 cases
Step 4: Calculate outcome quality
- Count correct classifications
- Outcome quality = (correct / total)
Step 5: Validate thresholds
- Pattern fidelity ≥ 90%?
- Outcome quality ≥ 85%?
- Combined score ≥ 88%?
Step 6: Document failures
- If any test fails, document:
- Which instruction was unclear?
- Why did execution deviate?
- How should instruction be improved?
Step 7: Iterate if needed
- Update SKILL.md based on failures
- Re-run failed test cases
- Achieve ≥90% threshold
Step 8: Approve for deployment
- All thresholds met
- Ready to extract to modules/communication/livechat/skills/
- Pattern validated for native Qwen/Gemma execution
Resources
See examples/ directory for:
spam_examples.json- 20 spam test casestoxic_examples.json- 20 toxic test caseslegitimate_examples.json- 60 legitimate test casesedge_cases.json- 10 edge test cases
See resources/ directory for:
toxic_patterns.json- Toxic keyword databasevalidation_template.json- Logging template
Next Steps After Validation
If prototype succeeds (≥90% fidelity):
- Extract to
modules/communication/livechat/skills/youtube_moderation/ - Add
versions/,metrics/,variations/directories - Create baseline:
versions/v1.0_baseline.md(copy of validated SKILL.md) - Implement WRE loader to inject into Qwen/Gemma prompts
- Run same 110 test cases with Gemma
- Enable pattern fidelity scoring
- Compare: 0102 fidelity vs Gemma fidelity
- If Gemma ≥ 0102 → SUCCESS (skill is portable!)
- If Gemma < 0102 → Qwen generates variations, A/B test, evolve
Status: READY FOR 0102 VALIDATION Next Action: Execute 110 benchmark test cases manually with Claude Code
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
3ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon
