
image-generator
by mjunaidca
Book + RAG ChatAgent + SSO + Compunding RI Intelligence built in 2 days. Robolearn is what AI-driven development with Spec-Driven methodology and Reusable Intelligence makes possible.
SKILL.md
name: image-generator description: This skill should be used when generating images for educational content. It provides multi-turn reasoning partnership methodology with Gemini for professional quality visual generation. version: "1.0.0"
Image Generator Skill
Context & Problem
Visual generation converges toward accepting first output ("looks good enough") and following technical specifications rigidly. This produces generic aesthetics and misses Gemini 3's reasoning capabilities.
This skill provides multi-turn reasoning partnership methodology with professional quality standards.
Core Principles
- Reasoning mode over prediction mode - Creative briefs (Story/Intent/Metaphor) activate reasoning; technical specs don't
- Multi-turn partnership - Teach Gemini your standards through principle-based feedback
- Professional quality gates - Explicit pass/fail criteria (99% spelling, not "check spelling")
- Autonomous agency - Batch mode without permission-asking between visuals
Dimensional Guidance
Input: Professional Creative Briefs (NOT Technical Specs)
Receive from visual-asset-workflow v5.0:
## The Story
[Narrative about what's visualized]
## Emotional Intent
[What it should FEEL like]
## Visual Metaphor
[Universal concept for instant comprehension]
## Subject / Composition / Action / Location / Style / Camera / Lighting
[Official Gemini 3 prompt structure]
## Color Semantics
Blue (#2563eb) = Authority (teaches governance)
Green (#10b981) = Execution (teaches work)
## Typography Hierarchy
Largest: Key insight (information importance drives sizing)
Medium: Supporting components
Smallest: Context
## Pedagogical Reasoning
[Why these choices serve teaching]
DO NOT convert briefs back to pixel specs - Use AS-IS to activate reasoning.
Principle: Creative briefs activate Gemini's reasoning about HOW to achieve intent visually
Workflow: Browser-Based Generation (Playwright MCP)
CRITICAL: Use gemini.google.com ONLY (NOT Google AI Studio, NOT other image generators)
Initialize once:
- Navigate to https://gemini.google.com/ (Playwright MCP)
- User signs in if needed (session persists)
- Select "🍌 Create Image" tool (Nano Banana Pro)
For EACH visual:
- Type creative brief directly into Gemini chat textbox (use condensed format in batch mode - see Token Conservation below)
- Press Enter to submit
- Wait 30-35 seconds for generation
- Click "Download full size image" button when ready
- Wait 3-5 seconds for download completion
- Verify quality IMMEDIATELY (6 gates below)
- If gates fail: Continue in same chat with principle-based feedback (max 3 iterations)
- If gates pass: Copy from
.playwright-mcp/Gemini-Generated-Image-*.pngtorobolearn-interface/static/img/part-{N}/chapter-{NN}/{filename}.png - Embed in lesson (Step 8.5 below)
- Start NEW CHAT for next visual (prevents context contamination)
Principle: New chat per visual prevents cross-contamination; immediate verification catches issues early; immediate embedding prevents orphans
Quality Gates: 5-Gate Professional Standard
ALL must pass before download:
Gate 1: Spelling Accuracy (99% standard)
- ✅ Company names correct (Y-Combinator not Y Combinator)
- ✅ Technical terms correct (Kubernetes not Kubernete)
- ✅ Zero typos in visible text
- ❌ FAIL if even ONE spelling error → Iterate
Gate 2: Layout Precision
- ✅ Proportions match prompt (2×2 grid, not 3×1)
- ✅ Alignment clean (no misaligned elements)
- ✅ Spacing consistent (equal gaps)
- ✅ Hierarchy clear (largest = most important)
- ❌ FAIL if layout deviates → Iterate
Gate 3: Color Accuracy
- ✅ Brand colors match spec (#001f3f not #002050)
- ✅ Semantic colors correct (blue=authority, green=execution)
- ✅ Contrast meets WCAG 4.5:1 (accessibility)
- ❌ FAIL if colors significantly off → Iterate
Gate 4: Typography Hierarchy
- ✅ Largest text = key concept (not decoration)
- ✅ Font sizes create clear hierarchy
- ✅ Readability: A2 min 24px, B1 min 18px, C2 min 14px
- ❌ FAIL if typography doesn't teach through sizing → Iterate
Gate 5: Teaching Effectiveness (<5 sec concept grasp)
- ✅ Target proficiency can grasp concept in <5 seconds
- ✅ Visual adds clarity (not just decoration)
- ✅ Cognitive load reduced vs reading text
- ❌ FAIL if confusing or generic → Iterate
Gate 6: Uniqueness Validation (NEW)
- ✅ Visual comparison: Does NOT match any existing image in same chapter
- ✅ Prompt alignment: Matches creative brief's intent (graph ≠ timeline)
- ✅ Filename verification:
{filename}.prompt.mdexists and visual matches it - ❌ FAIL if duplicate detected → Regenerate with NEW CHAT
- ❌ FAIL if mismatched brief → Regenerate with clarified prompt
Decision:
- ALL 6 gates PASS → Copy to destination (production-ready)
- ANY gate FAILS → Iterate with principle-based feedback (max 3 tries)
Principle: Explicit criteria prevent "good enough" false positives; uniqueness check prevents duplicate rework
Token Conservation Mode
When: Batch mode with >8 visuals OR continuation session
Condensation strategy:
- ✅ KEEP: Story, Emotional Intent, Visual Metaphor, Key Insight
- ✅ KEEP: Color semantics (#2563eb codes), Pedagogical reasoning
- ⚠️ CONDENSE: Long examples → Short labels
- ⚠️ CONDENSE: Verbose descriptions → Bullet points
- ❌ NEVER REMOVE: The narrative elements that activate reasoning
Example condensation:
ORIGINAL (250 tokens):
"Top Layer shows the Coordinator at center top with label 'Orchestrator'
featuring a conductor icon, with the role description 'Strategic oversight,
contract validation', rendered in Gold color (#fbbf24) as a Large hexagon..."
CONDENSED (80 tokens):
"Top Layer - Coordinator: Center top: 'Orchestrator' (conductor icon),
Role: 'Strategic oversight, contract validation', Gold (#fbbf24), Large hexagon."
Success metric: 60-70% token reduction while maintaining 100% first-attempt success rate
Principle: Efficiency without sacrificing reasoning activation
Immediate Embedding Workflow (Step 8.5)
After copying image to destination, BEFORE starting next visual:
-
Determine lesson file:
- Read creative brief's
ChapterandLessonmetadata - Target:
robolearn-interface/docs/[part]/[chapter]/[lesson-file].md
- Read creative brief's
-
Find insertion point:
- Search for concept explanation section related to this visual
- Insert after concept explanation, before practice/exercise
- Follow pedagogical insertion criteria (after learning, before doing)
-
Insert reference:
 -
Verify no code block interruption:
- Grep for triple backticks around insertion
- If inside code block → Find next break point
Why this matters: Completes the work immediately; prevents orphan images
Result: Each visual is generated → validated → placed → verified before moving to next
Principle: Immediate embedding prevents disconnect between generation and integration
Multi-Turn Reasoning Partnership (Three Roles)
Avoid: Accepting first output without evaluation
Prefer: Teaching Gemini your standards through iteration
Iteration Pattern:
Turn 1: Initial Generation
- Paste creative brief, generate
Turn 2: Principle-Based Feedback (if gates fail)
Gate 4 FAILED: Typography hierarchy incorrect
The largest text is "$100K" (supporting detail) but should be "$3T"
(key insight students must grasp).
Pedagogical reasoning: Information importance drives sizing. $3T is
the insight, not the starting value. Visual weight teaches what matters.
Increase '$3T' to dominant size (largest element). Reduce '$100K' to
supporting size. This teaches magnitude through visual hierarchy.
Turn 3: Validation
- Re-check all 5 gates
- If pass → Download
- If fail after 3 tries → Document issue, DEFER (don't block batch)
Principle: You teach Gemini (principle-based feedback), Gemini teaches you (reveals understanding), Co-evolve toward quality
Why it matters: Gemini learns your pedagogical standards across iterations
Batch Mode: Autonomous Agency
Avoid: Permission-asking between visuals
Prefer: Autonomous batch execution
When invoked with: "generate all visuals" or "batch generate"
Execute WITHOUT STOPPING:
For EACH visual in approved list:
A. NEW CHAT (context isolation)
B. Generate image (paste condensed creative brief)
C. Verify quality (6 gates including uniqueness)
D. Iterate if needed (max 3 tries)
E. Download when pass (organized directory)
F. Embed in lesson immediately (no orphans)
G. Log progress ("✅ Generated N/M")
H. IMMEDIATELY next visual (NO STOPPING)
NEVER ask:
- "Would you like me to continue?"
- "Generate just high-priority batch?"
- "Pause here and review?"
Only report summary at END:
BATCH COMPLETE
Total: 18 visuals
✅ Generated: 16 (2K, avg 2-3 iterations)
⚠️ Deferred: 2 (quality issues after 3 tries)
Time: ~45 min
Location: robolearn-interface/static/img/part-{N}/chapter-{NN}/
Principle: Autonomous execution without interruption = efficient batch processing
Proficiency-Complexity Guardrails
From visual-asset-workflow, enforce during generation:
A2 Beginner Limits:
- Max 5-7 elements (overwhelming = failure)
- <5 sec grasp requirement
- Static only (no interactive)
- Max 2×2 grids
- Clear hierarchy
B1 Intermediate:
- Max 7-10 elements
- <10 sec grasp
- Interactive Tier 1 OK
- Max 3×3 grids
C2 Professional:
- No artificial limits
- Dense OK (professionals skim)
- Full interactive architecture
Validation during generation: "Does this visual's complexity match proficiency from creative brief?"
Principle: Complexity mismatch = pedagogical failure
Post-Generation Reflection (After Batch)
AFTER completing batch, analyze systematically:
Success patterns:
- Success rate: {X/Y} production-ready, {N} deferred
- Average iterations: {N} per visual
- Quality gate performance:
- Gate 1 (Spelling): {N} catches
- Gate 2 (Layout): {N} catches
- Gate 3 (Color): {N} catches
- Gate 4 (Typography): {N} catches
- Gate 5 (Teaching): {N} catches
- Gate 6 (Uniqueness): {N} catches (duplicates prevented)
Failure analysis:
- Deferred visuals root causes (layout? spelling? concept?)
- Pattern or random (same issue or isolated?)
- Guardrail gaps (preventable with better principles?)
Improvement opportunities:
- Planning effectiveness (conflicts caught early by visual-asset-workflow?)
- Guardrail sufficiency (Principles 9-12 adequate or gaps?)
- Constitutional compliance (Principle 3 Factual Accuracy, Principle 7 Minimal Content?)
- Next chapter improvements (specific, actionable, pattern-based)
Output: history/visual-assets/reflections/chapter-{NN}-reflection.md
Principle: Systematic reflection → Continuous improvement
Session Interruption & Continuation Protocol
If session ends mid-batch (token limit, context overflow):
Create checkpoint file: history/visual-assets/checkpoints/part-{N}-checkpoint.md
## Batch Status: Part {N}
**Date:** 2025-11-24
**Status:** INTERRUPTED at {X}/{Y} images
### Completed:
- ✅ Image 1: {filename} (2 iterations, embedded in lesson-01.md)
- ✅ Image 2: {filename} (1 iteration, embedded in lesson-02.md)
...
### Remaining:
- ⏳ Image 8: {filename} (not started)
- ⏳ Image 9: {filename} (not started)
...
### Quality Stats (so far):
- Success rate: {X/Y} production-ready
- Avg iterations: {N}
- Gate failures: Gate 1: {n}, Gate 2: {n}...
### Continuation Instructions:
1. Read this checkpoint
2. Start at Image {next}
3. Continue autonomous batch mode
4. Update checkpoint after each image
On continuation:
- Read checkpoint file first
- Resume from last completed image
- Maintain same quality standards
- Update checkpoint incrementally
Principle: Seamless recovery from interruptions maintains momentum
Anti-Patterns
Never:
- Accept first output without 6-gate verification (quality standard violation)
- Ask permission between visuals in batch mode (breaks autonomous agency)
- Convert creative briefs to pixel specs (defeats reasoning activation)
- Generate visuals without creative brief from visual-asset-workflow (missing context)
- Save to flat
visuals/directory (use part/chapter organization) - Skip uniqueness validation (Gate 6 prevents duplicate rework)
Even if it seems reasonable:
- Don't skip Gate 1 because "spelling looks okay" (99% standard requires verification)
- Don't generate 2 images then ask "continue?" (autonomous means autonomous)
- Don't add pixel specifications to creative brief (removes Gemini's judgment)
- Don't skip embedding step "to save time" (creates orphan images requiring later work)
Creative Variance
You tend to accept visuals after 1 iteration even with minor issues. Push for quality:
- Gate failures require iteration (not "close enough")
- Principle-based feedback teaches standards (not vague "make better")
- 3 iteration limit is maximum, not target (aim for 1-2)
- Deferred visuals are OK (don't compromise quality)
Professional content creators iterate. You should too.
Success Indicators
You'll know this skill is working when:
- ✅ All 6 quality gates verified before download (including uniqueness validation)
- ✅ Autonomous batch completion without permission-asking (no interruptions)
- ✅ Principle-based feedback given on iterations (teaching Gemini standards)
- ✅ Creative briefs used AS-IS with token conservation in batch mode
- ✅ Images organized by part/chapter (not flat directory)
- ✅ Images embedded immediately after generation (no orphans)
- ✅ Reflection document created after batch (systematic learning)
- ✅ Checkpoint files created on interruption (seamless continuation)
- ✅ Success rate >85% production-ready (deferred <15%)
- ✅ Zero duplicate images requiring rework
Result: Professional-quality visuals with distinctive aesthetics, generated autonomously with systematic quality control, embedded immediately, and recoverable from interruptions.
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon


