image-generator

Name: image-generator
Rating: 65
Author: mjunaidca

by mjunaidca

Book + RAG ChatAgent + SSO + Compunding RI Intelligence built in 2 days. Robolearn is what AI-driven development with Spec-Driven methodology and Reusable Intelligence makes possible.

⭐ 7🍴 1📅 Jan 13, 2026

ai-agents ai-native-development better-auth-nextjs chatkit-advanced spec-driven-development sso

View on GitHub Run in Manus

SKILL.md

name: image-generator description: This skill should be used when generating images for educational content. It provides multi-turn reasoning partnership methodology with Gemini for professional quality visual generation. version: "1.0.0"

Image Generator Skill

Context & Problem

Visual generation converges toward accepting first output ("looks good enough") and following technical specifications rigidly. This produces generic aesthetics and misses Gemini 3's reasoning capabilities.

This skill provides multi-turn reasoning partnership methodology with professional quality standards.

Core Principles

Reasoning mode over prediction mode - Creative briefs (Story/Intent/Metaphor) activate reasoning; technical specs don't
Multi-turn partnership - Teach Gemini your standards through principle-based feedback
Professional quality gates - Explicit pass/fail criteria (99% spelling, not "check spelling")
Autonomous agency - Batch mode without permission-asking between visuals

Dimensional Guidance

Input: Professional Creative Briefs (NOT Technical Specs)

Receive from visual-asset-workflow v5.0:

## The Story
[Narrative about what's visualized]

## Emotional Intent
[What it should FEEL like]

## Visual Metaphor
[Universal concept for instant comprehension]

## Subject / Composition / Action / Location / Style / Camera / Lighting
[Official Gemini 3 prompt structure]

## Color Semantics
Blue (#2563eb) = Authority (teaches governance)
Green (#10b981) = Execution (teaches work)

## Typography Hierarchy
Largest: Key insight (information importance drives sizing)
Medium: Supporting components
Smallest: Context

## Pedagogical Reasoning
[Why these choices serve teaching]

DO NOT convert briefs back to pixel specs - Use AS-IS to activate reasoning.

Principle: Creative briefs activate Gemini's reasoning about HOW to achieve intent visually

Workflow: Browser-Based Generation (Playwright MCP)

CRITICAL: Use gemini.google.com ONLY (NOT Google AI Studio, NOT other image generators)

Initialize once:

Navigate to https://gemini.google.com/ (Playwright MCP)
User signs in if needed (session persists)
Select "🍌 Create Image" tool (Nano Banana Pro)

For EACH visual:

Type creative brief directly into Gemini chat textbox (use condensed format in batch mode - see Token Conservation below)
Press Enter to submit
Wait 30-35 seconds for generation
Click "Download full size image" button when ready
Wait 3-5 seconds for download completion
Verify quality IMMEDIATELY (6 gates below)
If gates fail: Continue in same chat with principle-based feedback (max 3 iterations)
If gates pass: Copy from .playwright-mcp/Gemini-Generated-Image-*.png to robolearn-interface/static/img/part-{N}/chapter-{NN}/{filename}.png
Embed in lesson (Step 8.5 below)
Start NEW CHAT for next visual (prevents context contamination)

Principle: New chat per visual prevents cross-contamination; immediate verification catches issues early; immediate embedding prevents orphans

Quality Gates: 5-Gate Professional Standard

ALL must pass before download:

Gate 1: Spelling Accuracy (99% standard)

✅ Company names correct (Y-Combinator not Y Combinator)
✅ Technical terms correct (Kubernetes not Kubernete)
✅ Zero typos in visible text
❌ FAIL if even ONE spelling error → Iterate

Gate 2: Layout Precision

✅ Proportions match prompt (2×2 grid, not 3×1)
✅ Alignment clean (no misaligned elements)
✅ Spacing consistent (equal gaps)
✅ Hierarchy clear (largest = most important)
❌ FAIL if layout deviates → Iterate

Gate 3: Color Accuracy

✅ Brand colors match spec (#001f3f not #002050)
✅ Semantic colors correct (blue=authority, green=execution)
✅ Contrast meets WCAG 4.5:1 (accessibility)
❌ FAIL if colors significantly off → Iterate

Gate 4: Typography Hierarchy

✅ Largest text = key concept (not decoration)
✅ Font sizes create clear hierarchy
✅ Readability: A2 min 24px, B1 min 18px, C2 min 14px
❌ FAIL if typography doesn't teach through sizing → Iterate

Gate 5: Teaching Effectiveness (<5 sec concept grasp)

✅ Target proficiency can grasp concept in <5 seconds
✅ Visual adds clarity (not just decoration)
✅ Cognitive load reduced vs reading text
❌ FAIL if confusing or generic → Iterate

Gate 6: Uniqueness Validation (NEW)

✅ Visual comparison: Does NOT match any existing image in same chapter
✅ Prompt alignment: Matches creative brief's intent (graph ≠ timeline)
✅ Filename verification: {filename}.prompt.md exists and visual matches it
❌ FAIL if duplicate detected → Regenerate with NEW CHAT
❌ FAIL if mismatched brief → Regenerate with clarified prompt

Decision:

ALL 6 gates PASS → Copy to destination (production-ready)
ANY gate FAILS → Iterate with principle-based feedback (max 3 tries)

Principle: Explicit criteria prevent "good enough" false positives; uniqueness check prevents duplicate rework

Token Conservation Mode

When: Batch mode with >8 visuals OR continuation session

Condensation strategy:

✅ KEEP: Story, Emotional Intent, Visual Metaphor, Key Insight
✅ KEEP: Color semantics (#2563eb codes), Pedagogical reasoning
⚠️ CONDENSE: Long examples → Short labels
⚠️ CONDENSE: Verbose descriptions → Bullet points
❌ NEVER REMOVE: The narrative elements that activate reasoning

Example condensation:

ORIGINAL (250 tokens):

"Top Layer shows the Coordinator at center top with label 'Orchestrator'
featuring a conductor icon, with the role description 'Strategic oversight,
contract validation', rendered in Gold color (#fbbf24) as a Large hexagon..."

CONDENSED (80 tokens):

"Top Layer - Coordinator: Center top: 'Orchestrator' (conductor icon),
Role: 'Strategic oversight, contract validation', Gold (#fbbf24), Large hexagon."

Success metric: 60-70% token reduction while maintaining 100% first-attempt success rate

Principle: Efficiency without sacrificing reasoning activation

Immediate Embedding Workflow (Step 8.5)

After copying image to destination, BEFORE starting next visual:

Determine lesson file:
- Read creative brief's Chapter and Lesson metadata
- Target: robolearn-interface/docs/[part]/[chapter]/[lesson-file].md
Find insertion point:
- Search for concept explanation section related to this visual
- Insert after concept explanation, before practice/exercise
- Follow pedagogical insertion criteria (after learning, before doing)

Insert reference:

![{Alt text from creative brief}](/img/part-X/chapter-YY/{filename}.png)

Verify no code block interruption:
- Grep for triple backticks around insertion
- If inside code block → Find next break point

Why this matters: Completes the work immediately; prevents orphan images

Result: Each visual is generated → validated → placed → verified before moving to next

Principle: Immediate embedding prevents disconnect between generation and integration

Multi-Turn Reasoning Partnership (Three Roles)

Avoid: Accepting first output without evaluation

Prefer: Teaching Gemini your standards through iteration

Iteration Pattern:

Turn 1: Initial Generation

Paste creative brief, generate

Turn 2: Principle-Based Feedback (if gates fail)

Gate 4 FAILED: Typography hierarchy incorrect

The largest text is "$100K" (supporting detail) but should be "$3T"
(key insight students must grasp).

Pedagogical reasoning: Information importance drives sizing. $3T is
the insight, not the starting value. Visual weight teaches what matters.

Increase '$3T' to dominant size (largest element). Reduce '$100K' to
supporting size. This teaches magnitude through visual hierarchy.

Turn 3: Validation

Re-check all 5 gates
If pass → Download
If fail after 3 tries → Document issue, DEFER (don't block batch)

Principle: You teach Gemini (principle-based feedback), Gemini teaches you (reveals understanding), Co-evolve toward quality

Why it matters: Gemini learns your pedagogical standards across iterations

Batch Mode: Autonomous Agency

Avoid: Permission-asking between visuals

Prefer: Autonomous batch execution

When invoked with: "generate all visuals" or "batch generate"

Execute WITHOUT STOPPING:

For EACH visual in approved list:
  A. NEW CHAT (context isolation)
  B. Generate image (paste condensed creative brief)
  C. Verify quality (6 gates including uniqueness)
  D. Iterate if needed (max 3 tries)
  E. Download when pass (organized directory)
  F. Embed in lesson immediately (no orphans)
  G. Log progress ("✅ Generated N/M")
  H. IMMEDIATELY next visual (NO STOPPING)

NEVER ask:

"Would you like me to continue?"
"Generate just high-priority batch?"
"Pause here and review?"

Only report summary at END:

BATCH COMPLETE
Total: 18 visuals
✅ Generated: 16 (2K, avg 2-3 iterations)
⚠️ Deferred: 2 (quality issues after 3 tries)
Time: ~45 min
Location: robolearn-interface/static/img/part-{N}/chapter-{NN}/

Principle: Autonomous execution without interruption = efficient batch processing

Proficiency-Complexity Guardrails

From visual-asset-workflow, enforce during generation:

A2 Beginner Limits:

Max 5-7 elements (overwhelming = failure)
<5 sec grasp requirement
Static only (no interactive)
Max 2×2 grids
Clear hierarchy

B1 Intermediate:

Max 7-10 elements
<10 sec grasp
Interactive Tier 1 OK
Max 3×3 grids

C2 Professional:

No artificial limits
Dense OK (professionals skim)
Full interactive architecture

Validation during generation: "Does this visual's complexity match proficiency from creative brief?"

Principle: Complexity mismatch = pedagogical failure

Post-Generation Reflection (After Batch)

AFTER completing batch, analyze systematically:

Success patterns:

Success rate: {X/Y} production-ready, {N} deferred
Average iterations: {N} per visual
Quality gate performance:
- Gate 1 (Spelling): {N} catches
- Gate 2 (Layout): {N} catches
- Gate 3 (Color): {N} catches
- Gate 4 (Typography): {N} catches
- Gate 5 (Teaching): {N} catches
- Gate 6 (Uniqueness): {N} catches (duplicates prevented)

Failure analysis:

Deferred visuals root causes (layout? spelling? concept?)
Pattern or random (same issue or isolated?)
Guardrail gaps (preventable with better principles?)

Improvement opportunities:

Planning effectiveness (conflicts caught early by visual-asset-workflow?)
Guardrail sufficiency (Principles 9-12 adequate or gaps?)
Constitutional compliance (Principle 3 Factual Accuracy, Principle 7 Minimal Content?)
Next chapter improvements (specific, actionable, pattern-based)

Output: history/visual-assets/reflections/chapter-{NN}-reflection.md

Principle: Systematic reflection → Continuous improvement

Session Interruption & Continuation Protocol

If session ends mid-batch (token limit, context overflow):

Create checkpoint file: history/visual-assets/checkpoints/part-{N}-checkpoint.md

## Batch Status: Part {N}
**Date:** 2025-11-24
**Status:** INTERRUPTED at {X}/{Y} images

### Completed:
- ✅ Image 1: {filename} (2 iterations, embedded in lesson-01.md)
- ✅ Image 2: {filename} (1 iteration, embedded in lesson-02.md)
...

### Remaining:
- ⏳ Image 8: {filename} (not started)
- ⏳ Image 9: {filename} (not started)
...

### Quality Stats (so far):
- Success rate: {X/Y} production-ready
- Avg iterations: {N}
- Gate failures: Gate 1: {n}, Gate 2: {n}...

### Continuation Instructions:
1. Read this checkpoint
2. Start at Image {next}
3. Continue autonomous batch mode
4. Update checkpoint after each image

On continuation:

Read checkpoint file first
Resume from last completed image
Maintain same quality standards
Update checkpoint incrementally

Principle: Seamless recovery from interruptions maintains momentum

Anti-Patterns

Never:

Accept first output without 6-gate verification (quality standard violation)
Ask permission between visuals in batch mode (breaks autonomous agency)
Convert creative briefs to pixel specs (defeats reasoning activation)
Generate visuals without creative brief from visual-asset-workflow (missing context)
Save to flat visuals/ directory (use part/chapter organization)
Skip uniqueness validation (Gate 6 prevents duplicate rework)

Even if it seems reasonable:

Don't skip Gate 1 because "spelling looks okay" (99% standard requires verification)
Don't generate 2 images then ask "continue?" (autonomous means autonomous)
Don't add pixel specifications to creative brief (removes Gemini's judgment)
Don't skip embedding step "to save time" (creates orphan images requiring later work)

Creative Variance

You tend to accept visuals after 1 iteration even with minor issues. Push for quality:

Gate failures require iteration (not "close enough")
Principle-based feedback teaches standards (not vague "make better")
3 iteration limit is maximum, not target (aim for 1-2)
Deferred visuals are OK (don't compromise quality)

Professional content creators iterate. You should too.

Success Indicators

You'll know this skill is working when:

✅ All 6 quality gates verified before download (including uniqueness validation)
✅ Autonomous batch completion without permission-asking (no interruptions)
✅ Principle-based feedback given on iterations (teaching Gemini standards)
✅ Creative briefs used AS-IS with token conservation in batch mode
✅ Images organized by part/chapter (not flat directory)
✅ Images embedded immediately after generation (no orphans)
✅ Reflection document created after batch (systematic learning)
✅ Checkpoint files created on interruption (seamless continuation)
✅ Success rate >85% production-ready (deferred <15%)
✅ Zero duplicate images requiring rework

Result: Professional-quality visuals with distinctive aesthetics, generated autonomously with systematic quality control, embedded immediately, and recoverable from interruptions.

Score

Total Score

65/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

○LICENSE

ライセンスが設定されている

0/10

✓説明文

100文字以上の説明がある

+10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

○フォーク

10回以上フォークされている

0/5

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

image-generator

SKILL.md

name: image-generator description: This skill should be used when generating images for educational content. It provides multi-turn reasoning partnership methodology with Gemini for professional quality visual generation. version: "1.0.0"

Image Generator Skill

Context & Problem

Core Principles

Dimensional Guidance

Input: Professional Creative Briefs (NOT Technical Specs)

Workflow: Browser-Based Generation (Playwright MCP)

Quality Gates: 5-Gate Professional Standard

Token Conservation Mode

Immediate Embedding Workflow (Step 8.5)

Multi-Turn Reasoning Partnership (Three Roles)

Batch Mode: Autonomous Agency

Proficiency-Complexity Guardrails

Post-Generation Reflection (After Batch)

Session Interruption & Continuation Protocol

Anti-Patterns

Creative Variance

Success Indicators

Score

Reviews

greeter

pr-creator

code-reviewer

skill-creator

browser-use

git-workflow

image-generator

SKILL.md

name: image-generator description: This skill should be used when generating images for educational content. It provides multi-turn reasoning partnership methodology with Gemini for professional quality visual generation. version: "1.0.0"

Image Generator Skill

Context & Problem

Core Principles

Dimensional Guidance

Input: Professional Creative Briefs (NOT Technical Specs)

Workflow: Browser-Based Generation (Playwright MCP)

Quality Gates: 5-Gate Professional Standard

Token Conservation Mode

Immediate Embedding Workflow (Step 8.5)

Multi-Turn Reasoning Partnership (Three Roles)

Batch Mode: Autonomous Agency

Proficiency-Complexity Guardrails

Post-Generation Reflection (After Batch)

Session Interruption & Continuation Protocol

Anti-Patterns

Creative Variance

Success Indicators

Score

Reviews

Related

Related Skills

greeter

pr-creator

code-reviewer

skill-creator

browser-use

git-workflow