reflect

Name: reflect
Rating: 60
Author: rjmurillo

by rjmurillo

Multi-agent system for software development

⭐ 5🍴 0📅 Jan 24, 2026

agentic-ai ai-agents ai-assistant anthropic-claude automation ci-cd claude-code code-generation

View on GitHub Run in Manus

SKILL.md

name: reflect description: CRITICAL learning capture. Extracts HIGH/MED/LOW confidence patterns from conversations to prevent repeating mistakes and preserve what works. Use PROACTIVELY after user corrections ("no", "wrong"), after praise ("perfect", "exactly"), when discovering edge cases, or when skills are heavily used. Without reflection, valuable learnings are LOST forever. Acts as continuous improvement engine for all skills. Invoke EARLY and OFTEN - every correction is a learning opportunity. license: MIT model: claude-sonnet-4-5 metadata: version: 1.0.0 timelessness: 8/10 adr: ADR-007, ADR-017

Reflect Skill

Critical learning capture system that prevents repeating mistakes and preserves successful patterns across sessions.

Analyze the current conversation and propose improvements to skill-based memories based on what worked, what didn't, and edge cases discovered. Every correction is a learning opportunity - invoke proactively to build institutional knowledge.

Triggers

reflect – explicit request to capture learnings
learn from this – user wants corrections documented
improve skill {name} – target a specific skill memory

Also monitor user phrasing such as what did we learn?, "what if...", "ensure", or "don't forget"—these phrases should immediately route into the MEDIUM trigger tables below.

🔴 HIGH Priority Triggers (Invoke Immediately)

Trigger	Example	Why Critical
User correction	"no", "wrong", "not like that", "never do"	Captures mistakes to prevent repetition
Chesterton's Fence	"you removed that without understanding"	Documents architectural decisions
Immediate fixes	"debug", "root cause", "fix all"	Learns from errors in real-time

🟡 MEDIUM Priority Triggers (Invoke After Multiple)

Trigger	Example	Why Important
User praise	"perfect", "exactly", "great"	Reinforces successful patterns
Tool preferences	"use X instead of Y", "prefer", "rather than"	Builds workflow preferences
Edge cases	"what if X happens?", "don't forget", "ensure"	Captures scenarios to handle
Questions	Short questions after output	May indicate confusion or gaps

🟢 LOW Priority Triggers (Invoke at Session End)

Trigger	Example	Why Useful
Repeated patterns	Frequent use of specific commands/tools	Identifies workflow preferences
Session end	After skill-heavy work	Consolidates all session learnings

Original Triggers (Still Supported)

Phrase	Action
"reflect"	Full analysis of current session
"improve skill"	Target specific skill for improvement
"learn from this"	Extract learnings from recent interaction
"what did we learn"	Summarize accumulated learnings

🚨 Proactive Invocation Reminder

Don't wait for users to ask! Invoke reflect immediately when you detect:

User says "no" → Invoke reflect NOW (captures correction)
User says "perfect" → Invoke reflect NOW (captures success pattern)
User asks "what if" → Invoke reflect NOW (captures edge case)
You used multiple skills → Invoke reflect at END (captures all learnings)
User corrected your output → Invoke reflect IMMEDIATELY (critical learning)

Why this matters: Without proactive reflection, learnings are LOST. The Stop hook captures some patterns, but manual reflection is MORE ACCURATE because you have full conversation context.

Cost: ~30 seconds of analysis. Benefit: Prevents repeating mistakes forever.

Process

Phase 1: Identify the Target Skill

Locate the skill-based memory to update:

Check Serena memories: Look for files ending with -observations.md in .serena/memories/
Infer from context: Identify which skill(s) were used in the conversation
Create if needed: If missing, propose {skill-name}-observations.md (skill observations pattern)

Storage Locations:

Serena MCP (canonical): .serena/memories/{skill-name}-observations.md via mcp__serena__write_memory
Contingency (Serena unavailable): Manually edit the same file in Git and note the manual update in the session log for later Serena sync

Phase 2: Analyze the Conversation

Scan the conversation for learning signals with confidence levels:

HIGH Confidence: Corrections

User actively steered or corrected output. These are the most valuable signals.

Detection patterns:

Explicit rejection: "no", "not like that", "that's wrong", "I meant"
Strong directives: "never do", "always do", "don't ever"
Immediate requests for changes after generation
User provided alternative implementation
User explicitly corrected output format/structure

Example:

User: "No, use the PowerShell skill script instead of raw gh commands"
→ [HIGH] + Add constraint: "Use PowerShell skill scripts, never raw gh commands"

MEDIUM Confidence: Success Patterns

Output was accepted or praised. Good signals but may be context-specific.

Detection patterns:

Explicit praise: "perfect", "great", "yes", "exactly", "that's it"
Implicit acceptance: User built on top of output without modification
User proceeded to next step without corrections
Output was committed/merged without changes

Example:

User: "Perfect, that's exactly what I needed"
→ [MED] + Add preference: "Include example usage in script headers"

MEDIUM Confidence: Edge Cases

Scenarios the skill didn't anticipate. Opportunities for improvement.

Detection patterns:

Questions skill didn't answer
Workarounds user had to apply
Features user asked for that weren't covered
Error handling gaps discovered

Example:

User: "What if the file doesn't exist?"
→ [MED] ~ Add edge case: "Handle missing file scenario"

LOW Confidence: Preferences

Accumulated patterns over time. Need more evidence before formalizing.

Detection patterns:

Repeated choices in similar situations
Style preferences shown implicitly (formatting, naming)
Tool/framework preferences
Workflow preferences

Example:

User consistently uses `-Force` flag
→ [LOW] ~ Note for review: "User prefers -Force flag for overwrites"

Confidence Threshold

Only propose changes when sufficient evidence exists:

Threshold	Action
≥1 HIGH signal	Always propose (user explicitly corrected)
≥2 MED signals	Propose (sufficient pattern)
≥3 LOW signals	Propose (accumulated evidence)
1-2 LOW only	Skip (insufficient evidence), note for next session

Phase 3: Propose Learnings

Present findings using WCAG AA accessible colors (4.5:1 contrast ratio):

┌─────────────────────────────────────────────────────────────┐
│ SKILL REFLECTION: {skill-name}                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│ [HIGH] + Add constraint: "{specific constraint}"            │
│   Source: "{quoted user correction}"                        │
│                                                             │
│ [MED]  + Add preference: "{specific preference}"            │
│   Source: "{evidence from conversation}"                    │
│                                                             │
│ [MED]  + Add edge case: "{scenario}"                        │
│   Source: "{question or workaround}"                        │
│                                                             │
│ [LOW]  ~ Note for review: "{observation}"                   │
│   Source: "{pattern observed}"                              │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│ Apply changes? [Y/n/edit]                                   │
└─────────────────────────────────────────────────────────────┘

Color Key (accessible):

[HIGH] - Red/bold: Mandatory corrections (user explicitly said "no")
[MED] - Yellow/amber: Recommended additions
[LOW] - Blue/dim: Notes for later review

User Response Handling:

Response	Action
Y (yes)	Proceed to Step 4 (update memory)
n (no)	Abort update, ask "What would you like to change or was this not useful?"
edit	Present each finding individually, allow user to modify/reject each one

On rejection (n):

Log that reflection was declined (for future pattern analysis)
Ask user if they want to revise the analysis or skip entirely
If skip, end workflow without memory update

On edit:

Present first finding with options: [keep/modify/remove]
If modify, accept user's revised text
Repeat for each finding
Confirm final list before applying

Phase 4: Persist Learnings to Memory

ALWAYS show changes before applying.

After user approval:

Read existing memory (if exists)
Append new learnings with timestamp and session reference
Preserve existing content - never remove without explicit request
Write to file: .serena/memories/{skill-name}-observations.md

Storage Strategy:

Serena MCP (canonical):

mcp__serena__write_memory(memory_file_name="{name}-observations", memory_content="...")

If Serena unavailable (contingency):

$path = ".serena/memories/{name}-observations.md"
$existingContent = Get-Content $path -ErrorAction SilentlyContinue
$newContent = $existingContent + "`n" + $newLearnings
Set-Content $path -Value $newContent
git add $path
git commit -m "chore(memory): update {name} skill sidecar learnings"

Record the manual edit in the session log so Serena MCP can replay the update when the service is available again.

Memory Format:

# Skill Sidecar Learnings: {Skill Name}

**Last Updated**: {ISO date}
**Sessions Analyzed**: {count}

## Constraints (HIGH confidence)

- {constraint 1} (Session {N}, {date})
- {constraint 2} (Session {N}, {date})

## Preferences (MED confidence)

- {preference 1} (Session {N}, {date})
- {preference 2} (Session {N}, {date})

## Edge Cases (MED confidence)

- {edge case 1} (Session {N}, {date})
- {edge case 2} (Session {N}, {date})

## Notes for Review (LOW confidence)

- {note 1} (Session {N}, {date})
- {note 2} (Session {N}, {date})

Decision Tree

User says "reflect" or similar?
│
├─► YES
│   │
│   ├─► Identify skill(s) used in conversation
│   │   │
│   │   └─► Skill identified?
│   │       │
│   │       ├─► YES → Analyze conversation for signals
│   │       │   │
│   │       │   └─► Meets confidence threshold?
│   │       │       │
│   │       │       ├─► YES → Present findings, await approval
│   │       │       │   │
│   │       │       │   ├─► User says Y → Update memory file
│   │       │       │   │   │
│   │       │       │   │   ├─► Serena available? → Use MCP write
│   │       │       │   │   └─► Serena unavailable? → Use Git fallback
│   │       │       │   │
│   │       │       │   ├─► User says n → Ask for feedback
│   │       │       │   │   │
│   │       │       │   │   ├─► User wants revision → Re-analyze
│   │       │       │   │   └─► User skips → End workflow
│   │       │       │   │
│   │       │       │   └─► User says edit → Interactive review
│   │       │       │       │
│   │       │       │       └─► Per-finding [keep/modify/remove]
│   │       │       │
│   │       │       └─► NO → Report "Insufficient evidence. Note for next session."
│   │       │
│   │       └─► NO → Ask user which skill to reflect on
│   │           │
│   │           ├─► User specifies skill → Continue with that skill
│   │           └─► User says "none" → End workflow
│   │
│   └─► Multiple skills?
│       │
│       └─► Analyze each, group findings by skill, present together
│
└─► NO → This skill not invoked

Examples

Example 1: Correction Detected

Conversation:
User: "Create a PR for this change"
Agent: [runs gh pr create directly]
User: "No, use the github skill script!"

Analysis:
[HIGH] + Add constraint: "Always use .claude/skills/github/ scripts for PR operations"
  Source: User correction - "No, use the github skill script!"

Example 2: Success Pattern

Conversation:
User: "Add error handling"
Agent: [adds try/catch with specific error types]
User: "Perfect! That's exactly what I wanted"

Analysis:
[MED] + Add preference: "Use specific error types in catch blocks, not generic [Exception]"
  Source: User approval after seeing specific error types

Example 3: Edge Case Discovery

Conversation:
User: "Run the build"
Agent: [runs build command]
User: "Wait, what if the node_modules folder doesn't exist?"

Analysis:
[MED] + Add edge case: "Check for node_modules existence before build"
  Source: User question about missing dependencies

Use Cases

1. Code Review Skills

Capture learnings about code review patterns:

Style guide rules: User corrections on formatting, naming, structure
Security patterns: Security vulnerabilities caught, OWASP patterns enforced
Severity levels: When issues are P0 vs P1 vs P2
False positives: Patterns that look like issues but aren't

Example memory: .serena/memories/code-review-observations.md

2. API Design Skills

Track API design decisions:

Naming conventions: REST endpoint patterns, verb choices
Error formats: HTTP status codes, error response structure
Auth patterns: OAuth, JWT, API key patterns
Versioning style: URL versioning, header versioning

Example memory: .serena/memories/api-design-observations.md

3. Testing Skills

Remember testing preferences:

Coverage targets: Minimum % required, critical paths
Mocking patterns: When to mock vs integration test
Assertion styles: Preferred assertion libraries, patterns
Test naming: Convention for test method names

Example memory: .serena/memories/testing-observations.md

4. Documentation Skills

Learn documentation patterns:

Structure/format: Section order, heading levels
Code examples: Real vs pseudo-code, language choice
Tone preferences: Formal vs casual, active vs passive voice
Diagram styles: Mermaid vs ASCII, detail level

Example memory: .serena/memories/documentation-observations.md

Anti-Patterns

Avoid	Why	Instead
Applying without showing	User loses visibility	Always preview changes
Overwriting existing learnings	Loses history	Append with timestamps
Generic observations	Not actionable	Be specific and contextual
Ignoring LOW confidence	Lose valuable patterns	Track for future validation
Creating memory for one-off	Noise	Wait for repeated patterns

Integration

With Session Protocol

Run reflection at session end as part of retrospective:

## Session End Checklist
- [ ] Complete session log
- [ ] Run skill reflection (if skills were used)
- [ ] Update Serena memory
- [ ] Commit changes

With Memory Skill

Skill memories integrate with the memory system:

# Search skill sidecar learnings
pwsh .claude/skills/memory/scripts/Search-Memory.ps1 -Query "github-observations constraints"

# Read specific skill sidecar
Read .serena/memories/github-observations.md

With Serena

If Serena MCP is available:

mcp__serena__read_memory(memory_file_name="github-observations")
mcp__serena__write_memory(memory_file_name="github-observations", memory_content="...")

Verification

Action	Verification
Analysis complete	Signals categorized by confidence
User approved	Explicit Y or approval statement
Memory updated	File written to `.serena/memories/`
Changes preserved	Existing content not lost
Commit ready	Changes staged, message drafted

Design Decisions

Agent Sidecar Naming: `{skill-name}-observations.md`

Decision: Skill memories follow the ADR-007 sidecar pattern (e.g., github-observations.md).

Rationale:

ADR-007 Alignment: Reuses the agent sidecar convention instead of inventing a parallel structure
ADR-017 Compliance: Keeps {domain}-{description} format while making "skill-sidecar" explicit
Discovery: Sidecars are now referenced in memory-index.md, preventing orphaned learnings
Single Canonical Store: Serena MCP and Git both write to the same file path, eliminating dual-governance ambiguity

Migration: Rename {skill}-observations.md (or legacy skill-{name}.md) to {skill}-observations.md and update index references.

Serena vs Forgetful Roles

Serena MCP remains the canonical record. Every learning is persisted to the {skill}-observations.md file.
Forgetful is optional and used for semantic lookup only. When storing supporting context, tag the entry with skill-{name} and reference the Serena sidecar instead of duplicating the content.

Relationship to `curating-memories`

curating-memories = general-purpose maintenance of any memory artifact (linking, pruning, marking obsolete).
reflect = targeted retrospective that feeds those artifacts with new learnings.
When a sidecar accumulates conflicting guidance, route the file to curating-memories for cleanup.

Session Protocol Integration

Add "Run skill reflection if ≥3 distinct skills used" to the Session End checklist.
Document any manual sidecar edits (when Serena MCP is unavailable) in the session log before completion.
Invoke reflect immediately after the Stop hook highlights high-confidence learnings so the session log and sidecar stay in sync.

Extension Points

Curating memories – route conflicting or stale learnings to curating-memories for consolidation.
Memory skill – use memory skill for search/recall before proposing redundant learnings.
Forgetful – optionally mirror high-confidence learnings into Forgetful with skill-{name} tags for semantic recall.
Session log fixer – after reflection, ensure the session log captures the learning summary via session-log-fixer.

Skill	Relationship
`memory`	Skill memories are part of Tier 1
`using-forgetful-memory`	Alternative storage for skill learnings
`curating-memories`	For maintaining/pruning skill memories
`retrospective`	Full session retrospective (this is mini version)

Commit Convention

When committing skill observation updates:

chore(memory): update {skill-name} skill sidecar learnings (session {N})

- Added {count} constraints (HIGH confidence)
- Added {count} preferences (MED confidence)
- Added {count} edge cases (MED confidence)
- Added {count} notes (LOW confidence)

Session: {session-id}

Score

Total Score

60/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

○説明文

100文字以上の説明がある

0/10

○人気

GitHub Stars 100以上

0/15

○最近の活動

3ヶ月以内に更新がある

0/10

○フォーク

10回以上フォークされている

0/5

○Issue管理

オープンIssueが50未満

0/5

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

reflect

SKILL.md

Reflect Skill

Triggers

🔴 HIGH Priority Triggers (Invoke Immediately)

🟡 MEDIUM Priority Triggers (Invoke After Multiple)

🟢 LOW Priority Triggers (Invoke at Session End)

Original Triggers (Still Supported)

🚨 Proactive Invocation Reminder

Process

Phase 1: Identify the Target Skill

Phase 2: Analyze the Conversation

HIGH Confidence: Corrections

MEDIUM Confidence: Success Patterns

MEDIUM Confidence: Edge Cases

LOW Confidence: Preferences

Confidence Threshold

Phase 3: Propose Learnings

Phase 4: Persist Learnings to Memory

Decision Tree

Examples

Example 1: Correction Detected

Example 2: Success Pattern

Example 3: Edge Case Discovery

Use Cases

1. Code Review Skills

2. API Design Skills

3. Testing Skills

4. Documentation Skills

Anti-Patterns

Integration

With Session Protocol

With Memory Skill

With Serena

Verification

Design Decisions

Agent Sidecar Naming: {skill-name}-observations.md

Serena vs Forgetful Roles

Relationship to curating-memories

Session Protocol Integration

Extension Points

Related

Commit Convention

Score

Reviews

Related

Related Skills

changelog-automation

web-component-design

dbt-transformation-patterns

market-sizing-analysis

orpc-contract-first

component-refactoring

Agent Sidecar Naming: `{skill-name}-observations.md`

Relationship to `curating-memories`