← Back to list

agentv-prompt-optimizer
by EntityProcess
Light-weight AI agent evaluation and optimization framework
⭐ 6🍴 0📅 Jan 22, 2026
SKILL.md
name: agentv-prompt-optimizer description: Iteratively optimize prompt files against AgentV evaluation datasets by analyzing failures and refining instructions.
AgentV Prompt Optimizer
Input Variables
eval-path: Path or glob pattern to the AgentV evaluation file(s) to optimize againstoptimization-log-path(optional): Path where optimization progress should be logged
Workflow
-
Initialize
- Verify
<eval-path>(file or glob) targets the correct system. - Identify Prompt Files:
- Infer prompt files from the eval file content (look for
file:references ininput_messagesthat match these patterns). - Recursively check referenced prompt files for other prompt references (dependencies).
- If multiple prompts are found, consider ALL of them as candidates for optimization.
- Infer prompt files from the eval file content (look for
- Identify Optimization Log:
- If
<optimization-log-path>is provided, use it. - If not, create a new one in the parent directory of the eval files:
optimization-[timestamp].md.
- If
- Read content of the identified prompt file.
- Verify
-
Optimization Loop (Max 10 iterations)
- Execute (The Generator): Run
agentv eval <eval-path>.- Targeted Run: If iterating on specific stubborn failures, use
--eval-id <case_id>to run only the relevant eval cases.
- Targeted Run: If iterating on specific stubborn failures, use
- Analyze (The Reflector):
- Locate the results file path from the console output (e.g.,
.agentv/results/eval_...jsonl). - Orchestrate Subagent: Use
runSubagentto analyze the results.- Task: Read the results file, calculate pass rate, and perform root cause analysis.
- Output: Return a structured analysis including:
- Score: Current pass rate.
- Root Cause: Why failures occurred (e.g., "Ambiguous definition", "Hallucination").
- Insight: Key learning or pattern identified from the failures.
- Strategy: High-level plan to fix the prompt (e.g., "Clarify section X", "Add negative constraint").
- Locate the results file path from the console output (e.g.,
- Decide:
- If 100% pass: STOP and report success.
- If Score decreased: Revert last change, try different approach.
- If No improvement (2x): STOP and report stagnation.
- Refine (The Curator):
- Orchestrate Subagent: Use
runSubagentto apply the fix.- Task: Read the relevant prompt file(s), apply the Strategy from the Reflector, and generate the log entry.
- Output: The Log Entry describing the specific operation performed.
markdown ### Iteration [N] - **Operation**: [ADD / UPDATE / DELETE] - **Target**: [Section Name] - **Change**: [Specific text added/modified] - **Trigger**: [Specific failing test case or error pattern] - **Rationale**: [From Reflector: Root Cause] - **Score**: [From Reflector: Current Pass Rate] - **Insight**: [From Reflector: Key Learning]
- Strategy: Treat the prompt as a structured set of rules. Execute atomic operations:
- ADD: Insert a new rule if a constraint was missed.
- UPDATE: Refine an existing rule to be clearer or more general.
- Clarify: Make ambiguous instructions specific.
- Generalize: Refactor specific fixes into high-level principles (First Principles).
- DELETE: Remove obsolete, redundant, or harmful rules.
- Prune: If a general rule covers specific cases, delete the specific ones.
- Negative Constraint: If hallucinating, explicitly state what NOT to do. Prefer generalized prohibitions over specific forbidden tokens where possible.
- Safety Check: Ensure new rules don't contradict existing ones (unless intended).
- Constraint: Avoid rewriting large sections. Make surgical, additive changes to preserve existing behavior.
- Orchestrate Subagent: Use
- Log Result:
- Append the Log Entry returned by the Curator to the optimization log file.
- Execute (The Generator): Run
-
Completion
- Report final score.
- Summarize key changes made to the prompt.
- Finalize Optimization Log: Add a summary header to the optimization log file indicating the session completion and final score.
Guidelines
- Generalization First: Prefer broad, principle-based guidelines over specific examples or "hotfixes". Only use specific rules if generalized instructions fail to achieve the desired score.
- Simplicity ("Less is More"): Avoid overfitting to the test set. If a specific rule doesn't significantly improve the score compared to a general one, choose the general one.
- Structure: Maintain existing Markdown headers/sections.
- Progressive Disclosure: If the prompt grows too large (>200 lines), consider moving specialized logic into a separate file or skill.
- Quality Criteria: Ensure the prompt defines a clear persona, specific task, and measurable success criteria.
Score
Total Score
65/100
Based on repository quality metrics
✓SKILL.md
SKILL.mdファイルが含まれている
+20
✓LICENSE
ライセンスが設定されている
+10
○説明文
100文字以上の説明がある
0/10
○人気
GitHub Stars 100以上
0/15
✓最近の活動
1ヶ月以内に更新
+10
○フォーク
10回以上フォークされている
0/5
✓Issue管理
オープンIssueが50未満
+5
✓言語
プログラミング言語が設定されている
+5
✓タグ
1つ以上のタグが設定されている
+5
Reviews
💬
Reviews coming soon
