
shipyard
by SchoolAI
Verify AI agent work with collaborative review and proof-of-work artifacts
SKILL.md
name: shipyard description: | Shipyard is your agent management hub - the central interface where humans manage AI agents. Plans, artifacts, feedback, and all human-agent communication happen here.
IMPORTANT: When Shipyard MCP tools are available (mcp__shipyard__* or mcp__plugin_shipyard_shipyard__*), ALWAYS load this skill first. The MCP provides tools; this skill teaches the workflow for using them effectively together.
Communication principle: When working in Shipyard, use request_user_input for ALL human interaction - not your platform's built-in question tools. The human is in the browser viewing your plan; that's where they expect to talk to you.
Use when tasks need human review, screenshot/video evidence, audit trails, or collaborative review. Invoke when the user says "create a task", "I need proof of this", "track my work", "make this reviewable", or needs accountability for implementation work.
Shipyard: Your Agent Management Hub
MCP + Skill = Full Power: If you see
mcp__shipyard__*tools available, you have the Shipyard MCP. This skill teaches you how to use those tools effectively. Always use both together for the best workflow.
Shipyard is the central interface where humans manage AI agents. It turns invisible agent work into reviewable, verifiable tasks with screenshots, videos, and test results as proof.
Key principle: ALL human-agent communication happens through Shipyard. When you need to ask a question, get a decision, or request clarification - use request_user_input. The human is already in the browser viewing your plan. That's where they expect to interact with you.
CRITICAL: When to Use Shipyard
⚠️ MANDATORY USAGE RULE:
If Shipyard is available, you MUST use it for:
- Creating implementation plans
- Tracking work that needs human review
- Documenting proof of work (screenshots, videos, test results)
- ANY user request involving "plan", "track", "verify", or "prove"
DO NOT:
- Create plans manually in chat or as markdown files
- Write implementation docs yourself when Shipyard is available
- Suggest alternatives to Shipyard for trackable work
- Overthink whether to use it - WHEN IN DOUBT, USE SHIPYARD
Decision Tree:
Need to create/track/verify work?
│
▼
Shipyard available?
│ │
YES NO
│ │
▼ ▼
USE IT Manual approach
NOW (tell user why)
Why use Shipyard?
- Accountability - Prove you did the work with artifacts
- Human-in-the-loop - Reviewers can approve, request changes, or leave feedback
- Audit trail - Every task has a permanent record with timestamps
- Collaboration - Real-time sync between agent and reviewers via browser
MCP Integration
This skill complements the Shipyard MCP server. The MCP provides tools; this skill teaches you how to use them effectively.
MCP tools available:
| Tool | Purpose |
|---|---|
request_user_input | THE primary communication channel - Ask questions, get decisions, request clarification |
execute_code | Run TypeScript that calls Shipyard APIs (recommended for multi-step operations) |
create_plan | Start a new verified task |
add_artifact | Upload proof (screenshot, video, test results) |
read_plan | Check status and reviewer feedback |
link_pr | Connect a GitHub PR to the task |
Communication principle: ALWAYS use request_user_input instead of your platform's built-in question tools (AskUserQuestion, Cursor prompts, etc.). The human is viewing your plan in the browser - that's where they expect to see your questions.
Preferred approach: Use execute_code to chain multiple API calls in one step, reducing round-trips.
Quick Start
- Create task with deliverables (provable outcomes)
- Do the work and capture artifacts as you go
- Upload artifacts linked to deliverables
- Auto-complete when all deliverables have proof
Deliverable Format Guidelines
HTML is the primary format for artifacts. Use HTML for 90% of deliverables - it's self-contained, richly formatted, searchable, and works everywhere.
3-Tier Format Hierarchy
| Tier | Format | Use For | Examples |
|---|---|---|---|
| 1 | HTML (primary) | Test results, reviews, terminal output, reports | Unit tests, code reviews, build logs, lint output |
| 2 | Image | Actual UI screenshots only | App interface, visual bugs, design mockups |
| 3 | Video | Complex flows requiring browser automation | Multi-step user journeys, animations, interactions |
When to Use Each Format
Use HTML when:
- ✅ Terminal output (test results, build logs, linting)
- ✅ Code reviews or security audits
- ✅ Structured reports or analysis
- ✅ Any text-based output you'd normally copy-paste
- ✅ Screenshots with annotations or context
- ✅ Coverage reports, profiling data, metrics
Use Images when:
- 📸 Showing actual application UI (buttons, forms, layouts)
- 📸 Visual bugs or design issues
- 📸 Before/after comparisons
- 📸 Design mockups or prototypes
Use Video when:
- 🎥 Demonstrating multi-step user flows
- 🎥 Showing animations or transitions
- 🎥 Browser automation proof (Playwright/Puppeteer)
- 🎥 Complex interactions that images can't capture
Decision Tree
Is this terminal/CLI output? ──► YES ──► HTML (dark terminal theme)
│
NO
│
Is this a code review/audit? ──► YES ──► HTML (light professional theme)
│
NO
│
Is this test/coverage data? ──► YES ──► HTML (syntax-highlighted)
│
NO
│
Does it require browser automation? ──► YES ──► Video
│
NO
│
Is it showing actual UI? ──► YES ──► Screenshot (Image)
HTML Examples
Test Results:
const html = `<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<style>
body {
font-family: 'SF Mono', Monaco, monospace;
background: #1e1e1e;
color: #d4d4d4;
padding: 20px;
}
.pass { color: #22c55e; }
.pass::before { content: "✔ "; }
</style>
</head>
<body>
<h1>Test Results - PASS</h1>
<div class="test-case">
<span class="pass">validates email addresses</span>
</div>
</body>
</html>`;
await addArtifact({
planId,
sessionToken,
type: 'test_results',
filename: 'test-results.html',
source: 'base64',
content: Buffer.from(html).toString('base64'),
deliverableId: deliverables[0].id
});
Code Review:
const review = `<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<style>
body {
font-family: -apple-system, sans-serif;
max-width: 1000px;
margin: 0 auto;
padding: 40px;
background: #ffffff;
}
.verdict.pass {
background: #d1fae5;
border: 2px solid #10b981;
padding: 20px;
}
.issue.critical {
border-left: 4px solid #dc2626;
background: #fef2f2;
padding: 16px;
margin: 16px 0;
}
</style>
</head>
<body>
<h1>Code Review: Authentication Module</h1>
<div class="verdict pass">✓ APPROVED</div>
<!-- Risk tables, findings, recommendations -->
</body>
</html>`;
See examples/html-artifacts.md for complete working templates.
Base64 Image Embedding
Embed screenshots directly in HTML for self-contained artifacts:
import { readFileSync } from 'node:fs';
const imageBuffer = readFileSync('/tmp/screenshot.png');
const base64Image = imageBuffer.toString('base64');
const html = `<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<style>
.screenshot {
border: 1px solid #e5e7eb;
border-radius: 8px;
overflow: hidden;
}
.screenshot img { width: 100%; display: block; }
</style>
</head>
<body>
<h1>Login Page Implementation</h1>
<div class="screenshot">
<img src="data:image/png;base64,${base64Image}"
alt="Login page with validation">
</div>
</body>
</html>`;
await addArtifact({
planId,
sessionToken,
type: 'screenshot',
filename: 'login-demo.html',
source: 'base64',
content: Buffer.from(html).toString('base64'),
deliverableId: deliverables[0].id
});
Why HTML is Primary
- Self-contained - Inline CSS, no external dependencies
- Rich formatting - Colors, structure, syntax highlighting
- Searchable - Text content is indexable
- Universal - Works in any browser
- Version control friendly - Text diffs work
- Portable - Single file, no special viewers needed
HTML Best Practices
- ✅ Inline all CSS in
<style>tags - ✅ Embed images as base64 data URIs
- ✅ Use semantic HTML (h1, h2, table, etc.)
- ✅ Include proper
<meta charset="UTF-8"> - ✅ Keep files under 5MB for fast loading
- ❌ Never link external stylesheets or scripts
- ❌ Don't use CDNs or remote resources
- ❌ Avoid JavaScript (static HTML only)
Deliverables: Provable Outcomes
Deliverables are outcomes you prove with artifacts. Mark them with {#deliverable}.
Good (provable):
- Screenshot of working login page
- Video showing drag-and-drop feature
- Test results showing 100% pass rate
Bad (not provable):
- Implement authentication (too vague)
- Refactor code (no artifact)
- Add error handling (internal)
Workflow Example
// Step 1: Create task with deliverables
const plan = await createPlan({
title: "Add user profile page",
content: `
## Deliverables
- [ ] Screenshot of profile page with avatar {#deliverable}
- [ ] Screenshot of edit form validation {#deliverable}
## Implementation
1. Create /profile route
2. Add avatar upload component
3. Build edit form with validation
`
});
const { planId, sessionToken, deliverables, monitoringScript } = plan;
// deliverables = [{ id: "del_xxx", text: "Screenshot of profile page with avatar" }, ...]
// monitoringScript = bash script to poll for approval (for non-hook agents)
// For non-hook agents (Cursor, Devin, etc.): Run the monitoring script in background
// to wait for human approval before proceeding:
// bash <(echo "$monitoringScript") &
// Step 2: Implement the feature (your actual work happens here)
// Step 3: Upload proof
await addArtifact({
planId,
sessionToken,
type: 'screenshot',
filename: 'profile-page.png',
source: 'file',
filePath: '/tmp/screenshots/profile.png',
deliverableId: deliverables[0].id
});
const result = await addArtifact({
planId,
sessionToken,
type: 'screenshot',
filename: 'validation-errors.png',
source: 'file',
filePath: '/tmp/screenshots/validation.png',
deliverableId: deliverables[1].id
});
// Step 4: Auto-complete triggers when all deliverables have artifacts
if (result.allDeliverablesComplete) {
return { done: true, proof: result.snapshotUrl };
}
Human-Agent Communication
request_user_input is THE primary way to talk to humans during active work.
The human is already in the browser viewing your plan. When you need to ask a question, get a decision, or request clarification - that's where they expect to see it. Don't scatter conversations across different interfaces.
Why Use request_user_input
- Context: The human sees your question alongside the plan, artifacts, and comments
- History: All exchanges are logged in the plan's activity feed
- Continuity: The conversation stays attached to the work
- Flexibility: 8 input types, multi-question forms, "Other" escape hatch
Replace Platform Tools
| Platform | DON'T Use | Use Instead |
|---|---|---|
| Claude Code | AskUserQuestion | request_user_input |
| Cursor | Built-in prompts | request_user_input |
| Windsurf | Native dialogs | request_user_input |
| Claude Desktop | Chat questions | request_user_input |
Example
const result = await requestUserInput({
message: "Which database should we use?",
type: "choice",
options: ["PostgreSQL", "SQLite", "MongoDB"],
timeout: 600 // 10 minutes
});
if (result.success) {
console.log("User chose:", result.response);
}
Note: The MCP tool is named request_user_input (snake_case). Inside execute_code, it's available as requestUserInput() (camelCase).
Input Types (8 total)
| Type | Use For | Example |
|---|---|---|
text | Single-line input | API keys, names |
multiline | Multi-line text | Bug descriptions |
choice | Select from options | Framework choice (auto-adds "Other") |
confirm | Yes/No decisions | Deploy to production? |
number | Numeric input | Port number (with min/max) |
email | Email validation | Contact address |
date | Date picker | Deadline (with range) |
rating | Scale rating | Rate approach 1-5 |
Multi-Question Forms
Ask multiple questions at once:
const result = await requestUserInput({
questions: [
{ message: "Project name?", type: "text" },
{ message: "Framework?", type: "choice", options: ["React", "Vue", "Angular"] },
{ message: "Include TypeScript?", type: "confirm" }
],
timeout: 600
});
Handling Reviewer Feedback
Check for comments and change requests:
const status = await readPlan(planId, sessionToken, {
includeAnnotations: true
});
if (status.status === "changes_requested") {
// Read status.content for inline comments
// Make changes, upload new artifacts
}
Artifact Types
| Type | Use For | Examples |
|---|---|---|
screenshot | UI changes, visual proof | .png, .jpg |
video | Complex flows, interactions | .mp4, .webm |
test_results | Test output, coverage | .json, .txt |
diff | Code changes | .diff, .patch |
Video Recording
Video recording uses the Playwriter MCP for browser capture and Shipyard for uploading proof-of-work artifacts. This is ideal for demonstrating complex user interactions, multi-step flows, or animated UI behavior.
Workflow (4 steps):
- Start recording - Playwriter begins capturing browser frames via CDP
- Perform interactions - Execute the actions you want to demonstrate
- Stop capture - Playwriter stops CDP screencast, saves frames to disk
- Encode and upload - Shipyard's bundled FFmpeg encodes frames to MP4, then uploads via
addArtifact
Configuration:
| Option | Range | Default | Description |
|---|---|---|---|
fps | 4-8 | 6 | Frames per second (lower = smaller file) |
quality | 60-90 | 80 | JPEG quality (higher = better quality, larger file) |
Note: FFmpeg is bundled with Shipyard (via @ffmpeg-installer, auto-downloaded on pnpm install). No manual installation required.
See examples/video-recording.md for complete code examples.
Tips
- Plan deliverables first - Decide what proves success before coding
- Capture during work - Take screenshots as you implement, not after
- Be specific - "Login page with error state" beats "Screenshot"
- Link every artifact - Always set
deliverableIdfor auto-completion - Check feedback - Poll
readPlanwhen awaiting review
When NOT to Use
- Quick answers or research (no artifacts to capture)
- Internal refactoring with no visible output
- Tasks where proof adds no value
- Exploration or debugging sessions
Troubleshooting
Browser doesn't open: Check MCP server is running and SHIPYARD_WEB_URL is set.
Upload fails: Verify file path exists, check GITHUB_TOKEN has repo write access.
No auto-complete: Ensure every deliverable has an artifact with matching deliverableId.
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon


