shipyard

Name: shipyard
Rating: 60
Author: SchoolAI

by SchoolAI

Verify AI agent work with collaborative review and proof-of-work artifacts

⭐ 0🍴 0📅 Jan 25, 2026

ai-agents claude-code code-review collaboration mcp proof-of-work task-management typescript

View on GitHub Run in Manus

SKILL.md

name: shipyard description: | Shipyard is your agent management hub - the central interface where humans manage AI agents. Plans, artifacts, feedback, and all human-agent communication happen here.

IMPORTANT: When Shipyard MCP tools are available (mcp__shipyard__* or mcp__plugin_shipyard_shipyard__*), ALWAYS load this skill first. The MCP provides tools; this skill teaches the workflow for using them effectively together.

Communication principle: When working in Shipyard, use request_user_input for ALL human interaction - not your platform's built-in question tools. The human is in the browser viewing your plan; that's where they expect to talk to you.

Use when tasks need human review, screenshot/video evidence, audit trails, or collaborative review. Invoke when the user says "create a task", "I need proof of this", "track my work", "make this reviewable", or needs accountability for implementation work.

Shipyard: Your Agent Management Hub

MCP + Skill = Full Power: If you see mcp__shipyard__* tools available, you have the Shipyard MCP. This skill teaches you how to use those tools effectively. Always use both together for the best workflow.

Shipyard is the central interface where humans manage AI agents. It turns invisible agent work into reviewable, verifiable tasks with screenshots, videos, and test results as proof.

Key principle: ALL human-agent communication happens through Shipyard. When you need to ask a question, get a decision, or request clarification - use request_user_input. The human is already in the browser viewing your plan. That's where they expect to interact with you.

CRITICAL: When to Use Shipyard

⚠️ MANDATORY USAGE RULE:

If Shipyard is available, you MUST use it for:

Creating implementation plans
Tracking work that needs human review
Documenting proof of work (screenshots, videos, test results)
ANY user request involving "plan", "track", "verify", or "prove"

DO NOT:

Create plans manually in chat or as markdown files
Write implementation docs yourself when Shipyard is available
Suggest alternatives to Shipyard for trackable work
Overthink whether to use it - WHEN IN DOUBT, USE SHIPYARD

Decision Tree:

Need to create/track/verify work?
        │
        ▼
  Shipyard available?
     │           │
    YES          NO
     │           │
     ▼           ▼
  USE IT     Manual approach
  NOW        (tell user why)

Why use Shipyard?

Accountability - Prove you did the work with artifacts
Human-in-the-loop - Reviewers can approve, request changes, or leave feedback
Audit trail - Every task has a permanent record with timestamps
Collaboration - Real-time sync between agent and reviewers via browser

MCP Integration

This skill complements the Shipyard MCP server. The MCP provides tools; this skill teaches you how to use them effectively.

MCP tools available:

Tool	Purpose
`request_user_input`	THE primary communication channel - Ask questions, get decisions, request clarification
`execute_code`	Run TypeScript that calls Shipyard APIs (recommended for multi-step operations)
`create_plan`	Start a new verified task
`add_artifact`	Upload proof (screenshot, video, test results)
`read_plan`	Check status and reviewer feedback
`link_pr`	Connect a GitHub PR to the task

Communication principle: ALWAYS use request_user_input instead of your platform's built-in question tools (AskUserQuestion, Cursor prompts, etc.). The human is viewing your plan in the browser - that's where they expect to see your questions.

Preferred approach: Use execute_code to chain multiple API calls in one step, reducing round-trips.

Quick Start

Create task with deliverables (provable outcomes)
Do the work and capture artifacts as you go
Upload artifacts linked to deliverables
Auto-complete when all deliverables have proof

Deliverable Format Guidelines

HTML is the primary format for artifacts. Use HTML for 90% of deliverables - it's self-contained, richly formatted, searchable, and works everywhere.

3-Tier Format Hierarchy

Tier	Format	Use For	Examples
1	HTML (primary)	Test results, reviews, terminal output, reports	Unit tests, code reviews, build logs, lint output
2	Image	Actual UI screenshots only	App interface, visual bugs, design mockups
3	Video	Complex flows requiring browser automation	Multi-step user journeys, animations, interactions

When to Use Each Format

Use HTML when:

✅ Terminal output (test results, build logs, linting)
✅ Code reviews or security audits
✅ Structured reports or analysis
✅ Any text-based output you'd normally copy-paste
✅ Screenshots with annotations or context
✅ Coverage reports, profiling data, metrics

Use Images when:

📸 Showing actual application UI (buttons, forms, layouts)
📸 Visual bugs or design issues
📸 Before/after comparisons
📸 Design mockups or prototypes

Use Video when:

🎥 Demonstrating multi-step user flows
🎥 Showing animations or transitions
🎥 Browser automation proof (Playwright/Puppeteer)
🎥 Complex interactions that images can't capture

Decision Tree

Is this terminal/CLI output? ──► YES ──► HTML (dark terminal theme)
  │
  NO
  │
Is this a code review/audit? ──► YES ──► HTML (light professional theme)
  │
  NO
  │
Is this test/coverage data? ──► YES ──► HTML (syntax-highlighted)
  │
  NO
  │
Does it require browser automation? ──► YES ──► Video
  │
  NO
  │
Is it showing actual UI? ──► YES ──► Screenshot (Image)

HTML Examples

Test Results:

const html = `<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <style>
    body {
      font-family: 'SF Mono', Monaco, monospace;
      background: #1e1e1e;
      color: #d4d4d4;
      padding: 20px;
    }
    .pass { color: #22c55e; }
    .pass::before { content: "✔ "; }
  </style>
</head>
<body>
  <h1>Test Results - PASS</h1>
  <div class="test-case">
    <span class="pass">validates email addresses</span>
  </div>
</body>
</html>`;

await addArtifact({
  planId,
  sessionToken,
  type: 'test_results',
  filename: 'test-results.html',
  source: 'base64',
  content: Buffer.from(html).toString('base64'),
  deliverableId: deliverables[0].id
});

Code Review:

const review = `<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <style>
    body {
      font-family: -apple-system, sans-serif;
      max-width: 1000px;
      margin: 0 auto;
      padding: 40px;
      background: #ffffff;
    }
    .verdict.pass {
      background: #d1fae5;
      border: 2px solid #10b981;
      padding: 20px;
    }
    .issue.critical {
      border-left: 4px solid #dc2626;
      background: #fef2f2;
      padding: 16px;
      margin: 16px 0;
    }
  </style>
</head>
<body>
  <h1>Code Review: Authentication Module</h1>
  <div class="verdict pass">✓ APPROVED</div>
  <!-- Risk tables, findings, recommendations -->
</body>
</html>`;

See examples/html-artifacts.md for complete working templates.

Base64 Image Embedding

Embed screenshots directly in HTML for self-contained artifacts:

import { readFileSync } from 'node:fs';

const imageBuffer = readFileSync('/tmp/screenshot.png');
const base64Image = imageBuffer.toString('base64');

const html = `<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <style>
    .screenshot {
      border: 1px solid #e5e7eb;
      border-radius: 8px;
      overflow: hidden;
    }
    .screenshot img { width: 100%; display: block; }
  </style>
</head>
<body>
  <h1>Login Page Implementation</h1>
  <div class="screenshot">
    <img src="data:image/png;base64,${base64Image}"
         alt="Login page with validation">
  </div>
</body>
</html>`;

await addArtifact({
  planId,
  sessionToken,
  type: 'screenshot',
  filename: 'login-demo.html',
  source: 'base64',
  content: Buffer.from(html).toString('base64'),
  deliverableId: deliverables[0].id
});

Why HTML is Primary

Self-contained - Inline CSS, no external dependencies
Rich formatting - Colors, structure, syntax highlighting
Searchable - Text content is indexable
Universal - Works in any browser
Version control friendly - Text diffs work
Portable - Single file, no special viewers needed

HTML Best Practices

✅ Inline all CSS in <style> tags
✅ Embed images as base64 data URIs
✅ Use semantic HTML (h1, h2, table, etc.)
✅ Include proper <meta charset="UTF-8">
✅ Keep files under 5MB for fast loading
❌ Never link external stylesheets or scripts
❌ Don't use CDNs or remote resources
❌ Avoid JavaScript (static HTML only)

Deliverables: Provable Outcomes

Deliverables are outcomes you prove with artifacts. Mark them with {#deliverable}.

Good (provable):

Screenshot of working login page
Video showing drag-and-drop feature
Test results showing 100% pass rate

Bad (not provable):

Implement authentication (too vague)
Refactor code (no artifact)
Add error handling (internal)

Workflow Example

// Step 1: Create task with deliverables
const plan = await createPlan({
  title: "Add user profile page",
  content: `
## Deliverables
- [ ] Screenshot of profile page with avatar {#deliverable}
- [ ] Screenshot of edit form validation {#deliverable}

## Implementation
1. Create /profile route
2. Add avatar upload component
3. Build edit form with validation
`
});

const { planId, sessionToken, deliverables, monitoringScript } = plan;
// deliverables = [{ id: "del_xxx", text: "Screenshot of profile page with avatar" }, ...]
// monitoringScript = bash script to poll for approval (for non-hook agents)

// For non-hook agents (Cursor, Devin, etc.): Run the monitoring script in background
// to wait for human approval before proceeding:
// bash <(echo "$monitoringScript") &

// Step 2: Implement the feature (your actual work happens here)

// Step 3: Upload proof
await addArtifact({
  planId,
  sessionToken,
  type: 'screenshot',
  filename: 'profile-page.png',
  source: 'file',
  filePath: '/tmp/screenshots/profile.png',
  deliverableId: deliverables[0].id
});

const result = await addArtifact({
  planId,
  sessionToken,
  type: 'screenshot',
  filename: 'validation-errors.png',
  source: 'file',
  filePath: '/tmp/screenshots/validation.png',
  deliverableId: deliverables[1].id
});

// Step 4: Auto-complete triggers when all deliverables have artifacts
if (result.allDeliverablesComplete) {
  return { done: true, proof: result.snapshotUrl };
}

Human-Agent Communication

request_user_input is THE primary way to talk to humans during active work.

The human is already in the browser viewing your plan. When you need to ask a question, get a decision, or request clarification - that's where they expect to see it. Don't scatter conversations across different interfaces.

Why Use request_user_input

Context: The human sees your question alongside the plan, artifacts, and comments
History: All exchanges are logged in the plan's activity feed
Continuity: The conversation stays attached to the work
Flexibility: 8 input types, multi-question forms, "Other" escape hatch

Replace Platform Tools

Platform	DON'T Use	Use Instead
Claude Code	`AskUserQuestion`	`request_user_input`
Cursor	Built-in prompts	`request_user_input`
Windsurf	Native dialogs	`request_user_input`
Claude Desktop	Chat questions	`request_user_input`

Example

const result = await requestUserInput({
  message: "Which database should we use?",
  type: "choice",
  options: ["PostgreSQL", "SQLite", "MongoDB"],
  timeout: 600  // 10 minutes
});

if (result.success) {
  console.log("User chose:", result.response);
}

Note: The MCP tool is named request_user_input (snake_case). Inside execute_code, it's available as requestUserInput() (camelCase).

Input Types (8 total)

Type	Use For	Example
`text`	Single-line input	API keys, names
`multiline`	Multi-line text	Bug descriptions
`choice`	Select from options	Framework choice (auto-adds "Other")
`confirm`	Yes/No decisions	Deploy to production?
`number`	Numeric input	Port number (with min/max)
`email`	Email validation	Contact address
`date`	Date picker	Deadline (with range)
`rating`	Scale rating	Rate approach 1-5

Multi-Question Forms

Ask multiple questions at once:

const result = await requestUserInput({
  questions: [
    { message: "Project name?", type: "text" },
    { message: "Framework?", type: "choice", options: ["React", "Vue", "Angular"] },
    { message: "Include TypeScript?", type: "confirm" }
  ],
  timeout: 600
});

Handling Reviewer Feedback

Check for comments and change requests:

const status = await readPlan(planId, sessionToken, {
  includeAnnotations: true
});

if (status.status === "changes_requested") {
  // Read status.content for inline comments
  // Make changes, upload new artifacts
}

Artifact Types

Type	Use For	Examples
`screenshot`	UI changes, visual proof	.png, .jpg
`video`	Complex flows, interactions	.mp4, .webm
`test_results`	Test output, coverage	.json, .txt
`diff`	Code changes	.diff, .patch

Video Recording

Video recording uses the Playwriter MCP for browser capture and Shipyard for uploading proof-of-work artifacts. This is ideal for demonstrating complex user interactions, multi-step flows, or animated UI behavior.

Workflow (4 steps):

Start recording - Playwriter begins capturing browser frames via CDP
Perform interactions - Execute the actions you want to demonstrate
Stop capture - Playwriter stops CDP screencast, saves frames to disk
Encode and upload - Shipyard's bundled FFmpeg encodes frames to MP4, then uploads via addArtifact

Configuration:

Option	Range	Default	Description
`fps`	4-8	6	Frames per second (lower = smaller file)
`quality`	60-90	80	JPEG quality (higher = better quality, larger file)

Note: FFmpeg is bundled with Shipyard (via @ffmpeg-installer, auto-downloaded on pnpm install). No manual installation required.

See examples/video-recording.md for complete code examples.

Tips

Plan deliverables first - Decide what proves success before coding
Capture during work - Take screenshots as you implement, not after
Be specific - "Login page with error state" beats "Screenshot"
Link every artifact - Always set deliverableId for auto-completion
Check feedback - Poll readPlan when awaiting review

When NOT to Use

Quick answers or research (no artifacts to capture)
Internal refactoring with no visible output
Tasks where proof adds no value
Exploration or debugging sessions

Troubleshooting

Browser doesn't open: Check MCP server is running and SHIPYARD_WEB_URL is set.

Upload fails: Verify file path exists, check GITHUB_TOKEN has repo write access.

No auto-complete: Ensure every deliverable has an artifact with matching deliverableId.

Score

Total Score

60/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

○説明文

100文字以上の説明がある

0/10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

○フォーク

10回以上フォークされている

0/5

○Issue管理

オープンIssueが50未満

0/5

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

shipyard

SKILL.md

Use when tasks need human review, screenshot/video evidence, audit trails, or collaborative review. Invoke when the user says "create a task", "I need proof of this", "track my work", "make this reviewable", or needs accountability for implementation work.

Shipyard: Your Agent Management Hub

CRITICAL: When to Use Shipyard

Why use Shipyard?

MCP Integration

Quick Start

Deliverable Format Guidelines

3-Tier Format Hierarchy

When to Use Each Format

Decision Tree

HTML Examples

Base64 Image Embedding

Why HTML is Primary

HTML Best Practices

Deliverables: Provable Outcomes

Workflow Example

Human-Agent Communication

Why Use request_user_input

Replace Platform Tools

Example

Input Types (8 total)

Multi-Question Forms

Handling Reviewer Feedback

Artifact Types

Video Recording

Tips

When NOT to Use

Troubleshooting

Score

Reviews

create-pr

prompt-lookup

skill-lookup

orpc-contract-first

component-refactoring

web-design-guidelines

shipyard

SKILL.md

Use when tasks need human review, screenshot/video evidence, audit trails, or collaborative review. Invoke when the user says "create a task", "I need proof of this", "track my work", "make this reviewable", or needs accountability for implementation work.

Shipyard: Your Agent Management Hub

CRITICAL: When to Use Shipyard

Why use Shipyard?

MCP Integration

Quick Start

Deliverable Format Guidelines

3-Tier Format Hierarchy

When to Use Each Format

Decision Tree

HTML Examples

Base64 Image Embedding

Why HTML is Primary

HTML Best Practices

Deliverables: Provable Outcomes

Workflow Example

Human-Agent Communication

Why Use request_user_input

Replace Platform Tools

Example

Input Types (8 total)

Multi-Question Forms

Handling Reviewer Feedback

Artifact Types

Video Recording

Tips

When NOT to Use

Troubleshooting

Score

Reviews

Related

Related Skills

create-pr

prompt-lookup

skill-lookup

orpc-contract-first

component-refactoring

web-design-guidelines