
capture-screenshots
by maxberko
Automate product documentation with Claude Code - screenshots, docs, and announcements
SKILL.md
name: capture-screenshots description: Capture product screenshots using Claude's Computer Use API. Provides visual authentication (no session expiration!), intelligent content verification, and reliable screenshot capture. Screenshots are saved to configured directory for later upload to Pylon CDN.
Capture Screenshots
Automate screenshot capture for product features using Claude's Computer Use API.
Purpose
Capture consistent, high-quality screenshots of product features for use in documentation. Computer Use provides:
- Visual Authentication: Claude logs in by seeing the screen, eliminating session expiration issues
- Intelligent Content Verification: Claude waits for pages to fully load and verifies content is visible
- Self-Adapting: No CSS selectors to maintain - Claude finds elements visually
- Reliable: 95%+ success rate vs 60-70% with traditional automation
All screenshots are captured at a consistent viewport size (configured in config.yaml).
Prerequisites
- Computer Use configured: Anthropic API key and product credentials in
.env - Product is accessible: The application must be running and accessible at configured URL
- Dependencies installed:
pip install anthropic pillow pyautogui - macOS only: Accessibility permissions granted for pyautogui
See Computer Use Setup Guide for detailed configuration.
Input
The user will provide:
- Feature name: e.g., "Dashboards", "Workflow Automation", "Integration Hub"
- Category: Which documentation category (features, integrations, getting-started)
- URLs/pages to capture: Specific pages or views to screenshot
- Optional context: Specific elements to capture, interactions needed
Process
Step 1: Research Feature UI
Use the Task tool to explore the codebase and understand:
-
Routes and URLs: Where is this feature accessed?
- Look for route definitions, page components
- Check navigation menus and links
- Find the base URL path
-
UI Structure: What does the interface look like?
- Main views and layouts
- Key components and sections
- Interactive elements (buttons, forms, modals)
-
Visual Landmarks (for Computer Use):
- Page titles and headers
- Distinctive UI elements
- Section labels
- Button text
Note: Computer Use doesn't require CSS selectors. Claude navigates visually, so focus on understanding what the user will see, not DOM structure.
Example exploration:
Task: Explore
Find all routes and components for the [feature name] feature.
Look for page components, route definitions, and main UI views.
Step 2: Plan Screenshot Capture
Based on your research, create a screenshot plan:
screenshot_plan = [
{
'name': '[feature]-overview',
'url': '/[feature-path]',
'wait_for': '.[main-container-class]', # CSS selector (will be converted to visual description)
'wait_time': 2000 # Additional wait time in milliseconds
},
{
'name': '[feature]-detail',
'url': '/[feature-path]/detail',
'selector': '.[specific-section]', # Optional: capture specific element
'wait_for': '.[section-class]',
'wait_time': 1500
}
]
Note: With Computer Use, CSS selectors like .main-container-class are automatically converted to visual descriptions like "element with class 'main-container-class'". Claude then finds the element visually on screen.
Naming convention:
- Use kebab-case:
feature-name-view.png - Be descriptive:
dashboards-overview.png,dashboards-metrics-panel.png - Keep it consistent:
[feature]-[view].png
Step 3: Create Capture Script
Create a Python script in scripts/screenshot/ for this feature:
#!/usr/bin/env python3
"""
Capture screenshots for [Feature Name]
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
from screenshot.factory import create_capturer_from_plan # Uses Computer Use by default
import config as cfg
def capture_[feature]_screenshots():
base_url = cfg.get_product_url()
screenshot_plan = [
# Add your screenshot plan here
]
print(f"📸 Capturing {len(screenshot_plan)} screenshots for [Feature Name]...")
create_capturer_from_plan(screenshot_plan, base_url) # Uses Computer Use automatically
print("✅ Screenshots captured successfully!")
if __name__ == '__main__':
capture_[feature]_screenshots()
Step 4: Execute Screenshot Capture
Run the capture script:
cd /path/to/max-doc-ai
python3 scripts/screenshot/[feature]_capture.py
What happens (with Computer Use):
- Visual authentication: Claude logs in by seeing and interacting with the login page
- Intelligent navigation: Claude navigates to each URL and waits for content to load
- Content verification: Claude verifies the correct content is visible before capturing
- Screenshot capture: High-quality screenshots are saved to the configured output directory
- No session expiration: Fresh authentication each time ensures reliability
Expected duration:
- First capture (with auth): 30-60 seconds
- Additional captures: 5-10 seconds each
Cost: ~$0.02 per screenshot with Claude Sonnet 4.5
Step 5: Verify Screenshots
Check the output directory (from config.yaml):
ls -lh output/screenshots/
Verify:
- ✅ All expected screenshots are present
- ✅ File sizes are reasonable (not too small = failed capture)
- ✅ Filenames follow naming convention
- ✅ Screenshots show the correct content (open and review visually)
Step 6: Document Screenshot Metadata
Create a summary of captured screenshots:
## Screenshots Captured for [Feature Name]
**Total:** [X] screenshots
**Location:** `output/screenshots/`
### Screenshot List:
1. **[feature]-overview.png**
- Description: Main overview of [feature]
- Size: [width]x[height]
- Shows: [what is visible]
2. **[feature]-detail.png**
- Description: Detailed view of [specific section]
- Size: [width]x[height]
- Shows: [what is visible]
[... continue for all screenshots ...]
### Next Steps:
1. Upload screenshots to Pylon CDN:
```bash
python3 scripts/pylon/upload.py --image output/screenshots/[feature]-overview.png
-
Use CloudFront URLs in documentation
-
Sync documentation to Pylon knowledge base
## Configuration
Screenshots are configured in `config.yaml`:
```yaml
screenshots:
viewport_width: 1280 # Display width (≤1280 recommended)
viewport_height: 800 # Display height (≤800 recommended)
format: "png"
quality: 90
model: "claude-sonnet-4-5"
max_iterations: 50
auth:
enabled: true
type: "sso" # or "username_password"
login_url: "${PRODUCT_URL}/login"
username: "${SCREENSHOT_USER}"
password: "${SCREENSHOT_PASS}"
sso_provider: "google"
output:
screenshots_dir: "./output/screenshots"
Viewport Size: Keep ≤1280x800 for optimal Computer Use coordinate accuracy. Consistent dimensions ensure professional-looking documentation.
Troubleshooting
Authentication Failures
Symptom: Screenshots show login page or authentication errors
Solution:
- Verify credentials in
.envare correct:SCREENSHOT_USERandSCREENSHOT_PASS - Check login URL is correct in
config.yaml - Ensure SSO provider is configured correctly
- Try increasing
max_iterationsin config (allows more time for auth) - Check Computer Use agent output for specific error messages
With Computer Use, session expiration is no longer an issue! Each capture session performs fresh authentication.
Page Not Loading
Symptom: Screenshots are blank or show loading state
Solution:
- Increase
wait_timein screenshot plan - Computer Use naturally waits for content - if still failing, the page may have issues
- Check if URL is correct
- Verify product is accessible and running
- Review Computer Use agent output to see what Claude saw
Content Not Captured Correctly
Symptom: Wrong element captured or content missing
Solution:
- CSS selectors are converted to visual descriptions - Claude finds elements by appearance
- Make selectors more descriptive (
.dashboard-headerbetter than.container-1) - Provide visual context in comments for complex elements
- Consider using full-page screenshots if specific element capture fails
Screenshots Too Small/Large
Symptom: Screenshot dimensions are wrong
Solution:
- Check viewport settings in config.yaml (must match display resolution)
- For macOS, ensure viewport matches your screen resolution or scaling factor
- For full-page screenshots, use
full_page=Truein screenshot plan - For specific elements, use
selectorparameter
Best Practices
- Consistent Naming: Use descriptive, kebab-case names
- Natural Wait Times: Computer Use waits intelligently - only add explicit
wait_timefor animations or slow transitions - Descriptive Selectors: Use meaningful class names that describe the element's purpose
- Visual Verification: Always review screenshots manually (Computer Use is reliable but not perfect)
- Clean State: Ensure pages are in clean, representative state (no dev tools, no personal data)
- Batch Captures: Capture all screenshots for a feature in one session to share authentication cost
- Monitor Costs: Track API usage at console.anthropic.com
- Accessibility: Include
alttext when uploading to Pylon
Advanced Techniques
Custom Interactions
For complex scenarios requiring user interactions:
from screenshot.factory import create_capturer
with create_capturer() as capturer:
capturer.navigate('https://app.example.com/feature')
# Click to open modal
capturer.click('.open-modal-button')
capturer.wait(1000)
# Capture modal
capturer.capture('feature-modal', selector='.modal-container')
# Scroll to specific section
capturer.scroll_to('.metrics-section')
capturer.wait(500)
# Capture after scroll
capturer.capture('feature-metrics')
Note: With Computer Use, these interactions are handled visually by Claude. The click() and scroll_to() methods translate to natural language prompts that Claude executes by seeing the screen.
Multiple States
Capture different states of the same view:
from screenshot.factory import create_capturer
with create_capturer() as capturer:
# Empty state
capturer.navigate('/dashboards?empty=true')
capturer.capture('dashboards-empty-state')
# With data
capturer.navigate('/dashboards')
capturer.capture('dashboards-with-data')
Note: Complex state manipulation (like triggering loading states) requires Claude to interact with the UI naturally. If you need specific states, consider using URL parameters or asking Claude to perform the necessary actions via prompts.
Output
After successful execution:
📸 Capturing 2 screenshots for [Feature Name]...
🌐 Starting Computer Use session...
Viewport: 1280x800
Model: claude-sonnet-4-5
🔐 Authenticating...
✅ Authentication complete
📍 Navigating to: https://app.example.com/feature
✅ Page loaded
📸 Capturing: feature-overview.png
✅ Saved: output/screenshots/feature-overview.png
📍 Navigating to: https://app.example.com/feature/detail
✅ Page loaded
📸 Capturing: feature-detail.png
✅ Saved: output/screenshots/feature-detail.png
✅ Session closed
✅ Screenshots captured successfully!
Integration with Release Workflow
This skill is typically invoked as the first step in the release workflow:
- ✅ capture-screenshots ← You are here
- Upload to Pylon CDN (sync-docs skill)
- Create documentation with screenshots (update-product-doc skill)
- Sync documentation (sync-docs skill)
- Create announcements (create-changelog skill)
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon
