Back to list
Who-Visions

computeruse

by Who-Visions

🌙 AI-powered creative intelligence system for worldbuilding, content generation & web research | Powered by Gemini 3 & Vertex AI | FastAPI server with smart routing, memory systems & Notion integration

1🍴 0📅 Jan 24, 2026

SKILL.md


name: computeruse version: 1.0.0 description: Gemini Computer Use - Browser automation with AI vision

Computer Use Skill (Gemini Browser Automation)

Enable Rhea to see and control a browser using Gemini 2.5 Computer Use model. The AI analyzes screenshots and generates mouse/keyboard actions.

Features

  • Visual Understanding: AI sees the screen via screenshots
  • Browser Control: Click, type, scroll, navigate
  • Multi-step Tasks: Chains actions to complete complex goals
  • Safety Checks: Built-in confirmation for risky actions

Capabilities

ActionDescription
run_taskExecute a multi-step browser task
click_atClick at coordinates
type_text_atType text at coordinates
navigateGo to a URL
scrollScroll the page
take_screenshotCapture current state

Requirements

pip install google-genai playwright
playwright install chromium

Usage Examples

Run a Web Research Task

from rhea_noir.skills.computeruse.actions import skill as cu

result = cu.run_task(
    goal="Search for 'best AI frameworks 2026' on Google and list the top 3 results",
    start_url="https://www.google.com",
    max_steps=10
)
print(result["final_answer"])

Automate Form Filling

result = cu.run_task(
    goal="Fill out the contact form with name 'Dave', email 'dave@example.com', and submit",
    start_url="https://example.com/contact"
)

Web Scraping with Context

result = cu.run_task(
    goal="Go to Amazon and find the price of Sony WH-1000XM5 headphones",
    start_url="https://www.amazon.com"
)

Safety Features

The model includes built-in safety checks:

  • require_confirmation: User must approve risky actions
  • Excluded actions: Block specific UI actions if needed
result = cu.run_task(
    goal="...",
    excluded_actions=["drag_and_drop"],
    require_human_confirmation=True
)

Supported UI Actions

ActionDescription
open_web_browserOpen browser
navigateGo to URL
click_atClick at (x, y)
type_text_atType text at (x, y)
scroll_documentScroll up/down/left/right
scroll_atScroll at specific location
hover_atHover mouse at (x, y)
key_combinationPress keys (e.g., "Control+C")
go_backBrowser back
go_forwardBrowser forward
wait_5_secondsWait for page load
drag_and_dropDrag element

Model

Uses gemini-2.5-computer-use-preview-10-2025 - specialized for browser control.

[!CAUTION] Computer Use is a Preview feature. Supervise closely for important tasks.

Score

Total Score

65/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

0/10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon