view-results

Name: view-results
Rating: 60
Author: METR

by METR

Running UK AISI's Inspect in the Cloud

⭐ 12🍴 5📅 Jan 23, 2026

ai ai-evaluation elicitation evals inspect

View on GitHub Run in Manus

SKILL.md

name: view-results description: View and analyze Hawk evaluation results. Use when the user wants to see eval-set results, check evaluation status, list samples, view transcripts, or analyze agent behavior from a completed evaluation run.

View Hawk Eval Results

When the user wants to analyze evaluation results, use these hawk CLI commands:

1. List Eval Sets

You can list all eval sets if the user do not know the eval set ID:

hawk list eval-sets

Shows: eval set ID, creation date, creator.

You can increase the limit of results returned by --limit N.

hawk list eval-sets --limit 50

Or you can search for a specific eval set by using --search QUERY.

hawk list eval-sets --search pico

2. List Evaluations

With an eval set ID, you can list all evaluations in the eval-set:

hawk list evals [EVAL_SET_ID]

Shows: task name, model, status (success/error/cancelled), and sample counts.

3. List Samples

Or you can list individual samples and their scores:

hawk list samples [EVAL_SET_ID] [--eval FILE] [--limit N]

4. Download Transcript

To get the full conversation for a specific sample:

hawk transcript <UUID>

The transcript includes full conversation with tool calls, scores, and metadata.

To get even more details, you can get the raw data by using --raw:

hawk transcript <UUID> --raw

Batch Transcript Download

You can also download all transcripts for an entire eval set:

# Fetch all samples in an eval set
hawk transcripts <EVAL_SET_ID>

# Write to individual files in a directory
hawk transcripts <EVAL_SET_ID> --output-dir ./transcripts

# Limit number of samples
hawk transcripts <EVAL_SET_ID> --limit 10

# Raw JSON output (one JSON per line to stdout, or .json files with --output-dir)
hawk transcripts <EVAL_SET_ID> --raw

Workflow

Run hawk list eval-sets to see available eval sets 2a. Run hawk list evals <EVAL_SET_ID> to see available evaluations 2b. or run hawk list samples <EVAL_SET_ID> to find samples of interest 3a. Run hawk transcript <uuid> to get full details on a single sample 3b. or run hawk transcripts <eval_set_id> --output-dir ./transcripts to download all
Read and analyze the transcript(s) to understand the agent's behavior

API Environments

Production (https://api.inspect-ai.internal.metr.org) is used by default. Set HAWK_API_URL only when targeting non-production environments:

Environment	URL
Staging	`https://api.inspect-ai.staging.metr-dev.org`
Dev1	`https://api.inspect-ai.dev1.staging.metr-dev.org`
Dev2	`https://api.inspect-ai.dev2.staging.metr-dev.org`
Dev3	`https://api.inspect-ai.dev3.staging.metr-dev.org`
Dev4	`https://api.inspect-ai.dev4.staging.metr-dev.org`

Example:

HAWK_API_URL=https://api.inspect-ai.staging.metr-dev.org hawk list eval_sets

Score

Total Score

60/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

○説明文

100文字以上の説明がある

0/10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

○フォーク

10回以上フォークされている

0/5

○Issue管理

オープンIssueが50未満

0/5

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

view-results

SKILL.md

name: view-results description: View and analyze Hawk evaluation results. Use when the user wants to see eval-set results, check evaluation status, list samples, view transcripts, or analyze agent behavior from a completed evaluation run.

View Hawk Eval Results

1. List Eval Sets

2. List Evaluations

3. List Samples

4. Download Transcript

Batch Transcript Download

Workflow

API Environments

Score

Reviews

create-pr

prompt-lookup

skill-lookup

orpc-contract-first

component-refactoring

web-design-guidelines

view-results

SKILL.md

name: view-results description: View and analyze Hawk evaluation results. Use when the user wants to see eval-set results, check evaluation status, list samples, view transcripts, or analyze agent behavior from a completed evaluation run.

View Hawk Eval Results

1. List Eval Sets

2. List Evaluations

3. List Samples

4. Download Transcript

Batch Transcript Download

Workflow

API Environments

Score

Reviews

Related

Related Skills

create-pr

prompt-lookup

skill-lookup

orpc-contract-first

component-refactoring

web-design-guidelines