Back to list
METR

view-results

by METR

Running UK AISI's Inspect in the Cloud

12🍴 5📅 Jan 23, 2026

SKILL.md


name: view-results description: View and analyze Hawk evaluation results. Use when the user wants to see eval-set results, check evaluation status, list samples, view transcripts, or analyze agent behavior from a completed evaluation run.

View Hawk Eval Results

When the user wants to analyze evaluation results, use these hawk CLI commands:

1. List Eval Sets

You can list all eval sets if the user do not know the eval set ID:

hawk list eval-sets

Shows: eval set ID, creation date, creator.

You can increase the limit of results returned by --limit N.

hawk list eval-sets --limit 50

Or you can search for a specific eval set by using --search QUERY.

hawk list eval-sets --search pico

2. List Evaluations

With an eval set ID, you can list all evaluations in the eval-set:

hawk list evals [EVAL_SET_ID]

Shows: task name, model, status (success/error/cancelled), and sample counts.

3. List Samples

Or you can list individual samples and their scores:

hawk list samples [EVAL_SET_ID] [--eval FILE] [--limit N]

4. Download Transcript

To get the full conversation for a specific sample:

hawk transcript <UUID>

The transcript includes full conversation with tool calls, scores, and metadata.

To get even more details, you can get the raw data by using --raw:

hawk transcript <UUID> --raw

Batch Transcript Download

You can also download all transcripts for an entire eval set:

# Fetch all samples in an eval set
hawk transcripts <EVAL_SET_ID>

# Write to individual files in a directory
hawk transcripts <EVAL_SET_ID> --output-dir ./transcripts

# Limit number of samples
hawk transcripts <EVAL_SET_ID> --limit 10

# Raw JSON output (one JSON per line to stdout, or .json files with --output-dir)
hawk transcripts <EVAL_SET_ID> --raw

Workflow

  1. Run hawk list eval-sets to see available eval sets 2a. Run hawk list evals <EVAL_SET_ID> to see available evaluations 2b. or run hawk list samples <EVAL_SET_ID> to find samples of interest 3a. Run hawk transcript <uuid> to get full details on a single sample 3b. or run hawk transcripts <eval_set_id> --output-dir ./transcripts to download all
  2. Read and analyze the transcript(s) to understand the agent's behavior

API Environments

Production (https://api.inspect-ai.internal.metr.org) is used by default. Set HAWK_API_URL only when targeting non-production environments:

EnvironmentURL
Staginghttps://api.inspect-ai.staging.metr-dev.org
Dev1https://api.inspect-ai.dev1.staging.metr-dev.org
Dev2https://api.inspect-ai.dev2.staging.metr-dev.org
Dev3https://api.inspect-ai.dev3.staging.metr-dev.org
Dev4https://api.inspect-ai.dev4.staging.metr-dev.org

Example:

HAWK_API_URL=https://api.inspect-ai.staging.metr-dev.org hawk list eval_sets

Score

Total Score

60/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

0/10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

0/5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon