
stata-mcp
by tmonk
A lightweight Model Context Protocol (MCP) server for Stata. Execute commands, inspect data, retrieve stored results (r()/e()), and view graphs in your chat interface. Built for economists who want to integrate LLM assistance into their Stata workflow.
SKILL.md
name: stata-mcp description: Run or debug Stata workflows through the local io.github.tmonk/mcp-stata server. Use when users mention Stata commands, .do files, r()/e() results, dataset inspection, Stata graph exports, or data browsing with sorting/filtering.
Stata MCP Skill
Instructions
- Ensure the
stataMCP server is registered (see project README for config) and request it if not already active. - When the user asks for Stata work:
- Use
run_commandfor ad-hoc syntax (trace=Truefor call stacks,raw=Truefor plain output). - Use
load_databefore analyses that require datasets. - Use
get_data,describe,codebook, orget_variable_listto inspect data. - Use
run_do_filefor provided.doscripts. - Use
export_graph/export_graphs_allfor visualization requests. - Use
get_helpwhen the user wants Stata documentation. - Use
get_stored_resultsto returnr()/e()scalars/macros after commands for validation. - Use
read_logto tail or retrieve output from long-running commands. - Use
get_ui_channelto obtain a localhost HTTP endpoint for high-volume data browsing.
- Use
- Surface
rc/stderrinfo back to the user, referencingr()/e()codes. - If Stata isn't auto-discovered, remind the user to set
STATA_PATH(examples in README).
Tool quick reference
Command Execution
-
run_command(code, echo=True, as_json=True, trace=False, raw=False, max_output_lines=None): Run Stata syntax.code: The Stata command(s) to execute.echo: Include the command itself in output (default: True).as_json: Return JSON envelope with rc/stdout/stderr/error (default: True).trace: Enableset trace onfor deeper error diagnostics (default: False).raw: Return plain stdout/error message instead of JSON (default: False).max_output_lines: Truncate output to this many lines (default: None for no truncation).- Note: Always writes output to a temporary log file and emits a
notifications/logMessagewith{"event":"log_path","path":"..."}so the client can tail it locally.
-
run_do_file(path, echo=True, as_json=True, trace=False, raw=False, max_output_lines=None): Execute .do files.path: Path to the .do file.echo: Include commands in output (default: True).as_json: Return JSON envelope (default: True).trace: Enable trace mode for debugging (default: False).raw: Return plain output instead of JSON (default: False).max_output_lines: Truncate output to this many lines (default: None).- Note: Always writes output to a temporary log file and emits incremental
notifications/progresswhen the client provides a progress token/callback.
-
read_log(path, offset=0, max_bytes=65536): Read a slice of a previously-provided log file.path: Path to the log file (fromnotifications/logMessage).offset: Byte offset to start reading from (default: 0).max_bytes: Maximum bytes to read (default: 65536).- Returns JSON:
path,offset,next_offset,data.
Data Loading & Inspection
-
load_data(source, clear=True, as_json=True, raw=False, max_output_lines=None): Load data using sysuse/webuse/use heuristics.source: Dataset name, URL, or file path (e.g., "auto", "webuse nlsw88", "/path/to/file.dta").clear: Append, clearto replace existing data (default: True).as_json: Return JSON envelope (default: True).raw: Return plain output (default: False).max_output_lines: Truncate output to this many lines (default: None).- Note: After loading, use UI channel for advanced filtering/sorting at scale.
-
get_data(start=0, count=50): Retrieve a slice of the active dataset as JSON.start: Zero-based index of first observation (default: 0).count: Number of observations to retrieve (default: 50, max: 500).- Note: For advanced sorting/filtering at scale, use the UI channel endpoints (see
get_ui_channel()).
-
describe(): Return variable descriptions, storage types, and labels. -
get_variable_list(): Return JSON list of all variables with names, labels, and types. -
codebook(variable, as_json=True, trace=False, raw=False, max_output_lines=None): Return codebook/summary for a specific variable.variable: Variable name to describe.as_json: Return JSON envelope (default: True).trace: Enable trace mode (default: False).raw: Return plain output (default: False).max_output_lines: Truncate output to this many lines (default: None).
Graph Management
-
list_graphs(): List all graphs in Stata's memory with active graph marked.- Note: Graphs are automatically cached during command execution for instant exports.
-
export_graph(graph_name=None, format="pdf"): Export a stored graph to file.graph_name: Name of graph to export (fromlist_graphs); if None, exports active graph.format: Output format—"pdf" (default) or "png". Use "png" to view plots directly.
-
export_graphs_all(): Export all graphs in memory. Returns file paths.
Help & Results
-
get_help(topic, plain_text=False): Return Stata help text.topic: Command or help topic (e.g., "regress", "graph").plain_text: Return plain text instead of Markdown (default: False).
-
get_stored_results(): Return currentr()ande()results as JSON after a command.
UI Data Browser
get_ui_channel(): Return a short-lived localhost HTTP endpoint + bearer token for the UI-only data browser.- Returns JSON with
baseUrl,token,expiresAt, andcapabilities. - Intended for VS Code extension UI to browse data at high volume (paging, filtering, sorting) without sending large payloads over MCP.
- Loopback only (binds to
127.0.0.1), requires bearer auth. - Key endpoints (all require
Authorization: Bearer <token>header):GET /v1/dataset: Dataset identity and stateGET /v1/vars: Variable metadataPOST /v1/page: Page data with optional sorting (sortByparameter)POST /v1/arrow: Binary Arrow IPC streamPOST /v1/views: Create filtered viewPOST /v1/views/:viewId/page: Page within filtered view (supports sorting)POST /v1/views/:viewId/arrow: Arrow stream from filtered viewDELETE /v1/views/:viewId: Delete viewPOST /v1/filters/validate: Validate filter expression
- Sorting: Use
sortByarray in page requests (e.g.,["price"]for ascending,["-price"]for descending,["foreign", "-price"]for multi-level) - Filtering: Filter expressions use Python boolean operators (
==,!=,<,>,and,or); Stata-style&/|also accepted - Server limits: maxLimit=500, maxVars=32767, maxChars=500, maxRequestBytes=1000000, maxArrowLimit=1000000
- Dataset tracking:
datasetIdused for cache invalidation; changing dataset invalidates view handles
- Returns JSON with
Cancellation
- Clients may cancel an in-flight request by sending the MCP notification
notifications/cancelledwithparams.requestIdset to the original tool call ID. - Pass a
_meta.progressTokenwhen invoking the tool if you want progress updates (optional). - Cancellation is best-effort and depends on Stata surfacing
BreakError.
Error Reporting
- All tools executing Stata commands support JSON envelopes (
as_json=true) containing:rc: Return code from r()/c(rc)stdout: Standard outputstderr: Standard error (captures "red text")message: Error messageline: Line number (when Stata reports it)command: The command that was executedlog_path: Path to log file for streaming (when applicable)snippet: Excerpt of error output
- Stata-specific error codes (
r(XXX)) are parsed and preserved - Use
trace=trueto enableset trace onfor detailed program-defined error diagnostics - Set
MCP_STATA_LOGLEVELenvironment variable (e.g.,DEBUG,INFO) to control server logging
MCP Resources
The server exposes these resources for MCP clients:
stata://data/summary→summarizestata://data/metadata→describestata://graphs/list→ graph liststata://variables/list→ variable liststata://results/stored→ stored r()/e() results
Graph review workflow
- Call
list_graphs()to see available plots and identify the active graph. - Use
export_graphs_all()to fetch file paths for every graph; view them directly in the client. - For a single plot, call
export_graph(graph_name="GraphName", format="png")to get a viewable file. - Compare the rendered PNGs to the user spec (titles, axes labels, legends, colors, filters); state whether the graph matches and what to change.
Examples
Run a regression
# Load sample data and run regression
load_data("auto")
run_command("regress price mpg")
get_stored_results() # Retrieve coefficients and statistics
Export a histogram
# Create and export a graph
run_command("histogram price")
list_graphs() # Confirm graph exists
export_graph(graph_name="Graph", format="png") # Export for viewing
Debug a do-file
run_do_file("/path/to/analysis.do", trace=True)
Inspect data structure
load_data("nlsw88", clear=True)
describe()
get_variable_list()
codebook("wage")
get_data(start=0, count=10)
Read log output from long-running command
# After run_command emits a log_path notification
read_log("/tmp/stata_log_abc123.log", offset=0)
# Continue reading with next_offset for incremental output
read_log("/tmp/stata_log_abc123.log", offset=4096)
Advanced data browsing with sorting and filtering
# Get UI channel for high-volume data operations
get_ui_channel() # Returns baseUrl, token, expiresAt
# Example UI channel usage (requires HTTP client):
# POST {baseUrl}/v1/page with Authorization: Bearer {token}
# Body: {"datasetId":"...","offset":0,"limit":50,"vars":["price","mpg"],"sortBy":["-price"]}
# Create filtered view for price < 5000
# POST {baseUrl}/v1/views
# Body: {"datasetId":"...","frame":"default","filterExpr":"price < 5000"}
# Page through filtered view with sorting
# POST {baseUrl}/v1/views/{viewId}/page
# Body: {"offset":0,"limit":50,"vars":["price","mpg"],"sortBy":["-price"]}
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon

