
langfuse
by avivsinai
A Model Context Protocol (MCP) server for Langfuse, enabling AI agents to query Langfuse trace data for enhanced debugging and observability
SKILL.md
name: langfuse version: 1.0.1 description: Debug AI traces, find exceptions, analyze sessions, and manage prompts via Langfuse MCP. Also handles MCP setup and configuration. metadata: short-description: Langfuse observability via MCP compatibility: claude-code, codex-cli
Langfuse Skill
Debug your AI systems through Langfuse observability.
Triggers: langfuse, traces, debug AI, find exceptions, set up langfuse, what went wrong, why is it slow, datasets, evaluation sets
Setup
Step 1: Get credentials from https://cloud.langfuse.com → Settings → API Keys
If self-hosted, use your instance URL for LANGFUSE_HOST and create keys there.
Step 2: Install MCP (pick one):
# Claude Code (project-scoped, shared via .mcp.json)
claude mcp add \
--scope project \
--env LANGFUSE_PUBLIC_KEY=pk-... \
--env LANGFUSE_SECRET_KEY=sk-... \
--env LANGFUSE_HOST=https://cloud.langfuse.com \
langfuse -- uvx --python 3.11 langfuse-mcp
# Codex CLI (user-scoped, stored in ~/.codex/config.toml)
codex mcp add langfuse \
--env LANGFUSE_PUBLIC_KEY=pk-... \
--env LANGFUSE_SECRET_KEY=sk-... \
--env LANGFUSE_HOST=https://cloud.langfuse.com \
-- uvx --python 3.11 langfuse-mcp
Step 3: Restart CLI, verify with /mcp (Claude) or codex mcp list (Codex)
Step 4: Test: fetch_traces(age=60)
Read-Only Mode
For safer observability without risk of modifying prompts or datasets, enable read-only mode:
# CLI flag
langfuse-mcp --read-only
# Or environment variable
LANGFUSE_MCP_READ_ONLY=true
This disables write tools: create_text_prompt, create_chat_prompt, update_prompt_labels, create_dataset, create_dataset_item, delete_dataset_item.
For manual .mcp.json setup or troubleshooting, see references/setup.md.
Playbooks
"Where are the errors?"
find_exceptions(age=1440, group_by="file")
→ Shows error counts by file. Pick the worst offender.
find_exceptions_in_file(filepath="src/ai/chat.py", age=1440)
→ Lists specific exceptions. Grab a trace_id.
get_exception_details(trace_id="...")
→ Full stacktrace and context.
"What happened in this interaction?"
fetch_traces(age=60, user_id="...")
→ Find the trace. Note the trace_id.
If you don't know the user_id, start with:
fetch_traces(age=60)
fetch_trace(trace_id="...", include_observations=true)
→ See all LLM calls in the trace.
fetch_observation(observation_id="...")
→ Inspect a specific generation's input/output.
"Why is it slow?"
fetch_observations(age=60, type="GENERATION")
→ Find recent LLM calls. Look for high latency.
fetch_observation(observation_id="...")
→ Check token counts, model, timing.
"What's this user experiencing?"
get_user_sessions(user_id="...", age=1440)
→ List their sessions.
get_session_details(session_id="...")
→ See all traces in the session.
"Manage datasets"
list_datasets()
→ See all datasets.
get_dataset(name="evaluation-set-v1")
→ Get dataset details.
list_dataset_items(dataset_name="evaluation-set-v1", page=1, limit=10)
→ Browse items in the dataset.
create_dataset(name="qa-test-cases", description="QA evaluation set")
→ Create a new dataset.
create_dataset_item(
dataset_name="qa-test-cases",
input={"question": "What is 2+2?"},
expected_output={"answer": "4"}
)
→ Add test cases.
"Manage prompts"
list_prompts()
→ See all prompts with labels.
get_prompt(name="...", label="production")
→ Fetch current production version.
create_text_prompt(name="...", prompt="...", labels=["staging"])
→ Create new version in staging.
update_prompt_labels(name="...", version=N, labels=["production"])
→ Promote to production. (Rollback = re-apply label to older version)
Quick Reference
| Task | Tool |
|---|---|
| List traces | fetch_traces(age=N) |
| Get trace details | fetch_trace(trace_id="...", include_observations=true) |
| List LLM calls | fetch_observations(age=N, type="GENERATION") |
| Get observation | fetch_observation(observation_id="...") |
| Error count | get_error_count(age=N) |
| Find exceptions | find_exceptions(age=N, group_by="file") |
| List sessions | fetch_sessions(age=N) |
| User sessions | get_user_sessions(user_id="...", age=N) |
| List prompts | list_prompts() |
| Get prompt | get_prompt(name="...", label="production") |
| List datasets | list_datasets() |
| Get dataset | get_dataset(name="...") |
| List dataset items | list_dataset_items(dataset_name="...", limit=N) |
age = minutes to look back (max 10080 = 7 days)
References
references/tool-reference.md— Full parameter docs, filter semantics, response schemasreferences/setup.md— Manual setup, troubleshooting, advanced configuration
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon

