llm-safety-patterns

Name: llm-safety-patterns
Rating: 75
Author: yonatangross

by yonatangross

The Complete AI Development Toolkit for Claude Code — 159 skills, 34 agents, 20 commands, 144 hooks. Production-ready patterns for FastAPI, React 19, LangGraph, security, and testing.

⭐ 29🍴 4📅 2026年1月23日

agents ai-development claude-code claude-plugin fastapi langgraph llm mcp

GitHubで見る Manusで実行

SKILL.md

name: llm-safety-patterns description: Security patterns for LLM integrations including prompt injection defense and hallucination prevention. Use when implementing context separation, validating LLM outputs, or protecting against prompt injection attacks. tags: [ai, safety, guardrails, security, llm] context: fork agent: security-auditor version: 1.0.0 author: OrchestKit user-invocable: false

LLM Safety Patterns

The Core Principle

Identifiers flow AROUND the LLM, not THROUGH it. The LLM sees only content. Attribution happens deterministically.

Why This Matters

When identifiers appear in prompts, bad things happen:

Hallucination: LLM invents IDs that don't exist
Confusion: LLM mixes up which ID belongs where
Injection: Attacker manipulates IDs via prompt injection
Leakage: IDs appear in logs, caches, traces
Cross-tenant: LLM could reference other users' data

The Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│   SYSTEM CONTEXT (flows around LLM)                                     │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │ user_id │ tenant_id │ analysis_id │ trace_id │ permissions     │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│        │                                                       │        │
│        │                                                       │        │
│        ▼                                                       ▼        │
│   ┌─────────┐                                           ┌─────────┐    │
│   │ PRE-LLM │       ┌─────────────────────┐            │POST-LLM │    │
│   │ FILTER  │──────▶│        LLM          │───────────▶│ATTRIBUTE│    │
│   │         │       │                     │            │         │    │
│   │ Returns │       │ Sees ONLY:          │            │ Adds:   │    │
│   │ CONTENT │       │ - content text      │            │ - IDs   │    │
│   │ (no IDs)│       │ - context text      │            │ - refs  │    │
│   └─────────┘       │ (NO IDs!)           │            └─────────┘    │
│                     └─────────────────────┘                            │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

What NEVER Goes in Prompts

OrchestKit Forbidden Parameters

Parameter	Type	Why Forbidden
`user_id`	UUID	Can be hallucinated, enables cross-user access
`tenant_id`	UUID	Critical for multi-tenant isolation
`analysis_id`	UUID	Job tracking, not for LLM
`document_id`	UUID	Source tracking, not for LLM
`artifact_id`	UUID	Output tracking, not for LLM
`chunk_id`	UUID	RAG reference, not for LLM
`session_id`	str	Auth context, not for LLM
`trace_id`	str	Observability, not for LLM
Any UUID	UUID	Pattern: `[0-9a-f]{8}-...`

Detection Pattern

import re

FORBIDDEN_PATTERNS = [
    r'user[_-]?id',
    r'tenant[_-]?id',
    r'analysis[_-]?id',
    r'document[_-]?id',
    r'artifact[_-]?id',
    r'chunk[_-]?id',
    r'session[_-]?id',
    r'trace[_-]?id',
    r'[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
]

def audit_prompt(prompt: str) -> list[str]:
    """Check for forbidden patterns in prompt"""
    violations = []
    for pattern in FORBIDDEN_PATTERNS:
        if re.search(pattern, prompt, re.IGNORECASE):
            violations.append(pattern)
    return violations

The Three-Phase Pattern

Phase 1: Pre-LLM (Filter & Extract)

async def prepare_for_llm(
    query: str,
    ctx: RequestContext,
) -> tuple[str, list[str], SourceRefs]:
    """
    Filter data and extract content for LLM.
    Returns: (content, context_texts, source_references)
    """
    # 1. Retrieve with tenant filter
    documents = await semantic_search(
        query_embedding=embed(query),
        ctx=ctx,  # Filters by tenant_id, user_id
    )

    # 2. Save references for attribution
    source_refs = SourceRefs(
        document_ids=[d.id for d in documents],
        chunk_ids=[c.id for c in chunks],
    )

    # 3. Extract content only (no IDs)
    content_texts = [d.content for d in documents]

    return query, content_texts, source_refs

Phase 2: LLM Call (Content Only)

def build_prompt(content: str, context_texts: list[str]) -> str:
    """
    Build prompt with ONLY content, no identifiers.
    """
    prompt = f"""
    Analyze the following content and provide insights.

    CONTENT:
    {content}

    RELEVANT CONTEXT:
    {chr(10).join(f"- {text}" for text in context_texts)}

    Provide analysis covering:
    1. Key concepts
    2. Prerequisites
    3. Learning objectives
    """

    # AUDIT: Verify no IDs leaked
    violations = audit_prompt(prompt)
    if violations:
        raise SecurityError(f"IDs leaked to prompt: {violations}")

    return prompt

async def call_llm(prompt: str) -> dict:
    """LLM only sees content, never IDs"""
    response = await llm.generate(prompt)
    return parse_response(response)

Phase 3: Post-LLM (Attribute)

async def save_with_attribution(
    llm_output: dict,
    ctx: RequestContext,
    source_refs: SourceRefs,
) -> Analysis:
    """
    Attach context and references to LLM output.
    Attribution is deterministic, not LLM-generated.
    """
    return await Analysis.create(
        # Generated
        id=uuid4(),

        # From RequestContext (system-provided)
        user_id=ctx.user_id,
        tenant_id=ctx.tenant_id,
        analysis_id=ctx.resource_id,
        trace_id=ctx.trace_id,

        # From Pre-LLM refs (deterministic)
        source_document_ids=source_refs.document_ids,
        source_chunk_ids=source_refs.chunk_ids,

        # From LLM (content only)
        content=llm_output["analysis"],
        key_concepts=llm_output["key_concepts"],
        difficulty=llm_output["difficulty"],

        # Metadata
        created_at=datetime.now(timezone.utc),
        model_used=MODEL_NAME,
    )

Output Validation

After LLM returns, validate:

Schema: Response matches expected structure
Guardrails: No toxic/harmful content
Grounding: Claims are supported by provided context
No IDs: LLM didn't hallucinate any IDs

async def validate_output(
    llm_output: dict,
    context_texts: list[str],
) -> ValidationResult:
    """Validate LLM output before use"""

    # 1. Schema validation
    try:
        parsed = AnalysisOutput.model_validate(llm_output)
    except ValidationError as e:
        return ValidationResult(valid=False, reason=f"Schema error: {e}")

    # 2. Guardrails
    if await contains_toxic_content(parsed.content):
        return ValidationResult(valid=False, reason="Toxic content detected")

    # 3. Grounding check
    if not is_grounded(parsed.content, context_texts):
        return ValidationResult(valid=False, reason="Ungrounded claims")

    # 4. No hallucinated IDs
    if contains_uuid_pattern(parsed.content):
        return ValidationResult(valid=False, reason="Hallucinated IDs")

    return ValidationResult(valid=True)

Integration Points in OrchestKit

Content Analysis Workflow

backend/app/workflows/
├── agents/
│   ├── execution.py        # Add context separation
│   └── prompts/            # Audit all prompts
├── tasks/
│   └── generate_artifact.py  # Add attribution

Services

backend/app/services/
├── embeddings/            # Pre-LLM filtering
└── analysis/              # Post-LLM attribution

Checklist Before Any LLM Call

RequestContext available
Data filtered by tenant_id and user_id
Content extracted without IDs
Source references saved
Prompt passes audit (no forbidden patterns)
Output validated before use
Attribution uses context, not LLM output

input-validation - Input sanitization patterns that complement LLM safety
rag-retrieval - RAG pipeline patterns requiring tenant-scoped retrieval
llm-evaluation - Output quality assessment including hallucination detection
security-scanning - Automated security scanning for LLM integrations

Key Decisions

Decision	Choice	Rationale
ID handling	Flow around LLM, never through	Prevents hallucination, injection, and cross-tenant leakage
Output validation	Schema + guardrails + grounding	Defense-in-depth for LLM outputs
Attribution approach	Deterministic post-LLM	System context provides IDs, not LLM
Prompt auditing	Regex pattern matching	Fast detection of forbidden identifiers

Version: 1.0.0 (December 2025)

Capability Details

context-separation

Keywords: context separation, prompt context, id in prompt, parameterized Solves:

How do I prevent IDs from leaking into prompts?
How do I separate system context from prompt content?
What should never appear in LLM prompts?

pre-llm-filtering

Keywords: pre-llm, rag filter, data filter, tenant filter Solves:

How do I filter data before sending to LLM?
How do I ensure tenant isolation in RAG?
How do I scope retrieval to current user?

post-llm-attribution

Keywords: attribution, source tracking, provenance, citation Solves:

How do I track which sources the LLM used?
How do I attribute results correctly?
How do I avoid LLM-generated IDs?

output-guardrails

Keywords: guardrail, output validation, hallucination, toxicity Solves:

How do I validate LLM output?
How do I detect hallucinations?
How do I prevent toxic content generation?

prompt-audit

Keywords: prompt audit, prompt security, prompt injection Solves:

How do I verify no IDs leaked to prompts?
How do I audit prompts for security?
How do I prevent prompt injection?

スコア

総合スコア

75/100

リポジトリの品質指標に基づく評価

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

✓説明文

100文字以上の説明がある

+10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

○フォーク

10回以上フォークされている

0/5

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

レビュー

💬

レビュー機能は近日公開予定です

llm-safety-patterns

SKILL.md

LLM Safety Patterns

The Core Principle

Why This Matters

The Architecture

What NEVER Goes in Prompts

OrchestKit Forbidden Parameters

Detection Pattern

The Three-Phase Pattern

Phase 1: Pre-LLM (Filter & Extract)

Phase 2: LLM Call (Content Only)

Phase 3: Post-LLM (Attribute)

Output Validation

Integration Points in OrchestKit

Content Analysis Workflow

Services

Checklist Before Any LLM Call

Key Decisions

Capability Details

context-separation

pre-llm-filtering

post-llm-attribution

output-guardrails

prompt-audit

スコア

レビュー

orpc-contract-first

component-refactoring

web-design-guidelines

frontend-code-review

frontend-testing

vercel-react-best-practices

llm-safety-patterns

SKILL.md

LLM Safety Patterns

The Core Principle

Why This Matters

The Architecture

What NEVER Goes in Prompts

OrchestKit Forbidden Parameters

Detection Pattern

The Three-Phase Pattern

Phase 1: Pre-LLM (Filter & Extract)

Phase 2: LLM Call (Content Only)

Phase 3: Post-LLM (Attribute)

Output Validation

Integration Points in OrchestKit

Content Analysis Workflow

Services

Checklist Before Any LLM Call

Related Skills

Key Decisions

Capability Details

context-separation

pre-llm-filtering

post-llm-attribution

output-guardrails

prompt-audit

スコア

レビュー

関連

関連スキル

orpc-contract-first

component-refactoring

web-design-guidelines

frontend-code-review

frontend-testing

vercel-react-best-practices