Back to list
trailofbits

semgrep

by trailofbits

Trail of Bits Claude Code skills for security research, vulnerability detection, and audit workflows

1,725🍴 140📅 Jan 23, 2026

SKILL.md


name: semgrep description: Run Semgrep static analysis for fast security scanning and pattern matching. Use when asked to scan code with Semgrep, write custom YAML rules, find vulnerabilities quickly, use taint mode, or set up Semgrep in CI/CD pipelines. allowed-tools:

  • Bash
  • Read
  • Glob
  • Grep

Semgrep Static Analysis

When to Use Semgrep

Ideal scenarios:

  • Quick security scans (minutes, not hours)
  • Pattern-based bug detection
  • Enforcing coding standards and best practices
  • Finding known vulnerability patterns
  • Single-file analysis without complex data flow
  • First-pass analysis before deeper tools

Consider CodeQL instead when:

  • Need interprocedural taint tracking across files
  • Complex data flow analysis required
  • Analyzing custom proprietary frameworks

When NOT to Use

Do NOT use this skill for:

  • Complex interprocedural data flow analysis (use CodeQL instead)
  • Binary analysis or compiled code without source
  • Custom deep semantic analysis requiring AST/CFG traversal
  • When you need to track taint across many function boundaries

Installation

# pip
python3 -m pip install semgrep

# Homebrew
brew install semgrep

# Docker
docker run --rm -v "${PWD}:/src" returntocorp/semgrep semgrep --config auto /src

# Update
pip install --upgrade semgrep

Core Workflow

1. Quick Scan

semgrep --config auto .                    # Auto-detect rules
semgrep --config auto --metrics=off .      # Disable telemetry for proprietary code

2. Use Rulesets

semgrep --config p/<RULESET> .             # Single ruleset
semgrep --config p/security-audit --config p/trailofbits .  # Multiple
RulesetDescription
p/defaultGeneral security and code quality
p/security-auditComprehensive security rules
p/owasp-top-tenOWASP Top 10 vulnerabilities
p/cwe-top-25CWE Top 25 vulnerabilities
p/r2c-security-auditr2c security audit rules
p/trailofbitsTrail of Bits security rules
p/pythonPython-specific
p/javascriptJavaScript-specific
p/golangGo-specific

3. Output Formats

semgrep --config p/security-audit --sarif -o results.sarif .   # SARIF
semgrep --config p/security-audit --json -o results.json .     # JSON
semgrep --config p/security-audit --dataflow-traces .          # Show data flow

4. Scan Specific Paths

semgrep --config p/python app.py           # Single file
semgrep --config p/javascript src/         # Directory
semgrep --config auto --include='**/test/**' .  # Include tests (excluded by default)

Writing Custom Rules

Basic Structure

rules:
  - id: hardcoded-password
    languages: [python]
    message: "Hardcoded password detected: $PASSWORD"
    severity: ERROR
    pattern: password = "$PASSWORD"

Pattern Syntax

SyntaxDescriptionExample
...Match anythingfunc(...)
$VARCapture metavariable$FUNC($INPUT)
<... ...>Deep expression match<... user_input ...>

Pattern Operators

OperatorDescription
patternMatch exact pattern
patternsAll must match (AND)
pattern-eitherAny matches (OR)
pattern-notExclude matches
pattern-insideMatch only inside context
pattern-not-insideMatch only outside context
pattern-regexRegex matching
metavariable-regexRegex on captured value
metavariable-comparisonCompare values

Combining Patterns

rules:
  - id: sql-injection
    languages: [python]
    message: "Potential SQL injection"
    severity: ERROR
    patterns:
      - pattern-either:
          - pattern: cursor.execute($QUERY)
          - pattern: db.execute($QUERY)
      - pattern-not:
          - pattern: cursor.execute("...", (...))
      - metavariable-regex:
          metavariable: $QUERY
          regex: .*\+.*|.*\.format\(.*|.*%.*

Taint Mode (Data Flow)

Simple pattern matching finds obvious cases:

# Pattern `os.system($CMD)` catches this:
os.system(user_input)  # Found

But misses indirect flows:

# Same pattern misses this:
cmd = user_input
processed = cmd.strip()
os.system(processed)  # Missed - no direct match

Taint mode tracks data through assignments and transformations:

  • Source: Where untrusted data enters (user_input)
  • Propagators: How it flows (cmd = ..., processed = ...)
  • Sanitizers: What makes it safe (shlex.quote())
  • Sink: Where it becomes dangerous (os.system())
rules:
  - id: command-injection
    languages: [python]
    message: "User input flows to command execution"
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
      - pattern: request.json
    pattern-sinks:
      - pattern: os.system($SINK)
      - pattern: subprocess.call($SINK, shell=True)
      - pattern: subprocess.run($SINK, shell=True, ...)
    pattern-sanitizers:
      - pattern: shlex.quote(...)
      - pattern: int(...)

Full Rule with Metadata

rules:
  - id: flask-sql-injection
    languages: [python]
    message: "SQL injection: user input flows to query without parameterization"
    severity: ERROR
    metadata:
      cwe: "CWE-89: SQL Injection"
      owasp: "A03:2021 - Injection"
      confidence: HIGH
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
      - pattern: request.json
    pattern-sinks:
      - pattern: cursor.execute($QUERY)
      - pattern: db.execute($QUERY)
    pattern-sanitizers:
      - pattern: int(...)
    fix: cursor.execute($QUERY, (params,))

Testing Rules

Test File Format

# test_rule.py
def test_vulnerable():
    user_input = request.args.get("id")
    # ruleid: flask-sql-injection
    cursor.execute("SELECT * FROM users WHERE id = " + user_input)

def test_safe():
    user_input = request.args.get("id")
    # ok: flask-sql-injection
    cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))
semgrep --test rules/

CI/CD Integration (GitHub Actions)

name: Semgrep

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 0 1 * *'  # Monthly

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Required for diff-aware scanning

      - name: Run Semgrep
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
          else
            semgrep ci
          fi
        env:
          SEMGREP_RULES: >-
            p/security-audit
            p/owasp-top-ten
            p/trailofbits

Configuration

.semgrepignore

tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/

Suppress False Positives

password = get_from_vault()  # nosemgrep: hardcoded-password
dangerous_but_safe()  # nosemgrep

Performance

semgrep --config rules/ --time .    # Check rule performance
ulimit -n 4096                       # Increase file descriptors for large codebases

Path Filtering in Rules

rules:
  - id: my-rule
    paths:
      include: [src/]
      exclude: [src/generated/]

Third-Party Rules

pip install semgrep-rules-manager
semgrep-rules-manager --dir ~/semgrep-rules download
semgrep -f ~/semgrep-rules .

Rationalizations to Reject

ShortcutWhy It's Wrong
"Semgrep found nothing, code is clean"Semgrep is pattern-based; it can't track complex data flow across functions
"I wrote a rule, so we're covered"Rules need testing with semgrep --test; false negatives are silent
"Taint mode catches injection"Only if you defined all sources, sinks, AND sanitizers correctly
"Pro rules are comprehensive"Pro rules are good but not exhaustive; supplement with custom rules for your codebase
"Too many findings = noisy tool"High finding count often means real problems; tune rules, don't disable them

Resources

Score

Total Score

95/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 1000以上

+15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

+5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon