Back to list
trailofbits

semgrep

by trailofbits

Trail of Bits Claude Code skills for security research, vulnerability detection, and audit workflows

1,725🍴 140📅 Jan 23, 2026

SKILL.md


name: semgrep type: tool description: > Semgrep is a fast static analysis tool for finding bugs and enforcing code standards. Use when scanning code for security issues or integrating into CI/CD pipelines.

Semgrep

Semgrep is a highly efficient static analysis tool for finding low-complexity bugs and locating specific code patterns. Because of its ease of use, no need to build the code, multiple built-in rules, and convenient creation of custom rules, it is usually the first tool to run on an audited codebase. Furthermore, Semgrep's integration into the CI/CD pipeline makes it a good choice for ensuring code quality.

Key benefits:

  • Prevents re-entry of known bugs and security vulnerabilities
  • Enables large-scale code refactoring, such as upgrading deprecated APIs
  • Easily added to CI/CD pipelines
  • Custom Semgrep rules mimic the semantics of actual code
  • Allows for secure scanning without sharing code with third parties
  • Scanning usually takes minutes (not hours/days)
  • Easy to use and accessible for both developers and security professionals

When to Use

Use Semgrep when:

  • Looking for bugs with easy-to-identify patterns
  • Analyzing single files (intraprocedural analysis)
  • Detecting systemic bugs (multiple instances across codebase)
  • Enforcing secure defaults and code standards
  • Performing rapid initial security assessment
  • Scanning code without building it first

Consider alternatives when:

  • Multiple files are required for analysis → Consider Semgrep Pro Engine or CodeQL
  • Complex flow analysis is needed → Consider CodeQL
  • Advanced taint tracking across files → Consider CodeQL or Semgrep Pro
  • Custom in-house framework analysis → May need specialized tooling

Quick Reference

TaskCommand
Scan with auto-detectionsemgrep --config auto
Scan with specific rulesetsemgrep --config="p/trailofbits"
Scan with custom rulessemgrep -f /path/to/rules
Output to SARIF formatsemgrep -c p/default --sarif --output scan.sarif
Test custom rulessemgrep --test
Disable metricssemgrep --metrics=off --config=auto
Filter by severitysemgrep --config=auto --severity ERROR
Show dataflow tracessemgrep --dataflow-traces -f rule.yml

Installation

Prerequisites

  • Python 3.7 or later (for pip installation)
  • macOS, Linux, or Windows
  • Homebrew (optional, for macOS/Linux)

Install Steps

Via Python Package Installer:

python3 -m pip install semgrep

Via Homebrew (macOS/Linux):

brew install semgrep

Via Docker:

docker pull returntocorp/semgrep

Keeping Semgrep Updated

# Check current version
semgrep --version

# Update via pip
python3 -m pip install --upgrade semgrep

# Update via Homebrew
brew upgrade semgrep

Verification

semgrep --version

Core Workflow

Step 1: Initial Scan

Start with an auto-configuration scan to evaluate Semgrep's effectiveness:

semgrep --config auto

Important: Auto mode submits metrics online. To disable:

export SEMGREP_SEND_METRICS=off
# OR
semgrep --metrics=off --config auto

Step 2: Select Targeted Rulesets

Use the Semgrep Registry to select rulesets:

# Security-focused rulesets
semgrep --config="p/trailofbits"
semgrep --config="p/cwe-top-25"
semgrep --config="p/owasp-top-ten"

# Language-specific
semgrep --config="p/javascript"

# Multiple rulesets
semgrep --config="p/trailofbits" --config="p/r2c-security-audit"

Step 3: Review and Triage Results

Filter results by severity:

semgrep --config=auto --severity ERROR

Use output formats for easier analysis:

# SARIF for VS Code SARIF Explorer
semgrep -c p/default --sarif --output scan.sarif

# JSON for automation
semgrep -c p/default --json --output scan.json

Step 4: Configure Ignored Files

Create .semgrepignore file to exclude paths:

# Ignore specific files/directories
path/to/ignore/file.ext
path_to_ignore/

# Ignore by extension
*.ext

# Include .gitignore patterns
:include .gitignore

Note: By default, Semgrep skips /tests, /test, and /vendors folders.

How to Customize

Writing Custom Rules

Semgrep rules are YAML files with pattern-matching syntax. Basic structure:

rules:
  - id: rule-id
    languages: [go]
    message: Some message
    severity: ERROR # INFO / WARNING / ERROR
    pattern: test(...)

Running Custom Rules

# Single file
semgrep --config custom_rule.yaml

# Directory of rules
semgrep --config path/to/rules/

Key Syntax Reference

Syntax/OperatorDescriptionExample
...Match zero or more arguments/statementsfunc(..., arg=value, ...)
$X, $VARMetavariable (captures and tracks values)$FUNC($INPUT)
<... ...>Deep expression operator (nested matching)if <... user.is_admin() ...>:
pattern-insideMatch only within contextPattern inside a loop
pattern-notExclude specific patternsNegative matching
pattern-eitherLogical OR (any pattern matches)Multiple alternatives
patternsLogical AND (all patterns match)Combined conditions
metavariable-patternNested metavariable constraintsConstrain captured values
metavariable-comparisonCompare metavariable values$X > 1337

Example: Detecting Insecure Request Verification

rules:
  - id: requests-verify-false
    languages: [python]
    message: requests.get with verify=False disables SSL verification
    severity: WARNING
    pattern: requests.get(..., verify=False, ...)

Example: Taint Mode for SQL Injection

rules:
  - id: sql-injection
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
    pattern-sinks:
      - pattern: cursor.execute($QUERY)
    pattern-sanitizers:
      - pattern: int(...)
    message: Potential SQL injection with unsanitized user input
    languages: [python]
    severity: ERROR

Testing Custom Rules

Create test files with annotations:

# ruleid: requests-verify-false
requests.get(url, verify=False)

# ok: requests-verify-false
requests.get(url, verify=True)

Run tests:

semgrep --test ./path/to/rules/

For autofix testing, create .fixed files (e.g., test.pytest.fixed.py):

semgrep --test
# Output: 1/1: ✓ All tests passed
#         1/1: ✓ All fix tests passed

Configuration

Configuration File

Semgrep doesn't require a central config file. Configuration is done via:

  • Command-line flags
  • Environment variables
  • .semgrepignore for path exclusions

Ignore Patterns

Create .semgrepignore in repository root:

# Ignore directories
tests/
vendor/
node_modules/

# Ignore file types
*.min.js
*.generated.go

# Include .gitignore patterns
:include .gitignore

Suppressing False Positives

Add inline comments to suppress specific findings:

# nosemgrep: rule-id
risky_function()

Best practices:

  • Specify the exact rule ID (not generic # nosemgrep)
  • Explain why the rule is disabled
  • Report false positives to improve rules

Metadata in Custom Rules

Include metadata for better context:

rules:
  - id: example-rule
    metadata:
      cwe: "CWE-89"
      confidence: HIGH
      likelihood: MEDIUM
      impact: HIGH
      subcategory: vuln
    # ... rest of rule

Advanced Usage

Tips and Tricks

TipWhy It Helps
Use --time flagIdentifies slow rules and files for optimization
Limit ellipsis usageReduces false positives and improves performance
Use pattern-inside for contextCreates clearer, more focused findings
Enable autocompleteSpeeds up command-line workflow
Use focus-metavariableHighlights specific code locations in output

Scanning Non-Standard Extensions

Force language interpretation for unusual file extensions:

semgrep --config=/path/to/config --lang python --scan-unknown-extensions /path/to/file.xyz

Dataflow Tracing

Use --dataflow-traces to understand how values flow to findings:

semgrep --dataflow-traces -f taint_rule.yml test.py

Example output:

Taint comes from:
  test.py
    2┆ data = get_user_input()

This is how taint reaches the sink:
  test.py
    3┆ return output(data)

Polyglot File Scanning

Scan embedded languages (e.g., JavaScript in HTML):

rules:
  - id: eval-in-html
    languages: [html]
    message: eval in JavaScript
    patterns:
      - pattern: <script ...>$Y</script>
      - metavariable-pattern:
          metavariable: $Y
          language: javascript
          patterns:
            - pattern: eval(...)
    severity: WARNING

Constant Propagation

Match instances where metavariables hold specific values:

rules:
  - id: high-value-check
    languages: [python]
    message: $X is higher than 1337
    patterns:
      - pattern: function($X)
      - metavariable-comparison:
          metavariable: $X
          comparison: $X > 1337
    severity: WARNING

Autofix Feature

Add automatic fixes to rules:

rules:
  - id: ioutil-readdir-deprecated
    languages: [golang]
    message: ioutil.ReadDir is deprecated. Use os.ReadDir instead.
    severity: WARNING
    pattern: ioutil.ReadDir($X)
    fix: os.ReadDir($X)

Preview fixes without applying:

semgrep -f rule.yaml --dryrun --autofix

Apply fixes:

semgrep -f rule.yaml --autofix

Performance Optimization

Analyze performance:

semgrep --config=auto --time

Optimize rules:

  1. Use paths to narrow file scope
  2. Minimize ellipsis usage
  3. Use pattern-inside to establish context first
  4. Remove unnecessary metavariables

Managing Third-Party Rules

Use semgrep-rules-manager to collect third-party rules:

pip install semgrep-rules-manager
mkdir -p $HOME/custom-semgrep-rules
semgrep-rules-manager --dir $HOME/custom-semgrep-rules download
semgrep -f $HOME/custom-semgrep-rules

CI/CD Integration

GitHub Actions

Recommended Approach

  1. Full scan on main branch with broad rulesets (scheduled)
  2. Diff-aware scanning for pull requests with focused rules
  3. Block PRs with unresolved findings (once mature)

Example Workflow

name: Semgrep
on:
  pull_request: {}
  push:
    branches: ["master", "main"]
  schedule:
    - cron: '0 0 1 * *' # Monthly

jobs:
  semgrep-schedule:
    if: ((github.event_name == 'schedule' || github.event_name == 'push' || github.event.pull_request.merged == true)
        && github.actor != 'dependabot[bot]')
    name: Semgrep default scan
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep
    steps:
      - name: Checkout main repository
        uses: actions/checkout@v4
      - run: semgrep ci
        env:
          SEMGREP_RULES: p/default

  semgrep-pr:
    if: (github.event_name == 'pull_request' && github.actor != 'dependabot[bot]')
    name: Semgrep PR scan
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep
    steps:
      - uses: actions/checkout@v4
      - run: semgrep ci
        env:
          SEMGREP_RULES: >
            p/cwe-top-25
            p/owasp-top-ten
            p/r2c-security-audit
            p/trailofbits

Adding Custom Rules in CI

Rules in same repository:

env:
  SEMGREP_RULES: p/default custom-semgrep-rules-dir/

Rules in private repository:

env:
  SEMGREP_PRIVATE_RULES_REPO: semgrep-private-rules
steps:
  - name: Checkout main repository
    uses: actions/checkout@v4
  - name: Checkout private custom Semgrep rules
    uses: actions/checkout@v4
    with:
      repository: ${{ github.repository_owner }}/${{ env.SEMGREP_PRIVATE_RULES_REPO }}
      token: ${{ secrets.SEMGREP_RULES_TOKEN }}
      path: ${{ env.SEMGREP_PRIVATE_RULES_REPO }}
  - run: semgrep ci
    env:
      SEMGREP_RULES: ${{ env.SEMGREP_PRIVATE_RULES_REPO }}

Testing Rules in CI

name: Test Semgrep rules

on: [push, pull_request]

jobs:
  semgrep-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v4
        with:
          python-version: "3.11"
          cache: "pip"
      - run: python -m pip install -r requirements.txt
      - run: semgrep --test --test-ignore-todo ./path/to/rules/

Common Mistakes

MistakeWhy It's WrongCorrect Approach
Using --config auto on private codeSends metadata to Semgrep serversUse --metrics=off or specific rulesets
Forgetting .semgrepignoreScans excluded directories like /vendorCreate .semgrepignore file
Not testing rules with false positivesRules generate noiseAdd # ok: test cases
Using generic # nosemgrepMakes code review harderUse # nosemgrep: rule-id with explanation
Overusing ellipsis ...Degrades performance and accuracyUse specific patterns when possible
Not including metadata in rulesMakes triage difficultAdd CWE, confidence, impact fields

Limitations

  • Single-file analysis: Cannot track data flow across files without Semgrep Pro Engine
  • No build required: Cannot analyze compiled code or resolve dynamic dependencies
  • Pattern-based: May miss vulnerabilities requiring deep semantic understanding
  • Limited taint tracking: Complex taint analysis is still evolving
  • Custom frameworks: In-house proprietary frameworks may not be well-supported
SkillWhen to Use Together
codeqlFor cross-file taint tracking and complex data flow analysis
sarif-parsingFor processing Semgrep SARIF output in pipelines

Resources

Key External Resources

Trail of Bits public Semgrep rules Community-contributed Semgrep rules for security audits, with contribution guidelines and quality standards.

Semgrep Registry Official registry of Semgrep rules, searchable by language, framework, and security category.

Semgrep Playground Interactive online tool for writing and testing Semgrep rules. Use "simple mode" for easy pattern combination.

Learn Semgrep Syntax Comprehensive guide on Semgrep rule-writing fundamentals.

Trail of Bits Blog: How to introduce Semgrep to your organization Seven-step plan for organizational adoption of Semgrep, including pilot testing, evangelization, and CI/CD integration.

Trail of Bits Blog: Discovering goroutine leaks with Semgrep Real-world example of writing custom rules to detect Go-specific issues.

Video Resources

Score

Total Score

95/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 1000以上

+15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

+5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon