
semgrep
by trailofbits
Trail of Bits Claude Code skills for security research, vulnerability detection, and audit workflows
SKILL.md
name: semgrep type: tool description: > Semgrep is a fast static analysis tool for finding bugs and enforcing code standards. Use when scanning code for security issues or integrating into CI/CD pipelines.
Semgrep
Semgrep is a highly efficient static analysis tool for finding low-complexity bugs and locating specific code patterns. Because of its ease of use, no need to build the code, multiple built-in rules, and convenient creation of custom rules, it is usually the first tool to run on an audited codebase. Furthermore, Semgrep's integration into the CI/CD pipeline makes it a good choice for ensuring code quality.
Key benefits:
- Prevents re-entry of known bugs and security vulnerabilities
- Enables large-scale code refactoring, such as upgrading deprecated APIs
- Easily added to CI/CD pipelines
- Custom Semgrep rules mimic the semantics of actual code
- Allows for secure scanning without sharing code with third parties
- Scanning usually takes minutes (not hours/days)
- Easy to use and accessible for both developers and security professionals
When to Use
Use Semgrep when:
- Looking for bugs with easy-to-identify patterns
- Analyzing single files (intraprocedural analysis)
- Detecting systemic bugs (multiple instances across codebase)
- Enforcing secure defaults and code standards
- Performing rapid initial security assessment
- Scanning code without building it first
Consider alternatives when:
- Multiple files are required for analysis → Consider Semgrep Pro Engine or CodeQL
- Complex flow analysis is needed → Consider CodeQL
- Advanced taint tracking across files → Consider CodeQL or Semgrep Pro
- Custom in-house framework analysis → May need specialized tooling
Quick Reference
| Task | Command |
|---|---|
| Scan with auto-detection | semgrep --config auto |
| Scan with specific ruleset | semgrep --config="p/trailofbits" |
| Scan with custom rules | semgrep -f /path/to/rules |
| Output to SARIF format | semgrep -c p/default --sarif --output scan.sarif |
| Test custom rules | semgrep --test |
| Disable metrics | semgrep --metrics=off --config=auto |
| Filter by severity | semgrep --config=auto --severity ERROR |
| Show dataflow traces | semgrep --dataflow-traces -f rule.yml |
Installation
Prerequisites
- Python 3.7 or later (for pip installation)
- macOS, Linux, or Windows
- Homebrew (optional, for macOS/Linux)
Install Steps
Via Python Package Installer:
python3 -m pip install semgrep
Via Homebrew (macOS/Linux):
brew install semgrep
Via Docker:
docker pull returntocorp/semgrep
Keeping Semgrep Updated
# Check current version
semgrep --version
# Update via pip
python3 -m pip install --upgrade semgrep
# Update via Homebrew
brew upgrade semgrep
Verification
semgrep --version
Core Workflow
Step 1: Initial Scan
Start with an auto-configuration scan to evaluate Semgrep's effectiveness:
semgrep --config auto
Important: Auto mode submits metrics online. To disable:
export SEMGREP_SEND_METRICS=off
# OR
semgrep --metrics=off --config auto
Step 2: Select Targeted Rulesets
Use the Semgrep Registry to select rulesets:
# Security-focused rulesets
semgrep --config="p/trailofbits"
semgrep --config="p/cwe-top-25"
semgrep --config="p/owasp-top-ten"
# Language-specific
semgrep --config="p/javascript"
# Multiple rulesets
semgrep --config="p/trailofbits" --config="p/r2c-security-audit"
Step 3: Review and Triage Results
Filter results by severity:
semgrep --config=auto --severity ERROR
Use output formats for easier analysis:
# SARIF for VS Code SARIF Explorer
semgrep -c p/default --sarif --output scan.sarif
# JSON for automation
semgrep -c p/default --json --output scan.json
Step 4: Configure Ignored Files
Create .semgrepignore file to exclude paths:
# Ignore specific files/directories
path/to/ignore/file.ext
path_to_ignore/
# Ignore by extension
*.ext
# Include .gitignore patterns
:include .gitignore
Note: By default, Semgrep skips /tests, /test, and /vendors folders.
How to Customize
Writing Custom Rules
Semgrep rules are YAML files with pattern-matching syntax. Basic structure:
rules:
- id: rule-id
languages: [go]
message: Some message
severity: ERROR # INFO / WARNING / ERROR
pattern: test(...)
Running Custom Rules
# Single file
semgrep --config custom_rule.yaml
# Directory of rules
semgrep --config path/to/rules/
Key Syntax Reference
| Syntax/Operator | Description | Example |
|---|---|---|
... | Match zero or more arguments/statements | func(..., arg=value, ...) |
$X, $VAR | Metavariable (captures and tracks values) | $FUNC($INPUT) |
<... ...> | Deep expression operator (nested matching) | if <... user.is_admin() ...>: |
pattern-inside | Match only within context | Pattern inside a loop |
pattern-not | Exclude specific patterns | Negative matching |
pattern-either | Logical OR (any pattern matches) | Multiple alternatives |
patterns | Logical AND (all patterns match) | Combined conditions |
metavariable-pattern | Nested metavariable constraints | Constrain captured values |
metavariable-comparison | Compare metavariable values | $X > 1337 |
Example: Detecting Insecure Request Verification
rules:
- id: requests-verify-false
languages: [python]
message: requests.get with verify=False disables SSL verification
severity: WARNING
pattern: requests.get(..., verify=False, ...)
Example: Taint Mode for SQL Injection
rules:
- id: sql-injection
mode: taint
pattern-sources:
- pattern: request.args.get(...)
pattern-sinks:
- pattern: cursor.execute($QUERY)
pattern-sanitizers:
- pattern: int(...)
message: Potential SQL injection with unsanitized user input
languages: [python]
severity: ERROR
Testing Custom Rules
Create test files with annotations:
# ruleid: requests-verify-false
requests.get(url, verify=False)
# ok: requests-verify-false
requests.get(url, verify=True)
Run tests:
semgrep --test ./path/to/rules/
For autofix testing, create .fixed files (e.g., test.py → test.fixed.py):
semgrep --test
# Output: 1/1: ✓ All tests passed
# 1/1: ✓ All fix tests passed
Configuration
Configuration File
Semgrep doesn't require a central config file. Configuration is done via:
- Command-line flags
- Environment variables
.semgrepignorefor path exclusions
Ignore Patterns
Create .semgrepignore in repository root:
# Ignore directories
tests/
vendor/
node_modules/
# Ignore file types
*.min.js
*.generated.go
# Include .gitignore patterns
:include .gitignore
Suppressing False Positives
Add inline comments to suppress specific findings:
# nosemgrep: rule-id
risky_function()
Best practices:
- Specify the exact rule ID (not generic
# nosemgrep) - Explain why the rule is disabled
- Report false positives to improve rules
Metadata in Custom Rules
Include metadata for better context:
rules:
- id: example-rule
metadata:
cwe: "CWE-89"
confidence: HIGH
likelihood: MEDIUM
impact: HIGH
subcategory: vuln
# ... rest of rule
Advanced Usage
Tips and Tricks
| Tip | Why It Helps |
|---|---|
Use --time flag | Identifies slow rules and files for optimization |
| Limit ellipsis usage | Reduces false positives and improves performance |
Use pattern-inside for context | Creates clearer, more focused findings |
| Enable autocomplete | Speeds up command-line workflow |
Use focus-metavariable | Highlights specific code locations in output |
Scanning Non-Standard Extensions
Force language interpretation for unusual file extensions:
semgrep --config=/path/to/config --lang python --scan-unknown-extensions /path/to/file.xyz
Dataflow Tracing
Use --dataflow-traces to understand how values flow to findings:
semgrep --dataflow-traces -f taint_rule.yml test.py
Example output:
Taint comes from:
test.py
2┆ data = get_user_input()
This is how taint reaches the sink:
test.py
3┆ return output(data)
Polyglot File Scanning
Scan embedded languages (e.g., JavaScript in HTML):
rules:
- id: eval-in-html
languages: [html]
message: eval in JavaScript
patterns:
- pattern: <script ...>$Y</script>
- metavariable-pattern:
metavariable: $Y
language: javascript
patterns:
- pattern: eval(...)
severity: WARNING
Constant Propagation
Match instances where metavariables hold specific values:
rules:
- id: high-value-check
languages: [python]
message: $X is higher than 1337
patterns:
- pattern: function($X)
- metavariable-comparison:
metavariable: $X
comparison: $X > 1337
severity: WARNING
Autofix Feature
Add automatic fixes to rules:
rules:
- id: ioutil-readdir-deprecated
languages: [golang]
message: ioutil.ReadDir is deprecated. Use os.ReadDir instead.
severity: WARNING
pattern: ioutil.ReadDir($X)
fix: os.ReadDir($X)
Preview fixes without applying:
semgrep -f rule.yaml --dryrun --autofix
Apply fixes:
semgrep -f rule.yaml --autofix
Performance Optimization
Analyze performance:
semgrep --config=auto --time
Optimize rules:
- Use
pathsto narrow file scope - Minimize ellipsis usage
- Use
pattern-insideto establish context first - Remove unnecessary metavariables
Managing Third-Party Rules
Use semgrep-rules-manager to collect third-party rules:
pip install semgrep-rules-manager
mkdir -p $HOME/custom-semgrep-rules
semgrep-rules-manager --dir $HOME/custom-semgrep-rules download
semgrep -f $HOME/custom-semgrep-rules
CI/CD Integration
GitHub Actions
Recommended Approach
- Full scan on main branch with broad rulesets (scheduled)
- Diff-aware scanning for pull requests with focused rules
- Block PRs with unresolved findings (once mature)
Example Workflow
name: Semgrep
on:
pull_request: {}
push:
branches: ["master", "main"]
schedule:
- cron: '0 0 1 * *' # Monthly
jobs:
semgrep-schedule:
if: ((github.event_name == 'schedule' || github.event_name == 'push' || github.event.pull_request.merged == true)
&& github.actor != 'dependabot[bot]')
name: Semgrep default scan
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- name: Checkout main repository
uses: actions/checkout@v4
- run: semgrep ci
env:
SEMGREP_RULES: p/default
semgrep-pr:
if: (github.event_name == 'pull_request' && github.actor != 'dependabot[bot]')
name: Semgrep PR scan
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v4
- run: semgrep ci
env:
SEMGREP_RULES: >
p/cwe-top-25
p/owasp-top-ten
p/r2c-security-audit
p/trailofbits
Adding Custom Rules in CI
Rules in same repository:
env:
SEMGREP_RULES: p/default custom-semgrep-rules-dir/
Rules in private repository:
env:
SEMGREP_PRIVATE_RULES_REPO: semgrep-private-rules
steps:
- name: Checkout main repository
uses: actions/checkout@v4
- name: Checkout private custom Semgrep rules
uses: actions/checkout@v4
with:
repository: ${{ github.repository_owner }}/${{ env.SEMGREP_PRIVATE_RULES_REPO }}
token: ${{ secrets.SEMGREP_RULES_TOKEN }}
path: ${{ env.SEMGREP_PRIVATE_RULES_REPO }}
- run: semgrep ci
env:
SEMGREP_RULES: ${{ env.SEMGREP_PRIVATE_RULES_REPO }}
Testing Rules in CI
name: Test Semgrep rules
on: [push, pull_request]
jobs:
semgrep-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: "3.11"
cache: "pip"
- run: python -m pip install -r requirements.txt
- run: semgrep --test --test-ignore-todo ./path/to/rules/
Common Mistakes
| Mistake | Why It's Wrong | Correct Approach |
|---|---|---|
Using --config auto on private code | Sends metadata to Semgrep servers | Use --metrics=off or specific rulesets |
Forgetting .semgrepignore | Scans excluded directories like /vendor | Create .semgrepignore file |
| Not testing rules with false positives | Rules generate noise | Add # ok: test cases |
Using generic # nosemgrep | Makes code review harder | Use # nosemgrep: rule-id with explanation |
Overusing ellipsis ... | Degrades performance and accuracy | Use specific patterns when possible |
| Not including metadata in rules | Makes triage difficult | Add CWE, confidence, impact fields |
Limitations
- Single-file analysis: Cannot track data flow across files without Semgrep Pro Engine
- No build required: Cannot analyze compiled code or resolve dynamic dependencies
- Pattern-based: May miss vulnerabilities requiring deep semantic understanding
- Limited taint tracking: Complex taint analysis is still evolving
- Custom frameworks: In-house proprietary frameworks may not be well-supported
Related Skills
| Skill | When to Use Together |
|---|---|
| codeql | For cross-file taint tracking and complex data flow analysis |
| sarif-parsing | For processing Semgrep SARIF output in pipelines |
Resources
Key External Resources
Trail of Bits public Semgrep rules Community-contributed Semgrep rules for security audits, with contribution guidelines and quality standards.
Semgrep Registry Official registry of Semgrep rules, searchable by language, framework, and security category.
Semgrep Playground Interactive online tool for writing and testing Semgrep rules. Use "simple mode" for easy pattern combination.
Learn Semgrep Syntax Comprehensive guide on Semgrep rule-writing fundamentals.
Trail of Bits Blog: How to introduce Semgrep to your organization Seven-step plan for organizational adoption of Semgrep, including pilot testing, evangelization, and CI/CD integration.
Trail of Bits Blog: Discovering goroutine leaks with Semgrep Real-world example of writing custom rules to detect Go-specific issues.
Video Resources
- Introduction to Semgrep - Trail of Bits Webinar
- Detect complex code patterns using semantic grep
- Semgrep part 1 - Embrace Secure Defaults, Block Anti-patterns and more
- Semgrep Weekly Wednesday Office Hours: Modifying Rules to Reduce False Positives
- Raining CVEs On WordPress Plugins With Semgrep | Nullcon Goa 2022
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 1000以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon
