TOP 5 March 23, 2026, 5:30 p.m.

Semgrep: Write Custom Static Security Scanning Rules

Static security scanning is a cornerstone of modern DevSecOps, but off‑the‑shelf tools often miss the nuances of your own codebase. That’s where Semgrep shines: it lets you write custom, language‑aware rules that surface exactly the patterns you care about, without the noise of generic scanners.

Getting Started with Semgrep

First things first: install Semgrep locally or as a Docker image. The CLI is a single binary, so a quick pip install semgrep or docker pull returntocorp/semgrep gets you up and running. Once installed, run semgrep --init in any repository to generate a starter .semgrep directory.

The generated folder contains a rules subdirectory where you’ll store your custom YAML rules, and a .semgrep.yml config that tells the CLI which rules to apply. Keep this config version‑controlled so the entire team shares the same scanning baseline.

The Anatomy of a Semgrep Rule

A Semgrep rule is a small YAML document with three main sections: metadata, patterns, and message. The metadata block holds human‑readable information like the rule ID, severity, and a short description. The patterns section defines the code pattern you want to match, using language‑specific abstract syntax tree (AST) placeholders.

Here’s a minimal example that flags the use of eval in Python:

rules:
  - id: python-eval-detected
    patterns:
      - pattern: eval($EXPR)
    message: "Avoid using eval(); it can lead to code injection."
    severity: ERROR
    languages: [python]
    metadata:
      category: security
      confidence: high

Notice the $EXPR metavariable – it captures any expression passed to eval. Semgrep then reports the exact location, making remediation straightforward.

Pattern Syntax Deep Dive

Semgrep supports three pattern styles:

Literal patterns – exact code snippets.
Metavariable patterns – placeholders like $VAR that match any subtree.
Regex‑like patterns – using ...$ to match any number of statements.

Combining these styles lets you express sophisticated checks. For instance, you can ensure that a requests.get call is always wrapped in a try/except block.

A Real‑World Rule: Detect Hard‑Coded Secrets

Hard‑coded credentials are a classic security blunder. Below is a rule that flags any string literal that looks like an AWS secret key (starts with AKIA and is 20 characters long). The rule uses a regular‑expression pattern on the captured string.

rules:
  - id: python-aws-secret-key
    patterns:
      - pattern: $VAR = "$SECRET"
        metavariable-regex:
          $SECRET: '^AKIA[0-9A-Z]{16}$'
    message: "Potential AWS secret key hard‑coded in source."
    severity: WARNING
    languages: [python]
    metadata:
      category: secret-management
      confidence: medium

When Semgrep encounters my_key = "AKIA1234567890ABCD", it raises a warning, pointing developers directly to the offending line.

Pro tip: Pair this rule with a pre‑commit hook so the scan runs before every commit, catching secrets early in the development cycle.

Extending the Secret Detection Rule

You can broaden the rule to cover other secret patterns, such as private RSA keys or JWT signing secrets. Add additional metavariable-regex entries under the same rule or create separate rules for clarity. Remember to adjust the severity level based on the risk each secret type poses.

Advanced Patterns: Taint Tracking with Dataflow

Simple pattern matching is powerful, but many vulnerabilities arise from data flowing from an untrusted source to a sensitive sink. Semgrep’s dataflow engine can trace this movement across functions and modules.

Consider a Flask endpoint that reads a query parameter and passes it directly to os.system. The following rule flags such insecure dataflow:

rules:
  - id: flask-os-system-taint
    patterns:
      - pattern-either:
          - pattern: |
              def $FUNC(...):
                  $VAR = request.args.get($PARAM)
                  ...
                  os.system($VAR)
          - pattern: |
              $VAR = request.args.get($PARAM)
              os.system($VAR)
    message: "User‑controlled input reaches os.system(); potential command injection."
    severity: ERROR
    languages: [python]
    pattern-sources:
      - pattern: request.args.get($PARAM)
    pattern-sinks:
      - pattern: os.system($VAR)
    metadata:
      category: injection
      confidence: high

The pattern-sources and pattern-sinks sections tell Semgrep to treat the request argument as a source of taint and os.system as a sink. If any path connects them, the rule fires.

Pro tip: Use the --debug flag when iterating on dataflow rules to see how Semgrep propagates taint across the AST.

Integrating Semgrep into CI/CD Pipelines

Running Semgrep locally is great for developers, but you’ll get the most security ROI when it’s baked into your CI pipeline. Most CI providers (GitHub Actions, GitLab CI, Jenkins) support a simple step that pulls the Docker image and executes the scan.

Here’s a concise GitHub Actions workflow snippet that runs Semgrep on every push to main and fails the build on any ERROR severity findings:

name: Semgrep Scan
on:
  push:
    branches: [main]
jobs:
  semgrep:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run Semgrep
        uses: returntocorp/semgrep-action@v1
        with:
          config: .semgrep.yml
          publishToken: ${{ secrets.SEMGREP_APP_TOKEN }}
          fail-on: error

The publishToken optionally uploads results to Semgrep Cloud for dashboards and trend analysis. If you prefer a self‑hosted solution, point the action to a local semgrep binary instead of the hosted action.

Fail‑Fast vs. Fail‑Soft Strategies

Fail‑fast: Treat any ERROR as a build blocker. Ideal for high‑risk projects where security is non‑negotiable.
Fail‑soft: Only block on CRITICAL findings, while surfacing WARNING and INFO in the PR comments. This reduces friction for new teams adopting Semgrep.

Adjust the severity levels in your rules to align with your organization’s risk tolerance.

Customizing Rule Metadata for Better Reporting

Metadata isn’t just decorative; it powers downstream tools like security dashboards and compliance reports. Include fields such as cwe, owasp, and references to map findings to industry standards.

Below is a rule that flags insecure deserialization in Java, enriched with CWE and OWASP references:

rules:
  - id: java-insecure-deserialization
    patterns:
      - pattern: |
          ObjectInputStream ois = new ObjectInputStream($STREAM);
          ois.readObject();
    message: "Insecure deserialization can lead to remote code execution."
    severity: ERROR
    languages: [java]
    metadata:
      category: deserialization
      cwe: "CWE-502"
      owasp: "A8:2021-Insecure Deserialization"
      references:
        - "https://owasp.org/Top10/A08_2021-Insecure_Deserialization/"
        - "https://cwe.mitre.org/data/definitions/502.html"

When this rule fires, the generated SARIF or JSON report will contain the CWE ID, making it trivial to feed into compliance tools.

Testing Your Rules with Semgrep’s Playground

Before committing a rule to the repo, validate it against a curated test suite. Semgrep provides a --test flag that runs the rule against positive (should match) and negative (should not match) test cases defined in the same YAML file.

Here’s a snippet that includes both test cases for the earlier eval rule:

rules:
  - id: python-eval-detected
    patterns:
      - pattern: eval($EXPR)
    message: "Avoid using eval(); it can lead to code injection."
    severity: ERROR
    languages: [python]
    metadata:
      category: security
    tests:
      - name: positive
        code: |
          result = eval(user_input)
      - name: negative
        code: |
          safe_result = int(user_input)

Run semgrep --test -c .semgrep.yml and Semgrep will report any mismatches, giving you confidence that the rule behaves as expected.

Pro tip: Keep test cases alongside each rule. This makes future modifications safe and encourages a test‑driven approach to rule development.

Scaling Custom Rules Across Languages

One of Semgrep’s strengths is its multi‑language support. You can write a single logical rule that applies to both Python and JavaScript by using the languages list. For example, detecting the use of innerHTML assignment—a common XSS vector—in both languages:

rules:
  - id: unsafe-innerhtml
    patterns:
      - pattern: $OBJ.innerHTML = $VALUE
    message: "Assigning to innerHTML can lead to XSS."
    severity: WARNING
    languages: [javascript, typescript]
    metadata:
      category: xss

For Python, an analogous rule would target templating engines that auto‑escape incorrectly. By maintaining a consistent naming convention (unsafe-*), you can build a library of reusable rules that span your entire tech stack.

Performance Considerations and Rule Optimization

Large codebases can cause Semgrep scans to take several minutes, especially with complex dataflow rules. To keep scans fast:

Scope rules using the paths field to limit scanning to relevant directories.
Use pattern-either wisely; each branch adds overhead.
Cache results by leveraging the --cache flag in CI pipelines.

Here’s an example that restricts a rule to the src/ folder:

rules:
  - id: python-logging-format
    patterns:
      - pattern: logger.info($MSG)
    message: "Consider using structured logging instead of plain strings."
    severity: INFO
    languages: [python]
    paths:
      - src/**/*.py

By narrowing the search space, you reduce CPU usage and keep developer feedback snappy.

Community Rules and Sharing Your Own

Semgrep hosts a public rule registry where thousands of community‑contributed rules live. You can import them directly via semgrep --config p/ci or any of the pre‑built packs (p/security-audit, p/r2c, etc.). When you create a valuable rule, consider publishing it to the registry to help the broader community.

Publishing is straightforward: create a GitHub repository with a .semgrep folder, add a README.md describing the rule, and open a pull request against the official semgrep‑rules repo. Your rule will then be available to anyone using the p/your‑repo shorthand.

Pro tip: Include a license field (e.g., MIT) in your rule metadata to avoid legal ambiguity when others reuse your rule.

Conclusion

Custom Semgrep rules empower teams to codify their unique security policies directly into the development workflow. By mastering pattern syntax, dataflow tracking, and CI integration, you can catch high‑impact bugs before they ship. Remember to write tests for every rule, keep metadata rich for compliance, and share valuable patterns with the community. With these practices in place, Semgrep becomes not just a scanner, but a living security policy that evolves alongside your code.

Share this article