Skip to content

fix(security): resolve OWASP scanner bypass via argument poisoning#89

Open
SHAURYASANYAL3 wants to merge 7 commits into
sreerevanth:mainfrom
SHAURYASANYAL3:fix/owasp-argument-bypass
Open

fix(security): resolve OWASP scanner bypass via argument poisoning#89
SHAURYASANYAL3 wants to merge 7 commits into
sreerevanth:mainfrom
SHAURYASANYAL3:fix/owasp-argument-bypass

Conversation

@SHAURYASANYAL3
Copy link
Copy Markdown
Contributor

@SHAURYASANYAL3 SHAURYASANYAL3 commented May 29, 2026

Summary

This PR fixes a security gap in OwaspScanner where tool call arguments were not included in heuristic analysis. As a result, malicious payloads embedded within nested JSON structures could bypass safety checks that were only inspecting raw command strings.

What Changed

Recursive Argument Flattening

  • Added _flatten_values() to OwaspScanner.
  • Recursively traverses dictionaries and lists to extract string content from nested tool arguments.

Expanded Scan Coverage

  • Updated _blob_of() to include flattened argument values in the generated scan payload.
  • Ensures security heuristics evaluate both command content and tool arguments.

Improved Threat Detection

The scanner can now detect:

  • A01: Prompt Injection
  • A04: Unsafe Code Execution
  • A05: Data Exfiltration

even when malicious content is embedded deep within structured tool arguments rather than exposed in the top-level command text.

Validation

Added the following test cases in tests/test_owasp_security.py:

  1. test_owasp_detects_exfil_in_poisoned_arguments
  2. test_owasp_detects_prompt_injection_in_arguments
  3. test_owasp_detects_nested_unsafe_code

All three tests pass with this change and fail against the previous implementation, confirming the vulnerability and validating the fix.

Summary by CodeRabbit

  • New Features

    • Claude Code adapter with CLI observation, real-time trace streaming, and safety enforcement.
    • LangChain adapter with callback integration and advanced configuration.
  • Documentation

    • Recommend editable dev-mode install for local development.
    • Added detailed adapter guides and a universal one-liner in getting-started.
    • Updated API description to clarify trace ingestion, sessions, safety, and dashboard updates.
  • Bug Fixes / Tests

    • Enhanced security scanner to detect threats in nested/cyclic tool arguments.
    • Added OWASP/security regression tests covering nested/cycle cases.
  • Chores

    • New PR test workflow to run tests, linting, and post results.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

Enhances OWASP scanning to inspect nested tool-call arguments, adds tests for cycle-safe flattening and OWASP vector detection, introduces Claude Code and LangChain adapter docs, updates README/getting-started and CONTRIBUTING setup, expands FastAPI metadata, and adds a PR test GitHub Actions workflow.

Changes

OWASP Security Enhancement and Adapter Documentation

Layer / File(s) Summary
OWASP Nested Argument Scanning
agentwatch/security/owasp.py
Introduces _flatten_values helper that recursively extracts string-like content from nested data structures and updates _blob_of to include flattened tool_call.arguments in security scanning.
OWASP Vector Detection Tests
tests/test_owasp_security.py
Adds test helper _tool_event and three pytest cases that validate detection of data exfiltration, prompt injection, and unsafe code execution when those threats are embedded in tool call arguments.
Cycle Handling & _flatten_values Tests
tests/test_safety.py
Adds tests ensuring OwaspScanner.scan and _flatten_values handle self-referential/cyclical data without raising RecursionError and skip cycles while returning flattened values; also updates related test imports.
Adapter Documentation Pages
docs/adapters/claude-code.md, docs/adapters/langchain.md
New documentation for Claude Code adapter (CLI usage, reasoning trace parsing, safety enforcement, WebSocket streaming) and LangChain adapter (installation, callback handler usage, features, event bus configuration).
Documentation Links and Getting Started
docs/getting-started.md, README.md
Adds universal wrapper example (watch() usage) in getting-started and updates README Supported Frameworks section with links to new adapter guides.
Setup Instructions and API Metadata
CONTRIBUTING.md, agentwatch/api/server.py
Replaces requirements.txt install with editable mode (python -m pip install -e ".[dev]") in development setup and expands FastAPI application description metadata.
PR Tests Workflow
.github/workflows/test-on-pr.yml
Adds a GitHub Actions workflow to run tests with coverage, run ruff, and post/update a PR comment with results; workflow fails if tests fail.
Test lint suppressions and import tweaks
tests/load/locustfile.py, tests/test_models.py
Adds noqa suppressions to load tests and refactors test imports/noqa annotations in model tests.

Sequence Diagram(s)

sequenceDiagram
  participant AgentEvent
  participant OwaspScanner
  participant ScanRules
  AgentEvent->>OwaspScanner: submit tool_call (raw_command + arguments)
  OwaspScanner->>OwaspScanner: _flatten_values(arguments)
  OwaspScanner->>ScanRules: evaluate scan blob (raw_command + flattened values)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • sreerevanth/AgentWatch#69: Both PRs modify OWASP scanning behavior by changing agentwatch/security/owasp.py's _blob_of handling of tool-call arguments.
  • sreerevanth/AgentWatch#29: Overlapping README adapter documentation and getting-started changes.

Suggested reviewers

  • sreerevanth

Poem

🐰 I nibble nested args with nimble paws,

Flattening secrets so scanners spot the flaws.
Tests hop round cycles, careful and spry,
Docs bloom for adapters, neat and dry,
CI hums—keep watch beneath moonlit sky.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main security fix: resolving an OWASP scanner bypass vulnerability caused by argument poisoning. It accurately reflects the primary change throughout the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agentwatch/security/owasp.py`:
- Around line 144-157: The _flatten_values method can infinite-recurse on cyclic
structures; modify it to track seen object ids and skip already-visited
containers: add an optional parameter (e.g., seen: set[int] | None = None)
inside _flatten_values, initialize seen if None, check id(data) and return [] if
already in seen, add id(data) to seen before recursing into
dicts/lists/tuples/sets, and pass seen through all recursive calls so
self-referential payloads are ignored and RecursionError is avoided.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 058502d4-aac3-4c0e-9d00-8f2d8653fd1a

📥 Commits

Reviewing files that changed from the base of the PR and between c0e464b and f14a4e4.

📒 Files selected for processing (8)
  • CONTRIBUTING.md
  • README.md
  • agentwatch/api/server.py
  • agentwatch/security/owasp.py
  • docs/adapters/claude-code.md
  • docs/adapters/langchain.md
  • docs/getting-started.md
  • tests/test_owasp_security.py

Comment thread agentwatch/security/owasp.py Outdated
Adds issues: write permission required to post comments on pull requests.
@sreerevanth sreerevanth self-requested a review May 30, 2026 15:13
Copy link
Copy Markdown
Owner

@sreerevanth sreerevanth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch on the OWASP argument-poisoning bypass and the added test coverage.

Before merge, please address one blocker:

_flatten_values() performs recursive traversal without cycle detection. Self-referential dict/list structures can trigger infinite recursion and raise RecursionError, causing scans to fail unexpectedly.

Suggested approach:

Track visited object IDs during traversal.
Skip already-visited containers.
Preserve current behavior for normal nested payloads.

Once cycle protection is added, this looks ready for final review.

@SHAURYASANYAL3 SHAURYASANYAL3 force-pushed the fix/owasp-argument-bypass branch from 85a2073 to 312848c Compare May 30, 2026 15:53
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
.github/workflows/test-on-pr.yml (1)

19-19: ⚡ Quick win

Pin third-party actions to commit SHAs.

actions/checkout@v4, actions/setup-python@v5, and actions/github-script@v7 are pinned to mutable tags. Static analysis flags this against the repo's pinning policy, and pinning to a full commit SHA is especially important here because this workflow runs with write permissions. Pin each uses: to a SHA (optionally with the version in a trailing comment).

Also applies to: 27-27, 51-51

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-on-pr.yml at line 19, Replace mutable action tags
with immutable commit SHAs for every "uses:" entry shown (e.g., replace
actions/checkout@v4, actions/setup-python@v5, and actions/github-script@v7) so
the workflow uses exact commit SHAs; keep the human-readable version as an
optional trailing comment if desired. Locate the three "uses:" occurrences in
the workflow file (the ones currently referencing actions/checkout,
actions/setup-python, and actions/github-script) and update each to the
corresponding full commit SHA from the action's repository release commit,
ensuring the SHA is the exact full-length commit hash.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/test-on-pr.yml:
- Around line 4-5: The workflow uses the privileged pull_request_target trigger
while checking out and running PR-controlled code (e.g., actions/checkout of the
PR head, pip install -e ".[dev]", pytest, ruff), which allows token
exfiltration; fix it by splitting into two workflows: 1) a "PR Tests" workflow
triggered on pull_request (not pull_request_target) with permissions.contents:
read that checks out the PR, runs pip install -e ".[dev]", pytest and ruff, and
uploads results as an artifact; and 2) a privileged "PR Test Comment" workflow
triggered by workflow_run for the "PR Tests" workflow that never checks out PR
code, has permissions.pull-requests: write and issues: write, downloads the
uploaded artifact and posts the comment—ensure actions/checkout is removed from
the privileged workflow and artifacts are used to pass results.
- Around line 18-24: The checkout step uses actions/checkout@v4 with the default
credential persistence, which leaves GITHUB_TOKEN in .git/config; update the
Checkout step (actions/checkout@v4) to set persist-credentials: false to prevent
persisting credentials for this PR-test job since the job does not push and
should not expose GITHUB_TOKEN. Ensure the changed Checkout configuration is
applied where ref: ${{ github.event.pull_request.head.sha }} is used.

In `@tests/test_safety.py`:
- Around line 368-385: The test uses the typing symbol Any in annotations inside
test_flatten_values_cycle_detection (and other tests) but there is no import;
add the missing import "from typing import Any" to the test module so
annotations in test_flatten_values_cycle_detection, and any uses of Any
elsewhere, resolve and CI (F821) no longer fails; locate the
tests/test_safety.py file and insert the import near other typing imports or at
top of the file.

---

Nitpick comments:
In @.github/workflows/test-on-pr.yml:
- Line 19: Replace mutable action tags with immutable commit SHAs for every
"uses:" entry shown (e.g., replace actions/checkout@v4, actions/setup-python@v5,
and actions/github-script@v7) so the workflow uses exact commit SHAs; keep the
human-readable version as an optional trailing comment if desired. Locate the
three "uses:" occurrences in the workflow file (the ones currently referencing
actions/checkout, actions/setup-python, and actions/github-script) and update
each to the corresponding full commit SHA from the action's repository release
commit, ensuring the SHA is the exact full-length commit hash.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7d8bb7c4-0ebb-480c-8e90-61e351c5d90f

📥 Commits

Reviewing files that changed from the base of the PR and between f14a4e4 and 312848c.

📒 Files selected for processing (3)
  • .github/workflows/test-on-pr.yml
  • agentwatch/security/owasp.py
  • tests/test_safety.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • agentwatch/security/owasp.py

Comment on lines +4 to 5
pull_request_target:
branches: [main]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Critical: pull_request_target that checks out and executes untrusted PR code enables token theft / RCE.

This workflow combines the privileged pull_request_target trigger (which runs with the base repo's write-capable GITHUB_TOKEN and issues: write / pull-requests: write) with checking out the PR head (Line 21) and then executing arbitrary PR-controlled code via pip install -e ".[dev]" (Line 33), pytest (Lines 39-42, including any conftest.py/test code in the PR), and ruff config from the PR.

Any external contributor can modify those files to run arbitrary commands during the job and exfiltrate the token or abuse the write permissions. The inline comment on Lines 22-24 documents the risk but does not mitigate it. This directly undermines the security intent of this PR.

The safe pattern is to split into two workflows: run untrusted code under the unprivileged pull_request trigger (no secrets/write token), and post the comment from a separate workflow_run workflow (or an actions/upload-artifact + download) that never checks out PR code.

🔒 Recommended two-workflow structure
# test-on-pr.yml — runs untrusted code, no write token
name: PR Tests
on:
  pull_request:
    branches: [main]
permissions:
  contents: read
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@<sha>
      - uses: actions/setup-python@<sha>
        with: { python-version: "3.12", cache: pip }
      - run: pip install -e ".[dev]"
      - run: pytest tests/ -v --tb=short --cov=agentwatch --cov-report=json
        continue-on-error: true
        id: pytest
      - run: ruff check .
        continue-on-error: true
      # upload coverage.json + outcomes as an artifact for the commenting workflow
# pr-comment.yml — privileged, never checks out PR code
name: PR Test Comment
on:
  workflow_run:
    workflows: ["PR Tests"]
    types: [completed]
permissions:
  pull-requests: write
  issues: write
# download the artifact, then post/update the comment via github-script
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-on-pr.yml around lines 4 - 5, The workflow uses the
privileged pull_request_target trigger while checking out and running
PR-controlled code (e.g., actions/checkout of the PR head, pip install -e
".[dev]", pytest, ruff), which allows token exfiltration; fix it by splitting
into two workflows: 1) a "PR Tests" workflow triggered on pull_request (not
pull_request_target) with permissions.contents: read that checks out the PR,
runs pip install -e ".[dev]", pytest and ruff, and uploads results as an
artifact; and 2) a privileged "PR Test Comment" workflow triggered by
workflow_run for the "PR Tests" workflow that never checks out PR code, has
permissions.pull-requests: write and issues: write, downloads the uploaded
artifact and posts the comment—ensure actions/checkout is removed from the
privileged workflow and artifacts are used to pass results.

Comment on lines 18 to +24
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
# Note: pull_request_target runs in the context of the base repo.
# Checking out the PR head is necessary to test the changes,
# but use caution as this runs with a write-capable GITHUB_TOKEN.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Disable credential persistence on checkout.

Since this job runs untrusted PR code under a privileged trigger, the default persist-credentials: true leaves the GITHUB_TOKEN in .git/config where the executed code can read and exfiltrate it. Set persist-credentials: false (you don't push from this job). This hardens the checkout even after restructuring the trigger.

🛡️ Proposed fix
       - name: Checkout
         uses: actions/checkout@v4
         with:
           ref: ${{ github.event.pull_request.head.sha }}
+          persist-credentials: false
🧰 Tools
🪛 zizmor (1.25.2)

[warning] 18-24: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)


[error] 19-19: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-on-pr.yml around lines 18 - 24, The checkout step
uses actions/checkout@v4 with the default credential persistence, which leaves
GITHUB_TOKEN in .git/config; update the Checkout step (actions/checkout@v4) to
set persist-credentials: false to prevent persisting credentials for this
PR-test job since the job does not push and should not expose GITHUB_TOKEN.
Ensure the changed Checkout configuration is applied where ref: ${{
github.event.pull_request.head.sha }} is used.

Comment thread tests/test_safety.py
Copy link
Copy Markdown
Owner

@sreerevanth sreerevanth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes Requested

Thanks for addressing the original blocker. The cycle-detection fix in _flatten_values() resolves the recursion concern and the security regression tests are appreciated.

Before merge, two additional issues need to be addressed:

  1. Critical workflow security issue

The workflow now uses pull_request_target while checking out and executing PR-controlled code (pip install, pytest, etc.). This introduces a privileged-code-execution path and is a higher-severity security concern than the OWASP scanner issue being fixed.

Please revert to a safe workflow design (e.g. unprivileged pull_request testing plus a separate reporting workflow) or otherwise eliminate execution of untrusted PR code under a privileged token context.

  1. Resolve merge conflicts

GitHub currently reports an unresolved conflict in:

  • docs/adapters/langchain.md

Please rebase on the latest main, resolve conflicts, and push the updated branch.

Once the workflow security issue and merge conflict are resolved, this can be reviewed again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants