Skip to content

feat: implement Causal Blast Radius Estimator for proactive impact analysis#91

Open
SHAURYASANYAL3 wants to merge 2 commits into
sreerevanth:mainfrom
SHAURYASANYAL3:feat/blast-radius-estimator
Open

feat: implement Causal Blast Radius Estimator for proactive impact analysis#91
SHAURYASANYAL3 wants to merge 2 commits into
sreerevanth:mainfrom
SHAURYASANYAL3:feat/blast-radius-estimator

Conversation

@SHAURYASANYAL3
Copy link
Copy Markdown
Contributor

@SHAURYASANYAL3 SHAURYASANYAL3 commented May 29, 2026

Summary

This PR introduces the Causal Blast Radius Estimator (SAF-010), enhancing AgentWatch's safety analysis by estimating the potential impact of an action rather than relying solely on static pattern matching.

The new estimator evaluates contextual risk signals across SQL and filesystem operations and automatically escalates actions that could affect critical resources or large amounts of data.

What Changed

Blast Radius Estimation Engine

  • Refactored agentwatch/core/blast_radius.py to implement causal impact heuristics.

  • Added support for:

    • SQL risk assessment (e.g., missing WHERE clauses, access to critical tables).
    • Filesystem risk assessment (e.g., critical paths, wildcard operations, broad file targeting).

Safety Engine Integration

  • Integrated blast radius estimation into the main safety evaluation flow in agentwatch/core/safety.py.
  • High-impact actions are now automatically classified with an ESCALATED status.

Schema Enhancements

  • Extended SafetyCheckData in agentwatch/core/schema.py.
  • Stores blast radius metadata for downstream analysis and dashboard visualization.

Validation

Added tests/test_blast_radius_causal.py covering:

  1. Escalation of SQL DELETE operations without a WHERE clause.
  2. Detection of access to critical database tables.
  3. Identification of production resource tags.
  4. Automatic escalation behavior within the SafetyEngine.

All 28 safety-related tests pass with this change.

Summary by CodeRabbit

  • New Features

    • Enhanced impact analysis detects unguarded DB deletions, critical-table changes, destructive filesystem operations, and marks critical resources with affected counts and an explanatory summary.
    • Safety checks now include blast-radius analysis and can automatically escalate high-impact tool calls to require approval.
  • Tests

    • Added tests covering impact detection scenarios and escalation behavior.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 81a5d8e1-c3dd-4e5c-a104-16c098c1322c

📥 Commits

Reviewing files that changed from the base of the PR and between da1b995 and 0fb7d8f.

📒 Files selected for processing (1)
  • agentwatch/core/blast_radius.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • agentwatch/core/blast_radius.py

📝 Walkthrough

Walkthrough

The PR adds causal impact analysis to the safety engine by introducing a refactored BlastRadiusEstimator that detects dangerous database operations (missing WHERE clauses, UPDATE/DELETE patterns, critical table modifications), destructive filesystem commands, and production resource access, then integrates this estimator into SafetyEngine to escalate policy decisions when high-impact operations are detected.

Changes

Causal Blast Radius Estimation and Safety Escalation

Layer / File(s) Summary
BlastRadius and SafetyCheckData schema extension
agentwatch/core/blast_radius.py, agentwatch/core/schema.py
BlastRadius gains affected_row_count, affected_file_count, is_critical_resource, and explanation fields; to_dict() serializes them. SafetyCheckData adds optional blast_radius field for carrying impact metadata.
BlastRadiusEstimator refactoring and pattern matching
agentwatch/core/blast_radius.py
Estimator converted from @dataclass to explicit __init__. Pattern list adds UPDATE ... SET detector. Estimate flow deduplicates downstream_services and aggregates pattern scores while preserving reversibility flags.
Causal heuristics and impact scoring pipeline
agentwatch/core/blast_radius.py
Post-pattern analysis runs database heuristics (missing WHERE detection, critical table identification), filesystem heuristics (critical path and wildcard deletion checks), criticality evaluation, score normalization based on irreversibility and is_critical_resource, and explanation generation.
SafetyEngine blast-radius estimation and escalation
agentwatch/core/safety.py
Constructor accepts optional BlastRadiusEstimator parameter. check_event computes blast-radius estimate during tool-call checks and attaches result to SafetyCheckData. "Causal Override" step escalates to requires_approval=true and appends "ESCALATED" reason if blast-radius requires approval, even when prior policy evaluation did not.
Blast radius estimation and safety escalation tests
tests/test_blast_radius_causal.py
Helper constructs tool-call events. Unit tests validate BlastRadiusEstimator detects missing WHERE, critical tables (users), critical filesystem paths (/etc), and production-tagged resources. Integration test confirms SafetyEngine escalates to requires_approval and triggers approval callback when deletion targets critical table.

Sequence Diagram

sequenceDiagram
  participant Engine as SafetyEngine
  participant Estimator as BlastRadiusEstimator
  participant Policy as Policy Evaluator
  participant Data as SafetyCheckData
  
  Engine->>Estimator: estimate(event)
  Estimator->>Estimator: pattern match & heuristics
  Estimator-->>Engine: BlastRadius{score, is_critical_resource, explanation}
  
  Engine->>Data: attach blast_radius
  Engine->>Policy: evaluate policies & DSL
  Policy-->>Engine: requires_approval, reasons
  
  Engine->>Engine: check Causal Override
  alt blast_radius.requires_approval && !already_approved
    Engine->>Engine: force requires_approval=true
    Engine->>Engine: append ESCALATED reason
  end
  
  Engine-->>Data: finalized SafetyCheckData
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 With whiskers raised and nose held high,
We trace the blast from earth to sky—
No WHERE? Critical table? Paths that burn?
The Estimator makes the engine turn!
Causal chains now guard the gate,
Approval saves us from our fate.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and specifically describes the main feature being implemented: the Causal Blast Radius Estimator for proactive impact analysis, which aligns perfectly with the primary objective of adding causal/heuristic-based impact estimation to AgentWatch.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
agentwatch/core/blast_radius.py (1)

137-141: 💤 Low value

Consider using regex for critical path detection.

The exact substring match f"rm -rf {path}" in raw won't catch variations like rm -rf /etc (double space) or rm -r -f /etc. These would still get score 95 from the base pattern (above approval threshold), so this isn't a bypass—just a minor gap in the 95→100 elevation.

Proposed regex-based approach
-        for path in critical_paths:
-            if f"rm -rf {path}" in raw or f"rm -rf {path}/" in raw:
+        for path in critical_paths:
+            # Match rm with -r and -f flags (in any order/format) followed by the critical path
+            escaped_path = re.escape(path)
+            if re.search(rf"\brm\s+(-[rf]+\s+)*-[rf]+\s+{escaped_path}(?:/|$|\s)", raw):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agentwatch/core/blast_radius.py` around lines 137 - 141, Replace the brittle
substring checks in the loop over critical_paths with a regex-based match
against raw that allows variable whitespace and flag order (e.g., multiple
spaces, flags like -r and -f in either order, combined flags), and ensure you
escape the path and allow an optional trailing slash; when the regex matches,
set radius.is_critical_resource = True, radius.score = max(radius.score, 100),
and radius.affected_file_count = 50000 as before. Use the existing
critical_paths list, the raw input string, and the radius object
(radius.is_critical_resource, radius.score, radius.affected_file_count) so the
logic location remains the same. Ensure you use a single regex search per path
and avoid changing other scoring logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@agentwatch/core/blast_radius.py`:
- Around line 137-141: Replace the brittle substring checks in the loop over
critical_paths with a regex-based match against raw that allows variable
whitespace and flag order (e.g., multiple spaces, flags like -r and -f in either
order, combined flags), and ensure you escape the path and allow an optional
trailing slash; when the regex matches, set radius.is_critical_resource = True,
radius.score = max(radius.score, 100), and radius.affected_file_count = 50000 as
before. Use the existing critical_paths list, the raw input string, and the
radius object (radius.is_critical_resource, radius.score,
radius.affected_file_count) so the logic location remains the same. Ensure you
use a single regex search per path and avoid changing other scoring logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 182e5179-c90d-4a98-b1ea-1c23a2751db6

📥 Commits

Reviewing files that changed from the base of the PR and between 25781e1 and da1b995.

📒 Files selected for processing (4)
  • agentwatch/core/blast_radius.py
  • agentwatch/core/safety.py
  • agentwatch/core/schema.py
  • tests/test_blast_radius_causal.py

Copy link
Copy Markdown
Owner

@sreerevanth sreerevanth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valuable feature and the implementation direction looks reasonable. However, because it changes core safety-engine decision making and can automatically escalate actions, additional regression coverage is required before merge.

Please add tests demonstrating:

Safe SQL operations are not incorrectly escalated.
Normal filesystem operations are not incorrectly escalated.
Existing safety-engine behavior remains unchanged for low-risk actions.
Backward compatibility of current safety checks.

Once regression coverage is expanded, this should be ready for another review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants