fix(alerting): make PagerDuty routing_key configurable via AlertingConfig#108
fix(alerting): make PagerDuty routing_key configurable via AlertingConfig#108anshul23102 wants to merge 1 commit into
Conversation
…nfig The PagerDuty routing_key was hardcoded as the literal string 'agentwatch'. PagerDuty's Events API v2 requires routing_key to be the 32-character integration key obtained from a specific PagerDuty service. The hardcoded value caused every PagerDuty delivery to fail with HTTP 400, meaning critical security events never reached on-call engineers. Changes: - Add pagerduty_routing_key: str | None = None field to AlertingConfig - Replace hardcoded 'agentwatch' with self._config.pagerduty_routing_key (falls back to empty string when not configured) Closes sreerevanth#99
📝 WalkthroughWalkthroughThe PR makes PagerDuty alert routing configurable by replacing a hardcoded routing key with a configurable field in ChangesPagerDuty Routing Key Configuration
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@agentwatch/alerting/engine.py`:
- Line 80: The code currently builds the PagerDuty payload with "routing_key":
self._config.pagerduty_routing_key or "" which still triggers HTTP 400s; update
alert_event() to validate self._config.pagerduty_routing_key before calling
_post(): if pagerduty_webhook_url is present but pagerduty_routing_key is None
or empty, skip calling _post(), emit a clear warning via the logger indicating
the missing/invalid PagerDuty routing key, and return early (do not attempt
delivery); keep the existing _post() error handling for genuine delivery errors.
- Line 20: The PagerDuty routing key is never wired into AlertingConfig so
AlertingConfig.pagerduty_routing_key stays None/empty; update the code that
constructs AlertingConfig (e.g., in agentwatch/api/server.py where
AlertingConfig is instantiated) to pass os.getenv("PAGERDUTY_KEY") into the
pagerduty_routing_key parameter so AlertingConfig.pagerduty_routing_key (the
attribute defined in agentwatch/alerting/engine.py) receives the actual env
value and the PagerDuty payload uses a non-empty routing_key when available.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: d4ad50f2-da78-4594-8ffb-a7006bb5806f
📒 Files selected for processing (1)
agentwatch/alerting/engine.py
| class AlertingConfig: | ||
| slack_webhook_url: str | None = None | ||
| pagerduty_webhook_url: str | None = None | ||
| pagerduty_routing_key: str | None = None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for PAGERDUTY_KEY environment variable usage
rg -n "PAGERDUTY_KEY" --type=py
# Search for pagerduty_routing_key parameter usage
rg -n "pagerduty_routing_key\s*=" --type=py -C3Repository: sreerevanth/AgentWatch
Length of output: 48
Fix PagerDuty routing key wiring (PAGERDUTY_KEY)
agentwatch/alerting/engine.pysends the PagerDuty payloadrouting_keyasself._config.pagerduty_routing_key or "", butPAGERDUTY_KEYis never referenced/wired anywhere in the Python codebase, sopagerduty_routing_keyremainsNone/empty.- This means PagerDuty can receive an empty routing key even when
pagerduty_webhook_urlis set.
🔌 Suggested fix
In agentwatch/api/server.py, pass the routing key into AlertingConfig:
AlertingConfig(
slack_webhook_url=os.getenv("SLACK_WEBHOOK_URL"),
pagerduty_webhook_url=os.getenv("PAGERDUTY_WEBHOOK_URL"),
pagerduty_routing_key=os.getenv("PAGERDUTY_KEY"),
)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agentwatch/alerting/engine.py` at line 20, The PagerDuty routing key is never
wired into AlertingConfig so AlertingConfig.pagerduty_routing_key stays
None/empty; update the code that constructs AlertingConfig (e.g., in
agentwatch/api/server.py where AlertingConfig is instantiated) to pass
os.getenv("PAGERDUTY_KEY") into the pagerduty_routing_key parameter so
AlertingConfig.pagerduty_routing_key (the attribute defined in
agentwatch/alerting/engine.py) receives the actual env value and the PagerDuty
payload uses a non-empty routing_key when available.
| }, | ||
| "pagerduty": { | ||
| "routing_key": "agentwatch", | ||
| "routing_key": self._config.pagerduty_routing_key or "", |
There was a problem hiding this comment.
Major: Empty string fallback still causes HTTP 400 errors.
The fallback to an empty string ("") when pagerduty_routing_key is not configured will still result in HTTP 400 errors from PagerDuty, as the Events API v2 requires a valid 32-character integration key. While the existing error handling in _post() (lines 96-104) will catch the exception and log a warning, operators won't understand that the failure is due to a missing routing key rather than a network or service issue.
Consider adding validation in alert_event() to skip PagerDuty delivery (with a clear warning) when pagerduty_webhook_url is set but pagerduty_routing_key is empty or None, rather than attempting delivery that will always fail.
🛡️ Proposed fix to validate routing_key before delivery
risk_level = event.safety.risk_level if event.safety else RiskLevel.SAFE
if self._config.pagerduty_webhook_url and self._should_page(risk_level):
+ if not self._config.pagerduty_routing_key:
+ logger.warning(
+ "PagerDuty webhook configured but routing_key is missing. "
+ "Set PAGERDUTY_KEY environment variable."
+ )
+ else:
- sent["pagerduty"] = await self._post(
- self._config.pagerduty_webhook_url,
- payload["pagerduty"],
- )
+ sent["pagerduty"] = await self._post(
+ self._config.pagerduty_webhook_url,
+ payload["pagerduty"],
+ )
return sent🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agentwatch/alerting/engine.py` at line 80, The code currently builds the
PagerDuty payload with "routing_key": self._config.pagerduty_routing_key or ""
which still triggers HTTP 400s; update alert_event() to validate
self._config.pagerduty_routing_key before calling _post(): if
pagerduty_webhook_url is present but pagerduty_routing_key is None or empty,
skip calling _post(), emit a clear warning via the logger indicating the
missing/invalid PagerDuty routing key, and return early (do not attempt
delivery); keep the existing _post() error handling for genuine delivery errors.
|
Hi @sreerevanth, a gentle follow-up on this PR. It has been 1 day since filing. All checks are passing and there are no conflicts. Please review when you get a chance. Happy to make any adjustments. |
Description
_build_payloadhardcoded"routing_key": "agentwatch"in the PagerDuty payload. PagerDuty's Events API v2 requiresrouting_keyto be the 32-character integration key obtained from the specific PagerDuty service. The hardcoded value caused every PagerDuty delivery to fail with HTTP 400, meaning critical security events never reached on-call engineers, even whenpagerduty_webhook_urlwas correctly configured.Related Issue
Closes #99
Type of Change
Root Cause
routing_keywas set to the literal string"agentwatch"instead of being sourced from configuration. Every user or deployment has a different PagerDuty integration key. Hardcoding any specific value makes all deliveries fail for all users.Changes Made
agentwatch/alerting/engine.pypagerduty_routing_key: str | None = Nonefield toAlertingConfigagentwatch/alerting/engine.py"routing_key": "agentwatch"with"routing_key": self._config.pagerduty_routing_key or ""in_build_payloadScreenshots or Demo
Not applicable (no UI change).
Testing Done
ruff checkandruff format --checkpass with no issues.To verify:
AlertingConfig(pagerduty_routing_key="your-32-char-key", pagerduty_webhook_url="https://events.pagerduty.com/v2/enqueue").AlertingEngine(config).alert_event(high_risk_event).routing_keyand creates an incident.Checklist
NSSoC Label Request
@sreerevanth could you please add the appropriate NSoC 26' label to this PR? It helps with tracking and scoring under NSoC '26. Thank you!
Summary by CodeRabbit