fix: resolve open bugs #456-#460 across scheduler/retry pipelines by bd73-com · Pull Request #462 · bd73-com/fetchthechange

bd73-com · 2026-04-20T17:39:33Z

Summary

Fixes the five open bugs triaged during /magicwand Phase 6: one high-severity data-loss bug in the Zapier/automation delivery path, two medium-severity multi-replica / boot-hardening bugs in the webhook retry cron and the ensure* startup sequence, and two low-severity hygiene bugs in scheduler shutdown and webhook-URL backfill encryption. All fixes are on server/services/*, server/storage.ts, server/routes.ts, and shared/schema.ts; npm run check, npm run test (2334/2334), and npm run build all pass.

Changes

#456 (high) — Durable retry queue for automation (Zapier REST Hook) deliveries

server/services/automationDelivery.ts: classify delivery failures as transient (5xx / 429 / 408 / network) vs persistent (4xx). Transient failures enqueue a delivery_log row (channel='automation', status='pending', response={subscriptionId, platform, error, transient:true}) and do not bump consecutive_failures. Persistent 4xx keeps the old behaviour (bump counter, potentially auto-deactivate). Adds exported performAutomationDelivery + finalizeAutomationRetry so the retry cron reuses the exact request/classification logic.
server/services/scheduler.ts: new */1 * * * * automation retry cron mirroring the webhook cron — atomic claim, cumulative backoff (5s / 35s / 155s), 3 attempts, stalled-row recovery. Caps at 10 retries per tick.
server/storage.ts: new getPendingAutomationRetries, getAutomationSubscriptionById (with hookUrl decryption).
server/services/automationDelivery.test.ts: replaced the two "increments failures on 500 / network" tests with new ones asserting transient failures queue a retry without bumping the counter; persistent 4xx still bumps it.

#457 (medium) — 503 SCHEMA_NOT_READY gate on critical `ensure*` failures

server/routes.ts: ensureMonitorHealthColumns, ensureMonitorPendingRetryColumn, ensureNotificationQueueColumns, and ensureAutomationSubscriptionsTable each contribute to a criticalSchemaFailures array. If any fail, log CRITICAL: and install a global app.use("/api", ...) handler that returns 503 { code: "SCHEMA_NOT_READY" } for every API route until the server restarts. Avoids the silent per-route 500 storm that the original comment described as "health alerts will be disabled" but was actually a full outage.
Chose the 503 middleware approach over process.exit(1) because it doesn't churn every route-test file's ensureTables mocks.

#458 (medium) — Atomic cross-replica claim for webhook retries

shared/schema.ts: new claimed_at column on delivery_log and expanded status doc comment (pending / processing / success / failed).
server/services/ensureTables.ts: ALTER TABLE delivery_log ADD COLUMN IF NOT EXISTS claimed_at TIMESTAMP (idempotent, added to ensureChannelTables).
server/storage.ts: new claimDeliveryForRetry(id, expectedAttempt, channel) atomic UPDATE using attempt as an optimistic concurrency key — if another replica already claimed the row, zero rows match and we return null. Companion recoverStalledDeliveries(channel, olderThan) re-queues rows stuck in processing for >5 minutes (crashed-replica recovery). Legacy claimWebhookDeliveryForRetry / recoverStalledWebhookDeliveries kept as thin deprecated wrappers.
server/services/scheduler.ts: webhook retry cron now (1) runs stalled-row recovery at the start of each tick, (2) atomically claims each row before delivering (skips if already claimed), (3) clears claimed_at in every final-state update.
Fixes the autoscale double-delivery case described in the issue (same X-FTC-Signature-256 + payload sent from two replicas at the same cron tick).

#459 (low) — Cancellable shutdown poll

server/services/scheduler.ts: waitForActiveChecks poll now goes through trackTimeout so stopScheduler() cancels the recursive 100 ms setTimeout, matching every other timer in the file.

#460 (low) — Per-field skip in webhook-URL backfill encryption

server/services/ensureTables.ts: backfillNotificationChannelWebhookUrls no longer continues the whole row when one of config.url / config.secret fails validation. Instead, it sets per-field urlSkipped / secretSkipped flags so a broken URL doesn't block encryption of a valid whsec_… secret (and vice versa).

Test wiring

server/services/ensureTables.test.ts: expect 9 execute() calls (added ALTER TABLE delivery_log ADD COLUMN claimed_at); added claimed_at to the delivery_log column parity list.
server/services/scheduler.test.ts: mock the new storage methods (claimDeliveryForRetry, recoverStalledDeliveries, getPendingAutomationRetries, getAutomationSubscriptionById) + mock automationDelivery module. Added a microtask-flush in the "skips webhook cron when previous iteration is still running" test so the assertion runs after the new recovery-sweep step completes and the cron is actually parked on the hanging getPendingWebhookRetries.
server/routes.conditions.test.ts: ensureMonitorHealthColumns mock now returns true so the 503 gate isn't installed in this mock app.

How to test

Type check + unit tests

npm run check
npm run test

All 2334 tests should pass.

#459 — shutdown poll cancellation

Start dev server, curl an endpoint that triggers a long-running monitor check so activeChecks > 0.
kill -SIGTERM <pid> and confirm the process exits within SHUTDOWN_TIMEOUT_MS without "unhandled error after resolution" warnings in the log.

#460 — per-field skip

Seed a row: INSERT INTO notification_channels (monitor_id, channel, config) VALUES (<id>, 'webhook', '{"url":"not-a-url","secret":"whsec_abc123plaintext","events":[]}'::jsonb);
Restart the server. Expect log: Skipping URL encryption for notification channel <id>: ….
SELECT config FROM notification_channels WHERE id = <id>; — secret should now be encrypted (no longer the plaintext whsec_abc123plaintext); url remains plaintext.

#457 — SCHEMA_NOT_READY gate

Before boot: ALTER TABLE monitors DROP COLUMN health_alert_sent_at; and revoke ALTER privilege on monitors from the app role.
Start the server. Expect log: CRITICAL: required schema missing — monitor health columns. ….
curl -i http://localhost:5000/api/monitors — expect 503 {"message":"Schema not ready: …","code":"SCHEMA_NOT_READY"} instead of a 500.
Re-grant privilege, restart, and confirm normal 200s.

#458 — cross-replica webhook retry claim

Point two local processes at the same Postgres (simulate autoscale).
Insert a pending webhook delivery: INSERT INTO delivery_log (monitor_id, change_id, channel, status, attempt, created_at) VALUES (<id>,<id>,'webhook','pending',1, NOW() - INTERVAL '10 seconds');
Wait for both replicas' cron tick. Exactly one should log [Webhook] Delivered …; the other should see 0 rows returned from claimDeliveryForRetry and skip.
Verify delivery_log shows a single final status transition (success / pending / failed), no double-writes.
Stalled-row recovery: UPDATE delivery_log SET status='processing', claimed_at=NOW() - INTERVAL '6 minutes' WHERE id=<id>;. Next cron tick should log Recovered 1 stalled delivery row(s) and re-process.

#456 — automation retry queue

Subscribe a Zapier-style hook via POST /api/zapier/v1/subscribe pointing to a test endpoint that returns HTTP 503.
Trigger a change on the monitored URL (or POST /api/monitors/:id/check).
Expect log: Automation delivery transient failure — queued for retry … and a new delivery_log row with channel='automation', status='pending', response.transient=true.
Verify the subscription's consecutive_failures is still 0 (no auto-deactivation from one flaky event).
Switch the endpoint to return HTTP 200. Wait for the next automation-retry cron tick. Expect [Automation] Delivered successfully and the delivery_log row transitions to status='success'.
For persistent failure: switch the endpoint to HTTP 404 before triggering a change. Expect consecutive_failures to increment and no retry row in delivery_log.

https://claude.ai/code/session_016C3aevHcGMqJL7MaEHdqN9

Summary by CodeRabbit

New Features
- API now returns a clear 503 "Schema not ready" response when required database schema pieces are missing, preventing partial service behavior.
- Automation deliveries gain retry handling with durable delivery logs and improved retry scheduling.
Bug Fixes
- Added recovery for stalled webhook and automation delivery attempts to resume pending work.
- Improved distinction between transient and persistent delivery failures for more reliable retry and deactivation behavior.

- #459 (low): waitForActiveChecks poll uses trackTimeout so the 100ms recursive timer is cancelled by stopScheduler(), eliminating the leaked handles that surfaced during graceful shutdown. - #460 (low): webhook-URL backfill now skips only the corrupted field (url OR secret) instead of the whole row, so a malformed url no longer blocks encryption of a valid whsec_ secret (and vice versa). - #457 (med): if any Drizzle-referenced schema (monitor health cols, pending_retry_at, notification_queue cols, automation_subscriptions) fails to provision, install a global /api 503 SCHEMA_NOT_READY gate so clients get a clear signal instead of silent 500s per route. - #458 (med): webhook retries are now atomically claimed via UPDATE ... SET status='processing', claimed_at=NOW() WHERE id=? AND attempt=? RETURNING, so replicas under Replit autoscale never double-deliver. A new claimed_at column enables stalled-row recovery (rows stuck >5min re-queued to 'pending'). Same infra generalised via storage.claimDeliveryForRetry / recoverStalledDeliveries. - #456 (high): automation (Zapier REST Hook) deliveries now classify transient (5xx/429/408/network) vs persistent (4xx) failures. On transient failure we enqueue a delivery_log row and a new automation retry cron re-attempts up to 3 times with cumulative backoff. Only persistent failures count toward consecutive_failures / auto-deact, so a flaky upstream can't silently kill a subscription after one dropped event. https://claude.ai/code/session_016C3aevHcGMqJL7MaEHdqN9

coderabbitai · 2026-04-20T17:39:45Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds delivery-retry and recovery for automation/webhook channels: a nullable claimedAt column and index, storage APIs for claiming/recovering delivery_log rows, scheduler loops for claiming and retrying deliveries (including automation cron), automation delivery outcome classification and enqueueing of transient failures, and a startup middleware that returns 503 if critical schema readiness checks fail.

Changes

Cohort / File(s)	Summary
Schema `shared/schema.ts`	Add `claimedAt` (`claimed_at`) nullable timestamp to `delivery_log`, add composite index on `(channel, status, claimed_at)`, extend documented enums to include `automation` and `processing`.
Ensure tables & backfill `server/services/ensureTables.ts`, `server/services/ensureTables.test.ts`	DDL adds `delivery_log.claimed_at` and index; `ensureChannelTables()` now returns `Promise<boolean>` (true on success, false on failure); backfill logic changed to independently evaluate/encrypt URL and secret per-field. Tests updated to reflect new DDL and return/value expectations.
Automation delivery `server/services/automationDelivery.ts`, `server/services/automationDelivery.test.ts`	Introduce `AutomationDeliveryOutcome` type, add `performAutomationDelivery()` and `finalizeAutomationRetry()`; refactor `deliverToAutomationSubscriptions()` to classify outcomes (success/transient/persistent), enqueue transient failures via `storage.addDeliveryLog`, and handle failures/fallbacks accordingly. Tests updated to assert pending delivery_log entries for transient failures and adjusted deactivation/threshold expectations.
Scheduler & retry loops `server/services/scheduler.ts`, `server/services/scheduler.test.ts`	Use `trackTimeout` + `activeCheckWaiters` for shutdown; add stalled-recovery (`recoverStalledDeliveries`/deprecated webhook aliases) and atomic claim APIs (`claimDeliveryForRetry`) to webhook retry flow; add new automation retry cron mirroring webhook flow (claim → perform → finalize); clear `claimedAt` on status updates; tests expanded with additional mocks and relaxed assertions.
Storage layer `server/storage.ts`	Expand `IStorage`/DatabaseStorage with delivery-log APIs: `addDeliveryLog`, `claimDeliveryForRetry`, `recoverStalledDeliveries`, `getPending*Retries`, `getAutomationSubscriptionById` (with decryption warning behavior); `updateDeliveryLog` accepts `claimedAt` updates; deprecated webhook-specific aliases added.
Routes / startup gating `server/routes.ts`, `server/routes.conditions.test.ts`	Aggregate critical schema readiness failures and install global `"/api"` middleware that returns HTTP 503 `{ message: "Schema not ready: ...", code: "SCHEMA_NOT_READY" }` if any critical checks fail. Test mock adjusted for `ensureMonitorHealthColumns` to resolve `true`.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler
    participant Storage
    participant Automation as Automation API
    participant DB as Database

    Scheduler->>Storage: recoverStalledDeliveries("automation", cutoff)
    Storage->>DB: UPDATE delivery_log SET status='pending', claimed_at=NULL WHERE status='processing' AND claimed_at < cutoff
    DB-->>Storage: recovered count

    Scheduler->>Storage: getPendingAutomationRetries()
    Storage->>DB: SELECT * FROM delivery_log WHERE channel='automation' AND status='pending' AND attempt < max
    DB-->>Storage: pending rows
    Storage-->>Scheduler: pending entries

    Scheduler->>Storage: claimDeliveryForRetry(id, expectedAttempt, "automation")
    Storage->>DB: UPDATE ... SET status='processing', claimed_at=NOW() WHERE id=? AND status='pending' AND attempt=?
    DB-->>Storage: claimed entry or null
    alt claimed
        Scheduler->>Automation: performAutomationDelivery(hookUrl, payload)
        Automation-->>Scheduler: {kind: success|transient|persistent, statusCode?, error?}
        alt success
            Scheduler->>Storage: updateDeliveryLog(id, {status:'success', deliveredAt:NOW(), claimedAt:null})
            Scheduler->>Storage: touchAndResetAutomationSubscription(...)
        else transient
            Scheduler->>Storage: addDeliveryLog({...status:'pending', attempt:1, response:{transient:true,...}})
            Scheduler->>Storage: updateDeliveryLog(id, {status:'failed' or other, claimedAt:null})
        else persistent
            Scheduler->>Storage: handleDeliveryFailure(...) (increment failures/deactivate)
            Scheduler->>Storage: updateDeliveryLog(id, {claimedAt:null})
        end
    else not claimed
        Note over Scheduler: skip (another replica claimed it)
    end

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix: auto-deactivate failing automation subscriptions and clean up stale ones #337 — closely overlaps automation subscriptions, delivery retry APIs, storage claim/recover logic, and scheduler retry loop changes.
Zapier & Make integration — connect FTC changes to 7,000+ apps #336 — touches ensureTables/startup readiness and automation subscription bootstrapping used by routes and tests.
fix: address 10 open bugs — encryption, race conditions, payload limits #364 — modifies delivery-retry, storage delivery-log schema/claimed_at, and scheduler retry mechanics.

Security & Correctness Observations

⚠️ Critical: If storage.addDeliveryLog() fails while handling a transient delivery outcome, current code falls back to handleDeliveryFailure() which treats it like a delivery failure. That can mask storage-layer outages as persistent delivery failures — log and alert explicitly on addDeliveryLog errors before fallback to avoid silent loss of retryability.
⚠️ Concurrency: The optimistic-claim predicate (id, channel, status='pending', attempt=expectedAttempt) depends on consistent attempt increment semantics across paths. Verify attempt increments and finalization paths to avoid duplicate processing or lost retries.
✓ Data leakage: Error strings are redacted for URLs in transient outcomes per tests; ensure redaction covers all fetch/network error paths.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: resolve open bugs `#456-`#460 across scheduler/retry pipelines' directly matches the PR's primary objective of addressing five identified issues across automation delivery, scheduler retry logic, and related infrastructure.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/fix-github-issues-Xumuy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

shared/schema.ts (1)

327-331: 🧹 Nitpick | 🔵 Trivial

Index claimed_at for stalled delivery recovery.

The retry recovery path filters by channel, status, and stale claimedAt every minute, but the current index uses createdAt instead. Add a matching index here and in ensureChannelTables() before delivery_log grows enough for recovery scans to become expensive.

Suggested index addition

 }, (table) => ({
   monitorCreatedIdx: index("delivery_log_monitor_created_idx").on(table.monitorId, table.createdAt),
   channelStatusAttemptIdx: index("delivery_log_channel_status_attempt_idx").on(table.channel, table.status, table.createdAt, table.attempt),
+  channelStatusClaimedIdx: index("delivery_log_channel_status_claimed_idx").on(table.channel, table.status, table.claimedAt),
 }));

Also mirror this in server/services/ensureTables.ts:

 await db.execute(sql`CREATE INDEX IF NOT EXISTS delivery_log_channel_status_attempt_idx ON delivery_log(channel, status, created_at, attempt)`);
+await db.execute(sql`CREATE INDEX IF NOT EXISTS delivery_log_channel_status_claimed_idx ON delivery_log(channel, status, claimed_at)`);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@shared/schema.ts` around lines 327 - 331, The delivery retry recovery queries
filter by channel, status and stale claimedAt but the schema only indexes
createdAt; add a new index on the delivery_log table covering (channel, status,
claimedAt) so recovery scans use the index (in the schema where claimedAt is
defined alongside monitorCreatedIdx and channelStatusAttemptIdx create e.g. a
channelStatusClaimedIdx index on table.channel, table.status, table.claimedAt),
and make the same change inside the ensureChannelTables()/ensureTables.ts logic
that creates/ensures delivery_log so the physical DB gets the identical index
before the table grows large.

server/storage.ts (1)

10-76: 🛠️ Refactor suggestion | 🟠 Major

Add the new retry primitives to IStorage.

The new retry APIs only exist on DatabaseStorage. Any test double or alternate implementation typed as IStorage will miss the scheduler’s new contract. Add the new delivery-log methods to the interface so storage remains a complete boundary.

Proposed interface additions

   cleanupStaleAutomationSubscriptions(olderThan: Date): Promise<number>;
   getRecentChangesForMonitors(monitorIds: number[], limit: number): Promise<MonitorChange[]>;
+
+  // Delivery log retry primitives
+  updateDeliveryLog(
+    id: number,
+    updates: Partial<Pick<DeliveryLogEntry, "status" | "attempt" | "response" | "deliveredAt" | "claimedAt">>,
+  ): Promise<void>;
+  getPendingWebhookRetries(): Promise<DeliveryLogEntry[]>;
+  claimDeliveryForRetry(id: number, expectedAttempt: number, channel: string): Promise<DeliveryLogEntry | null>;
+  claimWebhookDeliveryForRetry(id: number, expectedAttempt: number): Promise<DeliveryLogEntry | null>;
+  recoverStalledDeliveries(channel: string, olderThan: Date): Promise<number>;
+  recoverStalledWebhookDeliveries(olderThan: Date): Promise<number>;
+  getPendingAutomationRetries(): Promise<DeliveryLogEntry[]>;
+  getAutomationSubscriptionById(id: number): Promise<AutomationSubscription | undefined>;
 }

As per coding guidelines, “Never put database queries or Drizzle ORM calls directly in route handlers — all database access must go through methods on the IStorage interface implemented in server/storage.ts.”

Also applies to: 443-524

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@server/storage.ts` around lines 10 - 76, The IStorage interface is missing
the new retry/delivery-log primitives implemented on DatabaseStorage; add the
same method signatures to IStorage so all storage implementations conform to the
scheduler contract. Specifically, in the IStorage declaration add the
delivery-retry methods (for example) createDeliveryLog(monitorId: number,
payload: string, nextAttemptAt: Date): Promise<DeliveryLog>, getDeliveryLog(id:
number): Promise<DeliveryLog | undefined>, listPendingDeliveryLogs(limit:
number): Promise<DeliveryLog[]>, incrementDeliveryAttempt(id: number,
nextAttemptAt: Date): Promise<number>, markDeliveryDelivered(id: number):
Promise<void>, and deleteDeliveryLog(id: number): Promise<void> (or the exact
method names/signatures used on DatabaseStorage) so DatabaseStorage and any test
doubles satisfy the new scheduler contract.

server/services/scheduler.ts (1)

73-85: ⚠️ Potential issue | 🟠 Major

Resolve active-check waiters when stopping the scheduler.

Line 80 now tracks the shutdown polling timer, but stopScheduler() clears tracked timers at Lines 584-585. If activeChecks > 0, the next poll is cancelled and the waitForActiveChecks() promise never reaches either the active-check condition or its timeout. That can hang shutdown despite the documented timeout guarantee.

Proposed fix

 const cronTasks: ReturnType<typeof cron.schedule>[] = [];
 const pendingTimeouts = new Set<ReturnType<typeof setTimeout>>();
+const activeCheckWaiters = new Set<() => void>();
@@
 export function waitForActiveChecks(timeoutMs: number): Promise<void> {
   return new Promise((resolve) => {
+    let settled = false;
+    const finish = () => {
+      if (settled) return;
+      settled = true;
+      activeCheckWaiters.delete(finish);
+      resolve();
+    };
+    activeCheckWaiters.add(finish);
     const start = Date.now();
     const check = () => {
       if (activeChecks === 0 || Date.now() - start > timeoutMs) {
-        resolve();
+        finish();
       } else {
         trackTimeout(check, 100);
       }
@@
   pendingTimeouts.forEach((handle) => { clearTimeout(handle); });
   pendingTimeouts.clear();
+  for (const finish of activeCheckWaiters) finish();
+  activeCheckWaiters.clear();
   retryBackoff.clear();

Also applies to: 579-585

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@server/services/scheduler.ts` around lines 73 - 85, The waitForActiveChecks
Promise can hang when stopScheduler clears tracked timers; to fix, add a small
shutdown-waiter mechanism: in waitForActiveChecks (function waitForActiveChecks)
push the Promise's resolve into a global array (e.g., shutdownWaiters) in
addition to scheduling poll timers with trackTimeout, and when stopScheduler
runs (function stopScheduler) iterate shutdownWaiters and call each resolver
before/while clearing tracked timers so any outstanding waiters are unblocked;
ensure you remove the resolver from shutdownWaiters when the promise resolves
normally and keep using activeChecks, trackTimeout, and the existing timeout
logic unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/routes.ts`:
- Around line 79-91: The ensureChannelTables() function currently swallows
errors during its bootstrap (so an ALTER that fails for delivery_log.claimed_at
can leave the API live but broken); change ensureChannelTables() to return a
boolean: true on success and false from its catch block (or rethrow and let
caller handle), then in the routes initialization replace the current
fire-and-forget call with const channelsReady = await ensureChannelTables(); and
if (!channelsReady) push "channel tables" into criticalSchemaFailures so the
readiness gate will block traffic until the schema fix is applied; reference
ensureChannelTables(), delivery_log.claimed_at, and criticalSchemaFailures when
making the changes.

In `@server/services/automationDelivery.test.ts`:
- Around line 145-173: Add tests in server/services/automationDelivery.test.ts
that mirror the existing "queues durable retry" cases for 500 and network errors
but use HTTP 429 and 408 responses: call deliverToAutomationSubscriptions with
mockGetActiveAutomationSubscriptions returning a subscription and
mockSsrfSafeFetch resolved with { ok: false, status: 429 } and then with status:
408; in each case assert mockAddDeliveryLog was called with an objectContaining
channel: "automation", status: "pending", response containing transient: true
(and include subscriptionId/platform when applicable) and assert
mockIncrementAutomationSubscriptionFailures and
mockTouchAndResetAutomationSubscription were not called. Use the same helper
functions (makeSub, makeMonitor, makeChange) and matchers as the existing tests
to keep style consistent.

In `@server/services/automationDelivery.ts`:
- Around line 71-116: The code currently uses Promise.allSettled on the array
deliveries (created around performAutomationDelivery, storage.addDeliveryLog and
handleDeliveryFailure) but ignores rejected results, allowing failures from
addDeliveryLog fallback or handleDeliveryFailure to be swallowed; change the
completion handling to inspect the results: await Promise.allSettled(deliveries)
then iterate the settled results and for any result.status === "rejected" log a
clear error (including the subscription id, monitor.id and the rejection reason)
and rethrow or return a non-zero failure (e.g., throw the first rejection) so
the caller sees the failure; alternatively use await Promise.all(deliveries) if
you want immediate bubbling of the first rejection—ensure logging uses the same
contextual fields passed to storage.addDeliveryLog/handleDeliveryFailure
(subscriptionId/sub.id, monitorId/monitor.id, platform/sub.platform) so failures
aren’t silent.

In `@server/services/scheduler.test.ts`:
- Around line 36-43: The test currently mocks automation retry hooks but never
exercises the automation retry loop because getPendingAutomationRetries returns
an empty array; update the test to return one or more pending retry records from
getPendingAutomationRetries and add assertions that claimDeliveryForRetry is
invoked with channel="automation", that when claimDeliveryForRetry returns null
the delivery is skipped, that when getAutomationSubscriptionById returns
undefined the code clears claimedAt on the pending retry record, and that a
successful delivery path invokes the delivery finalizer (the function your code
uses to finalize deliveries); also add focused assertions for edge cases (empty
arrays, null/undefined fields), error paths (invalid IDs, failed claim), and a
security/ownership case to satisfy the guidelines.

In `@server/services/scheduler.ts`:
- Around line 478-490: The delivery row is being marked failed via
withDbRetry(storage.updateDeliveryLog(..., entry.id)) before calling
finalizeAutomationRetry(sub.id,...), which can leave the row irrecoverable if
finalization fails; change this so the terminal delivery-log update and the
subscription failure/deactivation side effects are executed atomically in one
storage transaction (e.g., add a new storage method like
finalizeDeliveryAndSubscriptionFailure that accepts entry.id, sub.id,
outcome/platform/monitor and performs updateDeliveryLog plus the
finalizeAutomationRetry work inside a single DB transaction, or call both
updates inside an existing storage.transaction/withDbRetry wrapper) so that
either both the delivery row and subscription counters are committed or none
are.

---

Outside diff comments:
In `@server/services/scheduler.ts`:
- Around line 73-85: The waitForActiveChecks Promise can hang when stopScheduler
clears tracked timers; to fix, add a small shutdown-waiter mechanism: in
waitForActiveChecks (function waitForActiveChecks) push the Promise's resolve
into a global array (e.g., shutdownWaiters) in addition to scheduling poll
timers with trackTimeout, and when stopScheduler runs (function stopScheduler)
iterate shutdownWaiters and call each resolver before/while clearing tracked
timers so any outstanding waiters are unblocked; ensure you remove the resolver
from shutdownWaiters when the promise resolves normally and keep using
activeChecks, trackTimeout, and the existing timeout logic unchanged.

In `@server/storage.ts`:
- Around line 10-76: The IStorage interface is missing the new
retry/delivery-log primitives implemented on DatabaseStorage; add the same
method signatures to IStorage so all storage implementations conform to the
scheduler contract. Specifically, in the IStorage declaration add the
delivery-retry methods (for example) createDeliveryLog(monitorId: number,
payload: string, nextAttemptAt: Date): Promise<DeliveryLog>, getDeliveryLog(id:
number): Promise<DeliveryLog | undefined>, listPendingDeliveryLogs(limit:
number): Promise<DeliveryLog[]>, incrementDeliveryAttempt(id: number,
nextAttemptAt: Date): Promise<number>, markDeliveryDelivered(id: number):
Promise<void>, and deleteDeliveryLog(id: number): Promise<void> (or the exact
method names/signatures used on DatabaseStorage) so DatabaseStorage and any test
doubles satisfy the new scheduler contract.

In `@shared/schema.ts`:
- Around line 327-331: The delivery retry recovery queries filter by channel,
status and stale claimedAt but the schema only indexes createdAt; add a new
index on the delivery_log table covering (channel, status, claimedAt) so
recovery scans use the index (in the schema where claimedAt is defined alongside
monitorCreatedIdx and channelStatusAttemptIdx create e.g. a
channelStatusClaimedIdx index on table.channel, table.status, table.claimedAt),
and make the same change inside the ensureChannelTables()/ensureTables.ts logic
that creates/ensures delivery_log so the physical DB gets the identical index
before the table grows large.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6ab6ad94-bba6-42ea-8f7f-f1f2676307d0

📥 Commits

Reviewing files that changed from the base of the PR and between 1593135 and d582fc0.

📒 Files selected for processing (10)

server/routes.conditions.test.ts
server/routes.ts
server/services/automationDelivery.test.ts
server/services/automationDelivery.ts
server/services/ensureTables.test.ts
server/services/ensureTables.ts
server/services/scheduler.test.ts
server/services/scheduler.ts
server/storage.ts
shared/schema.ts

coderabbitai · 2026-04-20T17:45:27Z

  const apiKeysReady = await ensureApiKeysTable();
  await ensureChannelTables();
  await ensureMonitorChangesIndexes();
  await ensureCampaignPartialIndexes();
  const notificationQueueReady = await ensureNotificationQueueColumns();
  if (!notificationQueueReady) {
-    console.error("CRITICAL: notification_queue columns missing — notification cron queries will fail");
+    criticalSchemaFailures.push("notification_queue columns");
  }
  await ensureTagTables();
  const automationSubsReady = await ensureAutomationSubscriptionsTable();
  if (!automationSubsReady) {
-    console.error("CRITICAL: automation_subscriptions table missing — Zapier endpoints and automation delivery will fail");
+    criticalSchemaFailures.push("automation_subscriptions table");
+  }


⚠️ Potential issue | 🟠 Major

Gate traffic when delivery_log.claimed_at bootstrap fails.

The new schema-not-ready gate checks boolean ensure calls, but ensureChannelTables() still swallows failures. If the new claimed_at ALTER fails, retry/delivery-log code can hit column claimed_at does not exist while /api stays live.

Suggested readiness wiring

const apiKeysReady = await ensureApiKeysTable(); - await ensureChannelTables(); + const channelTablesReady = await ensureChannelTables(); + if (!channelTablesReady) { + criticalSchemaFailures.push("notification channel / delivery_log columns"); + } await ensureMonitorChangesIndexes();

And change ensureChannelTables() to return true on success and false in its catch block.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@server/routes.ts` around lines 79 - 91, The ensureChannelTables() function currently swallows errors during its bootstrap (so an ALTER that fails for delivery_log.claimed_at can leave the API live but broken); change ensureChannelTables() to return a boolean: true on success and false from its catch block (or rethrow and let caller handle), then in the routes initialization replace the current fire-and-forget call with const channelsReady = await ensureChannelTables(); and if (!channelsReady) push "channel tables" into criticalSchemaFailures so the readiness gate will block traffic until the schema fix is applied; reference ensureChannelTables(), delivery_log.claimed_at, and criticalSchemaFailures when making the changes.

coderabbitai · 2026-04-20T17:45:27Z

  const deliveries = subscriptions.map(async (sub) => {
-    try {
-      // Automation hook URLs (e.g. Zapier) are unguessable bearer tokens.
-      // No HMAC signature header — the hookUrl itself authenticates the request.
-      // Webhook channels use X-FTC-Signature-256 with a per-channel secret.
-      const response = await ssrfSafeFetch(sub.hookUrl, {
-        method: "POST",
-        headers: {
-          "Content-Type": "application/json",
-          "User-Agent": "FetchTheChange-Zapier/1.0",
-        },
-        body,
-        signal: AbortSignal.timeout(5000),
+    const outcome = await performAutomationDelivery(sub.hookUrl, body);
+
+    if (outcome.kind === "success") {
+      console.log(`[Automation] Delivered successfully (monitorId=${monitor.id}, platform=${sub.platform}, subId=${sub.id}, status=${outcome.statusCode})`);
+      storage.touchAndResetAutomationSubscription(sub.id).catch((err) => {
+        console.warn(`[Automation] Failed to reset failure counter for subscription ${sub.id}`, err);
      });
+      return;
+    }

-      if (response.ok) {
-        console.log(`[Automation] Delivered successfully (monitorId=${monitor.id}, platform=${sub.platform}, subId=${sub.id}, status=${response.status})`);
-        // Atomic touch + failure counter reset in a single UPDATE
-        storage.touchAndResetAutomationSubscription(sub.id).catch((err) => {
-          console.warn(`[Automation] Failed to reset failure counter for subscription ${sub.id}`, err);
+    if (outcome.kind === "transient") {
+      // Enqueue a durable retry — cron will re-attempt up to 3 times total.
+      try {
+        await storage.addDeliveryLog({
+          monitorId: monitor.id,
+          changeId: change.id,
+          channel: "automation",
+          status: "pending",
+          attempt: 1,
+          response: {
+            subscriptionId: sub.id,
+            platform: sub.platform,
+            error: outcome.error,
+            transient: true,
+          },
+        });
+        console.warn(`[scheduler] Automation delivery transient failure — queued for retry for monitor "${monitor.name}"`, {
+          subscriptionId: sub.id,
+          monitorId: monitor.id,
+          platform: sub.platform,
+          error: outcome.error,
        });
-      } else {
-        await handleDeliveryFailure(sub.id, monitor, sub.platform, `HTTP ${response.status}`);
+      } catch (logErr) {
+        // If we can't enqueue the retry, fall back to the old behavior to avoid losing all signal.
+        console.error(`[Automation] Failed to enqueue retry delivery_log row — falling back to failure counter bump`, logErr);
+        await handleDeliveryFailure(sub.id, monitor, sub.platform, outcome.error);
      }
-    } catch (err) {
-      // Sanitize error to avoid leaking hookUrl secrets in logs
-      const rawMsg = err instanceof Error ? err.message : String(err);
-      const safeError = rawMsg.replace(/https?:\/\/[^\s)]+/g, "[redacted-url]");
-      await handleDeliveryFailure(sub.id, monitor, sub.platform, safeError);
+      return;
    }
+
+    // Persistent (4xx) — no retry; bump failure counter as before.
+    await handleDeliveryFailure(sub.id, monitor, sub.platform, outcome.error);
  });

  await Promise.allSettled(deliveries);


⚠️ Potential issue | 🟠 Major

Log rejected delivery tasks instead of swallowing them.

Promise.allSettled() prevents rethrowing, but the results are ignored. Any rejection from addDeliveryLog() fallback or handleDeliveryFailure() disappears, so failure counters/deactivation can fail silently.

Proposed fix

- await Promise.allSettled(deliveries); + const results = await Promise.allSettled(deliveries); + for (const [index, result] of results.entries()) { + if (result.status === "rejected") { + const sub = subscriptions[index]; + console.error("[Automation] Delivery handling failed", { + subscriptionId: sub?.id, + monitorId: monitor.id, + platform: sub?.platform, + errorMessage: result.reason instanceof Error ? result.reason.message : String(result.reason), + }); + } + } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@server/services/automationDelivery.ts` around lines 71 - 116, The code currently uses Promise.allSettled on the array deliveries (created around performAutomationDelivery, storage.addDeliveryLog and handleDeliveryFailure) but ignores rejected results, allowing failures from addDeliveryLog fallback or handleDeliveryFailure to be swallowed; change the completion handling to inspect the results: await Promise.allSettled(deliveries) then iterate the settled results and for any result.status === "rejected" log a clear error (including the subscription id, monitor.id and the rejection reason) and rethrow or return a non-zero failure (e.g., throw the first rejection) so the caller sees the failure; alternatively use await Promise.all(deliveries) if you want immediate bubbling of the first rejection—ensure logging uses the same contextual fields passed to storage.addDeliveryLog/handleDeliveryFailure (subscriptionId/sub.id, monitorId/monitor.id, platform/sub.platform) so failures aren’t silent.

coderabbitai · 2026-04-20T17:45:27Z

+    claimDeliveryForRetry: vi.fn().mockImplementation(async (id: number, attempt: number, channel: string) => ({
+      id, attempt, channel, status: "processing", createdAt: new Date(),
+      monitorId: id, changeId: id, response: null, deliveredAt: null, claimedAt: new Date(),
+    })),
+    recoverStalledWebhookDeliveries: vi.fn().mockResolvedValue(0),
+    recoverStalledDeliveries: vi.fn().mockResolvedValue(0),
+    getPendingAutomationRetries: vi.fn().mockResolvedValue([]),
+    getAutomationSubscriptionById: vi.fn().mockResolvedValue(undefined),


⚠️ Potential issue | 🟡 Minor

Add assertions for the automation retry cron path.

These mocks wire the new automation retry dependencies, but the default getPendingAutomationRetries: [] means no test exercises the loop. Add coverage for: due retry is claimed with channel="automation", claim miss skips delivery, missing subscriptionId/subscription clears claimedAt, and successful delivery calls the finalizer.

As per coding guidelines, Tests must cover edge cases and error paths - include assertions for edge cases (empty inputs, boundary values, null/undefined), error paths (invalid input, unauthorized access, not found), and security-relevant scenarios (ownership violations, SSRF attempts).

Also applies to: 68-70

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@server/services/scheduler.test.ts` around lines 36 - 43, The test currently mocks automation retry hooks but never exercises the automation retry loop because getPendingAutomationRetries returns an empty array; update the test to return one or more pending retry records from getPendingAutomationRetries and add assertions that claimDeliveryForRetry is invoked with channel="automation", that when claimDeliveryForRetry returns null the delivery is skipped, that when getAutomationSubscriptionById returns undefined the code clears claimedAt on the pending retry record, and that a successful delivery path invokes the delivery finalizer (the function your code uses to finalize deliveries); also add focused assertions for edge cases (empty arrays, null/undefined fields), error paths (invalid IDs, failed claim), and a security/ownership case to satisfy the guidelines.

coderabbitai · 2026-04-20T17:45:27Z

+          } else if (outcome.kind === "persistent" || nextAttempt >= 3) {
+            await withDbRetry(() => storage.updateDeliveryLog(entry.id, {
+              status: "failed",
+              attempt: nextAttempt,
+              claimedAt: null,
+              response: {
+                subscriptionId: sub.id,
+                platform,
+                error: outcome.error,
+                transient: outcome.kind === "transient",
+              } as Record<string, unknown>,
+            }));
+            await finalizeAutomationRetry(sub.id, monitor, platform, outcome);


⚠️ Potential issue | 🟠 Major

Make terminal retry finalization atomic with the subscription failure update.

This marks the delivery row failed and clears claimedAt before finalizeAutomationRetry() increments/deactivates the subscription. If that finalization storage call fails, the row is no longer recoverable and the subscription failure counter stays stale. Commit the delivery-log terminal status and subscription failure/deactivation side effects in one storage transaction, or leave the row recoverable until finalization succeeds.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@server/services/scheduler.ts` around lines 478 - 490, The delivery row is being marked failed via withDbRetry(storage.updateDeliveryLog(..., entry.id)) before calling finalizeAutomationRetry(sub.id,...), which can leave the row irrecoverable if finalization fails; change this so the terminal delivery-log update and the subscription failure/deactivation side effects are executed atomically in one storage transaction (e.g., add a new storage method like finalizeDeliveryAndSubscriptionFailure that accepts entry.id, sub.id, outcome/platform/monitor and performs updateDeliveryLog plus the finalizeAutomationRetry work inside a single DB transaction, or call both updates inside an existing storage.transaction/withDbRetry wrapper) so that either both the delivery row and subscription counters are committed or none are.

- scheduler.ts: waitForActiveChecks now tracks its resolver in activeCheckWaiters; stopScheduler() resolves pending waiters after cancelling the poll timer so shutdown cannot hang. - automationDelivery.ts: inspect Promise.allSettled results and log rejected delivery tasks with subscription / monitor context instead of swallowing them (failures from the addDeliveryLog fallback or handleDeliveryFailure are no longer silent). - ensureTables.ts: ensureChannelTables() now returns boolean and its result is added to criticalSchemaFailures in registerRoutes, so a failed `ALTER TABLE delivery_log ADD COLUMN claimed_at` triggers the /api SCHEMA_NOT_READY gate instead of leaving the API live with broken retry queries. - ensureTables.ts + shared/schema.ts: new delivery_log_channel_status_claimed_idx (channel, status, claimed_at) so the per-tick stalled-delivery recovery UPDATE is index-backed. - storage.ts: added the new delivery-log retry primitives (addDeliveryLog, updateDeliveryLog, claimDeliveryForRetry, recoverStalledDeliveries, getPendingAutomationRetries, getAutomationSubscriptionById, and deprecated webhook-specific wrappers) to the IStorage interface so test doubles and alternate implementations conform to the scheduler contract. - automationDelivery.test.ts: added parametrised 408 / 429 coverage to prove those statuses take the transient-retry path and do not bump consecutive_failures. https://claude.ai/code/session_016C3aevHcGMqJL7MaEHdqN9

…atch Phase 5 skeptic discoveries addressed: browserlessTracker (H1/H2): Same direct-insert pattern resendTracker had — raw db.insert(errorLogs).values({level:"info",...}) bypassed the dedup upsert. After the 6h cooldown elapsed on an unresolved alert row, the next INSERT would hit the partial unique index and throw, silently swallowing the alert (no email sent during the threshold event). Converted to ErrorLogger.info, matching the resendTracker fix. scraper.ts transient console.warn paths (M1/M2): Two paths on this branch use console.warn for transient failures. console.warn bypasses ErrorLogger.sanitizeString/sanitizeContext entirely: - Line ~1460 (save-failure retry): was logging extractedValue/previousValue which contain user-scraped page content — could include API tokens, PII, or internal URLs. Moved those fields to the non-transient ErrorLogger.error branch only; transient stdout log now contains only triage metadata. Test updated to assert the redaction. - Line ~1590 (outer-catch transient): was interpolating raw error.message into the stdout line. undici/fetch errors echo the full requested URL (userinfo creds, ?api_key=... query). Now uses the classified logContext only; the ErrorLogger.error branch keeps the full message since it runs through the sanitizer. Count endpoint / UI dropdown mismatch (M4): The nav badge's count endpoint (GET /api/admin/error-logs/count) counted all unresolved rows regardless of level, while the UI filter dropdown only offers "error"/"info" after the warning-level narrowing. Legacy warning rows inflated the badge but were unreachable from the filter — admin would see "7 unresolved" then find 4 after clicking through. Narrowed count query with inArray(level, ["error","info"]) so badge and filter agree. Browserless-outage stdout flooding (M5): The "Browserless service unavailable" console.warn used to dedup at DB level via ErrorLogger.warning's partial-index upsert. After the warning-level removal, every monitor check during an outage emits a fresh stdout line — N monitors x M replicas. Added a module-level per-process throttle (60s cooldown per message key) for the two infra-wide warnings ("service unavailable" and "circuit breaker open"). Exported _resetInfraWarnThrottleForTests hook; scraper.test.ts beforeEach resets it so tests observe the first emit deterministically. Verified #462 fixes survived the revert: SCHEMA_NOT_READY gate, activeCheckWaiters, claimed_at / claimDeliveryForRetry / recoverStalledDeliveries all present. Deferred to Phase 6 bug report (pre-existing / not regressions): - safeHostname "unknown" fallback ambiguity - errorLogs.level has no DB CHECK constraint - Dual warning/info rows during deploy (operator migration note for PR body: legacy resend_alert_* warning rows should be resolved or deleted at deploy to avoid apparent duplicate alerts) All 2475 tests pass; npm run check clean.

…ise (#469) * Revert "Remove admin error logging system (#461)" This reverts commit 1593135. * Remove warning-level from ErrorLogger; keep error/info in Event Log User intent on #461 was to silence the Browserless warning-noise flood only; reverting #461 brought the Event Log back but also re-enabled warning-level writes that produced the original noise. This drops ErrorLogger.warning() entirely and rewrites every call site to console.warn with the same `[source] message` prefix, so: - Transient/noisy warnings still surface in server logs for ops triage - The admin Event Log only shows error- and info-level events - Dedup/aggregation still applies to the remaining error+info rows Test mocks updated to spy on console.warn instead of ErrorLogger.warning. * test: regression guards for ErrorLogger.warning() removal Adds two assertions to logger.test.ts: - ErrorLogger.warning is undefined (prevents accidental reintroduction) - ErrorLogger.info routes only to console.log, never console.warn/.error * security: redact user URLs in error_logs context; drop warning-level API filter Phase 2 security findings from /magicwand review: H1/H2 (high): Replace `url: monitor.url` with `hostname: safeHostname(monitor.url)` at every log site on the branch. Monitor URLs are user-supplied and can carry basic-auth credentials (`https://user:pw@host`) or query-string secrets (`?api_key=…`) that the ErrorLogger's sanitizeString regex list does not catch. This re-applies PR #461's CodeRabbit hardening on top of the Event Log revert. Sites patched: - server/services/scraper.ts (5 call sites — error, console.warn, circuit-breaker) - server/services/scheduler.ts (1 — scheduled-check failed) - server/services/email.ts (1 — notification email failed) Also drops `rawBrowserlessMsg` from the site-specific console.warn at scraper.ts:1283 — upstream errors can echo back the requested URL with credentials. classifiedReason already captures enough triage detail. Updates the stale comment claiming logger sanitization applies. M1 (medium): API `level` filter in GET /api/admin/error-logs still accepted `"warning"` after commit 3541155 removed the warning-level write path. Narrows the allowlist (and batch-delete Zod enum) to ["error", "info"] and removes the "Warnings" dropdown option in AdminErrors.tsx. safeHostname helper inlined in email.ts to avoid pulling the full monitorValidation→storage→schema chain into email.test.ts's drizzle-orm mock. All 2475 tests pass; npm run check clean. * refactor: consolidate safeHostname; stop direct warning-level writes Phase 3 architecture findings: resendTracker: two direct `db.insert(errorLogs).values({level:"warning",...})` sites bypassed ErrorLogger entirely — producing new warning rows despite the branch's stated "no more warning writes" invariant. Threshold alerts (80/90/95% Resend usage) are legitimately triage-worthy, so route them through ErrorLogger.info instead (preserves admin visibility, sanitization, and dedup-upsert). The in-DB 6-hour cooldown SELECT is retained for rate-limiting. safeHostname consolidation: extracted the hostname helper to dependency-free server/utils/urlUtils.ts. scraper.ts and email.ts dropped their local copies; scheduler.ts and monitorValidation.ts updated to import from the new location (monitorValidation re-exports for back-compat with any external callers). Doc and test cleanup: - shared/schema.ts:93 level comment now flags 'warning' as historical-only - Removed twelve dead `warning: vi.fn()` mocks from route test files — they masked that the ErrorLogger API no longer exposes .warning() All 2475 tests pass; npm run check clean. * chore: code-review cleanup — direct safeHostname imports, valid test fixture Phase 4 code-review minor findings: - adminErrorsUtils.test.ts: swap stale `levelFilter: "warning"` fixture for "info" so the test mirrors real UI traffic (the dropdown no longer offers "warning" and the server Zod schema would 400 it). - monitorValidation.ts: drop the `safeHostname` re-export; routes.ts, routes/extension.ts, routes/v1.ts now import directly from ../utils/urlUtils. extension.test.ts mock moved accordingly. One import path per symbol. All 2475 tests pass; npm run check clean. * harden: skeptic-phase fixes to console.warn sanitization + badge mismatch Phase 5 skeptic discoveries addressed: browserlessTracker (H1/H2): Same direct-insert pattern resendTracker had — raw db.insert(errorLogs).values({level:"info",...}) bypassed the dedup upsert. After the 6h cooldown elapsed on an unresolved alert row, the next INSERT would hit the partial unique index and throw, silently swallowing the alert (no email sent during the threshold event). Converted to ErrorLogger.info, matching the resendTracker fix. scraper.ts transient console.warn paths (M1/M2): Two paths on this branch use console.warn for transient failures. console.warn bypasses ErrorLogger.sanitizeString/sanitizeContext entirely: - Line ~1460 (save-failure retry): was logging extractedValue/previousValue which contain user-scraped page content — could include API tokens, PII, or internal URLs. Moved those fields to the non-transient ErrorLogger.error branch only; transient stdout log now contains only triage metadata. Test updated to assert the redaction. - Line ~1590 (outer-catch transient): was interpolating raw error.message into the stdout line. undici/fetch errors echo the full requested URL (userinfo creds, ?api_key=... query). Now uses the classified logContext only; the ErrorLogger.error branch keeps the full message since it runs through the sanitizer. Count endpoint / UI dropdown mismatch (M4): The nav badge's count endpoint (GET /api/admin/error-logs/count) counted all unresolved rows regardless of level, while the UI filter dropdown only offers "error"/"info" after the warning-level narrowing. Legacy warning rows inflated the badge but were unreachable from the filter — admin would see "7 unresolved" then find 4 after clicking through. Narrowed count query with inArray(level, ["error","info"]) so badge and filter agree. Browserless-outage stdout flooding (M5): The "Browserless service unavailable" console.warn used to dedup at DB level via ErrorLogger.warning's partial-index upsert. After the warning-level removal, every monitor check during an outage emits a fresh stdout line — N monitors x M replicas. Added a module-level per-process throttle (60s cooldown per message key) for the two infra-wide warnings ("service unavailable" and "circuit breaker open"). Exported _resetInfraWarnThrottleForTests hook; scraper.test.ts beforeEach resets it so tests observe the first emit deterministically. Verified #462 fixes survived the revert: SCHEMA_NOT_READY gate, activeCheckWaiters, claimed_at / claimDeliveryForRetry / recoverStalledDeliveries all present. Deferred to Phase 6 bug report (pre-existing / not regressions): - safeHostname "unknown" fallback ambiguity - errorLogs.level has no DB CHECK constraint - Dual warning/info rows during deploy (operator migration note for PR body: legacy resend_alert_* warning rows should be resolved or deleted at deploy to avoid apparent duplicate alerts) All 2475 tests pass; npm run check clean. * fix: safeHostname returns <invalid-url> instead of 'unknown' on parse failure Phase 6 bug-reporter reclassified the Phase-5 "unknown" fallback as in-scope (urlUtils.ts is new code on this branch). A monitor URL of http://unknown/path would collide with the fallback, making "unknown" ambiguous in triage. Swap to `<invalid-url>` — angle brackets are not valid in a real hostname, so the sentinel is guaranteed distinguishable from parsed hostnames. * test: add urlUtils.test.ts covering safeHostname contract New-code coverage gap noted during review: server/utils/urlUtils.ts was added on this branch but had no direct test. Asserts: - well-formed URL → hostname - userinfo credentials stripped from output - query-string secrets not reflected in hostname - parse failures return the `<invalid-url>` sentinel (guard against regression of the previous "unknown" fallback) - real host named `unknown` round-trips cleanly (collision guard) --------- Co-authored-by: Claude <noreply@anthropic.com>

github-actions Bot added the fix label Apr 20, 2026

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

bd73-com merged commit 933260d into main Apr 21, 2026
2 checks passed

github-actions Bot deleted the claude/fix-github-issues-Xumuy branch April 21, 2026 04:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve open bugs #456-#460 across scheduler/retry pipelines#462

fix: resolve open bugs #456-#460 across scheduler/retry pipelines#462
bd73-com merged 2 commits into
mainfrom
claude/fix-github-issues-Xumuy

bd73-com commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Uh oh!

Uh oh!

coderabbitai Bot Apr 20, 2026

Uh oh!

coderabbitai Bot Apr 20, 2026

Uh oh!

coderabbitai Bot Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bd73-com commented Apr 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

#456 (high) — Durable retry queue for automation (Zapier REST Hook) deliveries

#457 (medium) — 503 SCHEMA_NOT_READY gate on critical ensure* failures

#458 (medium) — Atomic cross-replica claim for webhook retries

#459 (low) — Cancellable shutdown poll

#460 (low) — Per-field skip in webhook-URL backfill encryption

Test wiring

How to test

Type check + unit tests

#459 — shutdown poll cancellation

#460 — per-field skip

#457 — SCHEMA_NOT_READY gate

#458 — cross-replica webhook retry claim

#456 — automation retry queue

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bd73-com commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

#457 (medium) — 503 SCHEMA_NOT_READY gate on critical `ensure*` failures

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading