Skip to content

fix(ingest): clear agent_event by primary id so index drift can't crash re-ingest#141

Merged
Necmttn merged 1 commit into
mainfrom
fix/agent-event-index-clear
Jun 7, 2026
Merged

fix(ingest): clear agent_event by primary id so index drift can't crash re-ingest#141
Necmttn merged 1 commit into
mainfrom
fix/agent-event-index-clear

Conversation

@Necmttn
Copy link
Copy Markdown
Owner

@Necmttn Necmttn commented Jun 7, 2026

Problem

ax update / ax ingest crashed mid-pipeline with:

Database index `agent_event_session_seq` already contains
  [agent_session:codex__019df150…, 181],
  with record `agent_event:…__compacted_181__…`

Root cause

The per-session clear ran DELETE agent_event WHERE agent_session = X. SurrealDB plans that through the agent_event_session_seq (agent_session, seq) UNIQUE index. A long-lived DB had drifted: it held genuine duplicate (agent_session, seq) rows the index let in, and the index-driven DELETE silently skipped the drifted rows — yet their ghost entries still blocked the fresh (agent_session, seq) INSERT, crashing the next ingest.

Verified live:

  • DELETE … WHERE agent_session=X took one session 190→10 rows (10 ghosts survived with the correct agent_session link).
  • SELECT … WHERE agent_session=X and DELETE agent_event:<id> (by primary id) both saw/removed every row.
  • Scope: 18,936 dup (session, seq) groups, ~19,089 excess rows, 27 codex sessions.

Fix

Delete via id IN (SELECT VALUE id FROM agent_event WHERE agent_session = X) — the inner SELECT reliably enumerates every row; the outer DELETE removes them by primary id, never consulting the corruptible secondary index. This also self-heals a corrupt session on its next re-ingest.

Plus scripts/repair-agent-event-index.ts (+ unit-tested planSessionDedup) to globally dedupe + rebuild the index for an already-corrupt DB (--dry-run supported), and a strengthened builder test that forbids regressing to the bare-WHERE delete.

Verification

  • Local DB repaired (19,089 rows removed, index rebuilt UNIQUE), then full ax ingest ran end-to-end EXIT=0.
  • Re-ran the exact failing file's 925-statement batch: applies cleanly.
  • bun test provider-events + repair script: 9 pass.
  • Repair --dry-run against repaired DB: 0 duplicates.

🤖 Generated with Claude Code

…sh re-ingest

The per-session clear ran `DELETE agent_event WHERE agent_session = X`, which
SurrealDB plans through the `agent_event_session_seq` (agent_session, seq)
UNIQUE index. A long-lived DB can accumulate stale/ghost index entries (seen
across a SurrealDB version change + prior partial ingests). When that happens
the index-driven DELETE silently skips the drifted rows, but their entries
still block the fresh `(agent_session, seq)` INSERT, so the next ingest crashes:

  Database index `agent_event_session_seq` already contains
    [agent_session:codex__…, 181], with record `agent_event:…__compacted_181__…`

Observed live: 27 codex sessions held ~19k duplicate `(agent_session, seq)`
rows the index let in; a bare WHERE-delete removed only a subset (190 -> 10 on
one session) while a SELECT and a delete-by-id both saw/removed every row.

Fix: delete via `id IN (SELECT VALUE id FROM agent_event WHERE agent_session = X)`
so the inner SELECT reliably enumerates every row and the outer DELETE removes
them by PRIMARY id, never consulting the corruptible secondary index. This also
self-heals a corrupt session on its next re-ingest.

Adds scripts/repair-agent-event-index.ts (+ unit-tested `planSessionDedup`) to
globally dedupe + rebuild the index for an already-corrupt DB, and strengthens
the builder test to forbid regressing to the bare-WHERE delete.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying ax with  Cloudflare Pages  Cloudflare Pages

Latest commit: 30c406c
Status: ✅  Deploy successful!
Preview URL: https://c1d11d7b.ax-62d.pages.dev
Branch Preview URL: https://fix-agent-event-index-clear.ax-62d.pages.dev

View logs

@Necmttn Necmttn merged commit 1090c5f into main Jun 7, 2026
2 checks passed
@Necmttn Necmttn deleted the fix/agent-event-index-clear branch June 7, 2026 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant