Skip to content

Fix HTTP-continuation hang: externalize table-function scan state as a cursor#1

Merged
rustyconover merged 1 commit into
mainfrom
fix/http-continuation-cursor
Jun 24, 2026
Merged

Fix HTTP-continuation hang: externalize table-function scan state as a cursor#1
rustyconover merged 1 commit into
mainfrom
fix/http-continuation-cursor

Conversation

@rustyconover

Copy link
Copy Markdown
Contributor

The bug

The structure table functions (tables, words, pages) were
TableFunctionGenerator[Args] with process(params, state: None, out) doing
out.emit(...ALL rows...); out.finish() in a single tick.

Over the stateless HTTP transport the framework wire-serializes the per-scan
state after every tick and resumes by deserializing it, emitting at most one
producer batch per response
. A position-less state: None generator therefore
restarts from row 0 on every HTTP resume and loops forever once the output
exceeds one batch. words (hundreds–thousands of rows/PDF) and tables (one
row/cell) are genuinely unbounded, so this is a real hang on the http leg.
subprocess/unix hide it by keeping state in-process.

The fix

Convert all six functions to TableFunctionGenerator[Args, ScanState], mirroring
vgi-search's ScanState pattern:

  • ROWS_PER_TICK = 64 + ScanState(ArrowSerializableDataclass) with
    started / offset / rows_ipc (all plainly serializable), plus
    result_to_ipc / ipc_to_table / _stream_slice helpers.
  • initial_state() -> ScanState(); _emit_* refactored into _build_* that
    return the full RecordBatch. process() materializes the full batch into
    rows_ipc on the first tick, then emits a bounded ROWS_PER_TICK slice from
    offset, advancing it and finishing when drained. NULL/empty-source early
    finish paths stay; rows/schema are byte-identical to before.

Validation (fail-old / pass-new)

  • tests/harness.invoke_table_function(..., serialize_state=True) round-trips the
    state through serialize_to_bytes/deserialize_from_bytes between every tick
    (1000-tick guard) — mimics the HTTP wire.
  • TestScanStateRoundTrip / TestCursorSurvivesContinuation assert identical
    rows/order, no dupes, termination, and bounded chunks (>= 2 batches each
    <= ROWS_PER_TICK)
    — this fails on the old emit-all code (one 200-row batch)
    and passes on the cursor code.
  • New manywords.pdf fixture (200 words > ROWS_PER_TICK) + structure.test
    paging case (count = 200, ordered head, distinct = 200) — over http this
    only terminates if the cursor works.

All three transports (subprocess/http/unix) pass locally. CLAUDE.md documents
the cursor and why.

🤖 Generated with Claude Code

…a cursor

The structure table functions (tables/words/pages) were
TableFunctionGenerator[Args] with `process(params, state: None, out)` that did
`out.emit(...ALL rows...); out.finish()` in a single tick. Over the stateless
HTTP transport the framework wire-serializes the per-scan state after each tick
and resumes by deserializing it, emitting at most one producer batch per
response — so a position-less `state: None` generator restarts from row 0 on
every HTTP resume and loops forever once the output exceeds one batch. `words`
(hundreds–thousands of rows/PDF) and `tables` (one row/cell) are genuinely
unbounded, so this is a real hang on the http leg. subprocess/unix hide it by
keeping state in-process.

Convert all six functions to TableFunctionGenerator[Args, ScanState], mirroring
vgi-search's ScanState pattern:
- Add ROWS_PER_TICK = 64 and ScanState(ArrowSerializableDataclass) with
  started/offset/rows_ipc (all plainly serializable), plus result_to_ipc /
  ipc_to_table / _stream_slice helpers.
- Add initial_state() -> ScanState(); refactor _emit_* into _build_* that return
  the full RecordBatch. process() materializes the full batch into rows_ipc on
  the first tick, then emits a bounded ROWS_PER_TICK slice from offset, advancing
  offset and finishing when drained. NULL/empty-source early finish paths stay;
  rows/schema are byte-identical to before.

Validation:
- tests/harness.invoke_table_function gains serialize_state=True, round-tripping
  the state through serialize_to_bytes/deserialize_from_bytes between every tick
  (1000-tick guard) — mimicking the HTTP wire.
- TestScanStateRoundTrip / TestCursorSurvivesContinuation assert identical
  rows/order, no dupes, termination, and bounded chunks (>= 2 batches each
  <= ROWS_PER_TICK — fails on old emit-all code, which emits one 200-row batch).
- New manywords.pdf fixture (200 words > ROWS_PER_TICK) + structure.test paging
  case (count = 200, ordered head, distinct = 200) — over http this only
  terminates if the cursor works.

All three transports (subprocess/http/unix) pass locally; CLAUDE.md documents
the cursor and why.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rustyconover rustyconover merged commit e9bb6ee into main Jun 24, 2026
12 checks passed
@rustyconover rustyconover deleted the fix/http-continuation-cursor branch June 24, 2026 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant