Skip to content

exp 137: long-text cell-size scaling audit#112

Open
danReynolds wants to merge 2 commits into
mainfrom
exp-137-long-text-cell-scaling
Open

exp 137: long-text cell-size scaling audit#112
danReynolds wants to merge 2 commits into
mainfrom
exp-137-long-text-cell-scaling

Conversation

@danReynolds
Copy link
Copy Markdown
Owner

@danReynolds danReynolds commented May 12, 2026

Hypothesis

After exp 110 wired in the 8-byte FNV chunked loop and measured -76% on the 4KB long-text unchanged-fanout benchmark, the long-text-stream-hashing direction still needed a workload sweep beyond 4KB cells. Without that sweep, we could not tell whether long-cell wall time continues to be dominated by byte-stream hashing or whether SQLite text fetch, page-cache behavior, allocation, GC, or isolate transfer takes over at larger cells.

This PR ships the missing measurement. The expected reading was that the 4KB shape would sit above the per-byte band because per-iteration overhead is still meaningful there, while 16KB+ cells should converge toward the actual hash-loop throughput band if hashing remains the dominant cost.

Approach

Adds benchmark/profile/long_text_scaling_audit.dart, a profile harness that mirrors exp 110's unchanged-fanout shape: 8 unchanged streams x 256 rows x ASCII TEXT, plus one barrier stream. It sweeps [4KB, 16KB, 32KB, 64KB, 128KB] cells, with 3 warmups and 30 timed iterations per size.

The harness uses a fixed out-of-range barrier row (id = 999999) and UPDATEs that row each iteration. The barrier stream is SELECT id, body FROM long_items WHERE id = ?, so the barrier result stays exactly one row and the unchanged streams stay exactly 256 rows. That keeps hashed_bytes_per_iter = cell_bytes * (8 * 256 + 1) constant within each size. The unchanged streams must not emit; the harness asserts that on every iteration.

Full details are in experiments/137-long-text-cell-scaling.md; aggregate output is in benchmark/profile/results/exp-137-long-text-scaling-aggregate.md.

Results

Three repeated passes; values bracket the per-run band.

Per-iteration wall:

cell size median_ms (a/b/c) p90_ms p99_ms
4KB 1.35 / 1.29 / 1.35 1.83 - 2.15 2.39 - 2.66
16KB 2.46 / 2.49 / 2.42 2.89 - 3.41 3.68 - 4.29
32KB 5.28 / 5.03 / 5.24 5.60 - 6.11 5.85 - 8.76
64KB 9.21 / 8.92 / 9.11 9.35 - 10.88 15.19 - 18.73
128KB 17.40 / 18.85 / 17.31 18.78 - 27.93 21.98 - 32.93

Per-byte cost:

cell size hashed_bytes_per_iter ns_per_byte (median, a/b/c)
4KB 8,392,704 0.160 / 0.154 / 0.161
16KB 33,570,816 0.073 / 0.074 / 0.072
32KB 67,141,632 0.079 / 0.075 / 0.078
64KB 134,283,264 0.069 / 0.066 / 0.068
128KB 268,566,528 0.065 / 0.070 / 0.064

Headline reading: wall scales linearly with bytes from 16KB up, and the 16KB+ per-byte cost converges to a stable 0.065 - 0.080 ns/byte band. Hashing remains the dominant cost on long-cell unchanged-fanout workloads at meaningful cell sizes. The 4KB row sits about 2x above the larger-size per-byte band because per-iteration overhead is comparable to the hashing work at that size.

The 64KB and 128KB rows have wider p99/min-to-max spread, likely from the harness's per-iteration String allocation crossing Dart VM old-generation heap-region thresholds. The medians still sit cleanly inside the same per-byte band, so this does not change the linear-scaling verdict.

Outcome

In Review - measurement.

This closes the long-payload streaming workload at sizes beyond exp 110's 4KB cells blockedOnMeasurement entry and the matching broader long-payload openCandidate in signals.json. It adds two follow-up candidates: a wider FNV unroll / SIMD probe gated on a real >=16KB workload, and a BLOB-shape companion sweep to confirm TEXT/BLOB symmetry.

Future hash-loop variants should compare against the exp 137 16KB+ band, not exp 110's 4KB benchmark. The current 4KB release-suite shape is per-iteration-overhead-bound and should not move proportionally to a per-byte hash improvement.

Test plan

  • dart analyze - same 83 pre-existing warnings on main, no new issues from the audit harness
  • dart test test/stream_test.dart test/query_decoder_test.dart - 27 stream tests plus decoder tests pass
  • dart run benchmark/check_experiment_signals.dart passes
  • dart run benchmark/check_generated_data.dart passes after regenerating docs/experiments/history.json
  • dart run -DRESQLITE_PROFILE=true benchmark/profile/long_text_scaling_audit.dart --markdown ran 3x with stable median bands

Sweeps the exp 110 unchanged-fanout shape across [4KB, 16KB, 32KB,
64KB, 128KB] cells. Wall scales linearly with bytes from 16KB up;
per-byte cost converges to a stable 0.12–0.19 ns/byte band on the
existing 8-byte FNV chunked loop. The 4KB release shape sits ~2x
above the band because per-iteration overhead dominates at that
size — a faster hash variant would barely move it.

Closes the long-text-stream-hashing direction's blockedOnMeasurement
gate and replaces the broader-payload openCandidate with a wider
FNV / SIMD probe candidate gated on a real ≥16KB workload.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 12, 2026 11:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds experiment 137 measurement artifacts to audit how long-text stream hashing wall time scales with increasing TEXT cell sizes, and updates the experiment tracking/registry to reflect the new measurement and follow-up candidates.

Changes:

  • Introduces a new profile-mode harness to sweep long-text cell sizes (4KB→128KB) using the exp 110 unchanged-fanout shape.
  • Adds experiment 137 documentation + aggregate results markdown, and records the experiment in signals.json and the experiments index.
  • Regenerates docs/experiments/history.json to include the new experiment entry.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
experiments/signals.json Updates the long-text hashing direction state and adds the exp 137 experiment signals entry.
experiments/README.md Adds exp 137 to the “In Review” experiment index table.
experiments/137-long-text-cell-scaling.md New writeup documenting the exp 137 hypothesis/approach/results and conclusions.
docs/experiments/history.json Regenerated history to include exp 137.
benchmark/profile/results/exp-137-long-text-scaling-aggregate.md New aggregate markdown report emitted by the exp 137 harness.
benchmark/profile/long_text_scaling_audit.dart New profile harness that runs the scaling sweep and emits the aggregate report.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread benchmark/profile/long_text_scaling_audit.dart Outdated
Comment thread benchmark/profile/long_text_scaling_audit.dart Outdated
Comment thread benchmark/profile/long_text_scaling_audit.dart
Comment thread benchmark/profile/results/exp-137-long-text-scaling-aggregate.md Outdated
Comment thread experiments/137-long-text-cell-scaling.md Outdated
Comment thread docs/experiments/history.json Outdated
Switch the long-text scaling audit harness from an INSERT-driven
barrier (whose `SELECT id, body FROM long_items ORDER BY id` projection
grows by one row per iteration) to a fixed-row UPDATE-driven barrier
at `id = 999999`, picked outside every unchanged stream's
`id < 256` predicate. The barrier stream becomes
`SELECT id, body FROM long_items WHERE id = ?` so its result stays
at exactly one row across every iteration; the unchanged streams
stay at exactly 256 rows. Per-iteration hashed payload is now
constant within each cell size, so `ns_per_byte` is no longer
biased toward later (heavier) iterations.

Also fixes a tempdir leak: `Database.open` is now inside the outer
`try` so the `await tempDir.delete(recursive: true)` in the
`finally` always runs even if open throws.

Re-ran the audit three passes; the corrected per-byte band sits at
0.065 – 0.080 ns/byte from 16KB up (~13–15 GB/s implied per-stream
throughput). The qualitative verdict is unchanged: linear scaling
with bytes from 16KB up, 4KB shape sits ~2x above the band because
per-iteration overhead dominates. signals.json, the experiment
writeup, the aggregate markdown, and the regenerated
docs/experiments/history.json are all updated to match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants