exp 137: long-text cell-size scaling audit by danReynolds · Pull Request #112 · danReynolds/resqlite

danReynolds · 2026-05-12T11:20:02Z

Hypothesis

After exp 110 wired in the 8-byte FNV chunked loop and measured -76% on the 4KB long-text unchanged-fanout benchmark, the long-text-stream-hashing direction still needed a workload sweep beyond 4KB cells. Without that sweep, we could not tell whether long-cell wall time continues to be dominated by byte-stream hashing or whether SQLite text fetch, page-cache behavior, allocation, GC, or isolate transfer takes over at larger cells.

This PR ships the missing measurement. The expected reading was that the 4KB shape would sit above the per-byte band because per-iteration overhead is still meaningful there, while 16KB+ cells should converge toward the actual hash-loop throughput band if hashing remains the dominant cost.

Approach

Adds benchmark/profile/long_text_scaling_audit.dart, a profile harness that mirrors exp 110's unchanged-fanout shape: 8 unchanged streams x 256 rows x ASCII TEXT, plus one barrier stream. It sweeps [4KB, 16KB, 32KB, 64KB, 128KB] cells, with 3 warmups and 30 timed iterations per size.

The harness uses a fixed out-of-range barrier row (id = 999999) and UPDATEs that row each iteration. The barrier stream is SELECT id, body FROM long_items WHERE id = ?, so the barrier result stays exactly one row and the unchanged streams stay exactly 256 rows. That keeps hashed_bytes_per_iter = cell_bytes * (8 * 256 + 1) constant within each size. The unchanged streams must not emit; the harness asserts that on every iteration.

Full details are in experiments/137-long-text-cell-scaling.md; aggregate output is in benchmark/profile/results/exp-137-long-text-scaling-aggregate.md.

Results

Three repeated passes; values bracket the per-run band.

Per-iteration wall:

cell size	median_ms (a/b/c)	p90_ms	p99_ms
4KB	1.35 / 1.29 / 1.35	1.83 - 2.15	2.39 - 2.66
16KB	2.46 / 2.49 / 2.42	2.89 - 3.41	3.68 - 4.29
32KB	5.28 / 5.03 / 5.24	5.60 - 6.11	5.85 - 8.76
64KB	9.21 / 8.92 / 9.11	9.35 - 10.88	15.19 - 18.73
128KB	17.40 / 18.85 / 17.31	18.78 - 27.93	21.98 - 32.93

Per-byte cost:

cell size	hashed_bytes_per_iter	ns_per_byte (median, a/b/c)
4KB	8,392,704	0.160 / 0.154 / 0.161
16KB	33,570,816	0.073 / 0.074 / 0.072
32KB	67,141,632	0.079 / 0.075 / 0.078
64KB	134,283,264	0.069 / 0.066 / 0.068
128KB	268,566,528	0.065 / 0.070 / 0.064

Headline reading: wall scales linearly with bytes from 16KB up, and the 16KB+ per-byte cost converges to a stable 0.065 - 0.080 ns/byte band. Hashing remains the dominant cost on long-cell unchanged-fanout workloads at meaningful cell sizes. The 4KB row sits about 2x above the larger-size per-byte band because per-iteration overhead is comparable to the hashing work at that size.

The 64KB and 128KB rows have wider p99/min-to-max spread, likely from the harness's per-iteration String allocation crossing Dart VM old-generation heap-region thresholds. The medians still sit cleanly inside the same per-byte band, so this does not change the linear-scaling verdict.

Outcome

In Review - measurement.

This closes the long-payload streaming workload at sizes beyond exp 110's 4KB cells blockedOnMeasurement entry and the matching broader long-payload openCandidate in signals.json. It adds two follow-up candidates: a wider FNV unroll / SIMD probe gated on a real >=16KB workload, and a BLOB-shape companion sweep to confirm TEXT/BLOB symmetry.

Future hash-loop variants should compare against the exp 137 16KB+ band, not exp 110's 4KB benchmark. The current 4KB release-suite shape is per-iteration-overhead-bound and should not move proportionally to a per-byte hash improvement.

Test plan

dart analyze - same 83 pre-existing warnings on main, no new issues from the audit harness
dart test test/stream_test.dart test/query_decoder_test.dart - 27 stream tests plus decoder tests pass
dart run benchmark/check_experiment_signals.dart passes
dart run benchmark/check_generated_data.dart passes after regenerating docs/experiments/history.json
dart run -DRESQLITE_PROFILE=true benchmark/profile/long_text_scaling_audit.dart --markdown ran 3x with stable median bands

Sweeps the exp 110 unchanged-fanout shape across [4KB, 16KB, 32KB, 64KB, 128KB] cells. Wall scales linearly with bytes from 16KB up; per-byte cost converges to a stable 0.12–0.19 ns/byte band on the existing 8-byte FNV chunked loop. The 4KB release shape sits ~2x above the band because per-iteration overhead dominates at that size — a faster hash variant would barely move it. Closes the long-text-stream-hashing direction's blockedOnMeasurement gate and replaces the broader-payload openCandidate with a wider FNV / SIMD probe candidate gated on a real ≥16KB workload. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Adds experiment 137 measurement artifacts to audit how long-text stream hashing wall time scales with increasing TEXT cell sizes, and updates the experiment tracking/registry to reflect the new measurement and follow-up candidates.

Changes:

Introduces a new profile-mode harness to sweep long-text cell sizes (4KB→128KB) using the exp 110 unchanged-fanout shape.
Adds experiment 137 documentation + aggregate results markdown, and records the experiment in signals.json and the experiments index.
Regenerates docs/experiments/history.json to include the new experiment entry.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
experiments/signals.json	Updates the long-text hashing direction state and adds the exp 137 experiment signals entry.
experiments/README.md	Adds exp 137 to the “In Review” experiment index table.
experiments/137-long-text-cell-scaling.md	New writeup documenting the exp 137 hypothesis/approach/results and conclusions.
docs/experiments/history.json	Regenerated history to include exp 137.
benchmark/profile/results/exp-137-long-text-scaling-aggregate.md	New aggregate markdown report emitted by the exp 137 harness.
benchmark/profile/long_text_scaling_audit.dart	New profile harness that runs the scaling sweep and emits the aggregate report.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Switch the long-text scaling audit harness from an INSERT-driven barrier (whose `SELECT id, body FROM long_items ORDER BY id` projection grows by one row per iteration) to a fixed-row UPDATE-driven barrier at `id = 999999`, picked outside every unchanged stream's `id < 256` predicate. The barrier stream becomes `SELECT id, body FROM long_items WHERE id = ?` so its result stays at exactly one row across every iteration; the unchanged streams stay at exactly 256 rows. Per-iteration hashed payload is now constant within each cell size, so `ns_per_byte` is no longer biased toward later (heavier) iterations. Also fixes a tempdir leak: `Database.open` is now inside the outer `try` so the `await tempDir.delete(recursive: true)` in the `finally` always runs even if open throws. Re-ran the audit three passes; the corrected per-byte band sits at 0.065 – 0.080 ns/byte from 16KB up (~13–15 GB/s implied per-stream throughput). The qualitative verdict is unchanged: linear scaling with bytes from 16KB up, 4KB shape sits ~2x above the band because per-iteration overhead dominates. signals.json, the experiment writeup, the aggregate markdown, and the regenerated docs/experiments/history.json are all updated to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 12, 2026 11:20

Copilot started reviewing on behalf of danReynolds May 12, 2026 11:20 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

danReynolds added codex codex-automation labels May 13, 2026

danReynolds mentioned this pull request May 14, 2026

Exp 127: writer-isolate dispatch wall audit #108

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp 137: long-text cell-size scaling audit#112

exp 137: long-text cell-size scaling audit#112
danReynolds wants to merge 2 commits into
mainfrom
exp-137-long-text-cell-scaling

danReynolds commented May 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danReynolds commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hypothesis

Approach

Results

Outcome

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danReynolds commented May 12, 2026 •

edited

Loading