exp 137: long-text cell-size scaling audit#112
Open
danReynolds wants to merge 2 commits into
Open
Conversation
Sweeps the exp 110 unchanged-fanout shape across [4KB, 16KB, 32KB, 64KB, 128KB] cells. Wall scales linearly with bytes from 16KB up; per-byte cost converges to a stable 0.12–0.19 ns/byte band on the existing 8-byte FNV chunked loop. The 4KB release shape sits ~2x above the band because per-iteration overhead dominates at that size — a faster hash variant would barely move it. Closes the long-text-stream-hashing direction's blockedOnMeasurement gate and replaces the broader-payload openCandidate with a wider FNV / SIMD probe candidate gated on a real ≥16KB workload. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds experiment 137 measurement artifacts to audit how long-text stream hashing wall time scales with increasing TEXT cell sizes, and updates the experiment tracking/registry to reflect the new measurement and follow-up candidates.
Changes:
- Introduces a new profile-mode harness to sweep long-text cell sizes (4KB→128KB) using the exp 110 unchanged-fanout shape.
- Adds experiment 137 documentation + aggregate results markdown, and records the experiment in
signals.jsonand the experiments index. - Regenerates
docs/experiments/history.jsonto include the new experiment entry.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| experiments/signals.json | Updates the long-text hashing direction state and adds the exp 137 experiment signals entry. |
| experiments/README.md | Adds exp 137 to the “In Review” experiment index table. |
| experiments/137-long-text-cell-scaling.md | New writeup documenting the exp 137 hypothesis/approach/results and conclusions. |
| docs/experiments/history.json | Regenerated history to include exp 137. |
| benchmark/profile/results/exp-137-long-text-scaling-aggregate.md | New aggregate markdown report emitted by the exp 137 harness. |
| benchmark/profile/long_text_scaling_audit.dart | New profile harness that runs the scaling sweep and emits the aggregate report. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Switch the long-text scaling audit harness from an INSERT-driven barrier (whose `SELECT id, body FROM long_items ORDER BY id` projection grows by one row per iteration) to a fixed-row UPDATE-driven barrier at `id = 999999`, picked outside every unchanged stream's `id < 256` predicate. The barrier stream becomes `SELECT id, body FROM long_items WHERE id = ?` so its result stays at exactly one row across every iteration; the unchanged streams stay at exactly 256 rows. Per-iteration hashed payload is now constant within each cell size, so `ns_per_byte` is no longer biased toward later (heavier) iterations. Also fixes a tempdir leak: `Database.open` is now inside the outer `try` so the `await tempDir.delete(recursive: true)` in the `finally` always runs even if open throws. Re-ran the audit three passes; the corrected per-byte band sits at 0.065 – 0.080 ns/byte from 16KB up (~13–15 GB/s implied per-stream throughput). The qualitative verdict is unchanged: linear scaling with bytes from 16KB up, 4KB shape sits ~2x above the band because per-iteration overhead dominates. signals.json, the experiment writeup, the aggregate markdown, and the regenerated docs/experiments/history.json are all updated to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hypothesis
After exp 110 wired in the 8-byte FNV chunked loop and measured -76% on the 4KB long-text unchanged-fanout benchmark, the
long-text-stream-hashingdirection still needed a workload sweep beyond 4KB cells. Without that sweep, we could not tell whether long-cell wall time continues to be dominated by byte-stream hashing or whether SQLite text fetch, page-cache behavior, allocation, GC, or isolate transfer takes over at larger cells.This PR ships the missing measurement. The expected reading was that the 4KB shape would sit above the per-byte band because per-iteration overhead is still meaningful there, while 16KB+ cells should converge toward the actual hash-loop throughput band if hashing remains the dominant cost.
Approach
Adds
benchmark/profile/long_text_scaling_audit.dart, a profile harness that mirrors exp 110's unchanged-fanout shape: 8 unchanged streams x 256 rows x ASCII TEXT, plus one barrier stream. It sweeps[4KB, 16KB, 32KB, 64KB, 128KB]cells, with 3 warmups and 30 timed iterations per size.The harness uses a fixed out-of-range barrier row (
id = 999999) and UPDATEs that row each iteration. The barrier stream isSELECT id, body FROM long_items WHERE id = ?, so the barrier result stays exactly one row and the unchanged streams stay exactly 256 rows. That keepshashed_bytes_per_iter = cell_bytes * (8 * 256 + 1)constant within each size. The unchanged streams must not emit; the harness asserts that on every iteration.Full details are in
experiments/137-long-text-cell-scaling.md; aggregate output is inbenchmark/profile/results/exp-137-long-text-scaling-aggregate.md.Results
Three repeated passes; values bracket the per-run band.
Per-iteration wall:
Per-byte cost:
Headline reading: wall scales linearly with bytes from 16KB up, and the 16KB+ per-byte cost converges to a stable 0.065 - 0.080 ns/byte band. Hashing remains the dominant cost on long-cell unchanged-fanout workloads at meaningful cell sizes. The 4KB row sits about 2x above the larger-size per-byte band because per-iteration overhead is comparable to the hashing work at that size.
The 64KB and 128KB rows have wider p99/min-to-max spread, likely from the harness's per-iteration String allocation crossing Dart VM old-generation heap-region thresholds. The medians still sit cleanly inside the same per-byte band, so this does not change the linear-scaling verdict.
Outcome
In Review - measurement.
This closes the
long-payload streaming workload at sizes beyond exp 110's 4KB cellsblockedOnMeasuremententry and the matching broader long-payloadopenCandidateinsignals.json. It adds two follow-up candidates: a wider FNV unroll / SIMD probe gated on a real >=16KB workload, and a BLOB-shape companion sweep to confirm TEXT/BLOB symmetry.Future hash-loop variants should compare against the exp 137 16KB+ band, not exp 110's 4KB benchmark. The current 4KB release-suite shape is per-iteration-overhead-bound and should not move proportionally to a per-byte hash improvement.
Test plan
dart analyze- same 83 pre-existing warnings onmain, no new issues from the audit harnessdart test test/stream_test.dart test/query_decoder_test.dart- 27 stream tests plus decoder tests passdart run benchmark/check_experiment_signals.dartpassesdart run benchmark/check_generated_data.dartpasses after regeneratingdocs/experiments/history.jsondart run -DRESQLITE_PROFILE=true benchmark/profile/long_text_scaling_audit.dart --markdownran 3x with stable median bands