Skip to content

Exp 138: blob-heavy batch parameter shape audit#114

Open
danReynolds wants to merge 1 commit into
mainfrom
exp-138-blob-param-shape-audit
Open

Exp 138: blob-heavy batch parameter shape audit#114
danReynolds wants to merge 1 commit into
mainfrom
exp-138-blob-param-shape-audit

Conversation

@danReynolds
Copy link
Copy Markdown
Owner

Hypothesis

After exp 125 and exp 126 extracted per-string utf8.encode and the temporary Uint8List it allocates from the wide-batch parameter encoder, signals.json#parameter-encoding-and-binding still listed:

Are there remaining blob-heavy parameter shapes where encoding, not SQLite stepping, dominates?

Static reading of _allocatePackedBatchParams says no — Uint8List values are written directly to the inline parameter buffer with view.setRange(...) — but the focused harness shipped only a fixed 4-byte blob mix, which masks any blob-specific cost. This PR closes the question by direct measurement.

Approach

  • Extend benchmark/experiments/batch_param_flatten.dart with --cell-mode=mixed|blob and --blob-size=N. blob mode switches every column to BLOB and every value to a deterministic per-cell Uint8List (content varies by row+col so SQLite cannot collapse identical values).
  • Sweep blob sizes [4, 64, 256, 1024, 4096] bytes across {100, 1000, 10000} rows × {2, 8, 20} params, 30 iterations per shape (15 for the 4 KB / 10k × 20 case).

Mixed-mode behaviour is unchanged, so exp 113 / 125 / 126 baselines stay calibrated.

Results

Full table in benchmark/profile/results/exp-138-blob-param-shape-aggregate.md.

10k × 20 wide shape:

variant total payload p50 (ms) throughput
mixed (anchor) ~1 MB 12.297
blob 4 B 0.8 MB 17.055 47 MB/s
blob 64 B 12.8 MB 93.170 137 MB/s
blob 256 B 51.2 MB 240.479 213 MB/s
blob 1024 B 204.8 MB 1335.526 153 MB/s
ASCII text (exp 125) ~2.4 MB 12.760 188 MB/s
Unicode text (exp 126) ~4.4 MB 18.988 232 MB/s
Emoji text (exp 126) ~5.2 MB 17.458 298 MB/s

Per-byte throughput on all non-overhead-dominated rows lands in a 137–298 MB/s SQLite-stepping band, identical to exp 125 / 126's wide-text band. Tiny-blob shapes (4 B) sit at 47 MB/s because per-row overhead dominates total wall — the same regime as mixed-mode. There is no allocation-removal signature: the band is flat across 64 B+ blob sizes.

Outcome

In Review — measurement. The blob path has no removable Dart-side encoding cost. _allocatePackedBatchParams already writes the caller's Uint8List directly into the inline parameter buffer; further savings would require changing SQLite-side work (bind_blob, page writes, WAL frames), which is out of scope for this direction.

signals.json#parameter-encoding-and-binding.openQuestions drops the blob-encoding entry. The harness mode is the durable contribution: future blob-shape hypotheses can be re-evaluated without re-deriving the workload. Reject blob-encoder allocation removal experiments unless a workload surfaces a blob-specific cost (page pinning, mass copy for JOINs, page-cache invalidation) absent from text shapes.

Test plan

  • dart analyze --fatal-infos on edited files (batch_param_flatten.dart, signals.json, README.md, doc + aggregate) — clean.
  • dart test test/database_test.dart — 47/47 pass (no library code touched).
  • dart run benchmark/check_experiment_signals.dart — signal map valid.
  • dart run benchmark/check_generated_data.darthistory.json and devices.json current after regenerate.
  • dart run benchmark/experiments/batch_param_flatten.dart --cell-mode=blob --blob-size=N for N ∈ {4, 64, 256, 1024, 4096} produced the numbers in the aggregate.

🤖 Generated with Claude Code

Closes the blob-encoding openQuestion in
signals.json#parameter-encoding-and-binding by direct measurement.

Extends `batch_param_flatten.dart` with `--cell-mode=blob --blob-size=N`
so the focused harness can sweep BLOB-only wide batches in addition to
the historical 25/25/25/25 TEXT/INT/REAL/BLOB mix. A 4 / 64 / 256 /
1024 / 4096 byte sweep across 100/1000/10000 rows x 2/8/20 params at
30 iterations per shape (15 for the 4096 B / 10k x 20 case) lands all
non-overhead-dominated rows in a 137-298 MB/s per-byte throughput band,
identical to exp 125 / 126's wide-text shapes.

The signature is SQLite stepping, not removable Dart-side allocation:
`_allocatePackedBatchParams` already writes the caller-supplied
`Uint8List` directly into the inline parameter buffer via
`view.setRange(...)`. There is no analogue of exp 125 / 126's
`utf8.encode` temporary buffer to remove on the blob path.

The harness mode is the durable contribution. Future blob-shape
hypotheses can be evaluated directly against this band; do not retry
blob-encoder allocation removal without a workload that surfaces a
blob-specific cost (page pinning, mass copy for JOINs, page-cache
invalidation) absent from text shapes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 16, 2026 11:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This is a measurement-only experiment (EXP-138) closing the parameter-encoding-and-binding direction's open question about whether wide blob-heavy batches have removable Dart-side encoding cost. The benchmark harness batch_param_flatten.dart gains --cell-mode=blob and --blob-size=N knobs, and a sweep across 4 B–4096 B blobs shows per-byte throughput converging to the same SQLite-stepping band as exp 125/126's wide-text shapes — confirming _allocatePackedBatchParams already writes the caller's Uint8List directly via view.setRange(...) with nothing left to remove.

Changes:

  • Add --cell-mode=mixed|blob and --blob-size=N options to batch_param_flatten.dart, with a deterministic non-collapsible blob generator; mixed-mode default preserves prior calibration.
  • Add experiment writeup, aggregate results table, README entry, and signals.json updates (close the blob open question, add exp 138 to history; move 116 to archive).
  • Regenerate docs/experiments/history.json with the new entry.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
benchmark/experiments/batch_param_flatten.dart Adds blob cell-mode and configurable blob size; mixed mode unchanged.
experiments/138-blob-param-shape-audit.md New experiment writeup with hypothesis, sweep results, and decision.
benchmark/profile/results/exp-138-blob-param-shape-aggregate.md Aggregate p50/throughput tables for the blob sweep.
experiments/signals.json Closes blob-encoding openQuestion; updates currentRead; adds 138 entry; archives 116.
experiments/README.md Adds the EXP-138 row to the in-review list.
docs/experiments/history.json Regenerated to include EXP-138.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants