Fix: stop consuming fuzzer bytes for single-choice .values / .pointers on arrays#8
Fix: stop consuming fuzzer bytes for single-choice .values / .pointers on arrays#8N3ur0sis wants to merge 1 commit into
Conversation
…ains When .values or .pointers has at most one choice, the index is fixed; skip data[off] and off += 1 in the sampler and charge 0 bytes in neededBytesFromGlobals. Add tests/big_pool regression (see README).
There was a problem hiding this comment.
Pull request overview
This PR fixes a seed-size / input-consumption inefficiency in the C sampler generation by avoiding consuming a fuzzer byte when a .values or .pointers domain has only a single allowed choice, and adds an integration test that would previously inflate ABSOLUTION_GLOBALS_SIZE for large arrays.
Changes:
- Update seed byte accounting so
.values/.pointersdomains withlen <= 1contribute0consumed bytes. - Update C sampler emission to skip
data[off]reads andoff += 1when there is only one possible.values/.pointerschoice. - Add a regression test (
tests/big_pool/) covering a large array with a single-value.valuesdomain plus an additional global to validate the prefix stays small.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/seed.zig |
Makes neededBytesFromGlobals return 0 bytes for single/empty .values and .pointers domains to match sampler behavior. |
src/cgen/emit.zig |
Adjusts emitted sampler code to avoid consuming input bytes when the domain has only one choice. |
tests/big_pool/target.c |
Adds a large global array and a secondary global used by the regression test. |
tests/big_pool/target.c.in |
Provides an invariant constraining the large array to a single allowed value. |
tests/big_pool/target.c.zon |
Adds the golden expected .zon output for the new test. |
tests/big_pool/README.md |
Documents the regression scenario the test is intended to catch. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I am wondering if our current emition strategy is just wrong and should be reworked. The current behavior is to use values for every element of an array. Wdyt ? |
|
I tried an alternative approach here I've added a field type "whole_field" to keep the current behavior that can be useful in some cases |
|
Closing in favor of #10 |
For a field with .values or .pointers, the sampler always read one byte from the fuzzer and did off += 1 to pick an index, even when there is only one allowed choice (data[off] % 1 is always 0).
For a large global array (e.g. uint8_t big_pool[16000] with .values = .{"\x00"}), the generated C still loops over every element, so the fuzzer prefix grew by 16 000 useless bytes instead of 0. That inflated ABSOLUTION_GLOBALS_SIZE, seed size, and wasted mutation budget before the harness sees useful data (e.g. other globals). Before (generated sample_invariant excerpt)
ABSOLUTION_GLOBALS_SIZE was 1600. After
See tests/big_pool/