Skip to content

kaizen: Closure visited gen#547

Open
sayrer wants to merge 4 commits into
timbray:mainfrom
sayrer:closure-visited-gen
Open

kaizen: Closure visited gen#547
sayrer wants to merge 4 commits into
timbray:mainfrom
sayrer:closure-visited-gen

Conversation

@sayrer

@sayrer sayrer commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

speeds up build time by 1.77x

sayrer and others added 3 commits June 5, 2026 18:50
Build-cost lever timbray#2 (the map term), de-risked first per plan. The epsilon
closure's two map[*faState]uint64 visited sets (closureBuffers.states and
.walkVisited) grew to O(N) entries during a build; their pointer-keyed lookups
cache-missed at scale and were ~31% of build CPU (mapaccess1_fast64 +
mapassign). Replace them with two generation stamps stored on faState itself
(walkVisitedGen, closureVisitedGen), compared against a globally monotonic
atomic counter (closureGenCounter) so generations are never reused across the
pooled buffers or concurrent builds — the reason the per-buffer maps existed.

Result: build 15.2s -> 7.7s at 10k shellstyle patterns (~2x), tail 271 -> 135ms.
Exponent unchanged (still O(N^2) at slope ~1.0 — that is the spin-back epsilon
fan-out, addressed separately by Plan C); this halves the constant. No matching
regression (ExactString/SingleShellstyle/CityLots flat); full suite green;
race-clean.

Cost: +16 bytes per faState (128 -> 144) of build-only state made permanent —
the same memory-vs-speed trade nfa.go:358 once resolved the other way. Could be
halved to +8 with uint32 stamps (wrap-safe for ~1e9 closures) if the memory
matters. Size goldens (TestMcNfaSizes, TestQuaminaMemoryCost, TestPP) updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
#	epsi_closure.go
research-main.go writes regenerable CSVs into research/; they are build
artifacts, not source, and should not be tracked.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sayrer sayrer force-pushed the closure-visited-gen branch from beb2900 to 159d53f Compare June 17, 2026 04:26
@sayrer

sayrer commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

The first commit is from a few weeks ago, before I had fixed the ssh key on my laptop, that's why this isn't verified.

@timbray timbray left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to take this one for a while. The epsilon-closure code has grown a major code smell, very opaque, so much deduping. I need to spend time taking a close look and walkthrough, I have this hunch that we're missing something.

@sayrer

sayrer commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

All good. My hunch was that the code is complicated because the problem is difficult. But if it can be cleaner, that would be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants