Skip to content

fix: keep generated IDs free of the store path delimiter (fixes the broadcast "flake")#39

Open
danReynolds wants to merge 1 commit into
mainfrom
claude/fix-broadcast-test-flake
Open

fix: keep generated IDs free of the store path delimiter (fixes the broadcast "flake")#39
danReynolds wants to merge 1 commit into
mainfrom
claude/fix-broadcast-test-flake

Conversation

@danReynolds
Copy link
Copy Markdown
Owner

@danReynolds danReynolds commented May 26, 2026

Summary

The intermittent test/core failure ("105 passed, 1 failed, a different test each run") turned out to be a real library bug, not a test-timing flake.

generateId drew from an alphabet that included _, so a random id could contain __ — the delimiter the store uses to split paths into segments. Generated ids are used as path segments (observable value keys, auto-generated document ids), so a __ inside an id split it into extra segments and misplaced the value deeper in the store than reads/invalidations look:

  • Observable value never invalidated → stale emissions. An observable's cached value is stored in observerValueStore at its _observerId ("${path}__${generateId()}"). Writes/clears invalidate it via delete(path, recursive: false). If the random id contained __, the value was stored below path, so the non-recursive delete never reached it and the observable emitted a stale (often empty) value. This is the "flake" — it fired whenever a random observer id happened to contain __ (~7.5% of core runs at --concurrency=1, higher under parallel load), hence a different test each run.
  • Auto-id documents invisible to their collection. A document created without an explicit id got generateId(); a __ in it wrote the doc below its collection node, so collection.get() / queries didn't see it. Same root cause.

I confirmed the mechanism by forcing an extra __ segment into the observer id — loon_test.dart went from passing to 17 deterministic failures.

Fix

Restrict generated ids to alphanumeric [0-9A-Za-z] via uniform rejection sampling, so an id can never contain the __ delimiter. This fixes it at the source for every path-segment use. Ids stay URL-safe and ~125 bits.

Verification

  • Regression test: generateId never contains __ across 100k samples (deterministic), plus alphanumeric/length checks.
  • Stress: core suite 45/45 green — 15/15 under parallel load (previously ~12-20% flake) and 30/30 at --concurrency=1 (previously ~7.5%).

Note

My first attempt on this branch was a tearDown broadcast-drain (cross-test-bleed theory); the data disproved it (3/40 still failed), so it's been reverted in favour of this root-cause fix. Once this lands on main, the recurring CI re-kicks across the other PRs should stop.

Copilot AI review requested due to automatic review settings May 26, 2026 21:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an intermittent test/core flake caused by cross-test leakage of Loon’s pending zero-delay broadcast timer after Loon.clearAll() runs in tearDown, which could cause the next test’s writes to piggyback on the prior test’s scheduled broadcast.

Changes:

  • Update the core test suite tearDown to await an extra event-loop turn after Loon.clearAll() so the scheduled broadcast is drained before the next test begins.
  • Add an explanatory comment documenting why the extra await is required for test isolation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

generateId drew from an alphabet that included '_', so a random ID could
contain '__' — the delimiter the store uses to split paths into segments.
Because generated IDs are used as path segments (observer value keys,
auto-generated document IDs), a '__' in an ID split it into extra
segments and misplaced the value deeper in the store than reads and
invalidations expect:

  - An observable's cached value was stored below the path the broadcast
    manager invalidates (clearAll/writes call delete(path, recursive:
    false)), so it was never invalidated and the observer emitted a stale
    value. This is the intermittent "broadcast test flake": it fired
    whenever a random observer ID happened to contain '__' (~7.5% of core
    suite runs at --concurrency=1, more under parallel load). Forcing an
    extra segment into the observer ID turned it into 17 deterministic
    failures.
  - An auto-id document whose ID contained '__' was written below its
    collection node and was invisible to the collection's query.

Restrict generated IDs to alphanumeric [0-9A-Za-z] via uniform rejection
sampling, so an ID can never contain the delimiter. Adds a regression
test asserting generateId never produces '__' (100k samples).

Verified: core suite 45/45 green across parallel and --concurrency=1
stress runs that previously flaked ~7.5-20%.
@danReynolds danReynolds force-pushed the claude/fix-broadcast-test-flake branch from 626eb9f to 9fc001e Compare May 26, 2026 21:47
@danReynolds danReynolds changed the title test: drain pending broadcast in tearDown to fix cross-test flake fix: keep generated IDs free of the store path delimiter (fixes the broadcast "flake") May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants