Skip to content

bleep-bsp: tighten cancellation, signaling, and lifecycle#598

Merged
oyvindberg merged 1 commit into
masterfrom
analyze-bsp
May 27, 2026
Merged

bleep-bsp: tighten cancellation, signaling, and lifecycle#598
oyvindberg merged 1 commit into
masterfrom
analyze-bsp

Conversation

@oyvindberg
Copy link
Copy Markdown
Owner

Summary

Production-hardening pass over bleep-bsp concurrency. Focused on places where toolchain code, JVM threads, and cats-effect met awkwardly: callback-thread unsafeRunSync, races between non-atomic ops, fiber leaks on cancel, blanket IO.uncancelable blocking shutdown, contention bottlenecks.

CancellationToken stays as the toolchain ABI; the bridges to/from CE are now CE-safe (Dispatcher on the way in, COWAL + .background on the way out). Everything CE-side speaks Deferred[IO, KillReason] + fiber cancellation.

Cancellation hygiene

  • Outcome.fromCancellationToken routes through a lazy process-wide Dispatcher.parallel (no unsafeRunSync from callback threads — was illegal when cancel originated on a CE compute fiber).
  • Outcome.bridgeKillSignal callback list → CopyOnWriteArrayList (no deadlock if a callback re-enters onCancel).
  • TaskDag.executor drops the IO.uncancelable around the test handler — withRecovery already produces a structured Killed result on fiber cancel; the blanket uncancelable could pin server shutdown indefinitely.
  • withDeadClientDetection only uncancelables the kill-signal completion; processEvent stays cancelable so a wedged client can't pin the consumer fiber.

DAG executor

  • Rotating-Deferred wakeup replaced with Queue.bounded(1) + tryOffer: coalescing wakeups, no Ref-swap race, eliminates the spurious "DAG deadlock" false-positive on very-fast no-op task bursts.
  • ParallelProjectCompiler spawns compiles under a Supervisor — cancelling the surrounding fiber now actually cancels in-flight compiles instead of orphaning them with Zinc still writing class files.

ProjectLock

  • Files.createDirectories moved outside the per-project lock (idempotent syscall, no contention).
  • JVM monitor → fair ReentrantLock so an exclusive waiter can't be starved by a continuous stream of shared acquirers.
  • File-handle close moved outside the lock on release (Windows / NFS slow paths).

Diagnostics + metrics

  • BspDiagnosticTracker is per-build-operation (handleCompile / handleTest each have their own). startCycle atomically rotates current vs previous via a single AtomicReference — no more snapshot-then-clear race against concurrent recordDiagnostic.
  • BspMetrics gains a dedicated writer thread + bounded queue: producers offer non-blocking, the writer drains in batches with a single flush per batch. Saturated-queue events surface as dropped_events in the shutdown summary.
  • ZincBridge tracks abandoned ECJ compile threads (named per-compile, daemon-marked) so jstack + abandonedEcjThreadsSnapshot give operator visibility into the known leak when CompilationProgress cancellation is unavailable. Same pattern for native-compiler threads in Outcome.runInFreshThread.

Sharing + keying

  • SharedWorkspaceState.unregister stopped removing the empty inner map — the value-based remove(K, V) raced with concurrent register and could drop a just-registered operation.
  • SourceGenRunner.scriptSemaphores keyed on (scriptProject, mainClass) so two unrelated script projects with the same main-class name don't serialize.
  • cachedExternalTestRunnerJars keyed per-resolver instead of process-wide; multi-tenant servers with different resolver configs no longer share the wrong jars.

Misc

  • TestRunner.captureThreadDump bounded by 5s race so a wedged JVM can't pin the timeout-handling path.
  • JsonRpcTransport.close() synchronizes against concurrent reads/writes.
  • BspServerDaemon's server-wide compile Semaphore is FIFO-fair.
  • InProcessBspServer.serverExited returns the real exit code via CompletableFuture instead of IO.never; daemon thread reports failures.
  • Deleted unused CompilationCoordinator (-393 LOC).

Net: −163 lines.

Test plan

  • bleep compile bleep-bsp clean
  • bleep test bleep-bsp-tests — 552 passed, 0 failed
  • Spot-check Metals + bleep-cli driving the server through a real workspace to confirm the ProjectLock + diagnostic changes don't regress IDE flows
  • Run a long-lived daemon under sustained compile/test mix to confirm BspMetrics writer thread doesn't drop events under normal load (queue cap 200k)

🤖 Generated with Claude Code

Production-hardening pass over the BSP server's concurrency primitives.

Cancellation hygiene
- Outcome.fromCancellationToken now routes Deferred.complete through a
  process-wide Dispatcher instead of unsafeRunSync on the callback thread
  (legal regardless of whether the cancel originates on a CE compute fiber
  or a foreign JVM thread).
- Outcome.bridgeKillSignal switches callback list to CopyOnWriteArrayList
  so a re-entrant onCancel can't deadlock the cancel path.
- TaskDag.executor stops wrapping the test handler in IO.uncancelable —
  withRecovery already produces a structured Killed TaskResult on fiber
  cancel; the blanket uncancelable could pin server shutdown indefinitely.
- MultiWorkspaceBspServer.withDeadClientDetection only uncancelables the
  killSignal completion, leaving processEvent cancelable so a wedged client
  can no longer pin the consumer fiber.

DAG executor
- Replace the rotating-Deferred wakeup pattern with Queue.bounded(1) +
  tryOffer: coalescing semantics, no Ref swap race, eliminates the spurious
  "deadlock" false-positive on very fast no-op task bursts.
- ParallelProjectCompiler spawns compiles under a Supervisor so cancelling
  the surrounding fiber actually cancels in-flight compiles instead of
  orphaning them with Zinc still writing class files.

ProjectLock
- Move Files.createDirectories outside the per-project lock (idempotent
  syscall has no business serializing).
- JVM monitor → fair ReentrantLock so an exclusive waiter can't be starved
  by a continuous stream of shared acquirers.
- File-handle close moved outside the lock on release (closing a
  FileChannel under load can hit slow paths on Windows / NFS).

Diagnostics + metrics
- BspDiagnosticTracker scoped per-build-operation (handleCompile /
  handleTest each have their own). startCycle atomically rotates current
  vs previous via a single AtomicReference; no more snapshot-then-clear
  race against concurrent recordDiagnostic.
- BspMetrics gains a dedicated writer thread + bounded queue: producer
  threads enqueue non-blocking, the writer drains in batches with a single
  flush per batch instead of fsync-per-event under the global write lock.
  Saturated-queue events surface as dropped_events in the summary.
- ZincBridge tracks abandoned ECJ compile threads (named per-compile,
  daemon-marked) so jstack + abandonedEcjThreadsSnapshot give operator
  visibility into the known leak when CompilationProgress cancellation is
  unavailable. Same pattern in Outcome.runInFreshThread for native-compiler
  threads that ignore Thread.interrupt.

Locking + sharing
- SharedWorkspaceState.unregister stopped removing the empty inner map —
  the value-based remove(K, V) check raced with concurrent register and
  could drop a just-registered operation.
- SourceGenRunner.scriptSemaphores keyed on (scriptProject, mainClass) so
  two unrelated script projects sharing a main-class name no longer
  serialize against each other.
- cachedExternalTestRunnerJars keyed per-resolver instead of process-wide;
  multi-tenant servers with different resolver configs no longer share the
  wrong jars.

Misc
- TestRunner.captureThreadDump bounded by 5s so a wedged JVM can't pin the
  timeout-handling path.
- JsonRpcTransport.close() synchronizes against concurrent reads/writes.
- BspServerDaemon's server-wide compile Semaphore is now FIFO-fair.
- InProcessBspServer.serverExited returns the real exit code via
  CompletableFuture instead of IO.never; daemon thread reports failures.
- Deleted unused CompilationCoordinator (393 LOC).

All 552 bleep-bsp tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@oyvindberg oyvindberg merged commit 0974da1 into master May 27, 2026
9 checks passed
@oyvindberg oyvindberg deleted the analyze-bsp branch May 27, 2026 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant