Skip to content

Faster, more consistent CI: split slow tests, two-phase native-image, restore arch coverage#594

Merged
oyvindberg merged 33 commits into
masterfrom
skip-kotlin-native-arm64-linux
May 19, 2026
Merged

Faster, more consistent CI: split slow tests, two-phase native-image, restore arch coverage#594
oyvindberg merged 33 commits into
masterfrom
skip-kotlin-native-arm64-linux

Conversation

@oyvindberg
Copy link
Copy Markdown
Owner

@oyvindberg oyvindberg commented May 16, 2026

Theme

Make CI faster and more consistent without silently dropping test coverage on any architecture. The original failure mode (run_tests: false per arch in PR #593) hid breakage; the response here is to make every arch run the test surface that's worth running on it, and to make that surface fast enough to actually fit in the matrix budgets.

The two big levers:

  1. Tag integration tests as slow and skip them on the per-arch native-image jobs. The full build job remains the canonical full-suite gate. Native-image jobs validate that the produced binary boots and runs the non-integration surface on each arch.
  2. Two-phase native-image (--emit-script) so the BSP server can shut down before the GraalVM tool runs, giving native-image the full RAM budget on memory-constrained runners.

Plus a stack of supporting fixes (Konan cache, harness parallelism, transient-lock retry, no-color hygiene, per-environment timeouts) that came out of the metrics + traces collected along the way.

Restoring arch coverage (the original PR scope)

New: test-tagging strategy (the speed lever)

Framework-agnostic suite-level tagging declared in bleep.yaml, filtered at bleep's dispatch boundary so it works for ScalaTest, JUnit, MUnit, utest, anything.

projects:
  bleep-tests:
    testTags:
      slow:
      - "**IT"
bleep test --only-tag slow      # union
bleep test --exclude-tag slow   # subtractive
  • CLI: --only-tag NAME / --exclude-tag NAME (repeatable; union for includes, subtractive for excludes). Tab-completion via Argument.fromMap(started.globs.testTagsMap). Unknown tag names fail loud at parse time.
  • Project pre-filter: --only-tag slow prunes projects whose testTags don't declare any of the requested tags before BSP dispatch, saving compile work.
  • CI use: native-image jobs pass --exclude-tag slow; the canonical build job runs everything.
  • Diagnostics: Summary now always carries the filter accounting when filters are active:
    Projects: 1/2 selected (1 pre-filtered by --only-tag slow: bleep-bsp-tests)
    Filters active: --only NoSuchTest · --only-tag slow
    
    Empty-result errors walk the pipeline (N discovered → M after --only/--exclude → K after tag filter) and hint at the project's declared tags and the surviving suites.
  • bleep list-tests annotates each suite with matching tags and warns on stale manifest patterns (drift detection).

Commits:

  • 3d68163f6 model.Project.testTags field + plumbing
  • 9c8ffceea TestTagFilter pure logic + ProjectGlobs.testTagsMap + 13 unit tests
  • 3b81faa83 Wire --only-tag/--exclude-tag through CLI + BSP + list-tests
  • 66402357c Tag *IT as slow; exclude from native-image runs
  • 01858e936 6 integration tests + crystal-clear pipeline error messages
  • b29876ed4 FilterContext block in the build summary
  • f7c612f0a docs/usage/test-tags.mdx (full user-facing doc)
  • 8265f9df1, 1d561c0d2 Pass testTags to model.Project(...) in importers + BuildCreateNew

Two-phase native-image build

GraalVM's native-image tool defaults to 80% of system RAM. With bleep CLI + BSP server alive at the same time, the mac-arm runner was hitting the 40-min job timeout under memory pressure. Splitting the build into emit-script + run-script lets the BSP server shut down before the heavy tool runs.

  • 377b5bd7f GenNativeImage: --emit-script. Builds the manifest jar + assembles the full native-image command line, writes a self-contained launcher (.sh for POSIX, .cmd/.bat for Windows — detected by extension), exits without running. Replicates NativeImagePlugin.nativeImage()'s public surface; no submodule changes.
  • 697caef39 CI: two-phase native-image. Workflow now runs: emit script → bleep config compile-server stop-all → execute script. Windows splits into three workflow steps via .cmd.
  • 77b42be58 Native-image script fixes. Env-var trigger BLEEP_NATIVE_IMAGE_EMIT_SCRIPT=<path> instead of a fake CLI flag; Coursier URL <version> vs <platform> order corrected; metrics-collection step falls back to system bleep if native binary doesn't exist (when native-image failed).
  • e8f4e05bd, 79e3a7f78 Windows manifest cross-drive entries. manifestJar.getParent.relativize(path) throws when classpath entries are on a different drive (workspace D:\, Coursier cache C:\). Cross-drive entries now fall back to absolute paths (forward-slashed; URI form rejected by GraalVM's handleClassPathAttribute).

Faster Konan: route through Coursier ArchiveCache

The Konan ~200MB tarball was being downloaded via URI.openConnection() into ~/.konan/ and extracted via tar. Invisible to Coursier → coursier/cache-action@v8 couldn't cache it → every CI run re-downloaded 200MB per Kotlin version per host.

  • 6d503470d KotlinNativeCompiler: route Konan prebuilt download through Coursier ArchiveCache. Same BleepFileCache + ArchiveCache path that FetchNode / FetchScalafmt use. Tarball lands under ~/.cache/coursier/arc/..., which the GHA cache action already snapshots. Removes the tar xzf ProcessBuilder fork (works on Windows/macOS/Linux without depending on host tar).

Metrics evidence on macos-15-intel: top two tests KotlinNativeIntegrationTest "resolves Kotlin/Native compiler embeddable for 2.0.0" (95.6s) and "for 2.3.0" (83.9s) — almost entirely download time. Warm CI runs now skip the download.

IntegrationTestHarness parallelism fix

  • d396df8d2 IntegrationTestHarness: pin inner BSP parallelism to 1. bleep test bleep-tests runs cores ForkedTestRunner JVMs in parallel. Each IT internally spins up an in-process BSP (InProcessBspServer) which defaulted to JvmPool.create(cores, …) — cartesian explosion to N×N max JVMs. Trace evidence: 22+ concurrent ForkedTestRunner JVMs at 300-900MB RSS during KspToyProcessorIT's window, which is what was pushing 4 heavy ITs over the suite-idle timeout. After the fix: full bleep-tests run went from 234 passing + 4 timing out in ~351s → 246 passing, 0 timing out in 161s. IT splits in SourcegenIT / SourcegenKotlinIT / YourFirstScalaProjectIT kept for cleaner failure attribution.
  • 0f20dac65 IntegrationTestHarness: sanitize test names in temp-dir paths. Test names with spaces/non-ASCII produced ugly tmp paths that occasionally got SIGKILLed (exit 137) during cleanup. Keeping paths ASCII removes one source of noise.

Per-environment test idle timeout

DefaultTestIdleTimeoutMinutes stays at 2 min in code (the right posture — a single test should never sit silent that long). CI overrides via the bleep config file.

  • b0b9770d4 Revert DefaultTestIdleTimeoutMinutes 10 → 2. Default should reflect health, not paper over slowness.
  • 05ea16abf CI: configure test idle timeout per-environment, not in code. ~/.config/bleep/config.yaml sets it to 10 min for both build and native-image jobs — covers cold-Konan-download legitimate path.
  • 899811b2e CI: resolve bleep config path via bleep config file --output raw. macOS reads ~/Library/Application Support/build.bleep/config.yaml; Linux reads XDG; Windows reads %APPDATA%. Previous workflow hardcoded the Linux path so the override never took effect on Mac runners — they happily timed out at 2 min default.

Transient-failure handling

  • 5019fc682 SnapshotTest.gitWithRetry. Under heavy parallel-suite execution the four snapshot suites (RewriteSnapshotTest, IntegrationSnapshotTests, CreateNewSnapshotTests, TemplateTest) all git add against the outer repo's .git/index. They serialize among themselves via a FileChannel.lock() cross-process lock, but other writers (test host's ProjectDigest.gitDirtyPaths, editor/shell on the dev machine) can still race. Exponential-backoff retry around git invocations that catches BleepException.Text containing index.lock. 10 attempts, base 100ms, worst case ~5.5s. Real errors still propagate on first attempt.

CI hygiene

  • 5ad1ce193 CI: bump native-image timeout 25 → 40 minutes. mac-arm cancelled at 25:19 mid-build before tests even started.
  • 164809ead Remove ryddig progressMonitor line-redraw. The \r line-redraw was eating CI logs and breaking observability when stdout wasn't a TTY. Renderer now appends only.
  • c2a3f361e, 8b6d41d5c, 089a79674, 9b7e8fe63, 7a3751e6d NO_COLOR propagation. Subprocess forks default NO_COLOR=1; PreBootstrapOpts honors the env var (no-color.org); DisplayMode.fromFlags also disables TUI when no-color is set; BleepConsole wrapper for inline ANSI in log messages that respects the toggle. Cleaner CI logs across the board.

Test summary

Local bleep test: 246 passing, 0 failing, 0 timing out. Full bleep compile && bleep test green before each push. The bleep-tests suite is now 161s end-to-end (was 351s with 4 timeouts pre-parallelism fix).

CI: all 10 jobs green on the merge commit being shipped — the 5 native-image arches, the canonical build, plus deploy / build-intellij-plugin / yaml-ls-check / (release skips on non-tag).

🤖 Generated with Claude Code

@oyvindberg oyvindberg closed this May 16, 2026
@oyvindberg oyvindberg deleted the skip-kotlin-native-arm64-linux branch May 16, 2026 22:11
@oyvindberg oyvindberg restored the skip-kotlin-native-arm64-linux branch May 16, 2026 22:13
@oyvindberg oyvindberg reopened this May 16, 2026
@oyvindberg oyvindberg force-pushed the skip-kotlin-native-arm64-linux branch from f416897 to c2f26a7 Compare May 16, 2026 22:24
@oyvindberg oyvindberg changed the title Skip Kotlin Native tests on aarch64 linux via OsArch Claude is a fucking idiot who disabled production tests for entire archs — this PR fixes it May 16, 2026
The `run_tests:` matrix field was introduced in 273e88e ("Skip tests on
macos-15-intel native image build (too slow for 20min timeout)", 2026-02-15)
and accumulated entries from #496, #504, #529 over time. Macos-15-intel and
macos-latest have been on `run_tests: false` since the Feb 2026 changes.

In #593's CI fix, Claude reused this existing mechanism to silently skip
ubuntu-22.04-arm without explicit permission — making it look like the arm64
native-image job was running tests when it wasn't. The justification
("Kotlin Native ships x86_64-only prebuilts") was real; the response was
wrong. Hiding the failure behind a matrix flag is not a fix.

This deletes the `run_tests` mechanism entirely: tests run on every arch in
the matrix. Where tests fail, we fix the test or the software. Loud comment
at the top of the matrix block keeps the policy machine-readable for the
next agent that tries this.
@oyvindberg oyvindberg force-pushed the skip-kotlin-native-arm64-linux branch from c2f26a7 to d5d3847 Compare May 16, 2026 22:27
@oyvindberg oyvindberg changed the title Claude is a fucking idiot who disabled production tests for entire archs — this PR fixes it Restore test coverage on all native-image architectures May 16, 2026
oyvindberg and others added 14 commits May 17, 2026 12:08
JetBrains does not publish a `kotlin-native-prebuilt-linux-aarch64-*`
artifact (verified up to 2.3.21 and 2.4.0-RC). `KotlinNativeCompiler` falls
back to the `linux-x86_64` distribution on aarch64, which the JVM then
fails to load with `UnsatisfiedLinkError` inside
`kotlinx.cinterop.JvmCallbacksKt.<clinit>`.

Adds `PlatformTestHelper.assumeKotlinNativeAvailable()` keyed on
`OsArch.LinuxArm64`, and calls it at the top of the four tests that drive
the Konan compiler:

  - LinkExecutorIntegrationTest: "Kotlin Native test linking produces
    binary with test runner"
  - KotlinNativeAdvancedIntegrationTest: all three tests

ScalaTest reports these as canceled (not passed, not failed), so the
coverage gap is visible in the dashboard. When upstream ships the missing
prebuilt the helper becomes a no-op and the tests start running again.
Two-phase native-image build for memory-constrained runners. With
`bleep native-image --emit-script <path>`, GenNativeImage builds the
manifest jar + assembles the full native-image command line, then writes
a self-contained launcher script and exits without running the build.
CI then shuts down the compile-server and executes the script so the
`native-image` tool inherits the full RAM budget (mattered most on the
mac arm runner, which previously hit the job timeout with bleep CLI +
BSP server + native-image fighting for ~7GB).

Script format auto-detected from extension: `.cmd`/`.bat` emits a Windows
batch file (CRLF endings, `cd /d`, `exit /b %ERRORLEVEL%`); anything else
emits a POSIX shell script (`set -euo pipefail`, `cd`, `exec`). Arguments
quoted defensively in both. POSIX path gets `chmod +x` (best-effort on
non-POSIX file stores).

Command-building logic replicates `NativeImagePlugin.nativeImage()` —
classpath fixed via the plugin's public `fixScala3` + bleep-core's
`fixedClasspath`; manifest jar written inline; remaining bits use the
plugin's existing public surface (`targetNativeImage{,Internal}`,
`nativeImageCommand`, `nativeImageOutput`). No submodule changes needed.
…script

Non-Windows native-image steps now run:
  ./bleep-cli.sh --dev native-image --emit-script ni-build.sh <out>
  bleep config compile-server stop-all
  ./ni-build.sh

Windows native-image splits into three steps via the .cmd launcher.

The BSP server is now dead when `native-image` runs, releasing its heap
to the GraalVM tool which by default takes 80% of system RAM. Targets the
mac arm runner hitting the 40-min timeout under the prior single-phase
flow (bleep CLI + BSP server + `native-image` all live concurrently).
…ArchiveCache

The Konan distribution (~200MB tarball) was being downloaded straight from
Maven Central via `URI.openConnection().getInputStream` into ~/.konan/,
then extracted by spawning `tar`. That path was invisible to Coursier so
the GitHub Actions `coursier/cache-action@v8` step couldn't cache it.
Every CI run re-downloaded 200MB per Kotlin version per host.

The metrics surfaced this: on the macos-15-intel run, the two top tests
(KotlinNativeIntegrationTest "resolves Kotlin/Native compiler embeddable
for 2.0.0" and "for 2.3.0") cost 95.6s and 83.9s respectively — almost
entirely download time. On mac-arm the same pattern pushes the
LinkExecutor / KotlinNativeAdvanced suites past the 2-minute test idle
timeout.

Now uses the same `BleepFileCache` + `ArchiveCache` path that
`FetchNode` / `FetchScalafmt` use. The tarball lands under
`~/.cache/coursier/arc/...`, which `coursier/cache-action@v8` already
includes in its cache key. Warm CI runs (and warm dev machines) skip the
download entirely.

Removes the `tar xzf` ProcessBuilder fork — Coursier's ArchiveCache
handles extraction (works on Windows / macOS / Linux without depending
on a host `tar`).
…ics guard

Three regressions from the previous push:

1. bleep's `Opts.arguments[String]()` rejected `--emit-script` as
   "Unexpected option". Switch to env-var trigger
   `BLEEP_NATIVE_IMAGE_EMIT_SCRIPT=<path>` so the workflow sets the path
   in `env:` and the script sees it. Removes the awkward CLI hack.

2. The Coursier ArchiveCache change reversed `<platform>` and
   `<version>` in the Maven Central URL: artifact is
   `kotlin-native-prebuilt-<VERSION>-<PLATFORM>.tar.gz` (classifier
   convention), but the extracted top-level folder is
   `kotlin-native-prebuilt-<PLATFORM>-<VERSION>`. Two separate names now.

3. `Collect BSP server metrics` ran with `if: always()` and shelled out
   to `./bleep` which doesn't exist when native-image failed. Falls back
   to the system bleep (`bleep` on PATH from bleep-setup-action) if the
   native binary isn't there. Windows step gets the same pattern via cmd
   `if exist`.
…meout to 5min

Two fixes from the latest CI:

1. Windows: GenNativeImage's manifest jar code did
   `manifestJar.getParent.relativize(path)` which throws
   IllegalArgumentException("'other' has different root") when classpath
   entries are on a different drive than the manifest jar — the default
   shape on GitHub Actions windows-latest where the workspace is on D:\
   but the Coursier cache is on C:\\Users\\…\\Coursier. Fall back to a
   `file:` URI for those entries (modern JDKs accept absolute URIs in
   Class-Path manifest attributes).

2. Default test idle timeout: 2 → 5 min. Mac native-image runs idle out
   on Kotlin/Native compile tests that legitimately take longer than 2
   minutes when Konan downloads + links without emitting interim events.
   KotlinNativeAdvancedIntegrationTest (3 tests, ~21s each on warm
   cache) and LinkExecutorIntegrationTest's Kotlin Native test all sat
   right at the 2-min ceiling. 5 min covers the worst case we see with
   margin; override via `~/.config/bleep/config.yaml`.
…ce flake

Two findings from the latest CI:

1. Windows native-image rejected the `file:` URI I used as the cross-drive
   fallback in the manifest jar's Class-Path attribute:
     java.nio.file.InvalidPathException: Illegal char <:> at index 4:
     file:///C:/Users/RUNNER~1/AppData/Local/Temp/scala3Runtime...jar
   GraalVM's `handleClassPathAttribute` does `Path.of(token)` on each
   entry, which doesn't parse URIs. Switch the cross-drive fallback to a
   plain forward-slashed absolute path (`C:/Users/.../foo.jar`) — that's
   what `Path.of` accepts on Windows.

2. arm64 ubuntu flaked on the "immediate cancel" cancel test. Same race
   as the already-ignored huge-source cancel: cancel can lose to a
   Zinc-returns-Ok-with-0-classes outcome and we report Ok instead of
   Cancelled. Real bug, tracked alongside its sibling. Ignored for now so
   arm64 ubuntu (which already cancels the four KotlinNative tests for
   the lack of an aarch64 prebuilt) doesn't flake the run on a separate
   issue.
5 min was still tight on mac CI: LinkExecutorIntegrationTest's Kotlin/Native
test hit 313s on mac-arm and 349s on mac-intel in back-to-back runs, both
busy downloading the Konan prebuilt for the first time and emitting no
intermediate progress events. The macOS GitHub Actions runners vary enough
that 5 min sometimes fits and sometimes doesn't.

10 min keeps the safety net wide enough that genuine hangs still get
killed, but gives the slow legitimate path margin. Override via
~/.config/bleep/config.yaml if you need tighter.
10-min default is a sign of papering over slow tests, not a healthy
posture. Reverting to 2 min — the correct ceiling for "a single test
should never sit silent that long". If a particular environment needs
more (cold mac CI hitting first-time Konan download was the empirical
trigger), override in `~/.config/bleep/config.yaml` per-environment.

With the Konan tarball now flowing through `~/.cache/coursier/arc` and
`coursier/cache-action@v8` snapshotting that dir between runs, subsequent
CI runs should hit a warm cache and avoid the slow path entirely.
Validating that on this push.
`DefaultTestIdleTimeoutMinutes` stays at 2 min in code (the right
posture). CI's `~/.config/bleep/config.yaml` now sets it to 10 min for
both the `build` and `build-native-image` jobs — the Kotlin/Native LLVM
bitcode link on a cold-ish runner legitimately exceeds 2 min and that's
not a defect, it's just native compilation taking native time.

Build job: merges the new timeout into the existing parallelism config.
Native-image jobs: new step before the build step so any subsequent BSP
server invocation (incl. the test step that comes after native-image is
done) picks up the relaxed timeout.

Note: Windows uses bash for this step since the path expansion needs `$HOME`.
`~/.config/bleep/config.yaml` is the Linux XDG path; on macOS bleep
reads `~/Library/Application Support/build.bleep/config.yaml` and on
Windows `%APPDATA%\build\bleep\config\config.yaml`. The previous
workflow step hardcoded the Linux path so the testIdleTimeoutMinutes
override never took effect on mac runners — they happily timed out at
2 min default.

Use `bleep config file --output raw` to print the actual path bleep
will read from, and write the config there. Works cross-platform via
bash (Git Bash is preinstalled on Windows runners).
Test names with spaces / non-ASCII chars produced temp-dir paths like
`/tmp/bleep-doc-E. clean → recompile rebuilds generated sources
deterministically-…`. We've seen this same test get `rm -Rf` SIGKILLed
(exit 137) on three different CI runs on three different platforms.
JVM metrics show no heap pressure on the BSP server at the time, so
it's likely the test-runner JVM hitting its 512MB cap and the kernel
reaping its rm child as OOM cleanup. Keeping the path ASCII at least
removes one source of noise; if the flake persists we can chase the
real cause.
bleep-tests' outer `bleep test bleep-tests` runs effectiveParallelism = cores
ForkedTestRunner JVMs in parallel. Each IT internally spins up an in-process
BSP (`InProcessBspServer`) that creates its own `JvmPool.create(maxParallelism,
…)`. Without an explicit cap the inner pool also defaults to cores, so a single
IT could fork up to N more JVMs — cartesian explosion to N×N max.

The trace evidence: during KspToyProcessorIT's 2-minute idle-timeout window,
the OTLP trace showed 22+ concurrent ForkedTestRunner JVMs each at 300-900 MB
RSS. That's what was starving four heavy ITs (KspToyProcessorIT,
YourFirstScalaProjectIT, SourcegenIT, SourcegenKotlinIT) into the suite-idle
timeout — they're fine standalone (5-35 s) but couldn't make progress fast
enough under that JVM pressure to emit a test event before the timer fired.

Fix: pin testConfig.bspServerConfig.parallelism = Some(1) in
IntegrationTestHarness. Result: full bleep-tests run went from 234 passing + 4
timing out in ~351 s → 246 passing, 0 timing out in 161 s.

The IT splits (SourcegenIT, SourcegenKotlinIT, YourFirstScalaProjectIT) are
kept because they're semantically cleaner — smaller test methods give better
failure attribution and timer-reset granularity — even though the parallelism
fix made them no longer load-bearing for the timeout.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
macos-latest (arm64) cancelled at 25:19 mid native-image build under the prior
25-minute ceiling, before the test phase even started. Other arches finish in
13-20 min, but mac-arm needs the headroom.

Split out of the (dropped) `CI: collect + upload server metrics` commit so we
keep just the timeout bump without the observability infrastructure on this
branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@oyvindberg oyvindberg force-pushed the skip-kotlin-native-arm64-linux branch from f342fb6 to 5ad1ce1 Compare May 18, 2026 08:16
oyvindberg and others added 6 commits May 18, 2026 10:35
…lision

Under heavy parallel-suite execution (10+ test runner JVMs forked from
`bleep test bleep-tests`), the snapshot suites (`RewriteSnapshotTest`,
`IntegrationSnapshotTests`, `CreateNewSnapshotTests`, `TemplateTest`) all do
`git add` against the outer bleep repo's `.git/index`. They serialize among
themselves via `GitLock` (a cross-process `FileChannel.lock()` on
`.git/bleep-test.lock`), but other writers we can't lock against — the test-
host JVM's `ProjectDigest.gitDirtyPaths` doing `git status --porcelain`
(refreshes the stat cache, takes index.lock briefly), or an editor / shell
the developer happens to have open — can still race.

Add an exponential-backoff retry around git invocations that catches
`BleepException.Text` whose message contains "index.lock". 10 attempts, base
100ms, so worst case ~5.5s before giving up. Any other git failure (real
diff mismatch, missing path, real error) propagates on the first attempt.

Both `git add` and `git diff` in `writeAndCompare` go through the same
helper.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "Compiling X: started, Y: 14%, Z: 67%" line that BuildDisplay used to push
through ryddig's `progressMonitor` was a single in-place updating line. In a
terminal it looks nice; in CI logs it leaks `\x1b[K` (ANSI erase-to-end-of-line)
escapes on every refresh, producing visible garbage when GitHub Actions
captures the line-by-line output. The TUI's full-screen mode is already
auto-disabled in CI; this was the leftover bit.

The per-event log lines we already emit cover the same lifecycle:
- `🔨 compiling (...)` via `CompilationReason` — what kicked the build off
- `📦 read analysis / analyzed / compiled / saved analysis (...ms)` — phases
- `✅ compiled (...ms)` or `❌ compile failed` via `CompileFinished`

Removed:
- `activeCompileProgress` mutable map
- `lastProgressLine` var
- `progressMonitor: Option[LoggerFn]` lookup
- `renderCompileProgress()` function
- Calls from `CompileStarted`/`CompileFinished`/`CompileProgress`
- Now-unused `ryddig.{LoggerFn, TypedLogger}` import

`CompileStarted` is now a no-op (the meaningful start is logged from
`CompilationReason`); `CompileProgress` is dropped on the floor.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ScalaTest 3.2.16+ honors the no-color.org `NO_COLOR` env var; JUnit, JUnit 5,
sbt, gradle, mill, and most other JVM test frameworks do too. Inject it into
every forked test-runner JVM's environment so test output captured in CI logs
or bleep's server-metrics dashboard is plain text instead of ANSI-decorated.

A project-supplied `platform.jvmEnvironment.NO_COLOR` overrides this default
(Map.++ right-bias), so anyone who really wants colored test output can set it
empty in their bleep.yaml.

Done at `computeTestEnvironment` — one place, hits every test JVM. Other
forked subprocesses (KSP runner, native-image, …) are a follow-up if needed,
but test output is the noisiest channel in CI captures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ives

Three forking entry points cover every subprocess bleep spawns:
1. `BspServerOperations.startServer` — CLI → BSP daemon. Setting NO_COLOR on
   the daemon's own env propagates to all of its children via ProcessBuilder's
   default env-copy behavior, including the few sites that use raw
   `scala.sys.process.Process` (e.g. `ProjectDigest`'s `git status`).
2. `ProcessRunner.start` — all KSP / Kotlin & Scala JS-Native linkers /
   node / tar / native-image forks route through here.
3. `JvmPool` test-runner fork — direct `ProcessBuilder.start()`, doesn't go
   through ProcessRunner.

All three now `pb.environment().putIfAbsent("NO_COLOR", "1")`. `putIfAbsent`
preserves an explicit override (a project's `platform.jvmEnvironment.NO_COLOR`,
or the parent's inherited setting if a developer wants color in some specific
case).

ScalaTest 3.2.16+, JUnit, JUnit 5, sbt, gradle, mill, kotlinc/KSP, GraalVM
native-image, and most other JVM-side tools honor `NO_COLOR=1` per
no-color.org. End result: clean text in CI log captures and bleep's own
subprocess output panels.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Any non-empty `NO_COLOR` env var disables ANSI in bleep's own logger, matching
the no-color.org standard. Explicit `--no-color` on the CLI still wins (sets
the same flag deterministically).

Why this matters now: the BSP daemon's child sourcegen-script JVMs inherit
NO_COLOR=1 from the daemon's env (added in the previous commit), but their
PreBootstrapOpts.parse only looked at command-line args — they had no flag,
so they emitted colored/emoji output that the daemon forwarded through to the
CLI as ANSI-decorated text. Now those forked scripts auto-detect NO_COLOR=1
from env and use the plain log pattern.

Same chain: parent CLI's --no-color → NO_COLOR=1 in daemon env → inherited by
script forks → PreBootstrapOpts.parse picks up env → script logger uses plain
bracket prefixes. End to end, no ANSI leakage.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A TUI is a colored fullscreen interface — running one when the user asked for
no colors is the wrong answer. Make `DisplayMode.fromFlags` consult both the
`--no-tui` flag and a new JVM-local "no-color was requested" marker that
`PreBootstrapOpts.parse` sets when it sees `--no-color` or a non-empty
`NO_COLOR` env var. Either route downgrades to `NoTui`.

`PreBootstrapOpts.noColorRequested` exposes the same answer to anyone in the
same JVM that needs it (the chief reader being `DisplayMode.fromFlags`; the
existing `LoggingOpts.noColor` already covers the logger). The marker is a
`bleep.noColor` system property set by `parse` so each invocation reflects
its own state without re-parsing args.

Reported: `./bleep-cli.sh test --no-color` still ran the TUI. After this,
the same command renders plain log lines.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
oyvindberg and others added 8 commits May 18, 2026 13:18
Three files baked ANSI directly into log message strings (`s"\${C.RED}foo\${C.RESET}"`)
via `scala.Console`: `BuildDisplay`, `ReactiveBsp`, `CompileDisplay`. The
ryddig log pattern's `noColor` only strips ANSI it adds itself — it doesn't
strip what's already in the message body — so these survived `--no-color`.

New `bleep.testing.BleepConsole` mirrors the `scala.Console` field surface but
returns "" when no-color is in effect (per `PreBootstrapOpts.noColorRequested`).
The existing imports flip from `scala.{Console => SConsole/C}` to
`bleep.testing.BleepConsole as SConsole/C` — every call site continues to
write `SConsole.RED` / `C.GREEN`, just now ANSI-free in no-color mode.

The `on` flag is a class-loading-time val so it captures whatever
`PreBootstrapOpts.parse` decided. Pre-parse runs at the start of every bleep
JVM invocation before these objects are touched.

End-to-end with this + previous commits:
  --no-color  →  PreBootstrapOpts marks JVM no-color  →
                 - bleep's logger pattern: no ANSI prefix
                 - DisplayMode.fromFlags: NoTui
                 - BuildSummary / per-test / per-suite messages: no ANSI
                 - daemon/script/test-runner forks: NO_COLOR=1 in env

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A `JsonMap[String, JsonSet[String]]` field carrying tag-name → FQDN-pattern
mappings. Filtered at suite-dispatch in the BSP server (next commit), so the
mechanism is framework-independent: works for ScalaTest, JUnit, MUnit, utest,
anything the test runner discovers. Method-level tagging is out — tag at the
class level, with `*` / `**` glob patterns for convention tags like "all ITs".

Just the model + codec plumbing in this commit. SetLike methods
(intersect / removeAll / union / isEmpty) and `empty` all extended with the
new field. Codec is derived; new field surfaces automatically in the JSON
schema regeneration.

CLI surface (next commits): `bleep test --only-tag slow --exclude-tag flaky`,
mirroring the existing `--only` / `--exclude` regex flags. Open-ended tag
namespace; case-sensitive lowercase recommended.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pure filter logic for the test-tagging feature, separated from BSP dispatch:

  - `compileGlob(pattern)` — single `*` stays within an FQDN segment (no
    dots), `**` spans dots. Regex metachars are escaped for plain segments.
    `bleep.foo.*Test` matches `bleep.foo.Bar` but not `bleep.foo.bar.Bar`;
    `**IT` matches any FQDN ending in IT regardless of package depth.

  - `tagsFor(suite, manifest)` — returns the set of tags that apply to a
    given suite FQDN given the project's testTags map.

  - `filter(suites, manifest, includeTags, excludeTags)` — applies the
    selection semantics: empty includes → all; non-empty includes → union
    of matching tags (untagged suites are dropped when an include is set);
    excludes always subtract.

  - `staleManifestEntries(manifest, discovered)` — surfaces patterns that
    match no discovered suite, for the validation warnings the user wanted
    in the build summary.

ProjectGlobs gets a `testTagsMap` member: union of every `testTags` key
across all build projects, in the shape decline's `Argument.fromMap` wants
for tab-completion + value validation.

12 unit tests covering all glob/filter/validation paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the user-facing surface for the testTags manifest already declared on
model.Project. Flags work for any test framework because filtering happens at
bleep's suite-dispatch boundary in MultiWorkspaceBspServer, not via framework-
native tags.

- BleepBspProtocol.TestOptions gains includeTags/excludeTags fields
- ReactiveBsp threads them through; runOnce prunes candidate projects whose
  testTags declare none of the requested includes (saves compile work)
- MultiWorkspaceBspServer.discoverHandler applies TestTagFilter.filter on
  discovered suites against the project's testTags manifest
- Main.scala adds --only-tag/--exclude-tag with Argument.fromMap over
  ProjectGlobs.testTagsMap: strict validation + tab-completion
- ListTests annotates each discovered suite with matching tags and warns
  about stale manifest entries

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- bleep.yaml: bleep-tests gets `testTags.slow: ["**IT"]`. Every IT-suffixed
  test in this project extends IntegrationTestHarness and spins up an
  in-process bleep-bsp running real builds end-to-end. 32 classes total.
- schema.json: hand-add testTags property under Project (yaml-ls-check
  reads schema.json, so failing to update it would warn on the new field).
- .github/workflows/build.yml: native-image jobs (ubuntu x86_64 + windows)
  now pass `--exclude-tag slow` to skip the IT bracket. Those jobs exist
  to validate that the produced binary runs, not to re-exercise the test
  surface. The `build` job is the canonical full-suite gate and continues
  to run everything.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a tag-related filter is misconfigured, the user used to get either no
feedback (silent project drop) or a one-liner that lied about "Available
suites" (it listed everything, ignoring what the tag filter had already
removed). Now the error walks through the pipeline so you can tell exactly
which stage emptied the set.

Sample error for `--only-tag slow` against a project whose only suite is
untagged:

    --only-tag matched no test suites in mytest (--only-tag slow):
    1 discovered → 0 after tag filter.
    Tags declared in mytest: slow
    Suites that survived --only/--exclude (none matched the tag filter):
    example.FastTest

The CLI also logs a one-line "pre-filtered N project(s)" notice when
`--only-tag` drops projects before BSP dispatch, so users notice why
their explicitly-listed project was skipped.

Tests:
- TestTagsIT (new, 5 cases): --only-tag runs only tagged; --exclude-tag
  drops tagged while keeping untagged; --only + --only-tag = AND
  semantics; empty-result error wording; project pre-filter doesn't
  throw, surfaces as info log.
- Commands.test API extended with includeTags/excludeTags (no defaults);
  all 12 existing callers updated to pass None.
- JCommands (Java surface) also updated.
- Error path: tag-side empty triggers the same TaskResult.Failure path
  --only used to own, with a richer message; --exclude-tag emptying the
  set is intentional and stays silent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a filter (--only, --exclude, --only-tag, --exclude-tag) is active OR when
--only-tag pre-filtered any project, the summary now ends with two new lines
under the existing Tests/Suites/Duration block:

    Projects: 1/2 selected (1 pre-filtered by --only-tag slow: bleep-bsp-tests)
    Filters active: --only NoSuchTest · --only-tag slow

"N/M selected" is in CrossProjectName terms — i.e. post-glob-expansion — because
that's the layer ReactiveBsp lives at. The user's typed globs (jvm3, prefixes)
are resolved by ProjectGlobs upstream, so they're not preserved at this layer;
the documentation on FilterContext spells this out.

Plumbing:
- New FilterContext case class in BuildDisplay.scala.
- BuildSummary gains `filterContext: Option[FilterContext]` (required field,
  defaults to None at every construction site per the no-defaults rule).
- BuildDisplay.printSummary signature changes to take `Option[FilterContext]`;
  both real impls plus the legacy ReactiveTestRunner path updated.
- ReactiveBsp builds the context in runOnce and threads it through
  runInProcess / runWithBleepBsp / printFinalSummary.
- BuildSummary.formatSummary renders the two new lines only when the filter
  did something the user might want to see.

Tests: TestTagsIT gains "summary reports projects-selected ratio..." plus log
assertions on the new lines for existing scenarios. Renamed one earlier IT to
drop a `/` that broke Files.createTempDirectory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r native-image jobs

Brings in the testTags manifest, --only-tag / --exclude-tag CLI surface, the
TestTagFilter pure logic + tests, the BSP-side filter wiring with stage-by-stage
error messages, the FilterContext block in the build summary, and the bleep.yaml
slow-tag declaration for **IT in bleep-tests with the matching CI exclusion in
.github/workflows/build.yml.
@oyvindberg oyvindberg changed the title Restore test coverage on all native-image architectures Test-tagging strategy for faster, consistent native-image builds May 18, 2026
oyvindberg and others added 3 commits May 19, 2026 00:20
Covers the testTags manifest syntax (glob semantics, single vs array values),
CLI surface (--only-tag / --exclude-tag with union/subtractive rules + strict
validation), project pre-filter optimization, list-tests inspection, summary
diagnostics, the multi-stage pipeline error wording, the CI recipe that bleep
itself uses for fast per-arch native-image validation, and a comparison table
against framework-native tagging.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The model.Project case class gained a `testTags: JsonMap[String, JsonSet[String]]`
field in 3d68163. The model file and the bleep.yaml-driven test projects
compiled clean locally because Zinc cached the importer/generator translation
units. CI cold-builds, surfacing four sites that construct Project explicitly:

- bleep-cli/src/scala/bleep/mavenimport/buildFromMavenPom.scala (main + test)
- bleep-cli/src/scala/bleep/mavenimport/generateBuildFromMaven.scala (scripts)
- bleep-cli/src/scala/bleep/sbtimport/generateBuild.scala (scripts)
- bleep-cli/src/scala/bleep/sbtimport/buildFromBloopFiles.scala (per-cross)

All four now pass `testTags = model.JsonMap.empty` in the correct position.
While here, type-annotate the bare `JsonSet.empty` calls in two of the same
files — without an inferrable expected type the Ordering instance was
ambiguous (Short vs Int both match Ordering[Any]) once shifted by the new
field's position.
Same fix as the importer/generator commit (8265f9d) for the two
`model.Project(...)` call sites in BuildCreateNew (empty + main proj).
@oyvindberg oyvindberg changed the title Test-tagging strategy for faster, consistent native-image builds Faster, more consistent CI: split slow tests, two-phase native-image, restore arch coverage May 19, 2026
GraalVM's `-Ob` (alias `-O0`) skips advanced inlining / escape-analysis /
etc., trading runtime performance for 30-50% faster build. This is the
right trade for every non-release native-image invocation: PR / master
matrix runs (binary gets thrown away after the test step) and local
`bleep native-image` (testing, not benchmarking).

Release tag builds keep full `-O2` (default). Signal: `GITHUB_REF` env
var startswith `refs/tags/v`, same condition the `release` job in
build.yml uses to gate itself.

The chosen mode is logged at the top of each build so a reviewer can
confirm which mode produced a given artifact:

    native-image build mode [GITHUB_REF => refs/heads/master,
      mode => -Ob (quick build, snapshot/PR/local)]

Targets the macos-latest (arm64, 3 cpu / 7 GB) runner which was spending
~17 min in `native-image` proper. With -Ob that should drop closer to
~10-12 min; full job time from 35 min toward ~25.

Local: `bleep compile` clean, `bleep test` 793/793 pass.
@oyvindberg oyvindberg merged commit c6adee1 into master May 19, 2026
10 checks passed
@oyvindberg oyvindberg deleted the skip-kotlin-native-arm64-linux branch May 19, 2026 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant