Skip to content

Fix VM startup timeout by pinning BoxLite runtime to Cargo.lock revision#8

Open
vaayne wants to merge 2 commits intoboxlite-ai:mainfrom
vaayne:fix/pin-boxlite-runtime-version
Open

Fix VM startup timeout by pinning BoxLite runtime to Cargo.lock revision#8
vaayne wants to merge 2 commits intoboxlite-ai:mainfrom
vaayne:fix/pin-boxlite-runtime-version

Conversation

@vaayne
Copy link

@vaayne vaayne commented Mar 5, 2026

Summary

Fixes #7 — boxes stuck in "creating" status with Timeout waiting for guest ready (30s).

Root Cause

The release workflow and install-local.sh clone BoxLite at HEAD (latest main) to build the runtime binaries (boxlite-shim, boxlite-guest, dylibs), but the boxrun server binary is compiled against the boxlite version pinned in Cargo.lock.

When BoxLite's InstanceSpec struct changes between these two versions, the server produces JSON config that the shim cannot deserialize:

  • Server (compiled against boxlite cc236c4/v0.5.10): serializes InstanceSpec with parent_pid field, without exit_file/stderr_file
  • Shim (built from latest main): expects exit_file and stderr_file, doesn't expect parent_pid
  • Result: shim crashes immediately with missing field "exit_file", server sees a 30s timeout

Fix

Extract the boxlite git revision from Cargo.lock and git checkout that exact commit before building the runtime. This ensures the server and shim always use the same InstanceSpec schema.

Changed files:

  • .github/workflows/release.yml — pin BoxLite checkout in CI release build
  • scripts/install-local.sh — pin BoxLite checkout in local install script

Test plan

  • Verify grep -A2 'name = "boxlite"' Cargo.lock | sed -n 's/.*#\([0-9a-f]*\)".*/\1/p' extracts the correct commit hash
  • Build release with the pinned workflow and confirm boxrun shell ubuntu works
  • Confirmed v0.2.0 (which had matching versions) works correctly on macOS 15

vaayne added 2 commits March 5, 2026 09:21
The release workflow and install script clone BoxLite at HEAD to build
the runtime (shim, guest, dylibs), but the boxrun server binary is
compiled against the boxlite version pinned in Cargo.lock. When
BoxLite's InstanceSpec struct changes between these versions, the
server produces JSON config that the shim cannot deserialize, causing
"missing field" errors and VM startup timeouts.

This was the root cause of boxlite-ai#7: the server (compiled against boxlite
cc236c4/v0.5.10) serialized InstanceSpec with `parent_pid` but without
`exit_file`/`stderr_file`, while the shim (built from latest main)
expected the opposite set of fields.

Fix: extract the boxlite git revision from Cargo.lock and checkout
that exact commit before building the runtime, ensuring the server
and shim always agree on the InstanceSpec schema.
The boxrun binary links against runtime dylibs (libkrun, libgvproxy)
via @rpath. The release workflow adds the rpath with install_name_tool,
but the local install script was missing this step, causing a dyld
"no LC_RPATH's found" error on macOS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Box stuck in 'creating' status on Mac OS Tahoe 26

1 participant