fix(boot): unblock cold-boot core start on dev hosts#1324
fix(boot): unblock cold-boot core start on dev hosts#1324senamakel merged 6 commits intotinyhumansai:mainfrom
Conversation
Three regressions surfaced by tinyhumansai#1316's BootCheckGate flow on cold boot: - redux-persist's `lib/storage` default import resolves to the module namespace under Vite's CJS dep pre-bundling, so `coreMode`'s storage adapter had no `getItem` and rehydrate threw on first paint. Replace with an inline localStorage-backed adapter matching `userScopedStorage`'s shape. - `start_core_process` was registered as an invoke handler but missing from the Tauri capability ACL, so the gate's spawn invoke was rejected with "Command not found". Add it to allow-core-process. - A stale `~/Library/LaunchAgents/com.openhuman.core.plist` left over from a prior worktree's `service install` (KeepAlive=true) re-spawned the daemon after every SIGKILL, defeating `ensure_running`'s stale-listener takeover and producing "signaled pid <X> but port 7788 remained bound after 5000ms". Bootout + unlink the plist on macOS startup before `CoreProcessHandle::new`.
Two more cold-boot regressions surfaced once start_core_process was reachable: - BootCheckGate's `waitForCore` polled `openhuman.ping`, but the core registers ping under the Tier-1 dispatcher namespace as `core.ping` (see src/core/dispatch.rs). Every probe came back with -32601, the 10s budget elapsed, and the gate showed "Local core did not respond in time" despite the core being healthy. Switch the probe — and the matching test fixtures — to `core.ping`. - A second `start_core_process` invocation (e.g. HMR re-mounting the gate) saw the embedded server's own port as bound and walked into the tinyhumansai#1130 stale-listener takeover path against itself, producing "port 7788 rebounded to pid <X> after terminating pid <Y>; refusing to kill a different process". Add an idempotent fast path: if the in-process task is still alive and the port is open, the listener is us — return Ok without identifying or killing anything. The takeover path stays for genuinely external leftover binaries.
`openhuman.update_version` returns a flat VersionInfo
(`{ version, target_triple, asset_prefix }`) — see
src/openhuman/update/ops.rs::update_version. The boot-check was reading
`result.version_info.version`, a path that doesn't exist, so coreVersion
was always '' and every cold boot landed on `outdatedLocal` ("Local core
needs a restart") even when the running core matched the app build.
Read `result.version` directly and update the test fixtures to match.
`openhuman.update_version` is wrapped by RpcOutcome::single_log, so the
JSON-RPC response shape is `{ result: VersionInfo, logs }` (see
src/rpc/mod.rs::into_cli_compatible_json). The previous reader looked at
`result.version` and skipped the outer `result` wrapper, so coreVersion
was '' and the boot gate kept re-prompting "Local core needs a restart"
even after the user clicked through.
Read `response.result.version` and update the matching test fixtures.
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR enables Tauri ChangesCore process startup and boot-check integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src-tauri/src/lib.rs`:
- Around line 1357-1403: The macOS cleanup always runs and unconditionally calls
launchctl bootout on STALE_LABEL and deletes
~/Library/LaunchAgents/{STALE_LABEL}.plist; restrict this to the
stale-dev-worktree recovery path by either gating the whole block behind a
debug/dev-only check (e.g., cfg!(debug_assertions) or an env var) or by
validating the plist points at a different executable/worktree before doing
bootout/remove: read the plist (or its Program/ProgramArguments) and compare
with std::env::current_exe(), only run the
Command::new("launchctl").arg("bootout").arg(&target) and
std::fs::remove_file(&plist) when the plist is stale; ensure you keep the
STALE_LABEL constant and the existing uid/bootstrap logic but skip
bootout/remove for a matching/active daemon so normal service install +
daemon_mode is not torn down.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2e7803e4-490e-4f87-b0e8-2b6b4ba482f9
📒 Files selected for processing (6)
app/src-tauri/permissions/allow-core-process.tomlapp/src-tauri/src/core_process.rsapp/src-tauri/src/lib.rsapp/src/lib/bootCheck/index.test.tsapp/src/lib/bootCheck/index.tsapp/src/store/index.ts
CodeRabbit (PR tinyhumansai#1324) flagged that the unconditional purge would also tear down a legitimate `service install` on every restart. Tighten the gate so the cleanup only fires when: - `cfg!(debug_assertions)` — release builds never auto-purge. - `!daemon_mode` — the process running as the daemon doesn't kill its own plist out from under itself. - the plist's `ProgramArguments[0]` points at a binary other than `current_exe()` — so a `service install` against THIS build is preserved; only sibling-worktree leftovers get zapped. The plist parse is intentionally minimal (first `<string>` after the `ProgramArguments` key, no full XML parser) — the installer always writes a single absolute binary path there (see src/openhuman/service/macos.rs).
Coverage Gate failed on PR tinyhumansai#1324 because the inline `localStorageAdapter` in `app/src/store/index.ts` was 13/18 missed lines (top-level module init isn't exercised by any test, so the try/catch arms inside getItem/setItem/ removeItem all read as uncovered). Extract the adapter to `app/src/store/localStorageAdapter.ts` and add 8 unit tests covering: success path, missing-key path, and the swallowed- throw path for each of the three methods, plus a shape assertion that each returns a Promise (so redux-persist's await machinery is satisfied even when the underlying call throws).
Summary
redux-persist/lib/storagedefault import with an inlinelocalStorageadapter (Vite CJS dep pre-bundling resolved it to the module namespace, so cold boot threwstorage.getItem is not a functionbefore rehydrate).start_core_processto the Tauri capability ACL so the BootCheckGate's spawn invoke is no longer rejected with "Command not found" (it was already a registered handler — just unreferenced inallow-core-process.toml).~/Library/LaunchAgents/com.openhuman.core.plistleft by a prior worktree'sservice installduring Tauri startup so itsKeepAlive=truedaemon stops re-binding port 7788 on every SIGKILL.openhuman.ping(unregistered) tocore.ping(Tier-1 dispatcher insrc/core/dispatch.rs).start_core_processinvocation (HMR re-mount).response.result.version—update_versionis wrapped byRpcOutcome::single_log, so the JSON-RPC body is{ result: VersionInfo, logs }. Earlier readers usedresult.version_info.versionand thenresult.version; both fell through to '' and pinned every boot to "outdated local".Problem
After #1316 added the BootCheckGate, cold boot on a dev macOS host failed in six different ways stacked on top of each other:
redux-persist/lib/storageexposed only the module namespace, socoreMode's persist adapter had nogetItemand the renderer crashed before paint withUncaught TypeError: storage.getItem is not a function.start_core_processTauri command was registered but missing from the capability ACL →Command not found.KeepAlive=true) re-spawned the daemon after every SIGKILL, defeating the Proactively kill stale old OpenHuman RPC processes before the Tauri app reuses their port #1130 stale-listener takeover and producingsignaled pid <X> but port 7788 remained bound after 5000ms.openhuman.ping, which the core doesn't register — every retry returned-32601 Method not found, the 10s budget elapsed, and the gate showed "Local core did not respond in time" with a healthy core sitting underneath.start_core_process(HMR re-mounting BootCheckGate) saw the embedded server's own port as bound and walked into the takeover path against itself, producingport 7788 rebounded to pid <X> after terminating pid <Y>; refusing to kill a different process.{ result, logs }).End result: the BootCheckGate was effectively unreachable on a freshly-checked-out dev tree.
Solution
Six narrow, root-cause fixes — one per failure mode, each with a code comment naming the symptom so the next person hitting it can grep their way back here:
app/src/store/index.ts— inlinelocalStorageadapter forcoreMode, matching the shape ofuserScopedStorage. Same async semantics as the redux-persist default; just doesn't depend on Vite's CJS interop.app/src-tauri/permissions/allow-core-process.toml— addstart_core_process.app/src-tauri/src/lib.rs—setupclosure now runs a best-effortlaunchctl bootout+fs::remove_fileon~/Library/LaunchAgents/com.openhuman.core.plistbeforeCoreProcessHandle::new. Best-effort: missingHOME, missingid, plist already absent — all degrade silently. macOS-only viacfg(target_os = "macos").app/src/lib/bootCheck/index.ts— probecore.ping(Tier-1), readresponse.result.version. Doc comments reference the upstream contracts (src/core/dispatch.rs,src/openhuman/update/ops.rs,src/rpc/mod.rs).app/src-tauri/src/core_process.rs— idempotent fast path at the top ofensure_running: ifself.taskisSomeand unfinished and the port is open, returnOkwithout identifying or killing the listener. Takeover path stays intact for genuinely external leftover binaries.app/src/lib/bootCheck/index.test.tsupdated to match the wrapped response shape andcore.pingmethod name.Verified end-to-end on a real cold boot under
pnpm dev:app:start_core_processinvokes once → embedded core binds 7788 →core.pingsucceeds →update_versionreturns{ result: { version: '0.53.18', ... } }→ gate transitions tomatchand renders children.restart_core_processand HMR re-mount paths also exercise the new self-takeover guard cleanly.Submission Checklist
docs/TESTING-STRATEGY.mdsrc/openhuman/about_app/) doesn't track ACL/persist-adapter wiringdocs/TESTING-STRATEGY.md)pnpm dev:appon the boot-gated branchImpact
localStoragekeys redux-persist's default did.com.openhuman.corelabel and lives in the user's~/Library/LaunchAgents. No privileged paths touched.openhuman.pingwas failing instantly with-32601, nowcore.pingsucceeds on the first try).Related
[imessage] core RPC token is not initializedwarning during boot is a benign startup race in the imessage scanner — worth a smallwait_for_tokengate but out of scope here. TheDaemonHealth - setting status to disconnectedwarning underpnpm dev:applikely needs the legacy daemon-mode health check disabled now that the core is always embedded.AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
Validation Blocked
Behavior Changes
Parity Contract
redux-persiststorage shape is unchanged (same async `getItem/setItem/removeItem` contract, same key prefixes).Summary by CodeRabbit
Bug Fixes
Tests