fix: harden control-plane health diagnostics#71
fix: harden control-plane health diagnostics#71RaviTharuma wants to merge 2 commits intoAVIDS2:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 68800cc2ff
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (process.platform === 'linux') { | ||
| return execSync(`tr '\0' ' ' < /proc/${pid}/cmdline`, { encoding: 'utf-8' }).trim(); | ||
| } | ||
| return execSync(`ps -p ${pid} -o command=`, { encoding: 'utf-8' }).trim(); |
There was a problem hiding this comment.
Replace Unix-only process lookup for non-Linux platforms
The non-Linux branch runs ps -p ... -o command= to identify whether the PID belongs to memorix serve-http, but on Windows this command is not available under the default cmd.exe shell. In that environment isMemorixBackgroundProcess will always hit the catch path and return false, so background status can mislabel a real unhealthy Memorix PID as “reused” and delete background.json, leaving the running control plane unmanaged.
Useful? React with 👍 / 👎.
| const timer = setTimeout(() => controller.abort(), timeoutMs); | ||
| const res = await fetch(`http://127.0.0.1:${port}${CONTROL_PLANE_HEALTH_PATH}`, { | ||
| signal: controller.signal, | ||
| }); | ||
| clearTimeout(timer); |
There was a problem hiding this comment.
Clear health-check abort timer on failure paths
The timeout is only cleared after a successful fetch, so when the request fails fast (for example, ECONNREFUSED) the function returns from catch but leaves a pending timer alive until timeoutMs elapses. Since doctor now uses this shared helper for its probes, failed checks can keep the CLI process running for an extra 2–3 seconds per probe even though the network failure happened immediately.
Useful? React with 👍 / 👎.
Summary
memorix doctorto probe the live HTTP control-plane endpoint instead of/healthmemorix background statusfrom misclassifying a realmemorix serve-httpPID as an unrelated reused processVerification
npm test -- --run tests/cli/control-plane-shared.test.tsnpm test -- --run tests/integration/serve-http.test.tsnpm run buildnpm run lintLinked issue