Summary
Two related issues that together caused all 30 forager bees to crash-loop without any clear signal from hum doctor or hum bee --list.
Issue 1: thrum socket path inconsistency
humd plist sets HUM_THRUM_SOCK=/tmp/hum-501/hum/thrum.sock (the XDG_RUNTIME_DIR/hum/thrum.sock convention).
All bee plists (generated by hum bee enter / hive install) set HUM_THRUM_SOCK=/tmp/hum-501/thrum.sock (the old XDG_RUNTIME_DIR/thrum.sock path).
When humd crashes and restarts, it creates a new socket at its own path (…/hum/thrum.sock). Every bee still points at the old path → "connection refused" on every connect attempt → all bees crash-loop with exit code 1.
Observed: after humd died overnight, all 30 bees showed exit code 1 in launchctl list. None could connect until the humd plist was manually edited to match the bee plist path, then humd was restarted.
Workaround applied: changed humd plist from /tmp/hum-501/hum/thrum.sock → /tmp/hum-501/thrum.sock to match the bee plists. Restarted humd. All bees connected within seconds.
Root fix needed: either (a) hum bee enter / hive install should write the same socket path that humd actually binds, or (b) when humd restarts it should bind at the path the bee plists already reference. The two plists should be kept in sync by the same codepath that generates them.
Issue 2: hum doctor false positive on stale socket
Expected: hum doctor detects that the thrum socket is not accepting connections.
Actual: hum doctor reports thrum sock: /tmp/hum-501/thrum.sock ✓ present even when the socket file is stale (humd crashed, nothing is listening). The check only tests Path::exists(), not connectivity.
Evidence: during the crash-loop period, hum doctor output showed ✓ for the socket. Connecting to the socket via any client gave "connection refused (os error 61)". nc -z -U /tmp/hum-501/thrum.sock returned exit 1.
Fix: the doctor check should attempt a brief connect to the socket (e.g. write a byte, expect a breath back within 1s). If the connect fails or times out, report the socket as broken, not present.
Issue 3: hum bee --list does not surface crash-loop state
hum bee --list reported all 30 bees as in nest (service running) throughout the crash-loop period. The actual launchctl state was exit code 1, null PID for most bees. hum doctor did surface the exit codes, but only in the detailed [bees] section which is easy to miss.
Suggestion: hum bee --list should show a warning indicator (e.g. ⚠ crash-looping (exit 1)) for any bee whose launchctl state shows a non-zero exit code and null PID.
Reproduction
- Start all bees normally (they're running, all connected).
- Kill humd (
pkill humd) and wait for launchctl to restart it.
- Observe: if humd plist and bee plists have different
HUM_THRUM_SOCK paths, all bees crash-loop immediately.
- Run
hum doctor — socket shows ✓ present.
- Run
hum bee --list — bees show "in nest (service running)".
Environment
- hum CLI 0.31.16
- humd 0.31.16 (thrum 0.7.0)
- macOS aarch64
- 30 forager bees (daman swarm)
Summary
Two related issues that together caused all 30 forager bees to crash-loop without any clear signal from
hum doctororhum bee --list.Issue 1: thrum socket path inconsistency
humd plist sets
HUM_THRUM_SOCK=/tmp/hum-501/hum/thrum.sock(theXDG_RUNTIME_DIR/hum/thrum.sockconvention).All bee plists (generated by
hum bee enter/ hive install) setHUM_THRUM_SOCK=/tmp/hum-501/thrum.sock(the oldXDG_RUNTIME_DIR/thrum.sockpath).When humd crashes and restarts, it creates a new socket at its own path (
…/hum/thrum.sock). Every bee still points at the old path → "connection refused" on every connect attempt → all bees crash-loop with exit code 1.Observed: after humd died overnight, all 30 bees showed exit code 1 in
launchctl list. None could connect until the humd plist was manually edited to match the bee plist path, then humd was restarted.Workaround applied: changed humd plist from
/tmp/hum-501/hum/thrum.sock→/tmp/hum-501/thrum.sockto match the bee plists. Restarted humd. All bees connected within seconds.Root fix needed: either (a)
hum bee enter/ hive install should write the same socket path that humd actually binds, or (b) when humd restarts it should bind at the path the bee plists already reference. The two plists should be kept in sync by the same codepath that generates them.Issue 2: hum doctor false positive on stale socket
Expected:
hum doctordetects that the thrum socket is not accepting connections.Actual:
hum doctorreportsthrum sock: /tmp/hum-501/thrum.sock ✓ presenteven when the socket file is stale (humd crashed, nothing is listening). The check only testsPath::exists(), not connectivity.Evidence: during the crash-loop period,
hum doctoroutput showed ✓ for the socket. Connecting to the socket via any client gave "connection refused (os error 61)".nc -z -U /tmp/hum-501/thrum.sockreturned exit 1.Fix: the doctor check should attempt a brief connect to the socket (e.g. write a byte, expect a
breathback within 1s). If the connect fails or times out, report the socket as broken, not present.Issue 3: hum bee --list does not surface crash-loop state
hum bee --listreported all 30 bees asin nest (service running)throughout the crash-loop period. The actual launchctl state was exit code 1, null PID for most bees.hum doctordid surface the exit codes, but only in the detailed [bees] section which is easy to miss.Suggestion:
hum bee --listshould show a warning indicator (e.g.⚠ crash-looping (exit 1)) for any bee whose launchctl state shows a non-zero exit code and null PID.Reproduction
pkill humd) and wait for launchctl to restart it.HUM_THRUM_SOCKpaths, all bees crash-loop immediately.hum doctor— socket shows ✓ present.hum bee --list— bees show "in nest (service running)".Environment