Skip to content

Non-interactive runs should be fire-and-forget by default, add -f for blocking+streaming #197

@dpup

Description

@dpup

Context

The fix/interactive-only branch (#196) made non-interactive runs block until exit to fix a P0 bug where manager.Close() canceled monitorContainerExit before it could capture logs. But blocking with no visible output is the worst of both worlds — the user stares at a silent terminal while logs are only accessible from a second terminal via moat logs.

Root cause

The real bug isn't that non-interactive runs exit too fast — it's that manager.Close() cancels the context passed to monitorContainerExit, killing it before it can capture logs and update state. The fix should target that, not change the UX.

Proposed design

Non-interactive (default): fire-and-forget

  • moat run, moat claude -p "..." start the container and return immediately
  • Output: Started <name> (<id>). Use 'moat logs <id> -f' to follow output.
  • Logs, state, and cleanup handled by monitorContainerExit (which must survive Close())

Non-interactive + follow (-f): blocking with streamed logs

  • moat run -f, moat claude -p "..." -f block and stream container logs to stdout
  • Ctrl+C stops the run
  • On completion: shows exit status

Interactive (-i): blocking with terminal (already works)

  • No changes needed

Implementation

1. Fix monitorContainerExit lifecycle (fixes the P0)

The monitor goroutine should not use the manager's cancellable context for WaitContainer. Options:

Option A: Use context.Background() for the monitor

  • monitorContainerExit gets context.Background() instead of the manager's ctx
  • Close() sets a flag/closes a channel that the monitor checks after WaitContainer returns
  • Pro: Simple. Con: No way to cancel a stuck WaitContainer on shutdown.

Option B: Separate cancel for monitors

  • Manager tracks monitor goroutines via a sync.WaitGroup
  • Close() waits for monitors (with timeout) before canceling context
  • Pro: Clean shutdown. Con: More plumbing.

Option C: Close() waits for exitCh

  • Before canceling context, Close() waits on each run's exitCh with a short timeout
  • If the container is already exiting, this lets the monitor finish
  • Pro: Minimal change. Con: Adds latency to Close() for running containers.

Recommend Option A — the monitor should be fire-and-forget from the manager's perspective. A stuck WaitContainer is already handled by Docker/container runtime timeouts.

2. Revert non-interactive to fire-and-forget

  • Remove manager.Wait() + signal handling from exec.go non-interactive path
  • Restore "started in background" messages with log/stop hints
  • Keep the monitorContainerExit state/log fixes from this branch

3. Add -f / --follow flag

  • New flag on moat run and provider commands
  • When set: call manager.Wait() in parallel with manager.FollowLogs() streaming to stdout
  • Signal handler for Ctrl+C → manager.Stop()
  • On completion: print exit status

4. Update docs

  • Non-interactive default described as fire-and-forget
  • Document -f flag
  • Keep interactive docs as-is

What stays from #196

All the other fixes in the branch are solid and should merge:

  • Remove moat attach, --detach, escape-detach
  • ProviderRunner extraction
  • Interactive metadata updates (state/stopped_at)
  • Docker log flush wait (ContainerLogsAll)
  • Escape-stop state handling
  • daemonClient race fixes
  • close(exitCh) ordering fix

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions