Skip to content

fix: don't kill sibling sessions when sessions start runs without a TTY#43

Open
andreakiro wants to merge 1 commit into
mainfrom
fix/parallel-sessions-start-race
Open

fix: don't kill sibling sessions when sessions start runs without a TTY#43
andreakiro wants to merge 1 commit into
mainfrom
fix/parallel-sessions-start-race

Conversation

@andreakiro
Copy link
Copy Markdown
Member

Summary

Fixes a correctness bug where notte sessions start invoked from a non-TTY context (subprocess, CI, parallel script) silently kills whatever session ID happens to be in ~/.notte/cli/current_session. In parallel workloads, workers race to overwrite that shared file and then stop each other's live sessions, surfacing later as NotteApiSessionTimeoutError: Session ... has been closed on unrelated page calls.

Root cause is three behaviors combined:

  • ~/.notte/cli/current_session is global, file-based, unsynchronized.
  • runSessionsStart auto-stops whatever ID it finds there, with no ownership check.
  • confirmReplaceSession reads stdin, gets EOF, swallows the error, and treats empty input as the [Y/n] default → YES.

What's in this PR

  1. confirm.goconfirmReplaceSession, confirmReplaceAgent, and ConfirmStop now return false when stdin is not a terminal (unless --yes was passed). Matches the convention in apt, git, gh of refusing destructive defaults in non-interactive mode. The *WithIO helpers are unchanged so existing tests keep their interactive contract.
  2. sessions.go / agents.gorunSessionsStart and runAgentsStart skip writing current_session / current_agent / current_session_expiry / current_viewer_url when stdin is not a terminal. This stops scripted invocations from clobbering each other's state and from mutating state the user's interactive shell depends on. Scripted callers should pass --session-id or NOTTE_SESSION_ID explicitly.

isInteractiveStdin is indirected through a package-level var so tests can flip it. golang.org/x/term is already a dependency (used by internal/update).

Behavior changes

Context Before After
Interactive TTY, existing session, [Enter] stops existing, starts new (unchanged) same
Interactive TTY, existing session, n starts new without stopping (unchanged) same
Non-TTY (script), existing session EOF → "yes" → stops sibling's session does not stop; starts new in parallel
Non-TTY (script), --yes stops, starts new same
Non-TTY (script), new session writes shared current_session (race) does not write
Interactive TTY, new session writes current_session (unchanged) same

Scripts that relied on the auto-stop behavior (unlikely; it was a footgun) need to pass --yes explicitly. The README/CHANGELOG should mention this in the next release notes.

Test plan

  • go test ./... passes locally
  • New confirm_test.go tests cover non-TTY default for replace session/agent and stop, plus --yes override
  • New sessions_test.go tests verify (a) non-TTY start does not call /sessions/{id}/stop on a pre-populated current_session, and (b) non-TTY start does not write current_session
  • Existing TestSessionsStart_SetsCurrentSession / _SavesExpiry / _AutoClearsExpiredSession / _DoesNotAutoClearNonExpiredSession / TestAgentsStart_SetsCurrentAgent now opt into withInteractiveStdinForTest (they were always testing the interactive contract; go test runs without a TTY by default)
  • Manual repro from the PRD on a built binary:
    notte sessions start &
    notte sessions start &
    notte sessions start &
    wait
    NOTTE_API_KEY=... notte sessions list --only-active   # expect 3 active sessions
    

Follow-ups (not in this PR)

  • Scope current_session by API-key hash so different accounts on the same machine never collide interactively. Suggested in the PRD as additional hardening on top of (A) + skip-write-if-not-TTY.
  • Consider a release-note line so users who were silently relying on the auto-stop behavior know to pass -y.

When `notte sessions start` is invoked from a subprocess, CI, or
parallel script, stdin is EOF. The replace-existing-session prompt
treats EOF as "Enter" and defaults to YES, so the CLI silently issues
DELETE /sessions/{id}/stop on whatever ID happens to be in the shared
~/.notte/cli/current_session file - which may belong to a sibling
worker's live session.

Two changes:

1. confirmReplaceSession, confirmReplaceAgent and ConfirmStop now
   return false when stdin is not a terminal (unless --yes was passed).
   Matches the convention used by apt, git, gh.

2. runSessionsStart and runAgentsStart skip writing the global
   current_session / current_agent / current_session_expiry /
   current_viewer_url files when stdin is not a terminal. This stops
   scripted invocations from mutating state the interactive shell
   relies on, and removes the race where two parallel workers
   overwrite each other's current_session.

Scripted callers should thread the session ID explicitly via
--session-id or NOTTE_SESSION_ID; this PR makes that the only safe
option for non-interactive use.
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR fixes a correctness bug where notte sessions start invoked in non-TTY contexts (CI, parallel scripts) would silently kill sibling sessions via an EOF-as-yes confirmation and a shared ~/.notte/cli/current_session file. The fix is sound for the core race: confirmReplaceSession and confirmReplaceAgent now return false in non-TTY, and the shared state files are no longer written outside of interactive sessions.

  • confirm.go: Adds isInteractiveStdin (overridable for tests) and gates confirmReplaceSession, confirmReplaceAgent, and ConfirmStop on TTY detection; --yes bypasses the check in all cases.
  • sessions.go / agents.go: Writing current_session, current_agent, expiry, and viewer-URL files is now gated on isInteractiveStdin(), eliminating the write-race between parallel workers.
  • Tests: New unit tests cover non-TTY defaults and --yes override; existing interactive-contract tests are correctly annotated with withInteractiveStdinForTest.

Confidence Score: 4/5

The start-command race fix is correct and well-tested, but the same non-TTY guard applied to ConfirmStop causes explicit notte sessions stop/notte agents stop commands in CI scripts to silently exit 0 without stopping the targeted resource.

The core fix (preventing accidental sibling-session kills on start) is correct. However, ConfirmStop — used by the explicit stop subcommand, not the replacement flow — is also guarded, so CI cleanup scripts that call notte sessions stop $ID without --yes now silently no-op with exit 0, potentially leaking active sessions.

internal/cmd/confirm.go — specifically the ConfirmStop non-TTY guard and how it interacts with runSessionStop and runAgentStop

Important Files Changed

Filename Overview
internal/cmd/confirm.go Adds isInteractiveStdin var and non-TTY guards to confirmReplaceSession, confirmReplaceAgent, and ConfirmStop — correctly fixes the auto-replace bug, but ConfirmStop guard also silently blocks explicit stop commands in non-TTY
internal/cmd/sessions.go Skips writing current_session/expiry/viewer_url in non-TTY; correct fix for the race condition root cause
internal/cmd/agents.go Mirrors sessions.go fix: skips writing current_agent file in non-TTY mode
internal/cmd/confirm_test.go Good coverage of non-TTY defaults and --yes override for all three confirm functions; defines withNonInteractiveStdin helper
internal/cmd/sessions_test.go Adds two targeted regression tests for the core bug fix; correctly adds withInteractiveStdinForTest to existing tests that test the interactive contract
internal/cmd/agents_test.go Adds withInteractiveStdinForTest to TestAgentsStart_SetsCurrentAgent to correctly model the interactive-only write behavior

Fix All in Claude Code

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
internal/cmd/confirm.go:114-122
**Explicit `stop` commands silently no-op with exit 0 in non-TTY**

The non-TTY guard added to `ConfirmStop` means `notte sessions stop <id>` and `notte agents stop <id>` called from a CI script without `--yes` now print "Cancelled." and return exit 0 without stopping anything. Before this PR, EOF on stdin caused the `[Y/n]` default to resolve to `true`, so explicit stops worked as expected in scripts.

The `confirmReplaceSession`/`confirmReplaceAgent` guards are correct — those are the functions on the start-command path that caused sibling sessions to be killed. But `ConfirmStop` is called from the explicit `stop` subcommand, where the user has already identified the target ID. A CI cleanup step like `notte sessions stop $SESSION_ID` silently leaks the session, and a caller checking only exit code gets no signal that the stop was skipped.

### Issue 2 of 2
internal/cmd/confirm_test.go:185-186
**Duplicate non-TTY helper with inconsistent name**

`withNonInteractiveStdin` here and `withNonInteractiveStdinForTest` in `sessions_test.go` are identical functions in the same package. Consolidating to one name avoids confusion about which helper to use when adding future tests.

```suggestion
// withNonInteractiveStdinForTest temporarily overrides isInteractiveStdin to report false.
func withNonInteractiveStdinForTest(t *testing.T) {
```

Reviews (2): Last reviewed commit: "fix: don't kill sibling sessions when se..." | Re-trigger Greptile

Comment thread internal/cmd/confirm.go
Comment on lines 114 to 122
func ConfirmStop(resource, id string) (bool, error) {
if skipConfirmation {
return true, nil
}
if !isInteractiveStdin() {
return false, nil
}
return ConfirmStopWithIO(os.Stdin, os.Stderr, resource, id)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Explicit stop commands silently no-op with exit 0 in non-TTY

The non-TTY guard added to ConfirmStop means notte sessions stop <id> and notte agents stop <id> called from a CI script without --yes now print "Cancelled." and return exit 0 without stopping anything. Before this PR, EOF on stdin caused the [Y/n] default to resolve to true, so explicit stops worked as expected in scripts.

The confirmReplaceSession/confirmReplaceAgent guards are correct — those are the functions on the start-command path that caused sibling sessions to be killed. But ConfirmStop is called from the explicit stop subcommand, where the user has already identified the target ID. A CI cleanup step like notte sessions stop $SESSION_ID silently leaks the session, and a caller checking only exit code gets no signal that the stop was skipped.

Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/cmd/confirm.go
Line: 114-122

Comment:
**Explicit `stop` commands silently no-op with exit 0 in non-TTY**

The non-TTY guard added to `ConfirmStop` means `notte sessions stop <id>` and `notte agents stop <id>` called from a CI script without `--yes` now print "Cancelled." and return exit 0 without stopping anything. Before this PR, EOF on stdin caused the `[Y/n]` default to resolve to `true`, so explicit stops worked as expected in scripts.

The `confirmReplaceSession`/`confirmReplaceAgent` guards are correct — those are the functions on the start-command path that caused sibling sessions to be killed. But `ConfirmStop` is called from the explicit `stop` subcommand, where the user has already identified the target ID. A CI cleanup step like `notte sessions stop $SESSION_ID` silently leaks the session, and a caller checking only exit code gets no signal that the stop was skipped.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant