Skip to content

Fix operator session recovery: timeout, attempt persistence, interrupt logging#12

Open
CMander02 wants to merge 1 commit intomainfrom
ziyan-todo-3
Open

Fix operator session recovery: timeout, attempt persistence, interrupt logging#12
CMander02 wants to merge 1 commit intomainfrom
ziyan-todo-3

Conversation

@CMander02
Copy link
Copy Markdown
Collaborator

Summary

  • Fix _looks_like_resume_failure operator precedence bug (and/or without parens)
  • Add stage_timeout (default 4h) with threading.Timer to prevent infinite hangs
  • Persist attempt_no across resumes via operator_state/<stage>.attempt_count.txt
  • Log KeyboardInterrupt events to logs_raw with elapsed time
  • Add --stage-timeout CLI parameter

Context

Example run 20260330_101222 showed Stage 05 entering a 7+ resume loop spanning two days. Root causes: no timeout, attempt counter reset on resume, and a logic bug in resume failure detection.

Test plan

  • 13 new tests in tests/test_operator_recovery.py — all passing
  • Manual: --fake-operator run → abort → resume → verify attempt_no continues

🤖 Generated with Claude Code

…t logging

- Add stage_timeout (default 4h) with threading.Timer to prevent infinite hangs
- Fix _looks_like_resume_failure operator precedence bug (and/or without parens)
- Log KeyboardInterrupt events to logs_raw with elapsed time
- Persist attempt_no across resumes via operator_state/<stage>.attempt_count.txt
- Add --stage-timeout CLI parameter
- Add .gitignore entries for docs/ and examples/
- 13 new tests covering all fixes

Addresses TODO 3 (V3-1 through V3-4) from vulnerability analysis.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant