Skip to content

fix(session): prevent resume data loss and add /resume slash command#3225

Closed
Mibayy wants to merge 1 commit intoNousResearch:mainfrom
Mibayy:fix/session-resume-data-loss-3123
Closed

fix(session): prevent resume data loss and add /resume slash command#3225
Mibayy wants to merge 1 commit intoNousResearch:mainfrom
Mibayy:fix/session-resume-data-loss-3123

Conversation

@Mibayy
Copy link
Copy Markdown
Contributor

@Mibayy Mibayy commented Mar 26, 2026

Summary

Closes #3123

Three bugs in the session resume flow, all cascading from the same root cause.


Bug 1 — Sessions recorded with zero messages (root cause)

AIAgent.__init__ calls create_session() for the session ID. In the CLI flow, __init__ already called create_session() earlier, so the INSERT hits a UNIQUE constraint. The exception handler did:

self._session_db = None  # prevent silent data loss on every subsequent flush

This is self-defeating: it caused data loss by making every subsequent append_message() call silently no-op. The session row existed in the DB with zero messages. --resume found the row and printed "has no messages. Starting fresh.".

Fix: Change INSERTINSERT OR IGNORE in SessionDB.create_session(). Re-registering an existing session is now a safe no-op. Also remove self._session_db = None — it made the situation worse, not better.


Bug 2 — --resume overwrites session JSON with truncated history (data loss)

This is what the new comment reports. Sequence:

  1. Session X has 200 messages in JSONL + session_X.json, but 0 in SQLite (due to bug 1)
  2. User runs hermes --resume X
  3. _init_agent loads 0 messages from SQLite, sets conversation_history = []
  4. User sends a message → agent runs → _save_session_log([user, assistant]) writes a 2-message JSON file → overwrites the 200-message session_X.json
  5. All history is gone

Fix: _save_session_log now reads the existing file before writing. If the file already has more messages than the current batch, the write is skipped. The full history is preserved until the in-memory messages list catches up through normal turns.


Bug 3 — /resume is an unknown command

The commands registry (hermes_cli/commands.py) lists resume without cli_only=True, implying CLI availability — but process_command() had no handler for it. Typing /resume <id> produced "Unknown command: /resume".

Fix: Add elif canonical == "resume": to process_command(). The handler:

  • Resolves the target by title or session ID (reuses _resolve_session_by_name_or_id)
  • Ends the current session cleanly in the DB
  • Switches self.session_id and loads conversation_history from SQLite
  • Re-opens the target session (clears ended_at)
  • Syncs the cached AIAgent instance if one is already initialised (session_id, _last_flushed_db_idx, system prompt cache)

Files changed

File Change
hermes_state.py INSERT OR IGNORE in create_session
run_agent.py Remove self._session_db = None; add truncation guard in _save_session_log
cli.py Add /resume handler in process_command

Tests

1661 gateway + session + run_agent tests pass, 21 skipped, 0 failures.

Three related bugs all rooted in the same session registration flow.

Bug 1 — session written to DB with zero messages (NousResearch#3123 root cause):
  AIAgent.__init__ calls create_session() for the session_id.  In the
  CLI flow, the CLI already called create_session() earlier, so the
  INSERT hits a UNIQUE constraint.  The exception handler set
  self._session_db = None, silently dropping every subsequent
  append_message call.  The session row existed in the DB but had no
  messages, so --resume found it and printed 'has no messages'.

  Fix: change INSERT to INSERT OR IGNORE in SessionDB.create_session().
  Re-registering an existing session is now a safe no-op.  Also remove
  the self._session_db = None fallback — it caused more harm than the
  original failure it was guarding against.

Bug 2 — --resume overwrites session JSON with truncated history:
  When --resume is used on a session with zero SQLite messages (due to
  bug 1), the CLI loads empty history.  The first turn then calls
  _save_session_log(messages=[user+assistant]) which atomically writes
  a 2-message session JSON file — overwriting the pre-existing file
  that had the full conversation history.  The user loses all data.

  Fix: _save_session_log now reads the existing file before writing.
  If the file has more messages than the current batch, the write is
  skipped.  The full history is preserved until the in-memory messages
  list catches up.

Bug 3 — /resume is an unknown command:
  The commands registry listed 'resume' but process_command() had no
  handler for it, so /resume <id> produced 'Unknown command'.

  Fix: add elif canonical == 'resume': to process_command().  The
  handler resolves the target by title or ID, ends the current session,
  switches session_id + conversation_history, re-opens the target
  session in the DB, and syncs the cached agent if one exists.

Closes NousResearch#3123
teknium1 added a commit that referenced this pull request Mar 27, 2026
…reopen_session API

Three improvements salvaged from PR #3225 by Mibayy:

1. Add /resume slash command handler in CLI process_command(). The
   command was registered in the commands registry but had no handler,
   so typing /resume produced 'Unknown command'. The handler resolves
   by title or session ID, ends the current session cleanly, loads
   conversation history from SQLite, re-opens the target session, and
   syncs the AIAgent instance. Follows the same pattern as new_session().

2. Add truncation guard in _save_session_log(). When resuming a session
   whose messages weren't fully written to SQLite, the agent starts with
   partial history and the first save would overwrite the full JSON log
   on disk. The guard reads the existing file and skips the write if it
   already has more messages than the current batch.

3. Add reopen_session() method to SessionDB. Proper API for clearing
   ended_at/end_reason instead of reaching into _conn directly.

Note: Bug 1 from the original PR (INSERT OR IGNORE + _session_db = None)
is already fixed on main — skipped as redundant.

Closes #3123.
teknium1 added a commit that referenced this pull request Mar 27, 2026
…reopen_session API (#3315)

Three improvements salvaged from PR #3225 by Mibayy:

1. Add /resume slash command handler in CLI process_command(). The
   command was registered in the commands registry but had no handler,
   so typing /resume produced 'Unknown command'. The handler resolves
   by title or session ID, ends the current session cleanly, loads
   conversation history from SQLite, re-opens the target session, and
   syncs the AIAgent instance. Follows the same pattern as new_session().

2. Add truncation guard in _save_session_log(). When resuming a session
   whose messages weren't fully written to SQLite, the agent starts with
   partial history and the first save would overwrite the full JSON log
   on disk. The guard reads the existing file and skips the write if it
   already has more messages than the current batch.

3. Add reopen_session() method to SessionDB. Proper API for clearing
   ended_at/end_reason instead of reaching into _conn directly.

Note: Bug 1 from the original PR (INSERT OR IGNORE + _session_db = None)
is already fixed on main — skipped as redundant.

Closes #3123.
@teknium1
Copy link
Copy Markdown
Contributor

Merged via PR #3315. Salvaged the session log truncation guard, the /resume CLI handler (with a proper reopen_session() API instead of raw SQL), and added the missing SessionDB method. Bug 1 (INSERT OR IGNORE) was already fixed on main so that piece was skipped. Thanks for the thorough analysis of the cascading failures!

@teknium1 teknium1 closed this Mar 27, 2026
StreamOfRon pushed a commit to StreamOfRon/hermes-agent that referenced this pull request Mar 29, 2026
…reopen_session API (NousResearch#3315)

Three improvements salvaged from PR NousResearch#3225 by Mibayy:

1. Add /resume slash command handler in CLI process_command(). The
   command was registered in the commands registry but had no handler,
   so typing /resume produced 'Unknown command'. The handler resolves
   by title or session ID, ends the current session cleanly, loads
   conversation history from SQLite, re-opens the target session, and
   syncs the AIAgent instance. Follows the same pattern as new_session().

2. Add truncation guard in _save_session_log(). When resuming a session
   whose messages weren't fully written to SQLite, the agent starts with
   partial history and the first save would overwrite the full JSON log
   on disk. The guard reads the existing file and skips the write if it
   already has more messages than the current batch.

3. Add reopen_session() method to SessionDB. Proper API for clearing
   ended_at/end_reason instead of reaching into _conn directly.

Note: Bug 1 from the original PR (INSERT OR IGNORE + _session_db = None)
is already fixed on main — skipped as redundant.

Closes NousResearch#3123.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
…reopen_session API (NousResearch#3315)

Three improvements salvaged from PR NousResearch#3225 by Mibayy:

1. Add /resume slash command handler in CLI process_command(). The
   command was registered in the commands registry but had no handler,
   so typing /resume produced 'Unknown command'. The handler resolves
   by title or session ID, ends the current session cleanly, loads
   conversation history from SQLite, re-opens the target session, and
   syncs the AIAgent instance. Follows the same pattern as new_session().

2. Add truncation guard in _save_session_log(). When resuming a session
   whose messages weren't fully written to SQLite, the agent starts with
   partial history and the first save would overwrite the full JSON log
   on disk. The guard reads the existing file and skips the write if it
   already has more messages than the current batch.

3. Add reopen_session() method to SessionDB. Proper API for clearing
   ended_at/end_reason instead of reaching into _conn directly.

Note: Bug 1 from the original PR (INSERT OR IGNORE + _session_db = None)
is already fixed on main — skipped as redundant.

Closes NousResearch#3123.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Session --resume not working

2 participants