Skip to content

v0.8.66: Adopt Moraine as CodeWhale's memory backend (ingestion adapter + MCP recall) #3495

Description

@Hmbown

Goal

Adopt Moraine (https://github.com/eric-tramel/moraine, Apache-2.0) as CodeWhale's long-term
agent-memory backend: Moraine ingests CodeWhale's persisted sessions losslessly and exposes them as
searchable MCP recall tools (search, search_conversations, list_sessions, get_session,
open). This supersedes the in-repo push/inject memory path (crates/tui/src/memory.rs, which
prepends a <user_memory> block into the system prompt) in favor of pull/recall via Moraine MCP.

Local-first, no prompt injection (MCP is read-only retrieval). CodeWhale keeps persisting sessions;
Moraine indexes them. Supersedes community memory PRs #3381 (@pkeging, memory tags) and
#2933 (@cy2311, hippocampal v2 — glossary/namespaces/rollback/auto-inject) — their ideas inform
the recall layer but the implementation is Moraine-backed, not in-repo.

This is scoped for v0.8.66, not 0.8.65.

Research already done (design is concrete)

Moraine fork: Hmbown/moraine (upstream eric-tramel/moraine, on main @ v0.6.2).

Key finding — the session_json ingest format already fits CodeWhale

crates/moraine-ingest-core/src/dispatch.rs::process_session_json_file reads a whole-file JSON
snapshot and extracts session_doc["messages"]exactly CodeWhale's shape
(~/.codewhale/sessions/{id}.json{ schema_version, metadata, messages[], artifacts, system_prompt }).
It builds synthetic session_meta + per-session_message records and checkpoints by message index
(incremental ingest), then dispatches each to the harness adapter's normalize().

The fork is already half-wired

crates/moraine-config/src/lib.rs::default_sources() already contains the CodeWhale entry:
harness: "codewhale", glob: "~/.codewhale/sessions/*.json", format: SOURCE_FORMAT_SESSION_JSON.

Work items (Moraine fork)

  1. Add adapter crates/moraine-ingest-core/src/sources/codewhale.rs implementing IngestSource
    (harness()="codewhale", format()=SessionJson), mirroring the Hermes session handlers
    (normalize_hermes_session_meta / normalize_hermes_session_message at hermes.rs:791/891).
    Handle CodeWhale content-part types text / thinking / tool_use / tool_result; roles
    user / assistant.
  2. Register in sources/mod.rs: pub(crate) mod codewhale; + .register(&codewhale::CODEWHALE)
    in registry(), and add "codewhale" to moraine-config::KNOWN_INGEST_HARNESSES
    (the registry_matches_config_known_harnesses test enforces these agree).
  3. Generalize the synthetic builders build_session_meta_record/build_session_message_record
    (dispatch.rs:883/950) to read CodeWhale's nested identity with Hermes-safe fallbacks:
    session_id ← session_doc.session_id OR metadata.id; model ← session_doc.model OR metadata.model;
    session_start ← session_start OR metadata.created_at; last_updated ← last_updated OR metadata.updated_at.
    Hermes files have no metadata, so it's unaffected. compose_hermes_model is a no-op for CodeWhale
    (no base_url).
  4. Golden fixture + test mirroring tests/hermes_session_fixture.rs, using a redacted real
    CodeWhale session snapshot.
  5. Upstream the adapter to eric-tramel/moraine (PR) so it ships in a Moraine release; keep the
    fork in sync.

Work items (CodeWhale side)

  1. Wire moraine-mcp as a recall tool source (stdio moraine mcp server) via CodeWhale's MCP
    config so agents gain the recall tools.
  2. Config-gate deprecate the memory.rs push/inject (call sites crates/tui/src/core/engine.rs:801
    and :2996 via crate::memory::compose_block; render prompts.rs:1209; already off by default via
    MemoryConfig.memory_enabled=false) in favor of Moraine pull/recall. Update docs/config comments.

Validation

moraine up ingests ~/.codewhale/sessions; MCP list_sessions/search return CodeWhale sessions
with correct session id / model / timestamps; end-to-end recall in a CodeWhale agent turn.

Known risks / gaps

  • CodeWhale session files are whole-file snapshots rewritten on every save; verify the session_json
    "shrank → ignore" guard (dispatch.rs:758) doesn't drop data when a new session has fewer messages
    than a prior one (different file, so likely fine — confirm).
  • CodeWhale messages carry no per-message timestamps (only session created_at/updated_at);
    events need synthetic monotonic ts from session start + message index (like hermes_event_dt).
  • tool_result content-part exact shape needs confirmation (Anthropic-style
    {type, tool_use_id, content}) — handle defensively.
  • Cost/tokens live in metadata.cost / metadata.total_tokens — map into Moraine token accounting if
    desired (non-blocking for recall).

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or requestexternal-memoryExternal memory, context substrate, and long-running agent statev0.8.66Targeting v0.8.66

    Projects

    Status
    Backlog

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions