Skip to content

Agent tracer missing context for effective debugging #45

@G9000

Description

@G9000

Problem

While debugging agent memory issues (#41-#44), the tracer lacked critical context that would have sped up diagnosis significantly.

Current state

The tracer captures: message history (with previews), tool call args + returns, per-step timing, error flags, done event with model/provider info. This is a good foundation.

Missing context

1. Tool schemas not captured

We see allowedTools names but not the JSON schemas. When diagnosing #42 (GPT-4o skipping thinking kwarg), we couldn't tell from the trace alone whether thinking was actually marked required in the schema.

Proposed: Add a tool_schemas event on the first step with the full JSON schema for each tool.

2. TTFT always null

ttftMs is null on every step across all traces examined. The streaming adapter likely isn't emitting the first-token timestamp.

Proposed: Fix TTFT capture in the streaming adapter.

3. No memory state snapshot

There's no visibility into what the agent's core memory (persona, human blocks) looks like at the start of a turn. We had to infer that human memory was empty from recall_memory returning nothing.

Proposed: Add a memory_state event at turn start with a snapshot of core memory blocks.

4. Tool success/failure not semantically tagged

save_to_memory returned isError: false with message "Could not promote note — not found or duplicate". The tracer has no way to distinguish tool-level success from logical failure, making automated failure detection impossible.

Proposed: Add a tool_succeeded field derived from response content, or require tools to return structured success/failure signals.

5. Reasoning not captured

reasoningCaptured: false on every step. We can't see why the model chose save_to_memory over update_human_memory three times in a row.

Proposed: Capture reasoning/thinking content when available from the model response.

6. No search internals for recall_memory

When recall_memory returns empty, we don't know which search paths were attempted (hybrid/embedding, keyword, episode scan) or if any threw silent exceptions.

Proposed: Include search path details in the tool return or as a separate diagnostic event.

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions