Skip to content

Distinguish human-typed prompts from tool results in Claude history#68

Open
tony wants to merge 4 commits into
masterfrom
agentgrep-human-typed-prompts
Open

Distinguish human-typed prompts from tool results in Claude history#68
tony wants to merge 4 commits into
masterfrom
agentgrep-human-typed-prompts

Conversation

@tony

@tony tony commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Summary

  • Fix the Claude Code adapter so prompt search no longer treats tool results and subagent output as human-typed prompts. Claude records tool_result blocks, <local-command-stdout>, the [Request interrupted] marker, and subagent transcripts as type=user messages, so a user-role turn is not reliably a typed prompt.
  • Add claude_event_is_human_authored(), which classifies a Claude project JSONL event by structure (tool_result/tool_use blocks, the toolUseResult marker, isSidechain, non-user event types, and machine-authored string content).
  • Thread the verdict through parse_claude_project_file into build_search_record(human_typed=...), which sets metadata["human_typed"]=False only for non-human turns.

This is additive: a normal record's metadata stays empty, so existing consumers and tests are unchanged — the new signal is opt-in for anything (search ranking, analytics, the in-progress insights work) that wants to filter to the user's actual asks.

On a real history sample, 397 of 436 user-role records were tool/sidechain noise rather than typed prompts; the tag makes that separable.

Test plan

  • tests/test_claude_human_typed.py — parametrized cases for string prompts, text-block prompts, tool_result/tool_use blocks, isSidechain, assistant events, <local-command-stdout>, and the list-form [Request interrupted] marker, plus the additive-metadata behavior of build_search_record.
  • ruff check, ruff format, ty check clean.
  • Existing Claude parser tests pass (the change is additive).

tony added a commit that referenced this pull request Jun 14, 2026
why: PR #68 separated Claude's typed prompts from tool results, but the
same flattening happens in Codex (function_call_output) and Grok
(tool_use/tool_result), and the human/tool tag was only reachable from
the graph engine. Tagging every adapter that carries inline tool-output
and exposing it as a `human:` query field lets `agentgrep
search`/`grep` and the MCP server filter the user's real asks from tool
noise.

what:
- Add codex_event_is_human_authored() and tag Codex response_item
  records; function_call/function_call_output/reasoning are non-human.
- Tag Grok tool_use/tool_result turns with metadata["human_typed"]=False.
- Register a record-layer `human` enum field (true/false) and dispatch
  it in the query compiler: human:true keeps untagged turns, human:false
  selects tool/agent output.
- Cover the Codex detector, Grok tagging, and the human: predicate.
tony added a commit that referenced this pull request Jun 14, 2026
why: PR #68 separated Claude's typed prompts from tool results, but the
same flattening happens in Codex (function_call_output) and Grok
(tool_use/tool_result), and the human/tool tag was only reachable from
the graph engine. Tagging every adapter that carries inline tool-output
and exposing it as a `human:` query field lets `agentgrep
search`/`grep` and the MCP server filter the user's real asks from tool
noise.

what:
- Add codex_event_is_human_authored() and tag Codex response_item
  records; function_call/function_call_output/reasoning are non-human.
- Tag Grok tool_use/tool_result turns with metadata["human_typed"]=False.
- Register a record-layer `human` enum field (true/false) and dispatch
  it in the query compiler: human:true keeps untagged turns, human:false
  selects tool/agent output.
- Cover the Codex detector, Grok tagging, and the human: predicate.
@tony tony force-pushed the agentgrep-human-typed-prompts branch from 025f037 to 22f79f3 Compare June 14, 2026 20:32
tony added a commit that referenced this pull request Jun 14, 2026
…ted transformers LLM defaults

why: The insights ladder ended at the L5 narrative summary over a single
gated Gemma model. Users had no way to see which prompts they repeat,
which past conversations resemble the one in front of them, or which
workflows are worth saving as Skills — and the only GPU summary path
required an HF token plus an accepted license. This adds the `graph`
enrichment level and makes the local-LLM backend work out of the box.

what:
- Add the `graph` level: a prompt/reply/conversation similarity network
  (sentence-transformers or model2vec embeddings, optional HDBSCAN
  archetype clustering, sqlite-vec or LanceDB IVF-PQ vector store) that
  surfaces recurring asks, forgotten-but-similar conversations, and mined
  workflows, persisted incrementally in a content-hash-keyed graph store.
- Draft reusable Skills (SKILL.md) from mined workflows, print-by-default,
  also exposed over an `insights_skills` MCP tool.
- Add a transformers/CUDA LLM backend with a non-gated default chain that
  needs no HF token: Phi-4-mini (4-bit, native phi3), SmolLM2-1.7B (fp16),
  Granite-3.3-2b (4-bit), tried in order until one loads. gemma-3-1b-it
  stays curated but gated and is no longer the default. 4-bit weights load
  through bitsandbytes NF4 behind the insights-llm-transformers-quant extra.
- Add optional conversation-summary vectors: each conversation is embedded
  by a cached LLM one-line summary instead of a prompt mean, sharpening
  forgotten-but-similar.
- Bundle the #68 human-typed prompt detection (human: query field,
  Claude/Codex authored-turn tagging) so the branch hand-tests
  self-contained; it overlaps with that PR and should be reconciled at
  merge time.
tony added a commit that referenced this pull request Jun 22, 2026
why: PR #68 separated Claude's typed prompts from tool results, but the
same flattening happens in Codex (function_call_output) and Grok
(tool_use/tool_result), and the human/tool tag was only reachable from
the graph engine. Tagging every adapter that carries inline tool-output
and exposing it as a `human:` query field lets `agentgrep
search`/`grep` and the MCP server filter the user's real asks from tool
noise.

what:
- Add codex_event_is_human_authored() and tag Codex response_item
  records; function_call/function_call_output/reasoning are non-human.
- Tag Grok tool_use/tool_result turns with metadata["human_typed"]=False.
- Register a record-layer `human` enum field (true/false) and dispatch
  it in the query compiler: human:true keeps untagged turns, human:false
  selects tool/agent output.
- Cover the Codex detector, Grok tagging, and the human: predicate.
@tony tony force-pushed the agentgrep-human-typed-prompts branch from 22f79f3 to ca905ec Compare June 22, 2026 11:41
tony added a commit that referenced this pull request Jun 22, 2026
…ted transformers LLM defaults

why: The insights ladder ended at the L5 narrative summary over a single
gated Gemma model. Users had no way to see which prompts they repeat,
which past conversations resemble the one in front of them, or which
workflows are worth saving as Skills — and the only GPU summary path
required an HF token plus an accepted license. This adds the `graph`
enrichment level and makes the local-LLM backend work out of the box.

what:
- Add the `graph` level: a prompt/reply/conversation similarity network
  (sentence-transformers or model2vec embeddings, optional HDBSCAN
  archetype clustering, sqlite-vec or LanceDB IVF-PQ vector store) that
  surfaces recurring asks, forgotten-but-similar conversations, and mined
  workflows, persisted incrementally in a content-hash-keyed graph store.
- Draft reusable Skills (SKILL.md) from mined workflows, print-by-default,
  also exposed over an `insights_skills` MCP tool.
- Add a transformers/CUDA LLM backend with a non-gated default chain that
  needs no HF token: Phi-4-mini (4-bit, native phi3), SmolLM2-1.7B (fp16),
  Granite-3.3-2b (4-bit), tried in order until one loads. gemma-3-1b-it
  stays curated but gated and is no longer the default. 4-bit weights load
  through bitsandbytes NF4 behind the insights-llm-transformers-quant extra.
- Add optional conversation-summary vectors: each conversation is embedded
  by a cached LLM one-line summary instead of a prompt mean, sharpening
  forgotten-but-similar.
- Bundle the #68 human-typed prompt detection (human: query field,
  Claude/Codex authored-turn tagging) so the branch hand-tests
  self-contained; it overlaps with that PR and should be reconciled at
  merge time.
tony added 4 commits June 27, 2026 12:32
why: Claude Code records tool results and subagent output as type=user
messages, so prompt search/grep — and any consumer that treats a user-role
turn as a typed prompt — sees pasted command output, git-status dumps, and
subagent reports as if the user had typed them.

what:
- Add claude_event_is_human_authored(): classify a Claude project JSONL
  event by structure. tool_result/tool_use content blocks, the
  toolUseResult marker, isSidechain transcripts, non-user event types, and
  machine-authored string content (slash-command stdout/caveat, the
  interrupt marker) are not human-authored.
- Thread the verdict through parse_claude_project_file into
  build_search_record(human_typed=...), which records
  metadata["human_typed"]=False only for non-human turns. The change is
  additive: a normal record's metadata stays empty, so existing consumers
  and tests are unaffected.
why: PR #68 separated Claude's typed prompts from tool results, but the
same flattening happens in Codex (function_call_output) and Grok
(tool_use/tool_result), and the human/tool tag was only reachable from
the graph engine. Tagging every adapter that carries inline tool-output
and exposing it as a `human:` query field lets `agentgrep
search`/`grep` and the MCP server filter the user's real asks from tool
noise.

what:
- Add codex_event_is_human_authored() and tag Codex response_item
  records; function_call/function_call_output/reasoning are non-human.
- Tag Grok tool_use/tool_result turns with metadata["human_typed"]=False.
- Register a record-layer `human` enum field (true/false) and dispatch
  it in the query compiler: human:true keeps untagged turns, human:false
  selects tool/agent output.
- Cover the Codex detector, Grok tagging, and the human: predicate.
…eyed agents

why: Only Claude, Codex (JSONL), and Grok distinguished user-typed prompts
from tool/assistant output flattened into the prompt stream. The remaining
adapters left every turn untagged, so the human: query field and any
downstream cleaning saw tool output as if the user had typed it. The Codex
*legacy* rollout path was a gap too: it parsed the same response_item shapes
as the JSONL path but never tagged them.

what:
- Add the shared candidate_is_human_typed() helper (role in USER_ROLES) and
  wire it into the role-keyed parsers: Gemini (chat + legacy), Cursor-CLI
  transcripts, Cursor-IDE state.vscdb, and Pi sessions.
- Tag OpenCode message rows by their joined role (non-user -> human_typed=False).
- Tag the Codex legacy rollout path with codex_event_is_human_authored, closing
  the gap vs the JSONL path.
- Antigravity (protobuf / prompt-only stores) carries no role marker to filter
  on, so it stays untagged by design.
- Cover the shared helper and the Cursor-CLI assistant path with tests.
why: The human: distinction was queryable from the CLI but not over MCP — the
search tool had no way to keep user-typed prompts or isolate tool output.

what:
- Add an optional human ("true"|"false") parameter to the MCP search tool and
  SearchRequestModel; "true" keeps user-typed turns, "false" keeps tool/agent
  output, omitted keeps both.
- Thread it through the page cursor so paginated human-filtered searches stay
  consistent.
tony added a commit that referenced this pull request Jun 27, 2026
…ted transformers LLM defaults

why: The insights ladder ended at the L5 narrative summary over a single
gated Gemma model. Users had no way to see which prompts they repeat,
which past conversations resemble the one in front of them, or which
workflows are worth saving as Skills — and the only GPU summary path
required an HF token plus an accepted license. This adds the `graph`
enrichment level and makes the local-LLM backend work out of the box.

what:
- Add the `graph` level: a prompt/reply/conversation similarity network
  (sentence-transformers or model2vec embeddings, optional HDBSCAN
  archetype clustering, sqlite-vec or LanceDB IVF-PQ vector store) that
  surfaces recurring asks, forgotten-but-similar conversations, and mined
  workflows, persisted incrementally in a content-hash-keyed graph store.
- Draft reusable Skills (SKILL.md) from mined workflows, print-by-default,
  also exposed over an `insights_skills` MCP tool.
- Add a transformers/CUDA LLM backend with a non-gated default chain that
  needs no HF token: Phi-4-mini (4-bit, native phi3), SmolLM2-1.7B (fp16),
  Granite-3.3-2b (4-bit), tried in order until one loads. gemma-3-1b-it
  stays curated but gated and is no longer the default. 4-bit weights load
  through bitsandbytes NF4 behind the insights-llm-transformers-quant extra.
- Add optional conversation-summary vectors: each conversation is embedded
  by a cached LLM one-line summary instead of a prompt mean, sharpening
  forgotten-but-similar.
- Bundle the #68 human-typed prompt detection (human: query field,
  Claude/Codex authored-turn tagging) so the branch hand-tests
  self-contained; it overlaps with that PR and should be reconciled at
  merge time.
@tony tony force-pushed the agentgrep-human-typed-prompts branch from ca905ec to ea37d76 Compare June 27, 2026 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant