Distinguish human-typed prompts from tool results in Claude history by tony · Pull Request #68 · tony/agentgrep

tony · 2026-06-14T13:06:48Z

Summary

Fix the Claude Code adapter so prompt search no longer treats tool results and subagent output as human-typed prompts. Claude records tool_result blocks, <local-command-stdout>, the [Request interrupted] marker, and subagent transcripts as type=user messages, so a user-role turn is not reliably a typed prompt.
Add claude_event_is_human_authored(), which classifies a Claude project JSONL event by structure (tool_result/tool_use blocks, the toolUseResult marker, isSidechain, non-user event types, and machine-authored string content).
Thread the verdict through parse_claude_project_file into build_search_record(human_typed=...), which sets metadata["human_typed"]=False only for non-human turns.

This is additive: a normal record's metadata stays empty, so existing consumers and tests are unchanged — the new signal is opt-in for anything (search ranking, analytics, the in-progress insights work) that wants to filter to the user's actual asks.

On a real history sample, 397 of 436 user-role records were tool/sidechain noise rather than typed prompts; the tag makes that separable.

Test plan

tests/test_claude_human_typed.py — parametrized cases for string prompts, text-block prompts, tool_result/tool_use blocks, isSidechain, assistant events, <local-command-stdout>, and the list-form [Request interrupted] marker, plus the additive-metadata behavior of build_search_record.
ruff check, ruff format, ty check clean.
Existing Claude parser tests pass (the change is additive).

why: PR #68 separated Claude's typed prompts from tool results, but the same flattening happens in Codex (function_call_output) and Grok (tool_use/tool_result), and the human/tool tag was only reachable from the graph engine. Tagging every adapter that carries inline tool-output and exposing it as a `human:` query field lets `agentgrep search`/`grep` and the MCP server filter the user's real asks from tool noise. what: - Add codex_event_is_human_authored() and tag Codex response_item records; function_call/function_call_output/reasoning are non-human. - Tag Grok tool_use/tool_result turns with metadata["human_typed"]=False. - Register a record-layer `human` enum field (true/false) and dispatch it in the query compiler: human:true keeps untagged turns, human:false selects tool/agent output. - Cover the Codex detector, Grok tagging, and the human: predicate.

…ted transformers LLM defaults why: The insights ladder ended at the L5 narrative summary over a single gated Gemma model. Users had no way to see which prompts they repeat, which past conversations resemble the one in front of them, or which workflows are worth saving as Skills — and the only GPU summary path required an HF token plus an accepted license. This adds the `graph` enrichment level and makes the local-LLM backend work out of the box. what: - Add the `graph` level: a prompt/reply/conversation similarity network (sentence-transformers or model2vec embeddings, optional HDBSCAN archetype clustering, sqlite-vec or LanceDB IVF-PQ vector store) that surfaces recurring asks, forgotten-but-similar conversations, and mined workflows, persisted incrementally in a content-hash-keyed graph store. - Draft reusable Skills (SKILL.md) from mined workflows, print-by-default, also exposed over an `insights_skills` MCP tool. - Add a transformers/CUDA LLM backend with a non-gated default chain that needs no HF token: Phi-4-mini (4-bit, native phi3), SmolLM2-1.7B (fp16), Granite-3.3-2b (4-bit), tried in order until one loads. gemma-3-1b-it stays curated but gated and is no longer the default. 4-bit weights load through bitsandbytes NF4 behind the insights-llm-transformers-quant extra. - Add optional conversation-summary vectors: each conversation is embedded by a cached LLM one-line summary instead of a prompt mean, sharpening forgotten-but-similar. - Bundle the #68 human-typed prompt detection (human: query field, Claude/Codex authored-turn tagging) so the branch hand-tests self-contained; it overlaps with that PR and should be reconciled at merge time.

why: PR #68 separated Claude's typed prompts from tool results, but the same flattening happens in Codex (function_call_output) and Grok (tool_use/tool_result), and the human/tool tag was only reachable from the graph engine. Tagging every adapter that carries inline tool-output and exposing it as a `human:` query field lets `agentgrep search`/`grep` and the MCP server filter the user's real asks from tool noise. what: - Add codex_event_is_human_authored() and tag Codex response_item records; function_call/function_call_output/reasoning are non-human. - Tag Grok tool_use/tool_result turns with metadata["human_typed"]=False. - Register a record-layer `human` enum field (true/false) and dispatch it in the query compiler: human:true keeps untagged turns, human:false selects tool/agent output. - Cover the Codex detector, Grok tagging, and the human: predicate.

…ted transformers LLM defaults why: The insights ladder ended at the L5 narrative summary over a single gated Gemma model. Users had no way to see which prompts they repeat, which past conversations resemble the one in front of them, or which workflows are worth saving as Skills — and the only GPU summary path required an HF token plus an accepted license. This adds the `graph` enrichment level and makes the local-LLM backend work out of the box. what: - Add the `graph` level: a prompt/reply/conversation similarity network (sentence-transformers or model2vec embeddings, optional HDBSCAN archetype clustering, sqlite-vec or LanceDB IVF-PQ vector store) that surfaces recurring asks, forgotten-but-similar conversations, and mined workflows, persisted incrementally in a content-hash-keyed graph store. - Draft reusable Skills (SKILL.md) from mined workflows, print-by-default, also exposed over an `insights_skills` MCP tool. - Add a transformers/CUDA LLM backend with a non-gated default chain that needs no HF token: Phi-4-mini (4-bit, native phi3), SmolLM2-1.7B (fp16), Granite-3.3-2b (4-bit), tried in order until one loads. gemma-3-1b-it stays curated but gated and is no longer the default. 4-bit weights load through bitsandbytes NF4 behind the insights-llm-transformers-quant extra. - Add optional conversation-summary vectors: each conversation is embedded by a cached LLM one-line summary instead of a prompt mean, sharpening forgotten-but-similar. - Bundle the #68 human-typed prompt detection (human: query field, Claude/Codex authored-turn tagging) so the branch hand-tests self-contained; it overlaps with that PR and should be reconciled at merge time.

why: Claude Code records tool results and subagent output as type=user messages, so prompt search/grep — and any consumer that treats a user-role turn as a typed prompt — sees pasted command output, git-status dumps, and subagent reports as if the user had typed them. what: - Add claude_event_is_human_authored(): classify a Claude project JSONL event by structure. tool_result/tool_use content blocks, the toolUseResult marker, isSidechain transcripts, non-user event types, and machine-authored string content (slash-command stdout/caveat, the interrupt marker) are not human-authored. - Thread the verdict through parse_claude_project_file into build_search_record(human_typed=...), which records metadata["human_typed"]=False only for non-human turns. The change is additive: a normal record's metadata stays empty, so existing consumers and tests are unaffected.

why: PR #68 separated Claude's typed prompts from tool results, but the same flattening happens in Codex (function_call_output) and Grok (tool_use/tool_result), and the human/tool tag was only reachable from the graph engine. Tagging every adapter that carries inline tool-output and exposing it as a `human:` query field lets `agentgrep search`/`grep` and the MCP server filter the user's real asks from tool noise. what: - Add codex_event_is_human_authored() and tag Codex response_item records; function_call/function_call_output/reasoning are non-human. - Tag Grok tool_use/tool_result turns with metadata["human_typed"]=False. - Register a record-layer `human` enum field (true/false) and dispatch it in the query compiler: human:true keeps untagged turns, human:false selects tool/agent output. - Cover the Codex detector, Grok tagging, and the human: predicate.

…eyed agents why: Only Claude, Codex (JSONL), and Grok distinguished user-typed prompts from tool/assistant output flattened into the prompt stream. The remaining adapters left every turn untagged, so the human: query field and any downstream cleaning saw tool output as if the user had typed it. The Codex *legacy* rollout path was a gap too: it parsed the same response_item shapes as the JSONL path but never tagged them. what: - Add the shared candidate_is_human_typed() helper (role in USER_ROLES) and wire it into the role-keyed parsers: Gemini (chat + legacy), Cursor-CLI transcripts, Cursor-IDE state.vscdb, and Pi sessions. - Tag OpenCode message rows by their joined role (non-user -> human_typed=False). - Tag the Codex legacy rollout path with codex_event_is_human_authored, closing the gap vs the JSONL path. - Antigravity (protobuf / prompt-only stores) carries no role marker to filter on, so it stays untagged by design. - Cover the shared helper and the Cursor-CLI assistant path with tests.

why: The human: distinction was queryable from the CLI but not over MCP — the search tool had no way to keep user-typed prompts or isolate tool output. what: - Add an optional human ("true"|"false") parameter to the MCP search tool and SearchRequestModel; "true" keeps user-typed turns, "false" keeps tool/agent output, omitted keeps both. - Thread it through the page cursor so paginated human-filtered searches stay consistent.

…ted transformers LLM defaults why: The insights ladder ended at the L5 narrative summary over a single gated Gemma model. Users had no way to see which prompts they repeat, which past conversations resemble the one in front of them, or which workflows are worth saving as Skills — and the only GPU summary path required an HF token plus an accepted license. This adds the `graph` enrichment level and makes the local-LLM backend work out of the box. what: - Add the `graph` level: a prompt/reply/conversation similarity network (sentence-transformers or model2vec embeddings, optional HDBSCAN archetype clustering, sqlite-vec or LanceDB IVF-PQ vector store) that surfaces recurring asks, forgotten-but-similar conversations, and mined workflows, persisted incrementally in a content-hash-keyed graph store. - Draft reusable Skills (SKILL.md) from mined workflows, print-by-default, also exposed over an `insights_skills` MCP tool. - Add a transformers/CUDA LLM backend with a non-gated default chain that needs no HF token: Phi-4-mini (4-bit, native phi3), SmolLM2-1.7B (fp16), Granite-3.3-2b (4-bit), tried in order until one loads. gemma-3-1b-it stays curated but gated and is no longer the default. 4-bit weights load through bitsandbytes NF4 behind the insights-llm-transformers-quant extra. - Add optional conversation-summary vectors: each conversation is embedded by a cached LLM one-line summary instead of a prompt mean, sharpening forgotten-but-similar. - Bundle the #68 human-typed prompt detection (human: query field, Claude/Codex authored-turn tagging) so the branch hand-tests self-contained; it overlaps with that PR and should be reconciled at merge time.

tony temporarily deployed to docs June 14, 2026 13:06 — with GitHub Actions Inactive

tony temporarily deployed to docs June 14, 2026 13:46 — with GitHub Actions Inactive

tony mentioned this pull request Jun 14, 2026

Insights graph similarity engine + non-gated transformers LLM defaults #69

Open

4 tasks

tony force-pushed the agentgrep-human-typed-prompts branch from 025f037 to 22f79f3 Compare June 14, 2026 20:32

tony temporarily deployed to docs June 14, 2026 20:32 — with GitHub Actions Inactive

tony force-pushed the agentgrep-human-typed-prompts branch from 22f79f3 to ca905ec Compare June 22, 2026 11:41

tony temporarily deployed to docs June 22, 2026 11:41 — with GitHub Actions Inactive

tony added 4 commits June 27, 2026 12:32

tony force-pushed the agentgrep-human-typed-prompts branch from ca905ec to ea37d76 Compare June 27, 2026 18:47

tony temporarily deployed to docs June 27, 2026 18:47 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distinguish human-typed prompts from tool results in Claude history#68

Distinguish human-typed prompts from tool results in Claude history#68
tony wants to merge 4 commits into
masterfrom
agentgrep-human-typed-prompts

tony commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tony commented Jun 14, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant