Distinguish human-typed prompts from tool results in Claude history#68
Open
tony wants to merge 4 commits into
Open
Distinguish human-typed prompts from tool results in Claude history#68tony wants to merge 4 commits into
tony wants to merge 4 commits into
Conversation
tony
added a commit
that referenced
this pull request
Jun 14, 2026
why: PR #68 separated Claude's typed prompts from tool results, but the same flattening happens in Codex (function_call_output) and Grok (tool_use/tool_result), and the human/tool tag was only reachable from the graph engine. Tagging every adapter that carries inline tool-output and exposing it as a `human:` query field lets `agentgrep search`/`grep` and the MCP server filter the user's real asks from tool noise. what: - Add codex_event_is_human_authored() and tag Codex response_item records; function_call/function_call_output/reasoning are non-human. - Tag Grok tool_use/tool_result turns with metadata["human_typed"]=False. - Register a record-layer `human` enum field (true/false) and dispatch it in the query compiler: human:true keeps untagged turns, human:false selects tool/agent output. - Cover the Codex detector, Grok tagging, and the human: predicate.
4 tasks
tony
added a commit
that referenced
this pull request
Jun 14, 2026
why: PR #68 separated Claude's typed prompts from tool results, but the same flattening happens in Codex (function_call_output) and Grok (tool_use/tool_result), and the human/tool tag was only reachable from the graph engine. Tagging every adapter that carries inline tool-output and exposing it as a `human:` query field lets `agentgrep search`/`grep` and the MCP server filter the user's real asks from tool noise. what: - Add codex_event_is_human_authored() and tag Codex response_item records; function_call/function_call_output/reasoning are non-human. - Tag Grok tool_use/tool_result turns with metadata["human_typed"]=False. - Register a record-layer `human` enum field (true/false) and dispatch it in the query compiler: human:true keeps untagged turns, human:false selects tool/agent output. - Cover the Codex detector, Grok tagging, and the human: predicate.
025f037 to
22f79f3
Compare
tony
added a commit
that referenced
this pull request
Jun 14, 2026
…ted transformers LLM defaults why: The insights ladder ended at the L5 narrative summary over a single gated Gemma model. Users had no way to see which prompts they repeat, which past conversations resemble the one in front of them, or which workflows are worth saving as Skills — and the only GPU summary path required an HF token plus an accepted license. This adds the `graph` enrichment level and makes the local-LLM backend work out of the box. what: - Add the `graph` level: a prompt/reply/conversation similarity network (sentence-transformers or model2vec embeddings, optional HDBSCAN archetype clustering, sqlite-vec or LanceDB IVF-PQ vector store) that surfaces recurring asks, forgotten-but-similar conversations, and mined workflows, persisted incrementally in a content-hash-keyed graph store. - Draft reusable Skills (SKILL.md) from mined workflows, print-by-default, also exposed over an `insights_skills` MCP tool. - Add a transformers/CUDA LLM backend with a non-gated default chain that needs no HF token: Phi-4-mini (4-bit, native phi3), SmolLM2-1.7B (fp16), Granite-3.3-2b (4-bit), tried in order until one loads. gemma-3-1b-it stays curated but gated and is no longer the default. 4-bit weights load through bitsandbytes NF4 behind the insights-llm-transformers-quant extra. - Add optional conversation-summary vectors: each conversation is embedded by a cached LLM one-line summary instead of a prompt mean, sharpening forgotten-but-similar. - Bundle the #68 human-typed prompt detection (human: query field, Claude/Codex authored-turn tagging) so the branch hand-tests self-contained; it overlaps with that PR and should be reconciled at merge time.
tony
added a commit
that referenced
this pull request
Jun 22, 2026
why: PR #68 separated Claude's typed prompts from tool results, but the same flattening happens in Codex (function_call_output) and Grok (tool_use/tool_result), and the human/tool tag was only reachable from the graph engine. Tagging every adapter that carries inline tool-output and exposing it as a `human:` query field lets `agentgrep search`/`grep` and the MCP server filter the user's real asks from tool noise. what: - Add codex_event_is_human_authored() and tag Codex response_item records; function_call/function_call_output/reasoning are non-human. - Tag Grok tool_use/tool_result turns with metadata["human_typed"]=False. - Register a record-layer `human` enum field (true/false) and dispatch it in the query compiler: human:true keeps untagged turns, human:false selects tool/agent output. - Cover the Codex detector, Grok tagging, and the human: predicate.
22f79f3 to
ca905ec
Compare
tony
added a commit
that referenced
this pull request
Jun 22, 2026
…ted transformers LLM defaults why: The insights ladder ended at the L5 narrative summary over a single gated Gemma model. Users had no way to see which prompts they repeat, which past conversations resemble the one in front of them, or which workflows are worth saving as Skills — and the only GPU summary path required an HF token plus an accepted license. This adds the `graph` enrichment level and makes the local-LLM backend work out of the box. what: - Add the `graph` level: a prompt/reply/conversation similarity network (sentence-transformers or model2vec embeddings, optional HDBSCAN archetype clustering, sqlite-vec or LanceDB IVF-PQ vector store) that surfaces recurring asks, forgotten-but-similar conversations, and mined workflows, persisted incrementally in a content-hash-keyed graph store. - Draft reusable Skills (SKILL.md) from mined workflows, print-by-default, also exposed over an `insights_skills` MCP tool. - Add a transformers/CUDA LLM backend with a non-gated default chain that needs no HF token: Phi-4-mini (4-bit, native phi3), SmolLM2-1.7B (fp16), Granite-3.3-2b (4-bit), tried in order until one loads. gemma-3-1b-it stays curated but gated and is no longer the default. 4-bit weights load through bitsandbytes NF4 behind the insights-llm-transformers-quant extra. - Add optional conversation-summary vectors: each conversation is embedded by a cached LLM one-line summary instead of a prompt mean, sharpening forgotten-but-similar. - Bundle the #68 human-typed prompt detection (human: query field, Claude/Codex authored-turn tagging) so the branch hand-tests self-contained; it overlaps with that PR and should be reconciled at merge time.
why: Claude Code records tool results and subagent output as type=user messages, so prompt search/grep — and any consumer that treats a user-role turn as a typed prompt — sees pasted command output, git-status dumps, and subagent reports as if the user had typed them. what: - Add claude_event_is_human_authored(): classify a Claude project JSONL event by structure. tool_result/tool_use content blocks, the toolUseResult marker, isSidechain transcripts, non-user event types, and machine-authored string content (slash-command stdout/caveat, the interrupt marker) are not human-authored. - Thread the verdict through parse_claude_project_file into build_search_record(human_typed=...), which records metadata["human_typed"]=False only for non-human turns. The change is additive: a normal record's metadata stays empty, so existing consumers and tests are unaffected.
why: PR #68 separated Claude's typed prompts from tool results, but the same flattening happens in Codex (function_call_output) and Grok (tool_use/tool_result), and the human/tool tag was only reachable from the graph engine. Tagging every adapter that carries inline tool-output and exposing it as a `human:` query field lets `agentgrep search`/`grep` and the MCP server filter the user's real asks from tool noise. what: - Add codex_event_is_human_authored() and tag Codex response_item records; function_call/function_call_output/reasoning are non-human. - Tag Grok tool_use/tool_result turns with metadata["human_typed"]=False. - Register a record-layer `human` enum field (true/false) and dispatch it in the query compiler: human:true keeps untagged turns, human:false selects tool/agent output. - Cover the Codex detector, Grok tagging, and the human: predicate.
…eyed agents why: Only Claude, Codex (JSONL), and Grok distinguished user-typed prompts from tool/assistant output flattened into the prompt stream. The remaining adapters left every turn untagged, so the human: query field and any downstream cleaning saw tool output as if the user had typed it. The Codex *legacy* rollout path was a gap too: it parsed the same response_item shapes as the JSONL path but never tagged them. what: - Add the shared candidate_is_human_typed() helper (role in USER_ROLES) and wire it into the role-keyed parsers: Gemini (chat + legacy), Cursor-CLI transcripts, Cursor-IDE state.vscdb, and Pi sessions. - Tag OpenCode message rows by their joined role (non-user -> human_typed=False). - Tag the Codex legacy rollout path with codex_event_is_human_authored, closing the gap vs the JSONL path. - Antigravity (protobuf / prompt-only stores) carries no role marker to filter on, so it stays untagged by design. - Cover the shared helper and the Cursor-CLI assistant path with tests.
why: The human: distinction was queryable from the CLI but not over MCP — the
search tool had no way to keep user-typed prompts or isolate tool output.
what:
- Add an optional human ("true"|"false") parameter to the MCP search tool and
SearchRequestModel; "true" keeps user-typed turns, "false" keeps tool/agent
output, omitted keeps both.
- Thread it through the page cursor so paginated human-filtered searches stay
consistent.
tony
added a commit
that referenced
this pull request
Jun 27, 2026
…ted transformers LLM defaults why: The insights ladder ended at the L5 narrative summary over a single gated Gemma model. Users had no way to see which prompts they repeat, which past conversations resemble the one in front of them, or which workflows are worth saving as Skills — and the only GPU summary path required an HF token plus an accepted license. This adds the `graph` enrichment level and makes the local-LLM backend work out of the box. what: - Add the `graph` level: a prompt/reply/conversation similarity network (sentence-transformers or model2vec embeddings, optional HDBSCAN archetype clustering, sqlite-vec or LanceDB IVF-PQ vector store) that surfaces recurring asks, forgotten-but-similar conversations, and mined workflows, persisted incrementally in a content-hash-keyed graph store. - Draft reusable Skills (SKILL.md) from mined workflows, print-by-default, also exposed over an `insights_skills` MCP tool. - Add a transformers/CUDA LLM backend with a non-gated default chain that needs no HF token: Phi-4-mini (4-bit, native phi3), SmolLM2-1.7B (fp16), Granite-3.3-2b (4-bit), tried in order until one loads. gemma-3-1b-it stays curated but gated and is no longer the default. 4-bit weights load through bitsandbytes NF4 behind the insights-llm-transformers-quant extra. - Add optional conversation-summary vectors: each conversation is embedded by a cached LLM one-line summary instead of a prompt mean, sharpening forgotten-but-similar. - Bundle the #68 human-typed prompt detection (human: query field, Claude/Codex authored-turn tagging) so the branch hand-tests self-contained; it overlaps with that PR and should be reconciled at merge time.
ca905ec to
ea37d76
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tool_resultblocks,<local-command-stdout>, the[Request interrupted]marker, and subagent transcripts astype=usermessages, so a user-role turn is not reliably a typed prompt.claude_event_is_human_authored(), which classifies a Claude project JSONL event by structure (tool_result/tool_useblocks, thetoolUseResultmarker,isSidechain, non-userevent types, and machine-authored string content).parse_claude_project_fileintobuild_search_record(human_typed=...), which setsmetadata["human_typed"]=Falseonly for non-human turns.This is additive: a normal record's
metadatastays empty, so existing consumers and tests are unchanged — the new signal is opt-in for anything (search ranking, analytics, the in-progress insights work) that wants to filter to the user's actual asks.On a real history sample, 397 of 436 user-role records were tool/sidechain noise rather than typed prompts; the tag makes that separable.
Test plan
tests/test_claude_human_typed.py— parametrized cases for string prompts, text-block prompts,tool_result/tool_useblocks,isSidechain, assistant events,<local-command-stdout>, and the list-form[Request interrupted]marker, plus the additive-metadata behavior ofbuild_search_record.ruff check,ruff format,ty checkclean.