Skip to content

feat: llm exploration#105

Open
ArtemisMucaj wants to merge 8 commits intomainfrom
claude/llm-onnx-exploration-LWsBv
Open

feat: llm exploration#105
ArtemisMucaj wants to merge 8 commits intomainfrom
claude/llm-onnx-exploration-LWsBv

Conversation

@ArtemisMucaj
Copy link
Owner

@ArtemisMucaj ArtemisMucaj commented Mar 5, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added "Explain" command to analyze code symbols with AI-powered insights (supports Anthropic and OpenAI backend selection)
    • Added ability to retrieve code chunks by file path
  • Improvements

    • Impact analysis now performs unrestricted traversal without depth limitations
    • Simplified Impact command by removing the depth parameter

claude added 7 commits March 5, 2026 09:15
Introduces `codesearch explain <symbol>` which combines impact analysis
with an LLM to produce a structured explanation of a symbol's purpose,
data flow, and business requirements.

How it works:
- Runs BFS impact analysis (same engine as `impact`) to build the call
  graph up to a configurable depth (default: 3).
- Resolves the root symbol's source file via its callees, then reads a
  40-line window around each reference site directly from disk.
- Sends a structured Markdown prompt — root source + depth-1 caller
  sources + summary of deeper nodes — to the configured LLM backend.
- Supports both Anthropic (/v1/messages) and OpenAI-compatible
  (/v1/chat/completions, e.g. LM Studio) endpoints via `--llm`.
- Opens the database read-only (no write lock) so it can run alongside
  concurrent indexing or search processes.

New files:
  src/connector/api/controller/explain_controller.rs

Changed files:
  src/cli/mod.rs                           — Explain command variant
  src/connector/api/controller/mod.rs      — export ExplainController
  src/connector/api/router.rs              — route Commands::Explain
  src/main.rs                              — read-only flag for Explain

Usage:
  codesearch explain authenticate
  codesearch explain MyStruct::new --depth 5 --llm open-ai
  codesearch explain process_payment --repository my-repo

https://claude.ai/code/session_01494XuGs5Ez5SvRxtaN8dFy
Add a new trait method for retrieving all chunks for a given file path
without performing a similarity search. Useful for snippet-lookup use
cases (e.g. TUI reference navigation).

- Trait default: no-op returning empty vec for backwards compatibility
- DuckDB adapter: SQL query with optional repository_id filter, ordered by start_line
- InMemory adapter: HashMap filter with matching sort
Align method ordering in DuckdbVectorRepository impl block with the
convention established in claude/fix-duckdb-tui-state-scA0C: flush and
count precede find_chunks_by_file.
The enum controls which LLM provider is used for any LLM call (query
expansion, explain command, etc.), so the narrower name was misleading.

- cli: QueryExpansionTarget → LlmTarget
- container: query_expansion_target field → llm_target
- main: expand_query_target field → llm_target
- lib: update re-export
- explain_controller: update import and match arms
Drop the max_depth cap so the call graph walk runs until all reachable
callers are visited. Removes the --depth flag from the impact and
explain CLI commands and the depth field from the MCP ImpactToolInput
schema.
Replace the flat depth-1-only snippet approach with path-based source
gathering. reconstruct_paths() traces each leaf back to the root via
via_symbol links (same algorithm as ImpactController), producing one
Vec<&ImpactNode> per call chain ordered outermost-caller-first.

build_prompt() then:
- Collects unique symbols across all paths and reads a source window
  for each (capped at MAX_UNIQUE_SYMBOLS_WITH_SOURCE=20 to bound prompt
  size; no depth limit is applied to the traversal itself).
- Renders every path as a chain header (A → B → … → root_symbol) with
  an inline source block per node, giving the LLM full context for each
  call chain rather than just the first five depth-1 callers.
Read source for every unique symbol across all call paths with no
artificial limit. The exploration is now fully unbounded end to end.
@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Walkthrough

The PR introduces a file-scoped chunk lookup capability via a new VectorRepository method, removes depth constraints from impact analysis traversal, renames the LLM-provider enum for generality, adds a new Explain command with an accompanying ExplainController for LLM-based call-flow analysis, and removes depth parameters from the CLI Impact command and related method signatures.

Changes

Cohort / File(s) Summary
Vector Repository Interface & Adapters
src/application/interfaces/vector_repository.rs, src/connector/adapter/duckdb_vector_repository.rs, src/connector/adapter/in_memory_vector_repository.rs
Added new find_chunks_by_file method to retrieve code chunks filtered by file path. DuckDB adapter uses conditional SQL queries based on repository_id presence; in-memory adapter filters and sorts by start_line.
Impact Analysis & Traversal
src/application/use_cases/impact_analysis.rs, src/connector/adapter/mcp/server.rs, src/connector/api/controller/impact_controller.rs
Removed max_depth parameter from analyze method signature and all call sites, eliminating traversal depth limits. Depth field also removed from ImpactToolInput struct.
CLI & Command Definitions
src/cli/mod.rs, src/lib.rs, src/main.rs
Renamed QueryExpansionTarget enum to LlmTarget; removed depth field from Impact command; added new Explain command variant with symbol, repository, and llm fields; updated public re-exports.
LLM Configuration & Container
src/connector/api/container.rs
Updated ContainerConfig field from query_expansion_target to llm_target and adjusted routing logic to match renamed enum variants.
Explain Feature
src/connector/api/controller/explain_controller.rs, src/connector/api/controller/mod.rs
New ExplainController (257 lines) implementing call-flow analysis via LLM. Includes impact analysis, path reconstruction, source window extraction, and prompt construction with call path details. Also introduced helper functions for path traversal and prompt building.
Router & Command Dispatch
src/connector/api/router.rs
Added ExplainController field to Router; integrated new Commands::Explain branch to route explain requests; removed depth argument from Impact command dispatch.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI/Router
    participant EC as ExplainController
    participant IA as ImpactAnalysis
    participant VR as VectorRepository
    participant LLM as LLM Service
    
    CLI->>EC: explain(symbol, repo, llm_target)
    EC->>IA: analyze(symbol, repo)
    IA->>VR: query for symbol calls
    VR-->>IA: return call graph
    IA-->>EC: return ImpactAnalysis
    
    alt No callers found
        EC-->>CLI: return "no callers" message
    else Callers found
        EC->>VR: find_chunks_by_file(repo, file)
        VR-->>EC: return code chunks
        EC->>EC: reconstruct_paths(impact_nodes)
        EC->>EC: build_prompt(symbol, paths, sources)
        EC->>LLM: query with prompt + system context
        LLM-->>EC: return explanation
        EC->>EC: format with metrics
        EC-->>CLI: return formatted explanation
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • PR #87: Modifies impact analysis traversal logic in src/application/use_cases/impact_analysis.rs (augments ImpactNode with line and via_symbol fields) — directly related to the depth-removal and traversal changes.
  • PR #99: Updates CLI enum from QueryExpansionTarget to a more generic target and modifies ContainerConfig — shares the same enum rename and field refactoring pattern.
  • PR #48: Adds initial max_depth parameter to ImpactAnalysisUseCase::analyze and modifies VectorRepository — directly superseded by the depth-removal in this PR.

Poem

🐰 A new Explain sprouts forth with paths so clear,
Depth limits shed, traversal roams without fear,
LlmTarget's name more broadly does apply,
Chunks by file now fetchable on high,
The call-flow whispers secrets to the sky! 🌙

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'feat: llm exploration' is vague and generic, using non-descriptive terms that don't convey the specific changes made in the pull request. Consider using a more descriptive title that captures the main feature being added, such as 'feat: add explain controller for LLM-based code analysis' or 'feat: introduce symbol explanation with LLM integration'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/llm-onnx-exploration-LWsBv

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/connector/api/controller/explain_controller.rs`:
- Around line 139-143: The current maps (node_by_depth_symbol and the similar
map at lines ~199-213) use keys of (node.depth, node.symbol) which can collide
across files/repos; change the key to include a unique source identifier (e.g.
node.source_id, node.file_path, repo_id or node.id) so keys become (node.depth,
node.symbol.as_str(), node.source_id.as_str()) or equivalent unique field;
update the HashMap type signatures and all .entry(...) calls and lookups (both
where node_by_depth_symbol is built and in the later map at ~199-213) to use
that expanded tuple key so path reconstruction and snippet lookup use the full
identity rather than symbol+depth only.
- Around line 75-79: The call to
call_graph.find_callees(...).await.unwrap_or_default() in explain_controller.rs
is swallowing errors; change it to capture the Result, log the error with
tracing::warn! or tracing::error! (including the error value and context like
analysis.root_symbol and cg_query) and then fallback to an empty Vec only after
logging; locate the invocation around the variable callees (the
call_graph.find_callees call) and replace the unwrap_or_default() with explicit
error handling (e.g., match or if let Err(e) = ...) that logs the error before
assigning the default.
- Around line 252-256: Clamp the computed center index to the valid range before
slicing: when computing `center` from `center_line` (the existing `let center =
center_line.saturating_sub(1) as usize`), replace that with a clamped value
using `lines.len().saturating_sub(1)` to ensure `center <= last_index`; then
recompute `start` and `end` (as you already do) and add an early-return check
(e.g., return None) if `start >= end` to avoid slicing with out-of-range bounds
in the function that builds the source window (the block using `center`, `half`,
`start`, `end`, and `lines[start..end].join("\n")`).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fd915bd3-c7e4-43de-bab1-d2203c0313a0

📥 Commits

Reviewing files that changed from the base of the PR and between 8230ffd and 635ee6e.

📒 Files selected for processing (13)
  • src/application/interfaces/vector_repository.rs
  • src/application/use_cases/impact_analysis.rs
  • src/cli/mod.rs
  • src/connector/adapter/duckdb_vector_repository.rs
  • src/connector/adapter/in_memory_vector_repository.rs
  • src/connector/adapter/mcp/server.rs
  • src/connector/api/container.rs
  • src/connector/api/controller/explain_controller.rs
  • src/connector/api/controller/impact_controller.rs
  • src/connector/api/controller/mod.rs
  • src/connector/api/router.rs
  • src/lib.rs
  • src/main.rs
💤 Files with no reviewable changes (1)
  • src/application/use_cases/impact_analysis.rs

…urce window

- Expand node_by_depth_symbol key from (depth, symbol) to
  (depth, symbol, repository_id) so the same symbol name at the same
  depth in different repositories no longer clobbers each other.
- Expand source_cache and dedup seen-set keys from symbol to
  (symbol, file_path) for the same reason.
- Replace unwrap_or_default() on find_callees with explicit match that
  logs the error via tracing::warn! before falling back to an empty Vec.
- Clamp center index to lines.len()-1 in read_source_window and add an
  early None return when start >= end to prevent an out-of-range slice
  panic when a stored line number exceeds the actual file length.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants