Skip to content

[Feature]: Add minimum score threshold to filter low-relevance search results #35

@iamvirul

Description

@iamvirul

Problem Statement

search_code currently returns all results up to top_k regardless of their similarity score. Low-scoring results (e.g. 0.25–0.35) are genuinely not relevant to the query but are still sent to Claude, wasting tokens without improving answer quality.

Proposed Solution

Add a min_score parameter to search_code with a sensible default (e.g. 0.35):

@mcp.tool()
def search_code(query: str, path: str, top_k: int = 8, min_score: float = 0.35) -> str:

After retrieving results from the vector store, filter out any result below min_score before formatting the output:

results = [r for r in results if r["score"] >= min_score]

This is a pure token reduction — results below 0.35 cosine similarity are noise. Claude reading them adds no value and costs tokens on every search call.

Alternatives Considered

  • Hardcoding the threshold: less flexible, harder to tune per-codebase
  • Lowering top_k default: reduces results but doesn't remove genuinely irrelevant ones that happen to rank in the top N

Additional Context

  • No quality risk: sub-0.35 results are not semantically related to the query
  • The fine-tuned model (isuruwijesiri/all-MiniLM-L6-v2-code-search-512) produces well-calibrated scores so 0.35 is a reasonable default
  • Should also be documented in get_index_status output or README so users know the default threshold

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformancePerformance improvementpriority: mediumNormal priority, fix when convenient

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions