Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Case Study

This directory includes some case analysis. We compare the both method(grep + Claude Context semantic search) and the traditional grep only method.

These cases are selected from the Princeton NLP's SWE-bench_Verified dataset. The results and the logs are generated by the run_evaluation.py script. For more details, please refer to the evaluation README.md file.

  • 📁 django_14170: Query optimization in YearLookup breaks filtering by "__iso_year"
  • 📁 pydata_xarray_6938: .swap_dims() can modify original object

Each case study includes:

  • Original Issue: The GitHub issue description and requirements
  • Problem Analysis: Technical breakdown of the bug and expected solution
  • Method Comparison: Detailed comparison of both approaches
  • Conversation Logs: The interaction records showing how the LLM agent call the ols and generate the final answer.
  • Results: Performance metrics and outcome analysis

Key Results

Compared with traditional grep only, the both method(grep + Claude Context semantic search) is more efficient and accurate.

Why Grep Fails

  1. Information Overload - Generates hundreds of irrelevant matches
  2. No Semantic Understanding - Only literal text matching
  3. Context Loss - Can't understand code relationships
  4. Inefficient Navigation - Produces many irrelevant results

How Grep + Semantic Search Wins

  1. Intelligent Filtering - Automatically ranks by relevance
  2. Conceptual Understanding - Grasps code meaning and relationships
  3. Efficient Navigation - Direct targeting of relevant sections