Skip to content

thapar25/HyDE

Repository files navigation

HyDE RAG Experiment

An experimental comparison of three Retrieval-Augmented Generation (RAG) approaches to understand the effectiveness of HyDE (Hypothetical Document Embeddings) versus traditional RAG techniques.

Overview

This project evaluates three different RAG implementations:

  1. Vanilla RAG - Direct embedding of user queries for semantic search
  2. Two-Step RAG - Query rewriting followed by semantic search
  3. HyDE RAG - Hypothetical answer generation followed by semantic search

The experiment uses the Ragas Airlines FAQ dataset with a synthetic ground truth dataset generated using RAGAS to measure and compare performance across multiple dimensions.

What is HyDE?

HyDE (Hypothetical Document Embeddings) is a retrieval technique that generates a hypothetical answer to the user's question before performing semantic search. Instead of embedding the question directly, the system:

  1. Generates a plausible answer using an LLM
  2. Embeds this hypothetical answer
  3. Uses these embeddings to search for similar documents

The hypothesis is that answers are semantically closer to other answers in the vector space, potentially improving retrieval accuracy compared to searching with questions.

Architecture

Tech Stack

  • Framework: LangGraph for orchestrating RAG workflows
  • Vector Store: Qdrant for document storage and retrieval
  • Embeddings: Ollama (mxbai-embed-large) for local embedding generation
  • LLM: Ollama Qwen3 for query rewriting, hypothetical answer generation, and final response generation; OpenAI GPT-4o-mini for evaluation
  • Evaluation: RAGAS framework with metrics for accuracy, recall, precision, and groundedness
  • Tracking: LangSmith for observability and tracing

RAG Variants

1. Vanilla RAG (src/simple_rag.py)

image

2. Two-Step RAG (src/two_step.py)

image

3. HyDE RAG (src/hyde_rag.py)

image

Setup

Prerequisites

  • Python 3.13+
  • Docker (for Qdrant)
  • Ollama with qwen3 & mxbai-embed-large model

Installation

  1. Clone the repository:
git clone "https://github.com/thapar25/HyDE"
cd HyDE
  1. Install dependencies using uv:
uv sync
  1. Start Qdrant using Docker:
docker-compose up -d
  1. Configure environment variables: Create a .env file with the following:
QDRANT_HOST=localhost
QDRANT_PORT=6333
COLLECTION_NAME=faqs

OPENAI_API_KEY=your_openai_api_key

LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
LANGSMITH_PROJECT=hyde-rag
LANGSMITH_API_KEY=your_langsmith_api_key
  1. Pull the Ollama models:
ollama pull qwen3
ollama pull mxbai-embed-large

Data Preparation

  1. Generate synthetic ground truth dataset:
jupyter notebook notebooks/1-generate_ground_truth.ipynb
  1. Ingest data into Qdrant:
jupyter notebook notebooks/2-ingest_data.ipynb

Running Experiments

Execute the evaluation notebook to compare all three RAG variants:

jupyter notebook notebooks/3-evals.ipynb

The evaluation measures:

  • Answer Accuracy: How correct is the generated answer compared to ground truth
  • Context Recall: How much relevant context was retrieved
  • Context Precision: How precise is the retrieved context
  • Response Groundedness: How well the response is grounded in retrieved context

Analyze results:

jupyter notebook notebooks/4-analyze.ipynb

Key Findings

The experiment revealed mixed results across different RAG approaches:

  • Different variants excelled at different metrics
  • Performance varied based on query complexity and context requirements
  • Trade-offs exist between retrieval quality and computational overhead

Detailed analysis and visualizations can be found in the artifacts/ directory after running the evaluation notebooks.

(Optional) LangGraph Graph API

Since the graphs are written using LangGraph, the workflows can be tested on Langsmith UI.

langgraph dev

Project Structure

HyDE/
└──  src/
|   └──  simple_rag.py      # Vanilla RAG implementation
|   └──  two_step.py         # Query rewriting RAG implementation
|   └──  hyde_rag.py         # HyDE RAG implementation
|   └──  utils.py            # Shared utilities and state definitions
|   └── prompts.yaml        # Prompt templates
└──  notebooks/
|   └──  1-generate_ground_truth.ipynb
|   └──  2-ingest_data.ipynb
|   └──  3-evals.ipynb
|   └── 4-analyze.ipynb
└──  artifacts/              # Generated datasets and results
└──  docker-compose.yaml     # Qdrant configuration
└── pyproject.toml         # Project dependencies

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors