HyDE RAG Experiment

An experimental comparison of three Retrieval-Augmented Generation (RAG) approaches to understand the effectiveness of HyDE (Hypothetical Document Embeddings) versus traditional RAG techniques.

Overview

This project evaluates three different RAG implementations:

Vanilla RAG - Direct embedding of user queries for semantic search
Two-Step RAG - Query rewriting followed by semantic search
HyDE RAG - Hypothetical answer generation followed by semantic search

The experiment uses the Ragas Airlines FAQ dataset with a synthetic ground truth dataset generated using RAGAS to measure and compare performance across multiple dimensions.

What is HyDE?

HyDE (Hypothetical Document Embeddings) is a retrieval technique that generates a hypothetical answer to the user's question before performing semantic search. Instead of embedding the question directly, the system:

Generates a plausible answer using an LLM
Embeds this hypothetical answer
Uses these embeddings to search for similar documents

The hypothesis is that answers are semantically closer to other answers in the vector space, potentially improving retrieval accuracy compared to searching with questions.

Architecture

Tech Stack

Framework: LangGraph for orchestrating RAG workflows
Vector Store: Qdrant for document storage and retrieval
Embeddings: Ollama (mxbai-embed-large) for local embedding generation
LLM: Ollama Qwen3 for query rewriting, hypothetical answer generation, and final response generation; OpenAI GPT-4o-mini for evaluation
Evaluation: RAGAS framework with metrics for accuracy, recall, precision, and groundedness
Tracking: LangSmith for observability and tracing

RAG Variants

1. Vanilla RAG (`src/simple_rag.py`)

2. Two-Step RAG (`src/two_step.py`)

3. HyDE RAG (`src/hyde_rag.py`)

Setup

Prerequisites

Python 3.13+
Docker (for Qdrant)
Ollama with qwen3 & mxbai-embed-large model

Installation

Clone the repository:

git clone "https://github.com/thapar25/HyDE"
cd HyDE

Install dependencies using uv:

uv sync

Start Qdrant using Docker:

docker-compose up -d

Configure environment variables: Create a .env file with the following:

QDRANT_HOST=localhost
QDRANT_PORT=6333
COLLECTION_NAME=faqs

OPENAI_API_KEY=your_openai_api_key

LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
LANGSMITH_PROJECT=hyde-rag
LANGSMITH_API_KEY=your_langsmith_api_key

Pull the Ollama models:

ollama pull qwen3
ollama pull mxbai-embed-large

Data Preparation

Generate synthetic ground truth dataset:

jupyter notebook notebooks/1-generate_ground_truth.ipynb

Ingest data into Qdrant:

jupyter notebook notebooks/2-ingest_data.ipynb

Running Experiments

Execute the evaluation notebook to compare all three RAG variants:

jupyter notebook notebooks/3-evals.ipynb

The evaluation measures:

Answer Accuracy: How correct is the generated answer compared to ground truth
Context Recall: How much relevant context was retrieved
Context Precision: How precise is the retrieved context
Response Groundedness: How well the response is grounded in retrieved context

Analyze results:

jupyter notebook notebooks/4-analyze.ipynb

Key Findings

The experiment revealed mixed results across different RAG approaches:

Different variants excelled at different metrics
Performance varied based on query complexity and context requirements
Trade-offs exist between retrieval quality and computational overhead

Detailed analysis and visualizations can be found in the artifacts/ directory after running the evaluation notebooks.

(Optional) LangGraph Graph API

Since the graphs are written using LangGraph, the workflows can be tested on Langsmith UI.

langgraph dev

Project Structure

HyDE/
└──  src/
|   └──  simple_rag.py      # Vanilla RAG implementation
|   └──  two_step.py         # Query rewriting RAG implementation
|   └──  hyde_rag.py         # HyDE RAG implementation
|   └──  utils.py            # Shared utilities and state definitions
|   └── prompts.yaml        # Prompt templates
└──  notebooks/
|   └──  1-generate_ground_truth.ipynb
|   └──  2-ingest_data.ipynb
|   └──  3-evals.ipynb
|   └── 4-analyze.ipynb
└──  artifacts/              # Generated datasets and results
└──  docker-compose.yaml     # Qdrant configuration
└── pyproject.toml         # Project dependencies

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
artifacts		artifacts
notebooks		notebooks
ragas-airline-dataset		ragas-airline-dataset
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
__init__.py		__init__.py
docker-compose.yaml		docker-compose.yaml
langgraph.json		langgraph.json
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HyDE RAG Experiment

Overview

What is HyDE?

Architecture

Tech Stack

RAG Variants

1. Vanilla RAG (`src/simple_rag.py`)

2. Two-Step RAG (`src/two_step.py`)

3. HyDE RAG (`src/hyde_rag.py`)

Setup

Prerequisites

Installation

Data Preparation

Running Experiments

Key Findings

(Optional) LangGraph Graph API

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HyDE RAG Experiment

Overview

What is HyDE?

Architecture

Tech Stack

RAG Variants

1. Vanilla RAG (src/simple_rag.py)

2. Two-Step RAG (src/two_step.py)

3. HyDE RAG (src/hyde_rag.py)

Setup

Prerequisites

Installation

Data Preparation

Running Experiments

Key Findings

(Optional) LangGraph Graph API

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Vanilla RAG (`src/simple_rag.py`)

2. Two-Step RAG (`src/two_step.py`)

3. HyDE RAG (`src/hyde_rag.py`)

Packages