Thesis RAG Chat

A RAG (Retrieval-Augmented Generation) app that lets you chat with a PDF thesis. It ingests the document into a vector database and uses semantic search + an LLM to answer questions grounded in the actual content.

Tech Stack

FastAPI — API server
LangChain — RAG pipeline (LCEL)
ChromaDB — vector store
OpenAI — embeddings (text-embedding-3-small) + chat (gpt-4o-mini)
pymupdf4llm — PDF to Markdown conversion
PostgreSQL — query logging
Docker Compose — containerized deployment

How It Works

Ingestion: PDF is converted to Markdown, split by document headers (chapters/sections), then sub-split into ~1500-char chunks. Each chunk is embedded and stored in ChromaDB with structural metadata.
Query: User question is embedded, top-k similar chunks are retrieved, and the LLM generates an answer citing specific chapters and pages.

Project Structure

app/
  main.py          # FastAPI entry point
  api.py           # /query, /health, /history endpoints
  schemas.py       # Pydantic models
  static/          # Chat UI (HTML + CSS + JS)
rag/
  ingest.py        # PDF chunking (hybrid structural + recursive)
  vectorstore.py   # ChromaDB setup
  rag_chain.py     # Retrieval + LLM chain
  prompts.py       # Prompt template
db/
  database.py      # SQLAlchemy engine
  models.py        # QueryLog model

Setup

Docker (recommended)

# 1. Create .env with your OpenAI key
cp .env.example .env

# 2. Place your thesis PDF
cp your_thesis.pdf thesis/master_thesis.pdf

# 3. Start services
docker compose up --build

# 4. Ingest the PDF (once)
docker compose exec api python -c "from rag.ingest import ingest_pdf; print(ingest_pdf('thesis/master_thesis.pdf'))"

# 5. Open http://localhost:8000

Local

pip install -r requirements.txt
# Set up PostgreSQL and update DATABASE_URL in .env
uvicorn app.main:app --reload

API

Endpoint	Method	Description
`/`	GET	Chat UI
`/query`	POST	`{"question": "..."}` → `{"answer": "...", "sources": [...]}`
`/health`	GET	Health check
`/history`	GET	Last 20 queries

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
app		app
db		db
rag		rag
scripts		scripts
thesis		thesis
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thesis RAG Chat

Tech Stack

How It Works

Project Structure

Setup

Docker (recommended)

Local

API

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Thesis RAG Chat

Tech Stack

How It Works

Project Structure

Setup

Docker (recommended)

Local

API

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages