A lightweight end-to-end Retrieval-Augmented Generation (RAG) demo. It lets you upload a PDF, indexes it into a vector store, and chats with an agent that can search the PDF, the web, and arXiv. The backend is built with FastAPI + LangGraph; the frontend is a simple Streamlit chat UI.
.
├── client/ # Streamlit chat UI
├── server/ # FastAPI app, LangGraph agent, RAG stack, observability helpers
├── shared/ # Small shared utilities
├── evaluation/ # RAGAS evaluation runner
├── pyproject.toml # Python dependencies
└── README.md
server/main.py: FastAPI routes to upload a PDF and chat. Creates sessions, builds the RAG stack, and stores the LangGraph agent per session.server/rag/*: PDF loader, HuggingFace embeddings, Chroma vector store builder.server/agent/*: Tooling (PDF search, web search via Serper, arXiv search) and LangGraph graph definition.client/app.py: Streamlit chat surface that uploads the PDF, then sends chat turns to the API.evaluation/run_ragas.py: Skeleton for running RAGAS evaluations on stored conversations.shared/utils.py: Simple helpers (e.g., PDF file hashing).
- Python 3.11+
uvorpipfor installing dependencies- Access tokens:
GROQ_API_KEYfor the LLMSERPER_API_KEYfor Google Serper search
- PDF with extractable text (PyPDF is used for parsing)
-
(Recommended) create a virtual environment:
python -m venv .venv source .venv/bin/activate -
Install dependencies (using
uvorpip):uv pip install -r <(uv pip compile pyproject.toml)or
pip install -e . -
Create a
.envfile in the project root with your keys:GROQ_API_KEY=your_groq_key SERPER_API_KEY=your_serper_key
uvicorn server.main:app --reload --port 8000What it does on first PDF upload:
- Saves the PDF to
.rag_workspace/<hash>.pdf - Splits & embeds text with
sentence-transformers/all-MiniLM-L6-v2 - Stores vectors in a persisted Chroma DB
- Builds tools: PDF search, Serper web search, arXiv search
- Spins up a LangGraph agent bound to those tools
In a separate terminal:
streamlit run client/app.py --server.port 8501Then open http://localhost:8501.
- Upload a PDF via the Streamlit UI.
- The backend indexes the PDF and creates a session-specific agent.
- Each question is sent to
/chatalong with the session ID. - The LangGraph agent decides whether to call tools (PDF search, web search, arXiv) before replying.
- The latest assistant message is returned to the UI and shown in the chat history.
GET /health→{ "status": "ok" }POST /upload_pdf(multipartfile) →{ "session_id": "..." }POST /chat(JSON{ session_id, message }) →{ "answer": "..." }
server/config.py loads settings from environment variables and defaults:
GROQ_API_KEY(required)SERPER_API_KEY(required)MODEL_NAME(default:moonshotai/kimi-k2-instruct-0905)EMBEDDING_MODEL(default:sentence-transformers/all-MiniLM-L6-v2)WORKSPACE_DIR(default:.rag_workspace)
If keys are missing, errors are surfaced in Streamlit to help you catch setup issues early.
evaluation/run_ragas.py is a starter script for running RAGAS evaluations against saved conversations. Customize it with your dataset and metrics to benchmark answer quality.
- Make sure
GROQ_API_KEYandSERPER_API_KEYare set before starting the API. - Remove
.rag_workspaceif you want to clear cached PDFs and vector stores. - If a PDF has no extractable text, PyPDF will raise an error during upload.
- For cross-origin setups, adjust
API_BASEinclient/app.pyto point to your API host.
Feel free to open issues or PRs that improve reliability, add new tools, or enhance the UI. Keep code simple and documented so the project stays approachable.
MIT (see repository license if provided).