A powerful Retrieval-Augmented Generation (RAG) system for processing multiple PDFs and delivering accurate, context-aware responses to user queries. Built in Python, it features GPU-accelerated embeddings, a semantic search engine, and the Mistral model via Ollama, wrapped in an intuitive Streamlit interface.
- π Multi-PDF Support β Upload and process several PDFs at once.
- β‘ GPU-Accelerated Embeddings β Leverage NVIDIA GPUs for fast processing.
- π Semantic Search β Uses FAISS for efficient context retrieval with source tracking.
- π Streamlit Interface β Clean and interactive web UI for document upload, chat, and performance.
- π§© Highly Configurable β Adjust chunk size, overlap, batch size, and worker threads.
- π Real-time Metrics β Monitor chunking speed, memory usage, and system stats.
- π¬ Chat History Export β Save and download all chat sessions in JSON format.
| Field | Details |
|---|---|
| Repo Name | Advanced-RAG-Chatbot |
| License | MIT |
| Language | Python |
| Model | Mistral via Ollama |
| Embeddings | intfloat/e5-small (HuggingFace) |
| Interface | Streamlit |
| Vector Store | FAISS |
| PDF Processor | PyMuPDF |
| Status | π§ Actively maintained |
| Type | Minimum | Recommended |
|---|---|---|
| CPU | 4-core | 8-core |
| RAM | 8 GB | 16 GB |
| Storage | 10 GB | 20 GB |
| GPU (Optional) | β | β NVIDIA (GTX 1060 or higher) |
- Python 3.8+
- Git
- CUDA Toolkit 11.2+ (for GPU acceleration)
- Ollama (β₯ 0.1.0)
# Download Python from https://python.org
python --version # β
Ensure it's β₯ 3.8# Download from https://git-scm.com
git --versiongit clone https://github.com/milind899/Advanced-RAG-Chatbot.git
cd Advanced-RAG-Chatbotpython -m venv venv
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activatepip install -r requirements.txtstreamlitβ Frontend UIlangchainβ RAG orchestrationtorchβ Model + CUDA accelerationtransformersβ Embedding generationfaiss-cpuβ Vector similarity searchPyMuPDFβ PDF parser
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull mistral
ollama servecurl http://localhost:11434/api/tags# Verify CUDA
nvidia-smi
# Enable GPU
unset OLLAMA_NO_CUDA
# Test with PyTorch
python -c "import torch; print(torch.cuda.is_available())"streamlit run app.pyπ Open your browser at: http://localhost:8501
-
Open the app in your browser.
-
Drag-and-drop or browse to upload PDFs.
-
Adjust settings from the sidebar:
Chunk Size(default: 500)Overlap(default: 50)Workers(CPU count default)Batch Size(default: 100)
-
Click Process Documents.
- Go to the Chat with Your Documents section.
- Type a question and hit Send.
- View responses with source links.
- Review chat history at the bottom.
- Export your chats via the Export JSON button.
- View chunks/sec, total time in Performance Metrics.
- Sidebar shows GPU & system usage.
- Troubleshoot with real-time logs.
Advanced-RAG-Chatbot/
βββ app.py # Streamlit UI
βββ rag_backend.py # Core RAG logic
βββ requirements.txt # Dependencies
βββ LICENSE # MIT License
βββ faiss_index/ # Generated vector store
βββ embedding_cache/ # Cached embeddings
| Parameter | Description | Default |
|---|---|---|
chunk_size |
Text chunk size | 500 |
chunk_overlap |
Overlap between chunks | 50 |
max_workers |
Concurrent threads | CPU count |
batch_size |
Documents per embedding batch | 100 |
embedding_model |
HuggingFace model | intfloat/e5-small |
cache_embeddings |
Use embedding cache | True |
| Problem | Fix |
|---|---|
| β Ollama not responding | Ensure ollama serve is running and model is pulled. |
| β CUDA not available | Install CUDA Toolkit, check nvidia-smi, enable with unset. |
| π₯ Memory crash | Reduce batch_size or chunk_size, clear cache folders. |
| π’ Slow performance | Enable GPU, increase max_workers, tune chunking strategy. |
- π§ Use the Help & Tips section in the app.
- π File issues or suggestions on GitHub Issues
- π Refer to
rag_backend.pyfor logic and model details.
This project is licensed under the MIT License. See LICENSE for more information.