Skip to content

spoturno/research-assistant

Repository files navigation

📚 Research Assistant - Talk to Your Papers

A local RAG (Retrieval-Augmented Generation) system that allows you to upload PDFs and have conversations about their content using local LLMs via Ollama.

🏗️ Architecture

Architecture Diagram

🔄 RAG Pipeline

RAG Diagram

The RAG (Retrieval-Augmented Generation) pipeline processes your documents through several stages:

  1. Document Loading: PyMuPDF for PDF processing
  2. Text Splitting: Recursive character splitter (optimized chunks with overlap)
  3. Embeddings: Nomic embeddings via Ollama (nomic-embed-text)
  4. Vector Store: In-memory vector store with similarity search
  5. Retrieval: Context-aware similarity search with relevance scoring
  6. Generation: LangGraph workflow with query rewriting using llama3.2:3b

🚀 Quick Start

Prerequisites

  • Ollama: Install from ollama.ai and have it running
  • Docker & Docker Compose: For containerized backend and frontend
  • System Requirements: 8GB+ RAM, 2GB free disk space

One-Command Setup

For Windows:

./start.bat

For Linux/macOS:

./start.sh

Or manually with Docker Compose:

# First, ensure Ollama is running: ollama serve
# Then pull required models: ollama pull nomic-embed-text && ollama pull llama3.2:3b
docker compose up

The setup process will:

  1. ✅ Verify system requirements and dependencies
  2. 🤖 Check/download required AI models (nomic-embed-text and llama3.2:3b)
  3. 🐳 Build and start the backend API (FastAPI)
  4. 🖥️ Build and start the frontend UI (Streamlit)
  5. 🔗 Connect everything to your native Ollama instance

Access the Application

⚠️ Important Note: If Streamlit shows 0.0.0.0:8501 in the console, use http://127.0.0.1:8501 or http://localhost:8501 in your browser instead.

📖 How to Use

  1. Upload Documents:

  2. Ask Questions:

    • Type your questions in the chat interface
    • Get AI-powered answers based on your documents
  3. Manage Session:

    • Clear chat history
    • Remove documents
    • Start fresh conversations

🔧 Development

Project Structure

.
├── backend/                # FastAPI backend
│   ├── api/               # API endpoints and dependencies
│   │   └── endpoints/     # Document and chat endpoints
│   ├── rag/               # RAG pipeline components
│   │   ├── agent.py       # LangGraph workflow agent
│   │   ├── loader.py      # PDF document loader
│   │   ├── embedder.py    # Ollama embeddings
│   │   ├── splitter.py    # Text chunking
│   │   ├── vector_store.py# In-memory vector storage
│   │   └── retriever.py   # Document retrieval
│   ├── services/          # Business logic services
│   ├── models/            # Pydantic DTOs
│   ├── config.py          # Configuration management
│   ├── main.py            # FastAPI application
│   ├── requirements.txt   # Python dependencies
│   └── Dockerfile         # Backend container
├── ui/                    # Streamlit frontend
│   ├── ui.py              # Main UI application
│   ├── ui-requirements.txt# UI dependencies
│   └── Dockerfile         # Frontend container
├── architecture_diagram.png # System architecture
├── rag_diagram.png       # RAG pipeline diagram
├── docker-compose.yml    # Service orchestration
├── start.sh              # Linux/macOS startup script
├── start.bat             # Windows startup script
└── README.md

Local Development

Backend:

cd backend
pip install -r requirements.txt
python main.py

Frontend:

cd ui
pip install -r ui-requirements.txt
streamlit run ui.py

Note: Ensure Ollama is running locally with required models before starting development.

📊 API Endpoints

  • POST /documents - Upload PDF document
  • GET /documents - List uploaded documents
  • DELETE /documents - Clear all documents
  • POST /chat - Send message and get AI response
  • GET /messages - Get chat history
  • DELETE /messages - Clear chat history
  • GET /health - Health check

🐛 Troubleshooting

Common Issues

  1. Ollama not running:

    • Ensure Ollama is installed: Visit ollama.ai
    • Start Ollama: ollama serve
    • Verify it's running: curl http://localhost:11434/api/version
  2. Models not available:

    • Download required models: ollama pull nomic-embed-text && ollama pull llama3.2:3b
    • Check available models: ollama list
    • The startup scripts will help with this automatically
  3. Connection issues:

    • Verify Ollama is accessible: curl http://localhost:11434/api/tags
    • Check Docker containers: docker ps
    • Review logs: docker compose logs
  4. Out of memory:

    • Ensure at least 8GB RAM available
    • Close other applications
    • Consider using smaller models if issues persist

Logs & Debugging

# View all service logs
docker compose logs

# View specific service logs
docker compose logs backend
docker compose logs frontend

# Check Ollama status
ollama list
curl http://localhost:11434/api/version

Reset Everything

# Stop all services
docker compose down -v

# Clean Docker system
docker system prune -f

# Restart fresh
./start.sh  # or ./start.bat on Windows

🔐 Technical Details

Technologies

  • Backend: FastAPI, LangChain, LangGraph, Ollama integration
  • Frontend: Streamlit with intuitive file upload and chat interface
  • Containerization: Docker & Docker Compose
  • LLMs: Native Ollama with local models
  • Vector Storage: In-memory vector store for fast retrieval
  • Document Processing: PyMuPDF for robust PDF handling

Configuration

The system is configured to work with:

  • Embedding Model: nomic-embed-text (via Ollama)
  • Chat Model: llama3.2:3b (via Ollama)
  • Native Ollama: Running on localhost:11434
  • Auto-discovery: Backend automatically connects to local Ollama instance

About

A local RAG (Retrieval-Augmented Generation) system that allows you to upload PDFs and have conversations about their content using local LLMs via Ollama.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors