Skip to content

lalitnayyar/LocalRAGChatApp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local RAG Chat App

A fully local, privacy-friendly Retrieval-Augmented Generation (RAG) chat application powered by Reflex (frontend), LangChain (RAG pipeline), HuggingFace (embeddings), FAISS (vector store), and Ollama (local LLM inference). Built following the Apideck blog guide.


Features

  • Fully Local: No cloud dependencies, all data and inference runs on your machine.
  • Interactive Chat UI: Built with Reflex for a modern, responsive experience.
  • RAG Pipeline: Uses LangChain for document retrieval and LLM orchestration.
  • HuggingFace Embeddings: For semantic search and retrieval.
  • FAISS Vector Store: Fast, in-memory document retrieval.
  • Ollama LLM: Local LLM inference (default: gemma3:4b-it-qat).
  • Customizable Dataset: Uses a HuggingFace dataset by default, but can be adapted.

Project Structure

rag_app/  # Root folder
│
├── .env
├── requirements.txt
├── rxconfig.py
│
└── rag_app/
    ├── __init__.py        # Exposes 'app' for Reflex
    ├── rag_app.py         # Main Reflex app (was rag_gemma_reflex.py)
    ├── rag_logic.py       # RAG backend logic
    └── state.py           # Reflex state and handlers

Setup Guide

1. Prerequisites

  • Python 3.8+
  • Ollama installed and running locally
  • (Optional) uv for faster installs

2. Clone & Install Dependencies

pip install -r requirements.txt
# or
uv pip install -r requirements.txt

3. Configure Environment

4. Download Ollama Model

ollama pull gemma3:4b-it-qat

5. Run the App

reflex init
reflex run

Open your browser at http://localhost:3000


Troubleshooting

  • No module named 'rag_app.rag_app': This means Reflex cannot find the main app file. Ensure your directory structure is as above, with rag_app/rag_app.py as the main file and from .rag_app import app in rag_app/__init__.py.
  • Ollama not running: Start it with ollama serve.
  • Model not found: Pull with ollama pull <model_name>.
  • Dataset issues: Ensure you have internet access for the first run to download the dataset.
  • Port conflicts: Reflex defaults to port 3000. Change in config if needed.

Usage Guide

  1. Start Ollama locally (ollama serve if not auto-started).
  2. Start the Reflex app as above.
  3. Ask questions in the chat UI. Answers are generated using local RAG pipeline and LLM.
  4. The app retrieves relevant context from the dataset using embeddings and FAISS, then sends it to the LLM for answer generation.

Functional Guide

  • Chat: Type your question and hit "Ask". The app retrieves relevant context and generates an answer.
  • Dataset: By default, the app uses a subset of the neural-bridge/rag-dataset-12000 dataset. You can change this in rag_logic.py.
  • Model: The default LLM is gemma3:4b-it-qat. You can change the model in .env and pull it with Ollama.
  • Vector Store: On first run, the app builds a FAISS index for fast retrieval. Subsequent runs load the index from disk.
  • Error Handling: If the LLM or vector store is not available, errors will be shown in the console and in the chat.

Troubleshooting

  • Ollama not running: Start it with ollama serve.
  • Model not found: Pull with ollama pull <model_name>.
  • Dataset issues: Ensure you have internet access for the first run to download the dataset.
  • Port conflicts: Reflex defaults to port 3000. Change in config if needed.

References


License

MIT (or as preferred)

About

Building a Local RAG Chat App with Reflex, LangChain, Huggingface, and Ollama https://www.apideck.com/blog/building-a-local-rag-chat-app-with-reflex-langchain-huggingface-and-ollama

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages