A fully local, privacy-friendly Retrieval-Augmented Generation (RAG) chat application powered by Reflex (frontend), LangChain (RAG pipeline), HuggingFace (embeddings), FAISS (vector store), and Ollama (local LLM inference). Built following the Apideck blog guide.
- Fully Local: No cloud dependencies, all data and inference runs on your machine.
- Interactive Chat UI: Built with Reflex for a modern, responsive experience.
- RAG Pipeline: Uses LangChain for document retrieval and LLM orchestration.
- HuggingFace Embeddings: For semantic search and retrieval.
- FAISS Vector Store: Fast, in-memory document retrieval.
- Ollama LLM: Local LLM inference (default:
gemma3:4b-it-qat). - Customizable Dataset: Uses a HuggingFace dataset by default, but can be adapted.
rag_app/ # Root folder
│
├── .env
├── requirements.txt
├── rxconfig.py
│
└── rag_app/
├── __init__.py # Exposes 'app' for Reflex
├── rag_app.py # Main Reflex app (was rag_gemma_reflex.py)
├── rag_logic.py # RAG backend logic
└── state.py # Reflex state and handlers
pip install -r requirements.txt
# or
uv pip install -r requirements.txt
- Edit
.envto set your Ollama model and host if needed. - Default model:
gemma3:4b-it-qat - Default dataset: neural-bridge/rag-dataset-12000
ollama pull gemma3:4b-it-qat
reflex init
reflex run
Open your browser at http://localhost:3000
- No module named 'rag_app.rag_app': This means Reflex cannot find the main app file. Ensure your directory structure is as above, with
rag_app/rag_app.pyas the main file andfrom .rag_app import appinrag_app/__init__.py. - Ollama not running: Start it with
ollama serve. - Model not found: Pull with
ollama pull <model_name>. - Dataset issues: Ensure you have internet access for the first run to download the dataset.
- Port conflicts: Reflex defaults to port 3000. Change in config if needed.
- Start Ollama locally (
ollama serveif not auto-started). - Start the Reflex app as above.
- Ask questions in the chat UI. Answers are generated using local RAG pipeline and LLM.
- The app retrieves relevant context from the dataset using embeddings and FAISS, then sends it to the LLM for answer generation.
- Chat: Type your question and hit "Ask". The app retrieves relevant context and generates an answer.
- Dataset: By default, the app uses a subset of the neural-bridge/rag-dataset-12000 dataset. You can change this in
rag_logic.py. - Model: The default LLM is
gemma3:4b-it-qat. You can change the model in.envand pull it with Ollama. - Vector Store: On first run, the app builds a FAISS index for fast retrieval. Subsequent runs load the index from disk.
- Error Handling: If the LLM or vector store is not available, errors will be shown in the console and in the chat.
- Ollama not running: Start it with
ollama serve. - Model not found: Pull with
ollama pull <model_name>. - Dataset issues: Ensure you have internet access for the first run to download the dataset.
- Port conflicts: Reflex defaults to port 3000. Change in config if needed.
MIT (or as preferred)