A private, offline AI assistant with the entire English Wikipedia at your fingertips.
- 100% offline: Ollama + Kiwix + specialized models → answers with inline citations
- Zero API keys, zero telemetry, zero cost
- Pulls 6M+ English Wikipedia articles (≈102 GB) and answers in 10‑20 s on CPU
- One command (
./run.sh) to start chatting; optional installer handles everything
git clone https://github.com/macromeer/offline-wikipedia-rag.git
cd offline-wikipedia-rag
./scripts/install.sh # installs Ollama, models, Wikipedia dump, env
./run.sh # launches the assistantInstaller runtime: 2‑8 hours (mostly download). Disk: ~120 GB free.
./run.shThe launcher activates the conda env, checks Ollama, starts Kiwix if needed, finds the best local models, and tears everything down when you exit.
| Resource | Recommended |
|---|---|
| OS | Linux (Ubuntu 20.04+/Fedora 35+) or macOS |
| Disk | ≥120 GB free (102 GB Wikipedia + models + env) |
| RAM | 16‑24 GB for Mistral‑7B + Llama‑3.1‑8B |
| CPU/GPU | Multi-core CPU. GPU optional (Ollama auto-detects CUDA/ROCm/Metal). |
Smaller RAM works with smaller models; high-memory rigs can bump to Qwen2.5‑32B or Gemma2‑27B. See docs/TWO_STAGE_AI_PIPELINE.md for pairings.
# 1. Install Ollama and pull models
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull mistral:7b # selection
ollama pull llama3.1:8b # synthesis
# 2. Create the Python env
conda env create -f environment.yml
conda activate wikipedia-rag
# 3. Download Wikipedia (long download)
./scripts/setup_full_offline_wikipedia.sh
# 4. Run
./run.sh# Preferred: handles env + services automatically
./run.sh
# Manual mode
mamba activate wikipedia-rag
python wikipedia_rag_kiwix.py --question "What is machine learning?"While running you can ask follow-up questions interactively, or pass --question for single-shot mode. Helpful flags:
python wikipedia_rag_kiwix.py --help
--model llama3.1:8b # override synthesis model
--selection-model mistral:7b # override article selector
--max-results 4 # force number of articles
--no-auto-start # skip auto Kiwix launchExample session:
✓ Connected to Kiwix server at http://localhost:8080
✓ Selection model: mistral:7b
✓ Summarization model: llama3.1:8b
🔍 Searching local Wikipedia for: What are the goals of NASA?
🔑 Focus keywords: nasa, goals, ...
📄 Fetching article abstracts…
🤖 Selecting with mistral:7b…
✓ AI selected 3 article(s): Goals, NASA, Timeline of Solar System exploration
… answer with citations …
- Privacy by default – all computation happens locally (no telemetry, no API keys)
- Full-text Wikipedia (January 2024) served via Kiwix
- Two-stage pipeline: Mistral‑7B ranks articles using abstracts, Llama‑3.1‑8B synthesizes multi-source answers with inline
[1][2][3]citations - Adaptive retrieval depth: 3‑6 articles depending on question complexity
- Works on CPU, accelerates automatically if a GPU is available
Question → term extraction → Kiwix search → abstract fetch →
Stage 1 (selection model) → full article fetch → Stage 2 (synthesis model)
→ Answer + clickable citations
Relevant docs:
docs/TWO_STAGE_AI_PIPELINE.md– model pairing benchmarksdocs/AUTOMATIC_SETUP.md– whatrun.shand the installer configure
mamba activate wikipedia-rag
pytest tests -v # full suite
pytest tests/test_rag_functions.py -v
pytest -m "not integration" # skip network/Kiwix checksUnit tests cover search-term extraction, complexity estimation, model detection, and mocked Kiwix flows. CI runs tests.yml on every PR.
offline-wikipedia-rag/
├── run.sh / scripts/ # automation helpers
├── wikipedia_rag_kiwix.py # main application
├── docs/ # design notes and how-tos
├── tests/ # pytest suite
├── environment.yml # conda env
└── README.md
Issues and PRs are welcome! Ideas that help: additional language dumps, GUIs, Docker images, alternative model profiles, or performance/testing improvements. Please see CONTRIBUTING.md for details.
$ python wikipedia_rag_kiwix.py --question "What are the goals of NASA?"
✓ Connected to Kiwix server at http://localhost:8080
✓ Selection model: mistral:7b
✓ Summarization model: llama3.1:8b
🔍 Searching local Wikipedia for: What are the goals of NASA?
✓ Retrieved 11 unique candidates
✓ Found 11 candidate article(s)
📄 Fetching article abstracts for AI selection...
🤖 Selecting with mistral:7b (using article abstracts)...
✓ AI selected 3 article(s): Goals, NASA, Timeline of Solar System exploration
📊 Reading ~20 paragraphs per article (max 8k chars each)
📄 Fetching: Goals
📄 Fetching: NASA
📄 Fetching: Timeline of Solar System exploration
🤖 Generating synthesis with llama3.1:8b...
⏱️ Total time: 13.6s
======================================================================
❓ Question: What are the goals of NASA?
======================================================================
📖 Answer:
NASA's primary goals encompass a wide range of objectives, from advancing
space exploration to conducting aeronautics research. The agency was
established in 1958 [2] as an independent federal agency responsible for
the civil space program, aeronautics research, and space research.
NASA's early goals focused on achieving human spaceflight, which began with
Project Mercury [2]. The Apollo Program, launched in response to President
Kennedy's goal of landing an American on the Moon by the end of the 1960s [3],
marked a significant achievement in space exploration.
NASA's goals also extend beyond human spaceflight to exploring the Solar
System [3]. The agency has sent numerous robotic spacecraft to explore
various planets and celestial bodies, greatly expanding our understanding
of the universe.
----------------------------------------------------------------------
📚 Source Articles (click to open):
[1] Goals
http://localhost:8080/wikipedia_en_all_maxi_2024-01/A/Goals
[2] NASA
http://localhost:8080/wikipedia_en_all_maxi_2024-01/A/NASA
[3] Timeline of Solar System exploration
http://localhost:8080/content/wikipedia_en_all_maxi_2024-01/A/Timeline_of_Solar_System_exploration
======================================================================# Clone and run automated setup
git clone https://github.com/yourusername/offline-wikipedia-rag.git
cd offline-wikipedia-rag
./scripts/install.shThe installer will:
- ✅ Install Ollama and recommended AI models (Mistral-7B + Llama-3.1-8B)
- ✅ Download complete Wikipedia (102GB)
- ✅ Set up all dependencies
- ✅ Test the system
After installation, it's super simple - no manual steps needed:
./run.shThat's it! The script automatically:
- ✅ Activates the correct Python environment
- ✅ Checks if Ollama is running
- ✅ Starts Kiwix server if needed
- ✅ Detects best AI models available
- ✅ Cleans up when you exit
Time needed: 2-8 hours (mostly downloading)
Disk space: ~120GB
- OS: Linux (Ubuntu 20.04+, Debian 11+, Fedora 35+) or macOS
- Disk: 120GB free space (Wikipedia + models)
- RAM: 16-24GB for recommended models (Mistral-7B + Llama-3.1-8B)
- CPU: Multi-core processor (x86_64 or ARM64)
- Budget: 12GB RAM - Use smaller models (still works well!)
- High-end: 32GB+ RAM - Use larger models (Qwen2.5-32B + Gemma2-27B)
See docs/TWO_STAGE_AI_PIPELINE.md for detailed model recommendations.
- GPU acceleration is automatically detected and used if available
- Works perfectly fine without GPU (CPU-only mode)
- Supported: NVIDIA (CUDA), Apple Silicon (Metal), AMD (ROCm)
- No manual GPU setup required - Ollama handles everything automatically
Click to expand manual installation steps
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull recommended AI models
ollama pull mistral:7b # Selection model (~4.4GB)
ollama pull llama3.1:8b # Summarization model (~4.9GB)
# Install Python environment
conda env create -f environment.yml
conda activate wikipedia-rag# Automated download (~102GB, takes 2-8 hours)
./scripts/setup_full_offline_wikipedia.sh# Just run it - everything is automatic!
./run.shSee docs/AUTOMATIC_SETUP.md for details on what happens automatically.
- Total time: 10-25 seconds for complete answers (search + selection + synthesis)
- Token generation: ~50-60 tokens/sec with GPU, ~5-15 tokens/sec CPU-only
- Network: Fully offline, zero network latency
- Performance varies based on hardware, models used, and number of articles retrieved
GPU support is completely optional:
- ✅ Ollama automatically detects and uses GPU if available (NVIDIA/AMD/Apple Silicon)
- ✅ No CUDA/ROCm installation needed - Ollama includes everything
- ✅ Works great on CPU-only systems - no GPU required
- ✅ Seamlessly falls back to CPU if no GPU found
Bottom line: Just install and run - GPU acceleration works automatically if you have one, runs fine without it.
The script automatically handles everything - just run it! It will:
- ✅ Check if Ollama is running
- ✅ Auto-start Kiwix server if needed
- ✅ Auto-detect best available models
- ✅ Clean up on exit
📖 See docs/AUTOMATIC_SETUP.md for full details on automatic setup.
# One command - handles environment activation and everything!
./run.sh# Activate environment
mamba activate wikipedia-rag
# Run the script
python wikipedia_rag_kiwix.pyThen type your questions naturally:
❓ Your question: Explain photosynthesis
❓ Your question: Who was Marie Curie?
❓ Your question: What caused World War 2?
./run.sh --question "What is machine learning?"
# Or with manual activation:
# mamba activate wikipedia-rag
# python wikipedia_rag_kiwix.py --question "What is machine learning?"python wikipedia_rag_kiwix.py --help
Options:
--question TEXT Ask a single question
--model TEXT Summarization model (default: auto-detect)
--selection-model TEXT Article selection model (default: auto-detect)
--kiwix-url TEXT Kiwix server URL (default: http://localhost:8080)
--max-results INT Number of articles (default: auto by complexity)
--no-auto-start Don't automatically start Kiwix server# The script auto-detects the best models, but you can override:
# Recommended setup (most users)
./run.sh --selection-model mistral:7b --model llama3.1:8b
# Alternative for better selection (if you have 32GB+ RAM)
./run.sh --selection-model qwen2.5:32b-instruct --model llama3.1:8bgraph LR
A[Your Question] --> B[Search Wikipedia]
B --> C[Fetch Abstracts]
C --> D[Stage 1: AI Selection]
D --> E[Fetch Full Articles]
E --> F[Stage 2: AI Synthesis]
F --> G[Answer with Citations]
-
Wikipedia Search
- Extracts search terms from your question
- Searches for 25+ candidate articles
- Direct lookup finds main articles
-
Stage 1: Content-Based Selection (Mistral-7B)
- Fetches first paragraph (abstract) from each candidate
- AI evaluates actual content, not just titles
- Selects 3-6 most relevant articles
- Filters out lists, stubs, and irrelevant topics
-
Stage 2: Synthesis with Citations (Llama-3.1-8B)
- Reads full content of selected articles
- Synthesizes comprehensive answer
- Adds inline citations [1][2][3] for every fact
- Provides clickable URLs to source articles
Specialized models perform better than one model doing everything:
- Selection model (default: Mistral-7B): Fast, accurate classification from article abstracts
- Summarization model (default: Llama-3.1-8B): Excellent world knowledge and synthesis
- Result: 85-88% selection accuracy + high-quality answers in 10-18 seconds
- System auto-detects available models and selects best options
- AI Models:
- Mistral-7B - Fast, accurate article selection from abstracts
- Llama-3.1-8B - High-quality answer synthesis with citations
- Alternative selection models: Qwen2.5 (32B/14B/7B), Hermes-3-8B
- Alternative synthesis models: Gemma-2 (27B/9B), Llama-3.3-70B
- Wikipedia: Kiwix - Offline Wikipedia server (ZIM format)
- Runtime: Ollama - Local AI model runner
- Language: Python 3.10+
offline-wikipedia-rag/
├── wikipedia_rag_kiwix.py # Main RAG application
├── environment.yml # Python environment
├── requirements.txt # Python dependencies
├── scripts/
│ ├── install.sh # One-line installer
│ ├── setup_full_offline_wikipedia.sh # Wikipedia downloader
│ └── start_offline_rag.sh # Quick start script
├── tests/
│ ├── test_system.py # System validation tests
│ ├── test_token_speed.py # Token generation benchmarks
│ └── test_cpu_speed.py # CPU performance tests
└── docs/ # Additional documentation
The project includes a comprehensive pytest test suite:
# Run all unit tests
pytest tests/ -v
# Run specific test file
pytest tests/test_rag_functions.py -v
# Run tests with coverage
pytest tests/ --cov=. --cov-report=html
# Run only unit tests (skip integration tests)
pytest -m "not integration"Test Coverage:
- ✅ Search term extraction (proper nouns, content words, stopword filtering)
- ✅ Complexity estimation (simple, multi-part, comparison questions)
- ✅ Model detection (selection priority, reasoning model avoidance)
- ✅ Integration tests for Kiwix/Ollama (marked separately)
Contributions are welcome! Areas for improvement:
- Support for other languages (German, French, Spanish Wikipedia)
- GUI interface
- Docker container
- Additional AI models
- Performance optimizations
- More test coverage
See CONTRIBUTING.md for guidelines.
- First download takes time: Wikipedia is 102GB
- RAM usage: Two models need ~10-12GB RAM (or 16GB recommended)
- CPU-based: Works great on CPU, GPU support optional
- English only: Currently supports English Wikipedia only
Problem: "Connection refused to Kiwix"
# Restart Kiwix server
./scripts/start_offline_rag.shProblem: "Ollama model not found"
# Pull the recommended models
ollama pull mistral:7b
ollama pull llama3.1:8bProblem: "Out of memory"
# System will automatically fall back to smaller available models
# Or manually specify a smaller model
python wikipedia_rag_kiwix.py --model llama3.2:3bSee TROUBLESHOOTING.md for more solutions.
MIT License - see LICENSE file for details.
- Mistral AI - For the excellent Mistral-7B model (default selection model)
- Meta AI - For the powerful Llama-3.1-8B model (default synthesis model)
- Kiwix - For offline Wikipedia technology
- Ollama - For easy local AI model deployment
- Wikimedia Foundation - For Wikipedia
If you find this project useful, please consider giving it a star! It helps others discover the project.
Made with ❤️ for the open-source community