Skip to content

kasimmj/local-ai-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


Self-host an entire production-grade AI stack with ONE command. LLMs β€’ Vector DB β€’ Web UI β€’ Workflow Automation β€’ RAG β€’ Voice β€” all local, all yours.

Quick Start β€’ What's Inside β€’ Architecture β€’ Use Cases


🎯 Why Local AI Stack?

You don't have to choose between convenience and privacy. You don't have to pay OpenAI $0.03 per request for a chatbot. You don't have to leak company data through an API.

Run your own GPT β€” at home, on-prem, or in your private cloud.

git clone https://github.com/kasimmj/local-ai-stack
cd local-ai-stack
./start.sh

That's it. You now have a ChatGPT clone at http://localhost:3000, a vector database at http://localhost:6333, a workflow editor at http://localhost:5678, and a model API at http://localhost:11434.


πŸ“¦ What's Inside

Service Purpose Port
πŸ¦™ Ollama Run LLMs locally (Llama, Mistral, Qwen, DeepSeek...) 11434
πŸ’¬ Open WebUI ChatGPT-style UI, RAG, voice, image, multi-user 3000
πŸ” Qdrant Production vector database for embeddings/RAG 6333
πŸ”„ n8n Visual workflow automation (1000+ integrations) 5678
πŸ—„οΈ Postgres Persistent storage for n8n and your apps 5432
⚑ Redis Fast cache + job queue 6379
🌐 Caddy Auto-HTTPS reverse proxy (optional) 80/443

πŸš€ Quick Start

Prerequisites

  • Docker 24+ and Docker Compose v2
  • 16GB RAM minimum (32GB recommended for larger models)
  • ~50GB free disk space

Installation

git clone https://github.com/kasimmj/local-ai-stack
cd local-ai-stack
cp .env.example .env
./start.sh

Open http://localhost:3000 β†’ create your admin account β†’ start chatting.

Pull your first model

docker exec -it ollama ollama pull llama3.2:3b      # Fast & small (2GB)
docker exec -it ollama ollama pull qwen2.5:7b       # Great quality (4GB)
docker exec -it ollama ollama pull deepseek-r1:8b   # Reasoning (5GB)

Stop / Reset

./stop.sh                  # Graceful shutdown
./reset.sh                 # Nuke everything (delete data)

πŸ—οΈ Architecture

                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚              Caddy (Reverse Proxy)        β”‚
                β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚                β”‚            β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”  β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β”
              β”‚ Open WebUI  β”‚  β”‚     n8n    β”‚  β”‚ Qdrant β”‚
              β”‚   :3000     β”‚  β”‚   :5678    β”‚  β”‚  :6333 β”‚
              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚                β”‚
                     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”
                     β”‚  β”‚   Postgres     β”‚
                     β”‚  β”‚     :5432      β”‚
                     β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
              β”‚   Ollama    β”‚
              β”‚   :11434    β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

All services share a private Docker network. Only Open WebUI, n8n, and Qdrant are exposed by default β€” everything else stays internal.


πŸ’‘ Use Cases

🏒 Private Company Assistant

Replace ChatGPT Teams with an internal AI that knows your docs, never leaks data.

πŸŽ“ University RAG Research

Index thousands of papers and chat with them β€” no API costs, full reproducibility.

πŸ€– Customer Support Bot

Train on your knowledge base, deploy via n8n webhooks to WhatsApp/Telegram/Slack.

πŸ›‘οΈ Privacy-First Personal AI

Your conversations, your data, your model. No telemetry.

🌐 Edge AI for Disconnected Regions

Bring AI to areas with limited internet β€” fully offline after first install.


βš™οΈ Configuration

Choose your model

Edit .env:

DEFAULT_MODEL=qwen2.5:7b
EMBEDDING_MODEL=nomic-embed-text

Add Arabic / RTL support

Open WebUI already supports RTL out of the box. Just pick Ψ§Ω„ΨΉΨ±Ψ¨ΩŠΨ© in Settings β†’ Interface.

Enable HTTPS (production)

DOMAIN=ai.yourcompany.com ./start.sh --with-caddy

Caddy will auto-provision a Let's Encrypt certificate.

Add a custom model

docker exec -it ollama ollama pull <model-name>
# Then refresh Open WebUI β€” it appears in the model picker.

πŸ“Š Resource Requirements

Model size RAM Disk Speed (RTX 4090)
3B 4GB 2GB 80 tok/s
7B 8GB 4GB 50 tok/s
13B 16GB 8GB 28 tok/s
34B 32GB 20GB 12 tok/s
70B 64GB 40GB 6 tok/s

CPU-only mode is supported (slower).


🧩 Extensions

Drop YAML files into extensions/ to add more services:

  • voice/ β€” Whisper STT + Piper TTS for voice chat
  • vision/ β€” Stable Diffusion for image generation
  • search/ β€” SearXNG for web-grounded answers
  • monitoring/ β€” Grafana + Loki for observability
./start.sh --enable voice,vision,search

πŸ›‘οΈ Security Notes

  • ⚠️ Default credentials are random per-install (stored in .env)
  • ⚠️ Never expose ports directly to public internet without Caddy + auth
  • βœ… All inter-service traffic is on a private Docker network
  • βœ… No telemetry, no analytics, no external calls (unless you add them)

🀝 Contributing

PRs welcome! Especially:

  • Additional Ollama model preset bundles
  • n8n workflow templates
  • Extensions for new services (TTS, vision, scrapers)
  • Translations for Open WebUI

πŸ“œ License

MIT Β© 2026 Kasim Mohammed


Star ⭐ if you'd rather own your AI than rent it.

Releases

No releases published

Packages

 
 
 

Contributors

Languages