A portable, self-contained AI chatbot that runs from a USB drive on macOS. Plug in, double-click, chat — no installation required on the target machine.
- Local LLM inference from a USB stick — no internet, no cloud, no data leaves the machine
- Works on any Mac that has Ollama installed
- Clean browser-based chat UI with multi-model support and streaming responses
- One double-click to launch, Ctrl+C to shut down
- macOS (Apple Silicon or Intel)
- Ollama installed on the machine you use for setup
- USB 3.0 drive (minimum size depends on models — ~10GB for small models, 32GB+ for larger ones)
Run this once on your own Mac to prepare the USB drive.
1. Clone this repo directly onto your USB drive:
git clone https://github.com/GhostInTheBus/ollama-usb /Volumes/YOUR-USB-NAME2. Run the setup script:
cd /Volumes/YOUR-USB-NAME
bash setup.shThe setup script will:
- Copy the Ollama binary from your local installation
- Show you a list of your installed models
- Let you choose which ones to copy to the USB
3. Eject and go.
- Plug the USB into any Mac
- Open the USB in Finder
- Double-click
launch.command - A browser window opens at
http://localhost:8765with the chat interface - Select a model from the dropdown and start chatting
- Press
Ctrl+Cin the Terminal window to shut down
First run on a new Mac: macOS may block the
ollamabinary with a Gatekeeper warning. Right-click theollamafile → Open → click Open in the dialog. Then re-runlaunch.command.
USB Drive
├── ollama macOS binary (copied from your Ollama.app)
├── launch.command startup script
├── setup.sh one-time setup script
├── models/ model files (not tracked in git)
│ ├── blobs/ content-addressed model weights
│ └── manifests/ model metadata
└── ui/
└── chat.html single-file chat interface
Launch flow:
launch.commandsetsOLLAMA_MODELSto the USB models folder andOLLAMA_HOSTto port11435(avoids conflict with any locally running Ollama instance)- Starts
ollama servein the background - Serves
chat.htmlvia Python's built-in HTTP server on port8765(required to avoid browser CORS restrictions onfile://URLs) - Opens the browser
| Model | Size | Good for |
|---|---|---|
mistral:instruct |
~4GB | General Q&A, fast |
llama3.1:8b |
~5GB | General chat, instruction following |
llama3.2:3b |
~2GB | Tight on space, still capable |
qwen2.5:7b |
~5GB | Strong reasoning |
Pull models with ollama pull <model> before running setup.
- macOS only — the included binary is macOS-specific. Linux and Windows support would require separate binaries and launchers.
- No autorun — macOS (and all modern OSes) block USB autorun for security reasons. One double-click is the minimum.
- Ollama must be installed on the machine you run setup on (to get the binary and models). The target machine does not need Ollama installed.
- RAM — model inference happens in the host machine's RAM. A 7B model needs ~8GB free RAM; a 30B model needs ~20GB.
MIT