Run any GGUF language model locally — no Python, no Docker, no cloud, no install headaches.
Works from a USB drive. Runs on Linux, macOS, and Windows. One script to set up everything.
PortableAI wraps llama.cpp's llama-server into a zero-dependency portable package. You plug in a USB drive (or clone the repo), drop in a model, and run a single script. A local web UI opens in your browser and you can start chatting — fully offline, fully private.
No Python environment. No package managers. No GPU required.
- Truly portable — runs from a USB stick on any machine
- Cross-platform — Linux (x64 + arm64), macOS (Intel + Apple Silicon), Windows (x64)
- Auto-installer — fetches the correct
llama-serverbinary for every platform in one run - Model picker — prompts you to choose when multiple
.ggufmodels are present - CPU-only — works on any modern machine without a GPU
- 100% offline after setup — no data leaves your machine
- Built-in web UI — chat interface served directly by
llama-server - LAN sharing — accessible from any device on the same Wi-Fi at
http://0.0.0.0:8080
PortableAI/
├── install.sh ← Linux/macOS installer (all platforms)
├── install.bat ← Windows installer
├── start.sh ← Linux/macOS launcher
├── start.bat ← Windows launcher
├── models/ ← Drop your .gguf model files here
├── ui/ ← (optional) custom web UI override
└── bin/
├── linux/
│ ├── linux_x64/
│ │ ├── llama-server-linux-x64 ← renamed server binary
│ │ ├── libllama.so ← required shared libs
│ │ ├── libggml.so
│ │ └── libggml-cpu.so ...
│ └── linux_arm64/
│ └── llama-server-linux-arm + .so libs
├── mac/
│ ├── mac_arm64/
│ │ └── llama-server-mac-arm + .dylib libs
│ └── mac_x64/
│ └── llama-server-mac-x64 + .dylib libs
└── windows/
├── llama-server-win.exe
└── *.dll ← required DLLs
Download any .gguf model and place it in the models/ folder.
Recommended for most machines (4–8 GB RAM):
Search huggingface.co for any model — filter by
GGUFformat and pick aQ4_K_Mquantization for the best balance of size and quality.
Linux / macOS
chmod +x install.sh
./install.shThe installer asks which platform(s) to set up:
Select which platform(s) to install:
[1] Linux x64 (most PCs/servers)
[2] Linux arm64 (Raspberry Pi, ARM servers)
[3] macOS arm64 (Apple Silicon M1/M2/M3)
[4] macOS x64 (Intel Mac)
[5] Windows x64 (CPU)
[A] All platforms (for a shared USB drive)
[Q] Quit
Tip: enter multiple numbers separated by spaces (e.g. 1 3)
Choose A to pre-load all platforms if you're preparing a USB drive that will be used on multiple machines.
Windows
Double-click install.bat
Linux / macOS
chmod +x start.sh
./start.shWindows
Double-click start.bat
The browser opens automatically at http://127.0.0.1:8080
If you have multiple models, you'll be asked to choose one:
[?] Multiple models found — select one:
─────────────────────────────────────────────
[1] mistral-7b-instruct-v0.2.Q4_K_M.gguf 3.8G
[2] llama-3.2-3b-instruct-q4_k_m.gguf 1.9G
Enter number [1-2]:
| Platform | Requirement |
|---|---|
| Linux | glibc 2.17+ (any modern distro) |
| macOS | 11.0+ (Big Sur or later) |
| Windows | Windows 10 (build 17063+), Visual C++ Redistributable |
| Tool | Notes |
|---|---|
curl |
Pre-installed on most systems |
tar |
Pre-installed on all Linux/macOS |
unzip |
Only needed for Windows zip on Linux: sudo pacman -S unzip |
| RAM | Recommended model size |
|---|---|
| 4 GB | 1B–3B models (Q4_K_M) |
| 8 GB | 7B models (Q4_K_M) |
| 16 GB | 13B models (Q4_K_M) |
| 32 GB | 30B+ models (Q4_K_M) |
install.sh / install.bat
│
├── Queries GitHub API for latest llama.cpp release
├── Downloads the correct archive per platform:
│ Linux x64 → llama-bXXXX-bin-ubuntu-x64.tar.gz
│ Linux arm64 → llama-bXXXX-bin-ubuntu-arm64.tar.gz
│ macOS arm64 → llama-bXXXX-bin-macos-arm64.tar.gz
│ macOS x64 → llama-bXXXX-bin-macos-x64.tar.gz
│ Windows x64 → llama-bXXXX-bin-win-cpu-x64.zip
├── Extracts ALL files (binary + shared libs / DLLs)
└── Renames llama-server → platform-specific canonical name
start.sh / start.bat
│
├── Scans models/ for .gguf files
├── Prompts for selection if > 1 model found
├── Detects OS + architecture
├── Sets LD_LIBRARY_PATH (Linux) / DYLD_LIBRARY_PATH (macOS)
│ so shared libs next to the binary are always found
└── Launches llama-server on port 8080
Why copy all the
.so/.dllfiles?
llama-serverdynamically links againstlibllama,libggml,libggml-cpuand others. Copying only the executable would cause an immediate crash with a "shared library not found" error. The installer copies the entire archive contents into the bin directory, andstart.shsetsLD_LIBRARY_PATHto that directory so the binary is fully self-contained without touching any system paths.
The server starts with sensible defaults. To customize, edit the exec "$BIN" block at the bottom of start.sh / start.bat:
exec "$BIN" \
-m "$MODEL" \ # model file path (set automatically)
-c 4096 \ # context window size (tokens)
-t "$THREADS" \ # CPU threads (auto: nproc - 1)
--port 8080 \ # HTTP port
--host 0.0.0.0 # bind address (0.0.0.0 = LAN accessible)Common tweaks:
| Flag | Example | Effect |
|---|---|---|
-c |
-c 8192 |
Larger context (needs more RAM) |
-t |
-t 4 |
Fixed thread count |
--port |
--port 9090 |
Change the port |
--host |
--host 127.0.0.1 |
Localhost only (disable LAN) |
-ngl |
-ngl 35 |
Offload layers to GPU (if available) |
Full flag reference: ./bin/linux/linux_x64/llama-server-linux-x64 --help
While the server is running, any device on the same network can access the UI:
- Find your machine's local IP:
ip addr(Linux) /ipconfig(Windows) - Open
http://192.168.x.x:8080on the other device
llama-server: error while loading shared libraries: libllama.so
The shared libs are missing from the bin directory. Re-run install.sh — it copies all files from the archive, not just the binary.
Binary not found / install.sh stops silently after printing the release tag
GitHub API rate-limited the request (60 requests/hour for unauthenticated IPs). Wait a few minutes and try again.
VCRUNTIME140_1.dll not found (Windows)
Install the Visual C++ Redistributable and re-run start.bat.
Model loads but responses are very slow
Use a smaller or more quantized model (e.g., Q2_K instead of Q8_0). Reduce context with -c 2048.
Port 8080 already in use
Change --port 8080 to another port (e.g. --port 9090) in start.sh / start.bat.
Just re-run the installer. It always fetches the latest release from GitHub:
./install.sh # picks what to update, overwrites existing binariesPortable_Local_AI/
├── install.sh Universal installer — Linux/macOS, all platforms
├── install.bat Windows installer
├── start.sh Launcher — Linux/macOS
├── start.bat Launcher — Windows
├── models/ Your .gguf model files go here
├── ui/ Optional: override the built-in web UI
└── bin/ Auto-populated by installer
- llama.cpp by ggml-org — the inference engine powering everything
- Models from HuggingFace — community-converted GGUF weights