Gollama.cpp provides comprehensive GPU acceleration support across multiple platforms and vendors. This guide covers installation, configuration, and troubleshooting for GPU acceleration.
The library automatically detects available GPU hardware and configures the optimal backend during build time. No manual configuration is required for most setups.
| Backend | Platforms | GPU Vendors | Status |
|---|---|---|---|
| Metal | macOS | Apple Silicon | ✅ Production |
| CUDA | Linux, Windows | NVIDIA | ✅ Production |
| HIP/ROCm | Linux, Windows | AMD | ✅ Production |
| Vulkan | Linux, Windows | NVIDIA, AMD, Intel | ✅ Production |
| OpenCL | Windows, Linux | Qualcomm Adreno, Intel, AMD | ✅ Production |
| SYCL | Linux, Windows | Intel, NVIDIA | ✅ Production |
| CPU | All | All | ✅ Fallback |
Metal support is automatically enabled on macOS systems with Apple Silicon (M1/M2/M3).
Requirements:
- macOS 10.15+ (Catalina)
- Apple Silicon Mac (M1/M2/M3) or Intel Mac with Metal-compatible GPU
- Xcode Command Line Tools
Installation:
# Install Xcode Command Line Tools (if not already installed)
xcode-select --install
# Build with Metal support (automatic)
make buildVerification:
# Check Metal availability
system_profiler SPDisplaysDataType | grep MetalCUDA support is automatically detected when NVIDIA CUDA Toolkit is installed.
Requirements:
- NVIDIA GPU with Compute Capability 3.5+
- CUDA Toolkit 11.8 or later
- Compatible NVIDIA driver
Installation:
# Ubuntu/Debian - Install CUDA Toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get install cuda-toolkit
# Verify CUDA installation
nvcc --version
nvidia-smi
# Build with CUDA support (automatic detection)
make buildFedora/RHEL:
# Enable NVIDIA repository
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/cuda-fedora37.repo
# Install CUDA
sudo dnf install cuda-toolkit
# Build with CUDA support
make buildHIP support is automatically detected when AMD ROCm is installed.
Requirements:
- AMD GPU with GCN 4th gen (gfx803) or newer
- ROCm 5.0 or later
- Compatible AMD driver (amdgpu)
Installation:
# Ubuntu/Debian - Install ROCm
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt-get update
sudo apt-get install rocm-dev hip-dev
# Add user to render group
sudo usermod -a -G render,video $USER
# Verify HIP installation
/opt/rocm/bin/hipconfig --platform
/opt/rocm/bin/rocm-smi
# Build with HIP support (automatic detection)
make buildRequirements:
- NVIDIA GPU with Compute Capability 3.5+
- CUDA Toolkit 11.8 or later
- Visual Studio 2019+ or compatible compiler
Installation:
- Download and install CUDA Toolkit
- Ensure
nvccis in your PATH - Build with automatic CUDA detection:
# Verify CUDA installation
nvcc --version
nvidia-smi
# Build with CUDA support
make buildRequirements:
- AMD GPU with GCN 4th gen or newer
- HIP SDK for Windows
- Visual Studio 2019+ or compatible compiler
Installation:
- Download and install HIP SDK
- Ensure HIP tools are in your PATH
- Build with automatic HIP detection:
# Verify HIP installation
hipconfig --platform
# Build with HIP support
make buildVulkan provides cross-platform GPU acceleration for NVIDIA, AMD, and Intel GPUs on Windows.
Requirements:
- Vulkan-capable GPU (NVIDIA GTX 600+, AMD GCN+, Intel HD 4000+)
- Latest GPU drivers with Vulkan support
- Vulkan SDK (optional, for development)
Installation:
# Install Vulkan SDK (optional, for development)
# Download from: https://vulkan.lunarg.com/sdk/home
# Verify Vulkan support (if SDK installed)
vulkaninfo
# Or check driver support
# NVIDIA: GeForce Experience -> Drivers
# AMD: AMD Software -> Drivers
# Intel: Intel Graphics Command Center
# Build with Vulkan support (automatic detection)
make buildOpenCL provides cross-platform parallel computing, especially useful for Qualcomm Adreno GPUs on ARM64.
Requirements:
- OpenCL-capable GPU or CPU
- Latest GPU drivers with OpenCL support
Installation:
# For Intel GPUs
# Download Intel Graphics Driver from Intel website
# For AMD GPUs
# Install AMD Software (includes OpenCL support)
# For NVIDIA GPUs
# Install NVIDIA GPU drivers (includes OpenCL support)
# For Qualcomm Adreno (ARM64 devices)
# Usually pre-installed on ARM64 Windows devices
# Verify OpenCL support (if available)
# Install GPU-Z or similar tool to check OpenCL support
# Build with OpenCL support (automatic detection)
make buildSYCL provides unified parallel programming for CPUs, GPUs, and other accelerators.
Requirements:
- Intel oneAPI Toolkit or compatible SYCL implementation
- Compatible hardware (Intel GPUs, NVIDIA GPUs via CUDA backend)
Installation:
# Install Intel oneAPI Toolkit
# Download from: https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html
# Source the environment (in Developer Command Prompt)
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
# Verify SYCL installation
sycl-ls
# Build with SYCL support (automatic detection)
make buildVulkan provides cross-platform GPU acceleration support for NVIDIA, AMD, and Intel GPUs.
Requirements:
- Vulkan-capable GPU (NVIDIA GTX 600+, AMD GCN+, Intel HD 4000+)
- Vulkan drivers installed
- Vulkan SDK (optional, for development)
Installation:
# Ubuntu/Debian - Install Vulkan support
sudo apt-get update
sudo apt-get install vulkan-tools vulkan-utils
sudo apt-get install mesa-vulkan-drivers # For AMD/Intel
sudo apt-get install nvidia-driver-XXX # For NVIDIA (replace XXX with version)
# Verify Vulkan installation
vulkaninfo --summary
vkcube # Test Vulkan rendering
# Build with Vulkan support (automatic detection)
make buildFedora/RHEL:
# Install Vulkan support
sudo dnf install vulkan-tools vulkan-validation-layers
sudo dnf install mesa-vulkan-drivers # For AMD/Intel
sudo dnf install nvidia-driver # For NVIDIA
# Build with Vulkan support
make buildSYCL provides unified parallel programming for CPUs, GPUs, and other accelerators.
Requirements:
- Intel oneAPI Toolkit or compatible SYCL implementation
- Compatible hardware (Intel GPUs, NVIDIA GPUs via CUDA backend)
Installation:
# Install Intel oneAPI Toolkit
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt-get update
sudo apt-get install intel-oneapi-toolkit
# Source the environment
source /opt/intel/oneapi/setvars.sh
# Verify SYCL installation
sycl-ls
# Build with SYCL support (automatic detection)
make buildOpenCL provides cross-platform parallel computing support.
Requirements:
- OpenCL-capable GPU or CPU
- OpenCL runtime and drivers
Installation:
# Ubuntu/Debian - Install OpenCL support
sudo apt-get update
sudo apt-get install opencl-headers clinfo
sudo apt-get install intel-opencl-icd # For Intel
sudo apt-get install mesa-opencl-icd # For AMD
sudo apt-get install nvidia-opencl-dev # For NVIDIA
# Verify OpenCL installation
clinfo
# Build with OpenCL support (automatic detection)
make buildThe Makefile implements intelligent GPU detection using the following logic:
- CUDA: Checks for
nvccorCUDA_PATHenvironment variable - HIP: Checks for
hipconfigorROCM_PATHenvironment variable - Vulkan: Checks for
vulkaninfocommand or Vulkan loader - OpenCL: Checks for
clinfocommand or OpenCL runtime - SYCL: Checks for
sycl-lscommand or Intel oneAPI toolkit - CPU: Fallback when no GPU SDK is detected
# Check if GPU support is available in downloaded binaries
make detect-gpu
# Test all GPU detection logic
nvcc --version # CUDA detection
hipconfig --version # HIP detection
vulkaninfo --summary # Vulkan detection
clinfo # OpenCL detection
sycl-ls # SYCL detection
system_profiler SPDisplaysDataType | grep Metal # Metal (macOS)make test-download # Downloads and tests appropriate binaries
ls ~/.cache/gollama/libs/
## GPU Support in Pre-Built Binaries
Gollama.cpp now uses pre-built binaries from official llama.cpp releases that include GPU support:
- **macOS**: Binaries include Metal support automatically
- **Linux**: Binaries include CUDA and HIP support where available
- **Windows**: CPU support available, GPU support planned
### Binary Selection
The downloader automatically selects GPU-enabled binaries when available:
```bash
# Downloads appropriate binary for your platform with GPU support
make download-libs
Control how many model layers are offloaded to GPU:
import "github.com/dianlight/gollama.cpp"
// Configure GPU offloading
params := gollama.Context_default_params()
params.n_gpu_layers = 32 // Offload 32 layers to GPU
// For models with many layers, use -1 for all layers
params.n_gpu_layers = -1 // Offload all layers to GPUConfigure GPU memory usage:
// Set maximum GPU memory usage (in MB)
params.vram_budget = 8192 // 8GB VRAM limit
// Enable memory mapping for large models
model_params := gollama.Model_default_params()
model_params.use_mmap = trueFor systems with multiple GPUs:
// Split model across multiple GPUs
params.split_mode = gollama.LLAMA_SPLIT_MODE_LAYER
params.main_gpu = 0 // Primary GPU device ID
params.tensor_split = []float32{0.6, 0.4} // Split ratio between GPUsThe optimal number of GPU layers depends on:
- Available VRAM
- Model size
- Sequence length
Guidelines:
- Small models (7B): 32-40 layers on 8GB+ VRAM
- Medium models (13B): 20-32 layers on 8GB VRAM
- Large models (30B+): Adjust based on available VRAM
// Optimize batch size for your GPU
params.n_batch = 512 // Larger batches for high-end GPUs
params.n_ubatch = 512 // Micro-batch size for memory efficiency# Check CUDA installation
nvcc --version
ls -la /usr/local/cuda/bin/nvcc
# Check environment variables
echo $CUDA_PATH
echo $LD_LIBRARY_PATH# Check ROCm installation
/opt/rocm/bin/hipconfig --platform
ls -la /opt/rocm/bin/
# Check environment variables
echo $ROCM_PATH
echo $HIP_PATH// Reduce GPU memory usage
params.n_gpu_layers = 16 // Reduce from 32
params.vram_budget = 4096 // Reduce VRAM limit// Optimize for your hardware
params.n_threads = 8 // Match CPU cores
params.n_threads_batch = 8 // Batch processing threads
params.rope_scaling_type = gollama.LLAMA_ROPE_SCALING_TYPE_LINEAREnable detailed GPU information during build:
# Verbose GPU detection
make build V=1
# Check library GPU backend
ldd libs/linux_amd64/libllama.so | grep -E "(cuda|hip)"Test GPU acceleration is working:
package main
import (
"fmt"
"github.com/dianlight/gollama.cpp"
)
func main() {
// Load model with GPU acceleration
model_params := gollama.Model_default_params()
model := gollama.Load_model_from_file("model.gguf", model_params)
defer gollama.Free_model(model)
// Create context with GPU layers
ctx_params := gollama.Context_default_params()
ctx_params.n_gpu_layers = 32
ctx := gollama.New_context_with_model(model, ctx_params)
defer gollama.Free(ctx)
// Check if GPU is being used
fmt.Printf("GPU layers: %d\n", ctx_params.n_gpu_layers)
// Monitor GPU usage with nvidia-smi or rocm-smi during inference
}Monitor GPU utilization:
# NVIDIA GPUs
watch -n 1 nvidia-smi
# AMD GPUs
watch -n 1 rocm-smi
# Check GPU memory usage during inference- Start Conservative: Begin with fewer GPU layers and increase gradually
- Monitor Memory: Watch VRAM usage to avoid out-of-memory errors
- Profile Performance: Test different configurations for your specific use case
- Update Drivers: Keep GPU drivers updated for best performance
- Check Compatibility: Verify your GPU is supported by the chosen backend
| Platform | GPU | Backend | Model Sizes | Status |
|---|---|---|---|---|
| macOS M1/M2 | Apple Silicon | Metal | 7B-70B | ✅ Verified |
| Ubuntu 22.04 | RTX 4090 | CUDA 12.0 | 7B-70B | ✅ Verified |
| Ubuntu 22.04 | RX 7900 XTX | ROCm 5.7 | 7B-30B | ✅ Verified |
| Windows 11 | RTX 3080 | CUDA 11.8 | 7B-30B | ✅ Verified |
| Fedora 38 | RTX 3070 | CUDA 12.1 | 7B-13B | ✅ Verified |
For the latest compatibility information, see our CI test matrix.