System Info
- OS: macOS Sequoia 15.x (Apple M4 Pro)
- Go: 1.25.5
- gollama.cpp: v0.2.2-llamacpp.b6862
- Model: nomic-embed-text-v1.5.Q8_0.gguf (138MB)
Description
When generating embeddings for ~400+ text chunks using the embedding API, the process crashes with SIGBUS (bus error). The crash occurs in Metal GPU operations and is reproducible.
Stack Trace
SIGBUS: bus error
PC=0x1913e6388 m=17 sigcode=2 addr=0x1913e6388
goroutine 0 gp=0x2003d8f00 m=17 mp=0x200400d08 [idle]:
runtime.asyncPreempt2()
/opt/homebrew/Cellar/go/1.25.5/libexec/src/runtime/preempt.go:254 +0x28
Key observation: PC == addr indicates an instruction fetch fault, suggesting code pointer corruption.
Reproduction Steps
- Load model with default params (Metal GPU enabled)
- Call
Decode() repeatedly in a loop for embedding generation
- Process ~400+ text chunks (each ~500 chars)
- Crash occurs consistently after processing ~400 files worth of embeddings
// Simplified reproduction
modelParams := gollama.Model_default_params()
model, _ := gollama.Model_load_from_file("nomic-embed-text-v1.5.Q8_0.gguf", modelParams)
ctxParams := gollama.Context_default_params()
ctxParams.Embeddings = 1
ctxParams.PoolingType = gollama.LLAMA_POOLING_TYPE_MEAN
ctx, _ := gollama.Init_from_model(model, ctxParams)
// Process many texts - crash after ~400 iterations
for _, text := range manyTexts {
tokens, _ := gollama.Tokenize(model, text, true, false)
gollama.Memory_clear(ctx, true)
batch := gollama.Batch_get_one(tokens)
gollama.Decode(ctx, batch) // <-- SIGBUS occurs here eventually
embPtr := gollama.Get_embeddings_ith(ctx, 0)
// ... copy embeddings
}
Attempted Workarounds (All Failed)
| Approach |
Result |
NGpuLayers = 0 |
Metal still initializes, crash persists |
runtime.LockOSThread() |
Still crashes |
GODEBUG=asyncpreemptoff=1 |
Still crashes |
Current Workaround
Falling back to non-neural embeddings entirely, which defeats the purpose of using llama.cpp.
Analysis
The crash appears to be related to:
- Metal GPU state corruption during repeated
Decode() calls
- Possible interaction between Go's async preemption (SIGURG signals) and purego FFI
- Memory alignment issues in Metal operations on ARM64
Since NGpuLayers = 0 doesn't disable Metal (it still initializes), there's no way to use CPU-only mode with the pre-built binaries.
Questions
- Is there a way to completely disable Metal backend at runtime?
- Are CPU-only pre-built binaries available for macOS ARM64?
- Is this a known issue with llama.cpp Metal on Apple Silicon?
Environment Details
ggml_metal_device_init: GPU name: Apple M4 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 17179.89 MB
Thank you for this excellent library! Looking forward to any guidance on this issue.
System Info
Description
When generating embeddings for ~400+ text chunks using the embedding API, the process crashes with SIGBUS (bus error). The crash occurs in Metal GPU operations and is reproducible.
Stack Trace
Key observation:
PC == addrindicates an instruction fetch fault, suggesting code pointer corruption.Reproduction Steps
Decode()repeatedly in a loop for embedding generationAttempted Workarounds (All Failed)
NGpuLayers = 0runtime.LockOSThread()GODEBUG=asyncpreemptoff=1Current Workaround
Falling back to non-neural embeddings entirely, which defeats the purpose of using llama.cpp.
Analysis
The crash appears to be related to:
Decode()callsSince
NGpuLayers = 0doesn't disable Metal (it still initializes), there's no way to use CPU-only mode with the pre-built binaries.Questions
Environment Details
Thank you for this excellent library! Looking forward to any guidance on this issue.