SIGBUS crash on ARM64/Metal (Apple M4 Pro) after ~400+ embeddings

## System Info
- **OS**: macOS Sequoia 15.x (Apple M4 Pro)
- **Go**: 1.25.5
- **gollama.cpp**: v0.2.2-llamacpp.b6862
- **Model**: nomic-embed-text-v1.5.Q8_0.gguf (138MB)

## Description

When generating embeddings for ~400+ text chunks using the embedding API, the process crashes with SIGBUS (bus error). The crash occurs in Metal GPU operations and is reproducible.

## Stack Trace

```
SIGBUS: bus error
PC=0x1913e6388 m=17 sigcode=2 addr=0x1913e6388

goroutine 0 gp=0x2003d8f00 m=17 mp=0x200400d08 [idle]:
runtime.asyncPreempt2()
    /opt/homebrew/Cellar/go/1.25.5/libexec/src/runtime/preempt.go:254 +0x28
```

Key observation: `PC == addr` indicates an **instruction fetch fault**, suggesting code pointer corruption.

## Reproduction Steps

1. Load model with default params (Metal GPU enabled)
2. Call `Decode()` repeatedly in a loop for embedding generation
3. Process ~400+ text chunks (each ~500 chars)
4. Crash occurs consistently after processing ~400 files worth of embeddings

```go
// Simplified reproduction
modelParams := gollama.Model_default_params()
model, _ := gollama.Model_load_from_file("nomic-embed-text-v1.5.Q8_0.gguf", modelParams)

ctxParams := gollama.Context_default_params()
ctxParams.Embeddings = 1
ctxParams.PoolingType = gollama.LLAMA_POOLING_TYPE_MEAN
ctx, _ := gollama.Init_from_model(model, ctxParams)

// Process many texts - crash after ~400 iterations
for _, text := range manyTexts {
    tokens, _ := gollama.Tokenize(model, text, true, false)
    gollama.Memory_clear(ctx, true)
    batch := gollama.Batch_get_one(tokens)
    gollama.Decode(ctx, batch)  // <-- SIGBUS occurs here eventually
    embPtr := gollama.Get_embeddings_ith(ctx, 0)
    // ... copy embeddings
}
```

## Attempted Workarounds (All Failed)

| Approach | Result |
|----------|--------|
| `NGpuLayers = 0` | Metal still initializes, crash persists |
| `runtime.LockOSThread()` | Still crashes |
| `GODEBUG=asyncpreemptoff=1` | Still crashes |

## Current Workaround

Falling back to non-neural embeddings entirely, which defeats the purpose of using llama.cpp.

## Analysis

The crash appears to be related to:
1. Metal GPU state corruption during repeated `Decode()` calls
2. Possible interaction between Go's async preemption (SIGURG signals) and purego FFI
3. Memory alignment issues in Metal operations on ARM64

Since `NGpuLayers = 0` doesn't disable Metal (it still initializes), there's no way to use CPU-only mode with the pre-built binaries.

## Questions

1. Is there a way to completely disable Metal backend at runtime?
2. Are CPU-only pre-built binaries available for macOS ARM64?
3. Is this a known issue with llama.cpp Metal on Apple Silicon?

## Environment Details

```
ggml_metal_device_init: GPU name:   Apple M4 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 17179.89 MB
```

Thank you for this excellent library! Looking forward to any guidance on this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SIGBUS crash on ARM64/Metal (Apple M4 Pro) after ~400+ embeddings #38

System Info

Description

Stack Trace

Reproduction Steps

Attempted Workarounds (All Failed)

Current Workaround

Analysis

Questions

Environment Details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Approach	Result
`NGpuLayers = 0`	Metal still initializes, crash persists
`runtime.LockOSThread()`	Still crashes
`GODEBUG=asyncpreemptoff=1`	Still crashes

Uh oh!

SIGBUS crash on ARM64/Metal (Apple M4 Pro) after ~400+ embeddings #38

Description

System Info

Description

Stack Trace

Reproduction Steps

Attempted Workarounds (All Failed)

Current Workaround

Analysis

Questions

Environment Details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions