Hi - thanks for gollama.cpp, the purego approach is a great fit for us (Windows built cross compiled to linux and darwin in particular).
What we'd like
A release that bumps LlamaCppBuild past b6862 and regenerates the Go side of LlamaContextParams / LlamaModelParams (and any other structs whose C layout has shifted) from the corresponding llama.h. The motivation is downstream model support — newer architectures like qwen35 (Qwen 3.5 family) only load with a libllama from roughly b7990 onward, but the runtime LoadLibraryWithVersion(...) override doesn't safely cross that boundary on its own (see below).
Environment
github.com/dianlight/gollama.cpp v0.2.2-llamacpp.b6862 (pinned via go.mod)
- Windows 11, amd64, Go 1.25
- Downstream consumer: a Go service that wraps
gollama.cpp behind a small adapter
What I tried (workaround)
Used the runtime override to ask for a newer libllama, expecting upstream's qwen35 arch support to come along for the ride:
_ = gollama.LoadLibraryWithVersion("b9292") // tried various from b7990 up
_ = gollama.Backend_init()
_ = gollama.Ggml_backend_load_all()
// ...
ctx, err := gollama.Init_from_model(model, cparams) // CRASH
Cache populated correctly with llama-b9292-bin-win-cuda-12.4-x64, backends loaded, Model_load_from_file succeeded, print_info
reported arch = qwen35 happily - then Init_from_model segfaults deep in libffi:
Exception 0xc0000005 0x0 0x20000000a 0x7ffad809544d
...
github.com/dianlight/gollama.cpp.ffiInitFromModel(0x1817a64a8e0,
{0x200, 0x1000, 0x200, 0x1, 0x0, 0xc, 0xc, 0x0, 0xffffffff, ...})
.../gollama.cpp@v0.2.2-llamacpp.b6862/ffi.go:216 +0x42c
github.com/dianlight/gollama.cpp.Init_from_model(...)
.../gollama.go:1075 +0x94
Reproduces identically with n_gpu_layers = 0 (CPU-only) and with full offload, so it isn't a GPU/CUDA-runtime issue.
Suspected cause
LlamaContextParams (and likely LlamaModelParams) in gollama.go / ffi.go mirror the C struct shape as of b6862. Between b6862 and b9292 the C side gained / reordered fields (e.g. context params picked up new flags around flash-attn, KV cache typing, etc.). When the Go struct is marshalled via libffi against a b9292 libllama, field offsets don't line up - the function receives garbage in some slots and access-violates on the first pointer-shaped one. This isn't something that can be worked around at the LoadLibraryWithVersion level because the Go-side struct definitions are fixed at the dependency version.
Ask
Please may we request if you cut a release that:
- Bumps
LlamaCppBuild to a recent upstream tag (anything b9000+
that supports qwen35 / Qwen3.x would unblock us — happy to test
any specific tag you pick).
- Regenerates
LlamaContextParams, LlamaModelParams, and any other
structs whose C layout changed, against that tag's llama.h.
And if I've completely missed the mark on any of this then please do let me know. Happy to help test or try with a PR, but I am still learning Go best practices as fairly new to the language (from Java background).
Many thanks!
Hi - thanks for
gollama.cpp, the purego approach is a great fit for us (Windows built cross compiled to linux and darwin in particular).What we'd like
A release that bumps
LlamaCppBuildpastb6862and regenerates the Go side ofLlamaContextParams/LlamaModelParams(and any other structs whose C layout has shifted) from the correspondingllama.h. The motivation is downstream model support — newer architectures likeqwen35(Qwen 3.5 family) only load with a libllama from roughlyb7990onward, but the runtimeLoadLibraryWithVersion(...)override doesn't safely cross that boundary on its own (see below).Environment
github.com/dianlight/gollama.cpp v0.2.2-llamacpp.b6862(pinned via go.mod)gollama.cppbehind a small adapterWhat I tried (workaround)
Used the runtime override to ask for a newer libllama, expecting upstream's
qwen35arch support to come along for the ride:Cache populated correctly with
llama-b9292-bin-win-cuda-12.4-x64, backends loaded,Model_load_from_filesucceeded,print_inforeported
arch = qwen35happily - thenInit_from_modelsegfaults deep in libffi:Reproduces identically with
n_gpu_layers = 0(CPU-only) and with full offload, so it isn't a GPU/CUDA-runtime issue.Suspected cause
LlamaContextParams(and likelyLlamaModelParams) ingollama.go/ffi.gomirror the C struct shape as ofb6862. Betweenb6862andb9292the C side gained / reordered fields (e.g. context params picked up new flags around flash-attn, KV cache typing, etc.). When the Go struct is marshalled via libffi against ab9292libllama, field offsets don't line up - the function receives garbage in some slots and access-violates on the first pointer-shaped one. This isn't something that can be worked around at theLoadLibraryWithVersionlevel because the Go-side struct definitions are fixed at the dependency version.Ask
Please may we request if you cut a release that:
LlamaCppBuildto a recent upstream tag (anythingb9000+that supports
qwen35/ Qwen3.x would unblock us — happy to testany specific tag you pick).
LlamaContextParams,LlamaModelParams, and any otherstructs whose C layout changed, against that tag's
llama.h.And if I've completely missed the mark on any of this then please do let me know. Happy to help test or try with a PR, but I am still learning Go best practices as fairly new to the language (from Java background).
Many thanks!