Skip to content

Pin libllama beyond b6862 (and refresh Go struct layouts) so we can run newer architectures (e.g. qwen35) #39

Description

@kcow

Hi - thanks for gollama.cpp, the purego approach is a great fit for us (Windows built cross compiled to linux and darwin in particular).

What we'd like

A release that bumps LlamaCppBuild past b6862 and regenerates the Go side of LlamaContextParams / LlamaModelParams (and any other structs whose C layout has shifted) from the corresponding llama.h. The motivation is downstream model support — newer architectures like qwen35 (Qwen 3.5 family) only load with a libllama from roughly b7990 onward, but the runtime LoadLibraryWithVersion(...) override doesn't safely cross that boundary on its own (see below).

Environment

  • github.com/dianlight/gollama.cpp v0.2.2-llamacpp.b6862 (pinned via go.mod)
  • Windows 11, amd64, Go 1.25
  • Downstream consumer: a Go service that wraps gollama.cpp behind a small adapter

What I tried (workaround)

Used the runtime override to ask for a newer libllama, expecting upstream's qwen35 arch support to come along for the ride:

_ = gollama.LoadLibraryWithVersion("b9292") // tried various from b7990 up
_ = gollama.Backend_init()
_ = gollama.Ggml_backend_load_all()
// ...
ctx, err := gollama.Init_from_model(model, cparams) // CRASH

Cache populated correctly with llama-b9292-bin-win-cuda-12.4-x64, backends loaded, Model_load_from_file succeeded, print_info
reported arch = qwen35 happily - then Init_from_model segfaults deep in libffi:

Exception 0xc0000005 0x0 0x20000000a 0x7ffad809544d
...
github.com/dianlight/gollama.cpp.ffiInitFromModel(0x1817a64a8e0,
  {0x200, 0x1000, 0x200, 0x1, 0x0, 0xc, 0xc, 0x0, 0xffffffff, ...})
  .../gollama.cpp@v0.2.2-llamacpp.b6862/ffi.go:216 +0x42c
github.com/dianlight/gollama.cpp.Init_from_model(...)
  .../gollama.go:1075 +0x94

Reproduces identically with n_gpu_layers = 0 (CPU-only) and with full offload, so it isn't a GPU/CUDA-runtime issue.

Suspected cause

LlamaContextParams (and likely LlamaModelParams) in gollama.go / ffi.go mirror the C struct shape as of b6862. Between b6862 and b9292 the C side gained / reordered fields (e.g. context params picked up new flags around flash-attn, KV cache typing, etc.). When the Go struct is marshalled via libffi against a b9292 libllama, field offsets don't line up - the function receives garbage in some slots and access-violates on the first pointer-shaped one. This isn't something that can be worked around at the LoadLibraryWithVersion level because the Go-side struct definitions are fixed at the dependency version.

Ask

Please may we request if you cut a release that:

  1. Bumps LlamaCppBuild to a recent upstream tag (anything b9000+
    that supports qwen35 / Qwen3.x would unblock us — happy to test
    any specific tag you pick).
  2. Regenerates LlamaContextParams, LlamaModelParams, and any other
    structs whose C layout changed, against that tag's llama.h.

And if I've completely missed the mark on any of this then please do let me know. Happy to help test or try with a PR, but I am still learning Go best practices as fairly new to the language (from Java background).

Many thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions