Pin libllama beyond b6862 (and refresh Go struct layouts) so we can run newer architectures (e.g. qwen35)

Hi - thanks for `gollama.cpp`, the purego approach is a great fit for us (Windows built cross compiled to linux and darwin in particular).

## What we'd like

A release that bumps `LlamaCppBuild` past `b6862` *and* regenerates the Go side of `LlamaContextParams` / `LlamaModelParams` (and any other structs whose C layout has shifted) from the corresponding `llama.h`. The motivation is downstream model support — newer architectures like `qwen35` (Qwen 3.5 family) only load with a libllama from roughly `b7990` onward, but the runtime `LoadLibraryWithVersion(...)` override doesn't safely cross that boundary on its own (see below).

## Environment

- `github.com/dianlight/gollama.cpp v0.2.2-llamacpp.b6862` (pinned via   go.mod)
- Windows 11, amd64, Go 1.25
- Downstream consumer: a Go service that wraps `gollama.cpp` behind a   small adapter

## What I tried (workaround)

Used the runtime override to ask for a newer libllama, expecting upstream's `qwen35` arch support to come along for the ride:

```go
_ = gollama.LoadLibraryWithVersion("b9292") // tried various from b7990 up
_ = gollama.Backend_init()
_ = gollama.Ggml_backend_load_all()
// ...
ctx, err := gollama.Init_from_model(model, cparams) // CRASH
```

Cache populated correctly with `llama-b9292-bin-win-cuda-12.4-x64`, backends loaded, `Model_load_from_file` succeeded, `print_info`
reported `arch = qwen35` happily - then `Init_from_model` segfaults deep in libffi:

```
Exception 0xc0000005 0x0 0x20000000a 0x7ffad809544d
...
github.com/dianlight/gollama.cpp.ffiInitFromModel(0x1817a64a8e0,
  {0x200, 0x1000, 0x200, 0x1, 0x0, 0xc, 0xc, 0x0, 0xffffffff, ...})
  .../gollama.cpp@v0.2.2-llamacpp.b6862/ffi.go:216 +0x42c
github.com/dianlight/gollama.cpp.Init_from_model(...)
  .../gollama.go:1075 +0x94
```

Reproduces identically with `n_gpu_layers = 0` (CPU-only) and with full offload, so it isn't a GPU/CUDA-runtime issue.

## Suspected cause

`LlamaContextParams` (and likely `LlamaModelParams`) in `gollama.go` / `ffi.go` mirror the C struct shape as of `b6862`. Between `b6862` and `b9292` the C side gained / reordered fields (e.g. context params picked up new flags around flash-attn, KV cache typing, etc.). When the Go struct is marshalled via libffi against a `b9292` libllama, field offsets don't line up - the function receives garbage in some slots and access-violates on the first pointer-shaped one. This isn't something that can be worked around at the `LoadLibraryWithVersion` level because the Go-side struct definitions are fixed at the dependency version.

## Ask

Please may we request if you cut a release that:

1. Bumps `LlamaCppBuild` to a recent upstream tag (anything `b9000+`
   that supports `qwen35` / Qwen3.x would unblock us — happy to test
   any specific tag you pick).
2. Regenerates `LlamaContextParams`, `LlamaModelParams`, and any other
   structs whose C layout changed, against that tag's `llama.h`.

And if I've completely missed the mark on any of this then please do let me know. Happy to help test or try with a PR, but I am still learning Go best practices as fairly new to the language (from Java background).

Many thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pin libllama beyond b6862 (and refresh Go struct layouts) so we can run newer architectures (e.g. qwen35) #39

What we'd like

Environment

What I tried (workaround)

Suspected cause

Ask

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Pin libllama beyond b6862 (and refresh Go struct layouts) so we can run newer architectures (e.g. qwen35) #39

Description

What we'd like

Environment

What I tried (workaround)

Suspected cause

Ask

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions