Skip to content

fix: align LlamaContextParams/LlamaModelParams with llama.cpp b6862 ABI#40

Open
Marvinthebored wants to merge 1 commit into
dianlight:mainfrom
Marvinthebored:fix-context-params-b6862-abi
Open

fix: align LlamaContextParams/LlamaModelParams with llama.cpp b6862 ABI#40
Marvinthebored wants to merge 1 commit into
dianlight:mainfrom
Marvinthebored:fix-context-params-b6862-abi

Conversation

@Marvinthebored

Copy link
Copy Markdown

Problem

LlamaContextParams matches a much older llama.cpp than the bundled b6862 libraries. By coincidence both layouts are 120 bytes, so nothing crashes — but every field in the first 40 bytes (and the bool tail) silently lands on the wrong C struct member, on every platform and both FFI paths (purego struct registration on darwin, libffi descriptors elsewhere — ffi.go encodes the same stale layout).

Go field (old) actual C field it landed on (b6862)
Seed n_ctx
NCtx n_batch
NBatch n_ubatch
NUbatch n_seq_max
NSeqMax n_threads
NThreads n_threads_batch
NThreadsBatch rope_scaling_type
RopeScalingType pooling_type
PoolingType attention_type
AttentionType flash_attn_type
(aligned again from rope_freq_base through abort_callback_data)
Logits embeddings
Embeddings offload_kqv
Offload_kqv no_perf
FlashAttn op_offload
NoPerf swa_full

Observed symptoms (darwin-amd64, official b6862 dylib):

  • cp.NUbatch = 512llama_init_from_model: failed to initialize the context: n_seq_max must be <= 256
  • cp.Embeddings = 1 actually enabled offload_kqv; real embeddings stayed falsellama_get_embeddings* returns NULL
  • Context_default_params() reported NCtx=2048 (that's n_batch's default read through the wrong offset)

Fix

  • LlamaContextParams: drop Seed (removed from llama_context_params upstream long ago; LLAMA_DEFAULT_SEED kept for samplers), add FlashAttnType (LLAMA_FLASH_ATTN_TYPE_{AUTO,DISABLED,ENABLED}), replace the bool tail with b6862's {embeddings, offload_kqv, no_perf, op_offload, swa_full, kv_unified}.
  • LlamaModelParams: add trailing NoHost (b6862).
  • ffi.go: both libffi struct descriptors updated to the same layout.
  • Hardcoded default fallbacks updated; new fields use llama.cpp's defaults.
  • Examples: drop obsolete ctxParams.Logits = 1 (no context-level logits flag in b6862; per-token logits via batch.logits).
  • ffi_test: assert on real fields instead of Seed (that assertion only passed because Seed was reading n_ctx).

Verification (darwin-amd64, official b6862 dylib)

With this patch, through the public API only:

defaults via gollama API: n_ctx=512 n_batch=2048 n_ubatch=512 n_seq_max=1 flash_attn_type=-1   ← correct llama.cpp defaults, correct fields
Init_from_model with mutated NCtx/NBatch/NUbatch/Embeddings/PoolingType → context created
Tokenize + Encode + Get_embeddings_ith → real 768-dim embedding (nomic-embed-text-v1.5 Q8_0), L2≈24.4

Note for darwin-amd64

Unrelated to this PR but adjacent: with the pinned purego v0.9.1, RegisterFunc(llama_init_from_model) panics with too many arguments on darwin-amd64 (the 120-byte by-value struct exceeds purego's stack-slot budget). Fixed upstream in purego via ebitengine/purego#425 (first available in v0.11.0-alpha.3). Happy to send that bump as a separate PR if wanted.

🤖 Generated with Claude Code

LlamaContextParams matched a much older llama.cpp than the bundled b6862
libraries. Both layouts happen to be 120 bytes, so calls succeed but every
field after the divergence lands on the wrong C struct member:

  Go field (old)   -> actual C field (b6862)
  Seed             -> n_ctx
  NCtx             -> n_batch
  NBatch           -> n_ubatch
  NUbatch          -> n_seq_max
  NSeqMax          -> n_threads
  NThreads         -> n_threads_batch
  NThreadsBatch    -> rope_scaling_type
  RopeScalingType  -> pooling_type
  PoolingType      -> attention_type
  AttentionType    -> flash_attn_type
  (aligned again from rope_freq_base ... abort_callback_data)
  Logits           -> embeddings
  Embeddings       -> offload_kqv
  Offload_kqv      -> no_perf
  FlashAttn        -> op_offload
  NoPerf           -> swa_full
  (kv_unified unreachable)

Observed: setting NUbatch=512 was rejected by llama_init_from_model with
"n_seq_max must be <= 256"; setting Embeddings=1 actually enabled
offload_kqv while real embeddings stayed false (llama_get_embeddings
returns NULL).

Changes:
- LlamaContextParams: drop Seed (removed from llama_context_params long
  ago; LLAMA_DEFAULT_SEED remains for samplers), add FlashAttnType enum
  (LLAMA_FLASH_ATTN_TYPE_{AUTO,DISABLED,ENABLED}), replace the stale bool
  tail {Logits, Embeddings, Offload_kqv, FlashAttn, NoPerf} with b6862's
  {Embeddings, Offload_kqv, NoPerf, OpOffload, SwaFull, KvUnified}.
- LlamaModelParams: add trailing NoHost bool (b6862).
- ffi.go: update both libffi struct descriptors to the same layout (the
  non-darwin path marshals through these, so it had the same bug).
- Hardcoded default fallbacks updated; new fields default to llama.cpp's
  values (flash_attn_type=AUTO, op_offload=true, swa_full=true,
  kv_unified=false).
- Examples: drop obsolete `ctxParams.Logits = 1` (per-token logits are
  requested via batch.logits; the context-level flag no longer exists).
- ffi_test: assert on real fields instead of Seed (that assertion only
  passed because Seed was reading C n_ctx).

Verified on darwin-amd64 against the official b6862 dylib: defaults
round-trip with correct values (n_ctx=512, n_batch=2048, n_ubatch=512,
n_seq_max=1, flash_attn_type=-1), llama_init_from_model accepts mutated
params, and Encode produces real embeddings through the public API.

Note for darwin-amd64 users: RegisterFunc of llama_init_from_model needs
purego >= v0.11.0-alpha.3 (ebitengine/purego#425) — with the pinned
v0.9.1 registration panics with "too many arguments" regardless of this
fix. Not bumped here to keep the change focused.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant