fix: align LlamaContextParams/LlamaModelParams with llama.cpp b6862 ABI#40
Open
Marvinthebored wants to merge 1 commit into
Open
fix: align LlamaContextParams/LlamaModelParams with llama.cpp b6862 ABI#40Marvinthebored wants to merge 1 commit into
Marvinthebored wants to merge 1 commit into
Conversation
LlamaContextParams matched a much older llama.cpp than the bundled b6862
libraries. Both layouts happen to be 120 bytes, so calls succeed but every
field after the divergence lands on the wrong C struct member:
Go field (old) -> actual C field (b6862)
Seed -> n_ctx
NCtx -> n_batch
NBatch -> n_ubatch
NUbatch -> n_seq_max
NSeqMax -> n_threads
NThreads -> n_threads_batch
NThreadsBatch -> rope_scaling_type
RopeScalingType -> pooling_type
PoolingType -> attention_type
AttentionType -> flash_attn_type
(aligned again from rope_freq_base ... abort_callback_data)
Logits -> embeddings
Embeddings -> offload_kqv
Offload_kqv -> no_perf
FlashAttn -> op_offload
NoPerf -> swa_full
(kv_unified unreachable)
Observed: setting NUbatch=512 was rejected by llama_init_from_model with
"n_seq_max must be <= 256"; setting Embeddings=1 actually enabled
offload_kqv while real embeddings stayed false (llama_get_embeddings
returns NULL).
Changes:
- LlamaContextParams: drop Seed (removed from llama_context_params long
ago; LLAMA_DEFAULT_SEED remains for samplers), add FlashAttnType enum
(LLAMA_FLASH_ATTN_TYPE_{AUTO,DISABLED,ENABLED}), replace the stale bool
tail {Logits, Embeddings, Offload_kqv, FlashAttn, NoPerf} with b6862's
{Embeddings, Offload_kqv, NoPerf, OpOffload, SwaFull, KvUnified}.
- LlamaModelParams: add trailing NoHost bool (b6862).
- ffi.go: update both libffi struct descriptors to the same layout (the
non-darwin path marshals through these, so it had the same bug).
- Hardcoded default fallbacks updated; new fields default to llama.cpp's
values (flash_attn_type=AUTO, op_offload=true, swa_full=true,
kv_unified=false).
- Examples: drop obsolete `ctxParams.Logits = 1` (per-token logits are
requested via batch.logits; the context-level flag no longer exists).
- ffi_test: assert on real fields instead of Seed (that assertion only
passed because Seed was reading C n_ctx).
Verified on darwin-amd64 against the official b6862 dylib: defaults
round-trip with correct values (n_ctx=512, n_batch=2048, n_ubatch=512,
n_seq_max=1, flash_attn_type=-1), llama_init_from_model accepts mutated
params, and Encode produces real embeddings through the public API.
Note for darwin-amd64 users: RegisterFunc of llama_init_from_model needs
purego >= v0.11.0-alpha.3 (ebitengine/purego#425) — with the pinned
v0.9.1 registration panics with "too many arguments" regardless of this
fix. Not bumped here to keep the change focused.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
LlamaContextParamsmatches a much older llama.cpp than the bundled b6862 libraries. By coincidence both layouts are 120 bytes, so nothing crashes — but every field in the first 40 bytes (and the bool tail) silently lands on the wrong C struct member, on every platform and both FFI paths (purego struct registration on darwin, libffi descriptors elsewhere —ffi.goencodes the same stale layout).Seedn_ctxNCtxn_batchNBatchn_ubatchNUbatchn_seq_maxNSeqMaxn_threadsNThreadsn_threads_batchNThreadsBatchrope_scaling_typeRopeScalingTypepooling_typePoolingTypeattention_typeAttentionTypeflash_attn_typerope_freq_basethroughabort_callback_data)LogitsembeddingsEmbeddingsoffload_kqvOffload_kqvno_perfFlashAttnop_offloadNoPerfswa_fullObserved symptoms (darwin-amd64, official b6862 dylib):
cp.NUbatch = 512→llama_init_from_model: failed to initialize the context: n_seq_max must be <= 256cp.Embeddings = 1actually enabledoffload_kqv; realembeddingsstayedfalse→llama_get_embeddings*returns NULLContext_default_params()reportedNCtx=2048(that'sn_batch's default read through the wrong offset)Fix
LlamaContextParams: dropSeed(removed fromllama_context_paramsupstream long ago;LLAMA_DEFAULT_SEEDkept for samplers), addFlashAttnType(LLAMA_FLASH_ATTN_TYPE_{AUTO,DISABLED,ENABLED}), replace the bool tail with b6862's{embeddings, offload_kqv, no_perf, op_offload, swa_full, kv_unified}.LlamaModelParams: add trailingNoHost(b6862).ffi.go: both libffi struct descriptors updated to the same layout.ctxParams.Logits = 1(no context-level logits flag in b6862; per-token logits viabatch.logits).ffi_test: assert on real fields instead ofSeed(that assertion only passed becauseSeedwas readingn_ctx).Verification (darwin-amd64, official b6862 dylib)
With this patch, through the public API only:
Note for darwin-amd64
Unrelated to this PR but adjacent: with the pinned purego v0.9.1,
RegisterFunc(llama_init_from_model)panics withtoo many argumentson darwin-amd64 (the 120-byte by-value struct exceeds purego's stack-slot budget). Fixed upstream in purego via ebitengine/purego#425 (first available in v0.11.0-alpha.3). Happy to send that bump as a separate PR if wanted.🤖 Generated with Claude Code