-
Notifications
You must be signed in to change notification settings - Fork 0
feat: implement local LLM client with yzma (purego) #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat/local-llm-foundations-v2
Are you sure you want to change the base?
Conversation
Replace llama-go CGo bindings with hybridgroup/yzma, which uses purego to load llama.cpp shared libraries at runtime. No CGo, no build tags, no C++ compiler needed — go build always works. - Single local.go file (no stub), package-level sync.Once for lib init - Per-call context creation in Embed() for simplicity - Available() checks lib dir + model file via os.Stat - Integration tests gated on FLOOP_TEST_LIB_PATH + FLOOP_TEST_MODEL_PATH Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile OverviewGreptile SummaryReplaces llama-go (CGo) with hybridgroup/yzma (purego) for local LLM embeddings, eliminating C++ compiler requirements at build time. Key implementation details:
The PR achieves its goal of pure Go builds while maintaining runtime flexibility for local embeddings. Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| go.mod | Added github.com/hybridgroup/yzma v1.7.0 dependency with transitive deps ebitengine/purego and jupiterrider/ffi; updated golang.org/x/* packages |
| go.sum | Updated checksums for new dependencies and updated Go standard library packages |
| internal/llm/local.go | Implemented full yzma-based embedding client with thread-safe lazy loading, per-call context creation, package-level library init, and proper resource cleanup; one minor style suggestion on context padding |
| internal/llm/local_test.go | Comprehensive unit tests covering config validation, Available() checks, error paths, and interface compliance; tests properly use t.TempDir() and t.Setenv() |
| internal/llm/local_integration_test.go | Well-structured integration tests with build tag, clear skip conditions via env vars, and realistic test cases for embeddings and similarity comparisons |
Sequence Diagram
sequenceDiagram
participant App as Application
participant LC as LocalClient
participant Lib as yzma Library
participant Model as GGUF Model
Note over App,Model: First Call (Lazy Initialization)
App->>LC: Embed(ctx, text)
LC->>LC: loadModel() [once.Do]
LC->>Lib: loadLib(libPath) [package-level once]
Lib-->>LC: library loaded
LC->>Lib: llama.ModelLoadFromFile()
Lib->>Model: load model
Model-->>Lib: model handle
Lib-->>LC: model, vocab, nEmbd
Note over LC: Model loaded, proceed with embedding
LC->>LC: mu.Lock()
LC->>Lib: llama.Tokenize(vocab, text)
Lib-->>LC: tokens
LC->>Lib: llama.InitFromModel() [per-call context]
Lib-->>LC: lctx
LC->>Lib: llama.SetEmbeddings(lctx, true)
LC->>Lib: llama.Decode(lctx, batch)
LC->>Lib: llama.GetEmbeddingsSeq(lctx, 0, nEmbd)
Lib-->>LC: rawVec
LC->>LC: normalize(vec)
LC->>Lib: llama.Free(lctx)
LC->>LC: mu.Unlock()
LC-->>App: []float32 embedding
Note over App,Model: Subsequent Calls (Model Already Loaded)
App->>LC: CompareEmbeddings(ctx, a, b)
LC->>LC: Embed(ctx, a) [reuses loaded model]
LC->>LC: Embed(ctx, b) [reuses loaded model]
LC->>LC: CosineSimilarity(embA, embB)
LC-->>App: float64 similarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 1 comment
| tokens := llama.Tokenize(c.vocab, text, true, true) | ||
|
|
||
| ctxParams := llama.ContextDefaultParams() | ||
| ctxParams.NCtx = uint32(len(tokens) + 16) // fit the text with padding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
padding of +16 tokens may be insufficient for special tokens and context overhead
Most embedding models require additional space beyond the raw token count for BOS/EOS tokens, padding tokens, and internal context management. The current +16 may cause allocation issues with longer inputs or models with larger special token sets.
| ctxParams.NCtx = uint32(len(tokens) + 16) // fit the text with padding | |
| ctxParams.NCtx = uint32(len(tokens) + 64) // fit the text with padding for special tokens |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Summary
local.gofile — no build tags, no stub file, no C++ compiler neededgo buildalways works; shared libs only needed at runtime (yzma install --lib)sync.Oncefor library init, instance-levelsync.Oncefor model loadingAvailable()does cheapos.Statchecks without loading any librariesStacked on
feat/local-llm-foundations-v2) — pure Go foundationsrevert/local-llm-llamago) — revert llama-goTest plan
go build ./cmd/floop— builds cleanly (no yzma libs needed at build time)go vet ./...— no issuesgo test ./...— all 25 packages passFLOOP_TEST_LIB_PATH=/path/to/libs FLOOP_TEST_MODEL_PATH=/path/to/model.gguf go test -tags integration ./internal/llm/ -v🤖 Generated with Claude Code