feat: implement local LLM client with yzma (purego) #60

nvandessel · 2026-02-11T07:27:02Z

Summary

Replace llama-go (CGo) with hybridgroup/yzma (purego) for local LLM embeddings
Single local.go file — no build tags, no stub file, no C++ compiler needed
go build always works; shared libs only needed at runtime (yzma install --lib)
Per-call embedding context creation (simpler for low-throughput dedup workload)
Package-level sync.Once for library init, instance-level sync.Once for model loading
Available() does cheap os.Stat checks without loading any libraries

Stacked on

PR feat: re-add local LLM foundations with yzma-ready config #58 (feat/local-llm-foundations-v2) — pure Go foundations
PR revert: remove llama-go local LLM integration #57 (revert/local-llm-llamago) — revert llama-go

Test plan

go build ./cmd/floop — builds cleanly (no yzma libs needed at build time)
go vet ./... — no issues
go test ./... — all 25 packages pass
Integration: FLOOP_TEST_LIB_PATH=/path/to/libs FLOOP_TEST_MODEL_PATH=/path/to/model.gguf go test -tags integration ./internal/llm/ -v

🤖 Generated with Claude Code

Replace llama-go CGo bindings with hybridgroup/yzma, which uses purego to load llama.cpp shared libraries at runtime. No CGo, no build tags, no C++ compiler needed — go build always works. - Single local.go file (no stub), package-level sync.Once for lib init - Per-call context creation in Embed() for simplicity - Available() checks lib dir + model file via os.Stat - Integration tests gated on FLOOP_TEST_LIB_PATH + FLOOP_TEST_MODEL_PATH Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-02-11T07:29:59Z

Greptile Overview

Greptile Summary

Replaces llama-go (CGo) with hybridgroup/yzma (purego) for local LLM embeddings, eliminating C++ compiler requirements at build time.

Key implementation details:

Single local.go file with full yzma integration (no build tags or stub files)
Package-level sync.Once ensures llama.Load() and llama.Init() happen once per process
Instance-level sync.Once for lazy model loading on first Embed() call
Per-call embedding context creation with immediate cleanup (simpler for low-throughput dedup workload)
Thread-safe via sync.Mutex protecting all model operations
Available() performs cheap os.Stat checks without loading libraries
Comprehensive unit tests (error paths, config validation, interface compliance)
Integration tests with build tag and env-based skip conditions

The PR achieves its goal of pure Go builds while maintaining runtime flexibility for local embeddings.

Confidence Score: 4/5

This PR is safe to merge with low risk; the implementation is solid with comprehensive tests and proper resource management
Score reflects well-structured implementation with thread-safe lazy loading, proper error handling, comprehensive test coverage (both unit and integration), and adherence to Go best practices. Minor style suggestion on context padding doesn't affect correctness. The change successfully eliminates CGo dependency while maintaining all functionality.
No files require special attention - all changes follow project conventions and include proper tests

Important Files Changed

Filename	Overview
go.mod	Added `github.com/hybridgroup/yzma v1.7.0` dependency with transitive deps `ebitengine/purego` and `jupiterrider/ffi`; updated `golang.org/x/*` packages
go.sum	Updated checksums for new dependencies and updated Go standard library packages
internal/llm/local.go	Implemented full yzma-based embedding client with thread-safe lazy loading, per-call context creation, package-level library init, and proper resource cleanup; one minor style suggestion on context padding
internal/llm/local_test.go	Comprehensive unit tests covering config validation, `Available()` checks, error paths, and interface compliance; tests properly use `t.TempDir()` and `t.Setenv()`
internal/llm/local_integration_test.go	Well-structured integration tests with build tag, clear skip conditions via env vars, and realistic test cases for embeddings and similarity comparisons

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant LC as LocalClient
    participant Lib as yzma Library
    participant Model as GGUF Model

    Note over App,Model: First Call (Lazy Initialization)
    
    App->>LC: Embed(ctx, text)
    LC->>LC: loadModel() [once.Do]
    LC->>Lib: loadLib(libPath) [package-level once]
    Lib-->>LC: library loaded
    LC->>Lib: llama.ModelLoadFromFile()
    Lib->>Model: load model
    Model-->>Lib: model handle
    Lib-->>LC: model, vocab, nEmbd
    
    Note over LC: Model loaded, proceed with embedding
    
    LC->>LC: mu.Lock()
    LC->>Lib: llama.Tokenize(vocab, text)
    Lib-->>LC: tokens
    LC->>Lib: llama.InitFromModel() [per-call context]
    Lib-->>LC: lctx
    LC->>Lib: llama.SetEmbeddings(lctx, true)
    LC->>Lib: llama.Decode(lctx, batch)
    LC->>Lib: llama.GetEmbeddingsSeq(lctx, 0, nEmbd)
    Lib-->>LC: rawVec
    LC->>LC: normalize(vec)
    LC->>Lib: llama.Free(lctx)
    LC->>LC: mu.Unlock()
    LC-->>App: []float32 embedding

    Note over App,Model: Subsequent Calls (Model Already Loaded)
    
    App->>LC: CompareEmbeddings(ctx, a, b)
    LC->>LC: Embed(ctx, a) [reuses loaded model]
    LC->>LC: Embed(ctx, b) [reuses loaded model]
    LC->>LC: CosineSimilarity(embA, embB)
    LC-->>App: float64 similarity

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-11T07:30:02Z

internal/llm/local.go

+	tokens := llama.Tokenize(c.vocab, text, true, true)
+
+	ctxParams := llama.ContextDefaultParams()
+	ctxParams.NCtx = uint32(len(tokens) + 16) // fit the text with padding


padding of +16 tokens may be insufficient for special tokens and context overhead

Most embedding models require additional space beyond the raw token count for BOS/EOS tokens, padding tokens, and internal context management. The current +16 may cause allocation issues with longer inputs or models with larger special token sets.

Suggested change

ctxParams.NCtx = uint32(len(tokens) + 16) // fit the text with padding

ctxParams.NCtx = uint32(len(tokens) + 64) // fit the text with padding for special tokens

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement local LLM client with yzma (purego) #60

feat: implement local LLM client with yzma (purego) #60

nvandessel commented Feb 11, 2026

Uh oh!

greptile-apps bot commented Feb 11, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	ctxParams.NCtx = uint32(len(tokens) + 16) // fit the text with padding
	ctxParams.NCtx = uint32(len(tokens) + 64) // fit the text with padding for special tokens

feat: implement local LLM client with yzma (purego) #60

Are you sure you want to change the base?

feat: implement local LLM client with yzma (purego) #60

Conversation

nvandessel commented Feb 11, 2026

Summary

Stacked on

Test plan

Uh oh!

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant