Skip to content

Conversation

@nvandessel
Copy link
Owner

Summary

  • Replace llama-go (CGo) with hybridgroup/yzma (purego) for local LLM embeddings
  • Single local.go file — no build tags, no stub file, no C++ compiler needed
  • go build always works; shared libs only needed at runtime (yzma install --lib)
  • Per-call embedding context creation (simpler for low-throughput dedup workload)
  • Package-level sync.Once for library init, instance-level sync.Once for model loading
  • Available() does cheap os.Stat checks without loading any libraries

Stacked on

Test plan

  • go build ./cmd/floop — builds cleanly (no yzma libs needed at build time)
  • go vet ./... — no issues
  • go test ./... — all 25 packages pass
  • Integration: FLOOP_TEST_LIB_PATH=/path/to/libs FLOOP_TEST_MODEL_PATH=/path/to/model.gguf go test -tags integration ./internal/llm/ -v

🤖 Generated with Claude Code

Replace llama-go CGo bindings with hybridgroup/yzma, which uses purego
to load llama.cpp shared libraries at runtime. No CGo, no build tags,
no C++ compiler needed — go build always works.

- Single local.go file (no stub), package-level sync.Once for lib init
- Per-call context creation in Embed() for simplicity
- Available() checks lib dir + model file via os.Stat
- Integration tests gated on FLOOP_TEST_LIB_PATH + FLOOP_TEST_MODEL_PATH

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Replaces llama-go (CGo) with hybridgroup/yzma (purego) for local LLM embeddings, eliminating C++ compiler requirements at build time.

Key implementation details:

  • Single local.go file with full yzma integration (no build tags or stub files)
  • Package-level sync.Once ensures llama.Load() and llama.Init() happen once per process
  • Instance-level sync.Once for lazy model loading on first Embed() call
  • Per-call embedding context creation with immediate cleanup (simpler for low-throughput dedup workload)
  • Thread-safe via sync.Mutex protecting all model operations
  • Available() performs cheap os.Stat checks without loading libraries
  • Comprehensive unit tests (error paths, config validation, interface compliance)
  • Integration tests with build tag and env-based skip conditions

The PR achieves its goal of pure Go builds while maintaining runtime flexibility for local embeddings.

Confidence Score: 4/5

  • This PR is safe to merge with low risk; the implementation is solid with comprehensive tests and proper resource management
  • Score reflects well-structured implementation with thread-safe lazy loading, proper error handling, comprehensive test coverage (both unit and integration), and adherence to Go best practices. Minor style suggestion on context padding doesn't affect correctness. The change successfully eliminates CGo dependency while maintaining all functionality.
  • No files require special attention - all changes follow project conventions and include proper tests

Important Files Changed

Filename Overview
go.mod Added github.com/hybridgroup/yzma v1.7.0 dependency with transitive deps ebitengine/purego and jupiterrider/ffi; updated golang.org/x/* packages
go.sum Updated checksums for new dependencies and updated Go standard library packages
internal/llm/local.go Implemented full yzma-based embedding client with thread-safe lazy loading, per-call context creation, package-level library init, and proper resource cleanup; one minor style suggestion on context padding
internal/llm/local_test.go Comprehensive unit tests covering config validation, Available() checks, error paths, and interface compliance; tests properly use t.TempDir() and t.Setenv()
internal/llm/local_integration_test.go Well-structured integration tests with build tag, clear skip conditions via env vars, and realistic test cases for embeddings and similarity comparisons

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant LC as LocalClient
    participant Lib as yzma Library
    participant Model as GGUF Model

    Note over App,Model: First Call (Lazy Initialization)
    
    App->>LC: Embed(ctx, text)
    LC->>LC: loadModel() [once.Do]
    LC->>Lib: loadLib(libPath) [package-level once]
    Lib-->>LC: library loaded
    LC->>Lib: llama.ModelLoadFromFile()
    Lib->>Model: load model
    Model-->>Lib: model handle
    Lib-->>LC: model, vocab, nEmbd
    
    Note over LC: Model loaded, proceed with embedding
    
    LC->>LC: mu.Lock()
    LC->>Lib: llama.Tokenize(vocab, text)
    Lib-->>LC: tokens
    LC->>Lib: llama.InitFromModel() [per-call context]
    Lib-->>LC: lctx
    LC->>Lib: llama.SetEmbeddings(lctx, true)
    LC->>Lib: llama.Decode(lctx, batch)
    LC->>Lib: llama.GetEmbeddingsSeq(lctx, 0, nEmbd)
    Lib-->>LC: rawVec
    LC->>LC: normalize(vec)
    LC->>Lib: llama.Free(lctx)
    LC->>LC: mu.Unlock()
    LC-->>App: []float32 embedding

    Note over App,Model: Subsequent Calls (Model Already Loaded)
    
    App->>LC: CompareEmbeddings(ctx, a, b)
    LC->>LC: Embed(ctx, a) [reuses loaded model]
    LC->>LC: Embed(ctx, b) [reuses loaded model]
    LC->>LC: CosineSimilarity(embA, embB)
    LC-->>App: float64 similarity
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

tokens := llama.Tokenize(c.vocab, text, true, true)

ctxParams := llama.ContextDefaultParams()
ctxParams.NCtx = uint32(len(tokens) + 16) // fit the text with padding
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

padding of +16 tokens may be insufficient for special tokens and context overhead

Most embedding models require additional space beyond the raw token count for BOS/EOS tokens, padding tokens, and internal context management. The current +16 may cause allocation issues with longer inputs or models with larger special token sets.

Suggested change
ctxParams.NCtx = uint32(len(tokens) + 16) // fit the text with padding
ctxParams.NCtx = uint32(len(tokens) + 64) // fit the text with padding for special tokens

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant