feat: Add multi-model support, ONNX backend, and performance optimizations by uriafranko · Pull Request #7 · uriafranko/AgentState

uriafranko · 2026-01-15T13:10:29Z

Major improvements to the Rust embedding engine:

Model Support

Add 6 embedding models: MiniLM-L6, MiniLM-L12, BGE-Small, BGE-Base, E5-Small, GTE-Small
Default changed from MiniLM-L6 (MTEB 56.3) to BGE-Small (MTEB 62.2) - ~10% accuracy improvement
BGE-Base option for maximum accuracy (MTEB 64.2, 768 dims)
All models work offline with local files - no HuggingFace API key needed

Backend Support

Candle backend (default): Pure Rust, no external dependencies
ONNX Runtime backend (optional): Faster CPU inference with --features onnx-backend
Feature flags allow choosing backends at compile time

Performance Optimizations

CPU optimization flags via .cargo/config.toml (target-cpu=native for AVX/SIMD)
Embedding cache (1000 entries LRU)
Dynamic embedding dimensions support (384 or 768)

API Improvements

AgentEngine::new_fast() - MiniLM-L6 for speed
AgentEngine::new_accurate() - BGE-Base for accuracy
AgentEngine::with_config() - Full customization
Export EmbeddingModel, Backend, BrainConfig for configuration

New Examples

benchmark.rs - Compare models and backends

All tests pass (43 unit + integration tests).

…tions Major improvements to the Rust embedding engine: ## Model Support - Add 6 embedding models: MiniLM-L6, MiniLM-L12, BGE-Small, BGE-Base, E5-Small, GTE-Small - Default changed from MiniLM-L6 (MTEB 56.3) to BGE-Small (MTEB 62.2) - ~10% accuracy improvement - BGE-Base option for maximum accuracy (MTEB 64.2, 768 dims) - All models work offline with local files - no HuggingFace API key needed ## Backend Support - Candle backend (default): Pure Rust, no external dependencies - ONNX Runtime backend (optional): Faster CPU inference with --features onnx-backend - Feature flags allow choosing backends at compile time ## Performance Optimizations - CPU optimization flags via .cargo/config.toml (target-cpu=native for AVX/SIMD) - Embedding cache (1000 entries LRU) - Dynamic embedding dimensions support (384 or 768) ## API Improvements - AgentEngine::new_fast() - MiniLM-L6 for speed - AgentEngine::new_accurate() - BGE-Base for accuracy - AgentEngine::with_config() - Full customization - Export EmbeddingModel, Backend, BrainConfig for configuration ## New Examples - benchmark.rs - Compare models and backends All tests pass (43 unit + integration tests).

Replaces multiple constructor methods with a clean builder pattern: ## Before ```rust AgentEngine::new_fast("db") AgentEngine::new_accurate("db") AgentEngine::with_config("db", config) ``` ## After ```rust AgentEngine::builder() .db_path("db") .model(EmbeddingModel::BgeBase) .backend(Backend::Candle) .build()? ``` ## Changes ### Brain Module - BrainConfig::builder() -> BrainConfigBuilder - Fluent API: .model(), .backend(), .local_model_dir(), .mock() ### AgentEngine - AgentEngine::builder() -> AgentEngineBuilder - Fluent API: .db_path(), .in_memory(), .model(), .backend(), .mock(), .without_metrics() - Convenience methods retained: new(), new_in_memory(), new_mock_in_memory(), new_mock() ### Benefits - Single entry point for configuration - Discoverable API via IDE autocomplete - Follows Rust builder pattern best practices - Easy to extend with new options - Better maintainability All tests pass (43 unit + integration).

- Add EmbeddingModel and Backend enums to Python SDK - Update AgentEngine constructor to accept model and backend parameters - Add feature flags for candle-backend and onnx-backend - Expose model(), backend(), and embedding_dim() methods - Add comprehensive tests for new model configuration API Python SDK now supports: - Selecting embedding models (MiniLM-L6, MiniLM-L12, BGE-Small, BGE-Base, E5-Small, GTE-Small) - Selecting inference backends (Candle, ONNX, Mock) - Querying model properties (embedding_dim, mteb_score, hf_repo) Example usage: from agent_state import AgentEngine, EmbeddingModel, Backend engine = AgentEngine(model=EmbeddingModel.BgeBase, backend=Backend.Onnx)

claude added 3 commits January 15, 2026 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add multi-model support, ONNX backend, and performance optimizations#7

feat: Add multi-model support, ONNX backend, and performance optimizations#7
uriafranko wants to merge 3 commits into
masterfrom
claude/rust-hf-offline-performance-wamDP

uriafranko commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

uriafranko commented Jan 15, 2026

Model Support

Backend Support

Performance Optimizations

API Improvements

New Examples

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants