Skip to content

feat: Add multi-model support, ONNX backend, and performance optimizations#7

Open
uriafranko wants to merge 3 commits into
masterfrom
claude/rust-hf-offline-performance-wamDP
Open

feat: Add multi-model support, ONNX backend, and performance optimizations#7
uriafranko wants to merge 3 commits into
masterfrom
claude/rust-hf-offline-performance-wamDP

Conversation

@uriafranko
Copy link
Copy Markdown
Owner

Major improvements to the Rust embedding engine:

Model Support

  • Add 6 embedding models: MiniLM-L6, MiniLM-L12, BGE-Small, BGE-Base, E5-Small, GTE-Small
  • Default changed from MiniLM-L6 (MTEB 56.3) to BGE-Small (MTEB 62.2) - ~10% accuracy improvement
  • BGE-Base option for maximum accuracy (MTEB 64.2, 768 dims)
  • All models work offline with local files - no HuggingFace API key needed

Backend Support

  • Candle backend (default): Pure Rust, no external dependencies
  • ONNX Runtime backend (optional): Faster CPU inference with --features onnx-backend
  • Feature flags allow choosing backends at compile time

Performance Optimizations

  • CPU optimization flags via .cargo/config.toml (target-cpu=native for AVX/SIMD)
  • Embedding cache (1000 entries LRU)
  • Dynamic embedding dimensions support (384 or 768)

API Improvements

  • AgentEngine::new_fast() - MiniLM-L6 for speed
  • AgentEngine::new_accurate() - BGE-Base for accuracy
  • AgentEngine::with_config() - Full customization
  • Export EmbeddingModel, Backend, BrainConfig for configuration

New Examples

  • benchmark.rs - Compare models and backends

All tests pass (43 unit + integration tests).

…tions

Major improvements to the Rust embedding engine:

## Model Support
- Add 6 embedding models: MiniLM-L6, MiniLM-L12, BGE-Small, BGE-Base, E5-Small, GTE-Small
- Default changed from MiniLM-L6 (MTEB 56.3) to BGE-Small (MTEB 62.2) - ~10% accuracy improvement
- BGE-Base option for maximum accuracy (MTEB 64.2, 768 dims)
- All models work offline with local files - no HuggingFace API key needed

## Backend Support
- Candle backend (default): Pure Rust, no external dependencies
- ONNX Runtime backend (optional): Faster CPU inference with --features onnx-backend
- Feature flags allow choosing backends at compile time

## Performance Optimizations
- CPU optimization flags via .cargo/config.toml (target-cpu=native for AVX/SIMD)
- Embedding cache (1000 entries LRU)
- Dynamic embedding dimensions support (384 or 768)

## API Improvements
- AgentEngine::new_fast() - MiniLM-L6 for speed
- AgentEngine::new_accurate() - BGE-Base for accuracy
- AgentEngine::with_config() - Full customization
- Export EmbeddingModel, Backend, BrainConfig for configuration

## New Examples
- benchmark.rs - Compare models and backends

All tests pass (43 unit + integration tests).
Replaces multiple constructor methods with a clean builder pattern:

## Before
```rust
AgentEngine::new_fast("db")
AgentEngine::new_accurate("db")
AgentEngine::with_config("db", config)
```

## After
```rust
AgentEngine::builder()
    .db_path("db")
    .model(EmbeddingModel::BgeBase)
    .backend(Backend::Candle)
    .build()?
```

## Changes

### Brain Module
- BrainConfig::builder() -> BrainConfigBuilder
- Fluent API: .model(), .backend(), .local_model_dir(), .mock()

### AgentEngine
- AgentEngine::builder() -> AgentEngineBuilder
- Fluent API: .db_path(), .in_memory(), .model(), .backend(), .mock(), .without_metrics()
- Convenience methods retained: new(), new_in_memory(), new_mock_in_memory(), new_mock()

### Benefits
- Single entry point for configuration
- Discoverable API via IDE autocomplete
- Follows Rust builder pattern best practices
- Easy to extend with new options
- Better maintainability

All tests pass (43 unit + integration).
- Add EmbeddingModel and Backend enums to Python SDK
- Update AgentEngine constructor to accept model and backend parameters
- Add feature flags for candle-backend and onnx-backend
- Expose model(), backend(), and embedding_dim() methods
- Add comprehensive tests for new model configuration API

Python SDK now supports:
- Selecting embedding models (MiniLM-L6, MiniLM-L12, BGE-Small, BGE-Base, E5-Small, GTE-Small)
- Selecting inference backends (Candle, ONNX, Mock)
- Querying model properties (embedding_dim, mteb_score, hf_repo)

Example usage:
  from agent_state import AgentEngine, EmbeddingModel, Backend
  engine = AgentEngine(model=EmbeddingModel.BgeBase, backend=Backend.Onnx)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants