Cache Gemini embeddings by content hash

Every chunk's embedding costs a Gemini API call. If the same paragraph appears in two uploads (or the same file is uploaded twice), we pay twice. A SHA-256-of-chunk-text → embedding cache in SQLite cuts cost and latency on repeated content at no accuracy loss.

**Current state:**
No cache. Every chunk is always embedded.

**Proposed implementation:**
1. SQLite table `embedding_cache(chunk_hash TEXT PRIMARY KEY, model TEXT, dimension INT, embedding BLOB)`.
2. Before calling Gemini: look up `sha256(text)[:32]`. If present, reuse.
3. After calling Gemini: insert the new embeddings.
4. Namespace the cache by `model + dimension` so upgrading `gemini-embedding-001` to a newer model forces a rebuild.
5. Log cache hit rate; surface in `/metrics`.

**Files likely affected:**
- `backend/rag_utils.py`
- `backend/embedding_cache.py` (new)
- Tests.

**Acceptance criteria:**
- Uploading the same file twice yields zero Gemini embed API calls on the second upload.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Gemini embeddings by content hash #9

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cache Gemini embeddings by content hash #9

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions