Every chunk's embedding costs a Gemini API call. If the same paragraph appears in two uploads (or the same file is uploaded twice), we pay twice. A SHA-256-of-chunk-text → embedding cache in SQLite cuts cost and latency on repeated content at no accuracy loss.
Current state:
No cache. Every chunk is always embedded.
Proposed implementation:
- SQLite table
embedding_cache(chunk_hash TEXT PRIMARY KEY, model TEXT, dimension INT, embedding BLOB).
- Before calling Gemini: look up
sha256(text)[:32]. If present, reuse.
- After calling Gemini: insert the new embeddings.
- Namespace the cache by
model + dimension so upgrading gemini-embedding-001 to a newer model forces a rebuild.
- Log cache hit rate; surface in
/metrics.
Files likely affected:
backend/rag_utils.py
backend/embedding_cache.py (new)
- Tests.
Acceptance criteria:
- Uploading the same file twice yields zero Gemini embed API calls on the second upload.
Every chunk's embedding costs a Gemini API call. If the same paragraph appears in two uploads (or the same file is uploaded twice), we pay twice. A SHA-256-of-chunk-text → embedding cache in SQLite cuts cost and latency on repeated content at no accuracy loss.
Current state:
No cache. Every chunk is always embedded.
Proposed implementation:
embedding_cache(chunk_hash TEXT PRIMARY KEY, model TEXT, dimension INT, embedding BLOB).sha256(text)[:32]. If present, reuse.model + dimensionso upgradinggemini-embedding-001to a newer model forces a rebuild./metrics.Files likely affected:
backend/rag_utils.pybackend/embedding_cache.py(new)Acceptance criteria: