Skip to content

millingtonsully/CacheAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cache AI

The screenshot is the most spontaneous, simple note-taker there is. When we are inspired or need to note something, a screenshot is often our go-to. After a while, screenshots become disorganized, and it is easy to lose the context of your thought process when you took them.

Cache AI is a Chrome extension and web application that captures, analyzes, and organizes your browsing context using AI-powered screenshot analysis and a knowledge-graph-style memory. Each time you take a screenshot, the image is added as a node and placed near other nodes based on semantic similarity. AI analyzes the image and produces a short label and a written analysis. At capture time you can add an optional audio note that is transcribed, attached to the screenshot, and stored with the node.

Cache AI is powered by the Gemini API (Google) for vision, embeddings, transcription, and chat, and by the Mem0 API for optional long-term memory in chat.

A future iOS app is planned for more realistic use cases and easy, on-the-fly screenshotting.


Table of Contents

  1. Overview
  2. Architecture
  3. Features
  4. Prerequisites
  5. Setup
  6. Configuration
  7. Development
  8. Deployment
  9. API Reference
  10. Security
  11. Troubleshooting

Overview

Chrome extension (CacheExtension)
Runs on any tab. A floating overlay provides: capture (screenshot of the current tab), and an optional audio note. Screenshots and optional audio are sent to the Cache AI backend; the extension does not hold API keys. Authenticated users sync sessions with the web app so nodes appear there.

Web app (CacheWeb)
React SPA hosted at cacheai.app (or your own domain). Users sign in with Supabase Auth (e.g. Google). The main view is a canvas (Cache Plane) where each screenshot is a node. Nodes are laid out by semantic clustering (UMAP over Gemini embeddings) or randomly. You can open a node to see the image, label, analysis, intent, and any audio note; search nodes; and use an AI chat that can use Mem0-backed memory and attach nodes or files as context.

Backend
Serverless API on Vercel: screenshot analysis (Gemini vision + optional embeddings), audio transcription (Gemini), storage and retrieval of cache nodes (Supabase), chat (Gemini with optional Mem0 context), and Mem0 proxy for memory. Auth is Supabase; the extension uses the same session via a bridge when the user is on the web app.


Architecture

  • Extension: Manifest V3. Content script injects the overlay; background script handles capture, API calls, and sync. Communicates with the web app via webapp-bridge.js when the user is on cacheai.app (or localhost).
  • Web app: React 19, TypeScript, React Flow for the graph, UMAP-js for 2D layout from embeddings. Supabase client for auth and (via API) for cache_nodes.
  • API (Vercel): Node handlers under CacheWeb/api/. Supabase service role for DB and auth checks; env vars for Gemini and Mem0.
  • Data: Supabase cache_nodes (id, user_id, image_data base64, label, analysis, intent, audio_note, embedding array, timestamp). Optional Mem0 usage for chat memory (user_id = email).

Features

  • Screenshot capture: One-click capture of the active tab from the extension overlay.
  • Optional audio note: Record a short note; it is transcribed (Gemini) and stored with the screenshot node.
  • AI analysis: Each screenshot is analyzed by Gemini to produce a concise label, a longer analysis, and an inferred intent. Optional embedding is computed for semantic layout.
  • Semantic clustering: In the web app, nodes with embeddings are laid out with UMAP so related content appears near each other. Fallback to random layout when embeddings are missing or few.
  • Cache Plane: Graph view of all your nodes; click a node to see image, label, analysis, intent, and audio note.
  • Search: Text search over node labels, analysis, and intent on the current canvas.
  • Chat: AI chat (Gemini) with optional Mem0 memory. You can attach cache nodes or files so the model has context.
  • Account: Sign in with Supabase (e.g. Google). Delete account option removes your Supabase-backed data and Mem0 memories.

Prerequisites

  • Node.js 16+
  • Chrome (for the extension)
  • Accounts and keys:
    • Supabase: Project for auth and database (see CacheWeb/database/schema.sql).
    • Google Cloud / AI Studio: Gemini API key (vision, embeddings, transcription, chat).
    • Mem0 (optional): API key for chat memory. Without it, chat still works but has no long-term memory.
    • Google OAuth (optional): For “Sign in with Google” on the web app; must be configured in Supabase and in the app’s env.

Setup

1. Clone and install

git clone <repo-url>
cd CacheAI

Extension

cd CacheExtension
npm install
npm run build

Web app

cd CacheWeb
npm install

2. Supabase

  • Create a Supabase project.
  • In the SQL Editor, run the contents of CacheWeb/database/schema.sql.
  • In Dashboard > Authentication > Providers, enable Google (or others) if desired.
  • Note: Project URL, anon key, and service_role key for the backend.

3. Load the extension in Chrome

  • Open chrome://extensions, enable “Developer mode”.
  • “Load unpacked” and select the CacheExtension folder (after npm run build, so that the built JS exists as expected by manifest.json).
  • The extension uses the backend at REACT_APP_API_URL (see Configuration). For local development you can point the web app to local API; the extension’s API base is set in its background script (e.g. https://cacheai.app/api for production).

4. Environment variables

Web app (e.g. .env or .env.local for local dev)

  • REACT_APP_API_URL: Backend base URL (e.g. https://cacheai.app/api or http://localhost:3000/api if you run API locally).
  • REACT_APP_GOOGLE_CLIENT_ID: Optional; for Google OAuth.
  • Supabase URL and anon key are usually set in the app’s Supabase client config (e.g. CacheWeb/src/config/supabase.ts).

Backend (Vercel or local serverless)

  • SUPABASE_URL, SUPABASE_SERVICE_KEY: Supabase project URL and service_role key.
  • GEMINI_API_KEY: Google AI Studio / Gemini API key.
  • MEM0_API_KEY: Optional; for Mem0 memory in chat.

See CacheWeb/env.production.example for a full list.


Configuration

  • Extension API base: In CacheExtension/background.js, API_URL is set to the Cache AI backend (e.g. https://cacheai.app/api). Change it for a custom backend.
  • Web app API base: Set via REACT_APP_API_URL. Must match the backend that serves the API routes and has the env vars above.
  • Auth sync: When the user is logged in on the web app (cacheai.app or localhost), the app posts a message to the extension with the Supabase access token so the extension can call the same API as the logged-in user.

Development

Web app

cd CacheWeb
npm start

Runs the React app (e.g. http://localhost:3000). For local API, you need to run the Vercel dev server or equivalent so that CacheWeb/api/* handlers and env vars are available.

Extension

After changing TypeScript/React in CacheExtension, run npm run build and reload the extension in chrome://extensions.

API locally

Use Vercel CLI from the project root (or from CacheWeb) so that api/ is served and env is loaded:

vercel dev

Point the web app’s REACT_APP_API_URL to the URL Vercel dev prints (e.g. http://localhost:3000).


Deployment

  • Web app + API: Typically deployed together on Vercel. Build the React app; configure routes so that /* serves the SPA and /api/* goes to the serverless functions. Set all environment variables in Vercel.
  • Supabase: Already hosted; ensure RLS and schema are applied as in schema.sql.
  • Extension: No separate “deploy”; users load the unpacked extension or you distribute via the Chrome Web Store. Ensure the extension’s API_URL points to your deployed API.

See the Setup and Configuration sections above and CacheWeb/env.production.example for Supabase and Vercel setup.


API Reference

All API routes live under CacheWeb/api/. Authenticated endpoints expect Authorization: Bearer <supabase_access_token>.

Method Path Description
GET /api/health Health check; no auth.
POST /api/analyze-screenshot Body: { imageData (base64), audioNote? }. Returns { success, analysis: { label, analysis, intent, audioNote?, embedding? } }. Uses Gemini for vision and optional embedding.
POST /api/transcribe-audio Body: { audioData, mimeType? }. Returns { success, transcription }. Uses Gemini.
GET /api/cache-nodes Returns { success, nodes } for the authenticated user.
POST /api/cache-nodes Body: { id, imageData?, label, analysis, intent, audioNote?, embedding?, timestamp? }. Creates a cache node.
DELETE /api/cache-nodes?id=<id> Deletes the given cache node for the user.
POST /api/chat Body: { messages, useMemory? }. Returns stream or JSON with Gemini reply; optionally uses Mem0 for context.
GET/POST/DELETE /api/memories Proxy to Mem0 for listing, adding, and deleting memories (optional).
GET/POST /api/user/account User account info and delete-account (delete also clears Mem0 for that user).

Security

Before pushing this repo to a public host (e.g. GitHub):

  • Do not commit any .env, .env.local, .env.production, or other env files. They are listed in .gitignore; ensure they were never added with git add -f.
  • Do not put API keys, secrets, or tokens in source code. Use environment variables only (e.g. Vercel dashboard for the backend; local .env for development, which must stay untracked).
  • Backend secrets (e.g. GEMINI_API_KEY, MEM0_API_KEY, SUPABASE_SERVICE_KEY) must exist only on the server (Vercel env). The extension and web app call your backend; they do not need those keys.
  • Frontend env (e.g. REACT_APP_API_URL, REACT_APP_SUPABASE_URL) is baked into the client build. Do not put secret keys in REACT_APP_*; use them only for non-secret config (API base URL, Supabase anon key is designed to be public).
  • If you ever committed a secret, rotate it immediately (new key in provider, update env), and remove the secret from history (e.g. git filter-branch or BFG) or make the repo private.

Troubleshooting

  • Screenshot not appearing in web app: Ensure you are signed in on the web app and that the extension has received auth sync (visit the web app with the extension enabled). Check that the backend has valid Supabase and Gemini env vars and that the extension’s API_URL is correct.
  • Analysis or transcription fails: Verify GEMINI_API_KEY is set and has access to the models used (e.g. gemini-2.5-flash-lite, embedding-001). Check payload size (e.g. image base64 under 10MB).
  • Chat has no memory: Mem0 is optional. Set MEM0_API_KEY for memory; ensure the chat API is calling Mem0 and that the user_id (e.g. email) is consistent.
  • Nodes not clustering semantically: Semantic layout requires at least two nodes with embeddings. If the analyze-screenshot step fails to return embeddings or the node is created without them, that node will be placed randomly.
  • Extension cannot reach API: Check host_permissions in manifest.json and that API_URL in the background script points to a reachable backend (and CORS allows the extension origin if required).

About

Google Chrome Extension for Cache AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors