vida — Spanish for "life." Turn your day into searchable memory with Claude Code, Codex, or Gemini.
English | 日本語
A personal AI that quietly watches your day-to-day life, remembers everything, and helps you understand how you spend your time. The recommended setup is local analysis through Claude Code or Codex CLI, with Gemini API support still available.
Experience a simulated vida session in the browser. The demo uses the real UI with generated data and a virtual live feed so you can understand the product without camera or microphone setup.
Prerequisites: Python 3.12+, Node.js 22+, uv, plus one AI provider: Claude Code CLI, Codex CLI, or a Gemini API key. Don't have these yet? See the full setup guide for installation instructions.
Recommended: use Claude Code or Codex CLI so vida can analyze your day locally. Gemini API keys remain fully supported in onboarding and Settings.
Windows (PowerShell) — 5 min
# 1. Clone and install
git clone https://github.com/Andyyyy64/vida.git
cd vida
uv sync
cd web; npm install; cd ..
# 2. Pick one AI provider
# Recommended: sign in to Claude Code or Codex CLI first.
# If you prefer Gemini, set your API key:
"GEMINI_API_KEY=your-key-here" | Out-File -Encoding utf8 .env
# You can also choose the provider during onboarding or later in Settings.
# 3. Launch the desktop app
cd web; npx tauri devPermissions: When prompted, allow Camera and Microphone access in Settings → Privacy & Security.
macOS (Terminal) — 5 min
# 1. Clone and install
git clone https://github.com/Andyyyy64/vida.git
cd vida
uv sync
cd web && npm install && cd ..
# 2. Pick one AI provider
# Recommended: sign in to Claude Code or Codex CLI first.
# If you prefer Gemini, set your API key:
echo "GEMINI_API_KEY=your-key-here" > .env
# You can also choose the provider during onboarding or later in Settings.
# 3. Launch the desktop app
cd web && npx tauri devPermissions: Grant Camera, Microphone, Screen Recording, and Accessibility access for your terminal in System Settings → Privacy & Security. See the macOS permission guide for details.
Linux / WSL2
git clone https://github.com/Andyyyy64/vida.git
cd vida
uv sync
cd web && npm install && cd ..
# Recommended: sign in to Claude Code or Codex CLI first.
# If you prefer Gemini, set your API key:
echo "GEMINI_API_KEY=your-key-here" > .env
# You can also choose the provider during onboarding or later in Settings.
# Start daemon + web UI
./start.sh
# Desktop app opens automaticallyFor WSL2 camera setup (usbipd), see the full guide.
Verify it's working:
life look # Capture + analyze a single frame
life status # Check daemon is runningThe desktop app opens automatically with the timeline. Alternatively, download a pre-built installer from Releases.
- Vision
- Features
- Architecture
- Project Structure
- Setup — Requirements, Configuration, Docker
- CLI Commands
- Configuration Reference
- IPC Commands
- Database Schema
- Tech Stack
- Security
"Monitor, manage, and analyze your life."
Three pillars:
- Monitoring — Continuous, automatic recording of your day. Camera, screen, audio, and app focus — all captured without any manual input.
- Management — Instantly answer "what was I doing then?" An externalized, searchable memory that replaces journaling.
- Analysis — See "how focused was I?" and "where did my time go?" in concrete patterns — daily, weekly, and monthly.
- Interval capture — Webcam + screen + audio captured every 30 seconds (configurable). Between ticks, change detection checks every 1 second and saves extra frames/screenshots when significant visual changes occur (screen: 10% threshold, camera: 15% perceptual hash difference).
- Foreground window tracking — Persistent PowerShell process monitors app focus changes every 500ms via Win32 P/Invoke (
GetForegroundWindow), recording process name and window title for precise per-app duration tracking. - Presence detection — Haar cascade face detection + MOG2 motion analysis with hysteresis state machine (present → absent → sleeping). Requires 3 consecutive ticks without face before transitioning. Sleep state detected by low brightness during configured night hours.
- Audio capture & transcription — ALSA recording with auto-device detection. Silence trimming (500 amplitude threshold, min 0.3s voice) keeps only meaningful audio. Transcription via LLM with user context awareness.
- Live feed — MJPEG streaming server on port 3002 at ~30fps, independent of the main capture interval.
- Frame analysis — Each tick sends camera image + screen capture + audio + foreground window info to LLM (Gemini, Claude Code, or Codex CLI). Returns structured JSON with activity category and natural language description.
- Activity classification — LLM freely generates activity category names. Existing categories are shown as examples for consistency, and new categories are accepted and registered automatically. Fuzzy matching (LCS similarity ≥ 0.7) normalizes variants to existing categories. All activity → meta-category mappings are stored in the
activity_mappingsDB table. - Meta-categories — Activities are dynamically mapped to 6 meta-categories for productivity scoring: focus, communication, entertainment, browsing, break, idle. The LLM outputs the meta-category alongside each activity. The mapping is stored in DB and served to the frontend via API.
- Multi-scale summaries — Hierarchical generation: 10m (from raw frames) → 30m → 1h → 6h → 12h → 24h (includes keyframe images + transcriptions + improvement suggestions). Each scale builds from the one below.
- Daily reports — Auto-generated on day change. Includes activity breakdown, timeline narrative, focus percentage (focus frames / active frames), and event list. Delivered via webhook.
- Context awareness — User profile (
data/context.md) and recent 5-frame history included in every LLM prompt for continuity.
- Timeline — Frames grouped by hour, sized by motion score, colored by activity meta-category. Keyboard navigation (arrow keys) and scroll-wheel frame switching.
- Detail panel — Camera image (click to expand), screen captures (main + change-detected extras with thumbnail strip), audio player with transcription, foreground window info, and all metadata.
- Summary panel — Browse summaries by scale (10m–24h) with expand/collapse. Click a summary to highlight its time range on the timeline.
- Dashboard — Focus score %, pie chart by meta-category, activity list with duration bars, top 10 app usage with switch counts, weekly stacked bar chart, gantt-style session timeline.
- Search — FTS5 trigram full-text search across frame descriptions, transcriptions, activities, window titles, and summaries. Click results to jump to date/frame.
- Activity heatmap — Frames-per-hour intensity visualization across 24 hours.
- Live feed — Real-time MJPEG stream with LIVE/OFFLINE indicator, expandable to full-screen modal.
- Mobile — Responsive layout with tab switching (Summaries / Timeline / Detail) on narrow viewports.
- Auto-refresh — 30-second polling for new frames, summaries, and events when viewing today's data.
Collects conversations from external chat platforms to enrich the "externalized memory." Knowing what you were discussing — and with whom — adds a critical dimension to daily activity tracking.
Architecture: Adapter pattern with a unified ChatSource interface. Each platform has its own adapter; users enable only what they use via life.toml.
| Platform | Status | Method | DMs | Servers/Groups |
|---|---|---|---|---|
| Discord | Implemented | REST API polling (user token) | Yes | Yes |
| LINE | Planned | Chat export import | Yes | Yes |
| Slack | Planned | Bot token + Events API | — | Yes |
| Telegram | Planned | Bot API / TDLib | Yes | Yes |
| Planned | Chat export import | Yes | Yes | |
| Teams | Planned | Microsoft Graph API | Yes | Yes |
How it works:
- Platform adapter polls for new messages in a background thread
- Messages are stored in
chat_messagestable with unified schema (platform, channel, author, content, timestamp) - Recent conversations are injected into LLM prompts — frame analysis sees "user was discussing X on Discord" alongside screen/camera data
- Daily reports include a chat activity summary (message counts per channel)
Discord specifics: Uses a user token for full access (servers + DMs + group DMs). On first run, backfills past N months of history (backfill_months, default 3) by paginating backwards through all channels. Thereafter, polls every 60s (configurable) for new messages, comparing last_message_id per channel to fetch only deltas. Handles rate limiting with automatic retry.
- Discord — Daily reports via webhook embed (4000 char limit, purple accent)
- LINE Notify — Daily reports via LINE API (1000 char limit)
- Test with
life notify-test
daemon/ (Python) tauri/ (Rust) frontend (React)
├─ Camera capture ├─ IPC commands ├─ Timeline view
├─ Screen capture ├─ rusqlite queries ├─ Frame detail
├─ Audio capture ├─ Asset Protocol ├─ Summary panel
├─ Window monitor ├─ Daemon lifecycle ├─ Live feed
├─ Presence detection └─ System tray ├─ Dashboard
├─ LLM analysis ├─ Search
├─ Summary generation ├─ Activity heatmap
├─ Report generation └─ Mobile responsive
├─ Chat integration
├─ Change detection
├─ SQLite write
└─ MJPEG live server (port 3002)
- Daemon writes to SQLite, web reads it (WAL mode for concurrent access)
- Window monitor runs a persistent PowerShell process with its own SQLite connection
- Shared
data/directory:frames/,screens/,audio/,life.db - LLM provider abstracted: Gemini API, Claude Code CLI, Codex CLI, or external WebSocket, configured in
life.toml
| Thread | Purpose | Rate |
|---|---|---|
| Main loop | Capture + analysis + summaries | Every 30s (configurable) |
| Live feed | Webcam → MJPEG stream | ~30fps |
| Audio recording | ALSA capture during interval | Per tick |
| Window monitor | PowerShell → window_events table |
500ms polls |
| Change detection | Screen/camera hash comparison | Every 1s between ticks |
| Chat poller | Discord/etc → chat_messages table |
Every 60s (configurable) |
| Live HTTP server | Serve MJPEG to clients | On demand |
Click to expand
daemon/ # Python package
├─ cli.py # CLI entry point (Click)
├─ daemon.py # Main observer loop
├─ config.py # TOML config loading
├─ analyzer.py # Frame analysis + summary generation
├─ activity.py # ActivityManager: DB-backed normalization + meta-category mapping
├─ report.py # Daily report generation
├─ notify.py # Discord / LINE webhook notifications
├─ live.py # MJPEG streaming server
├─ chat/ # Chat platform integration
│ ├─ base.py # Abstract ChatSource interface
│ ├─ discord.py # Discord adapter (user token, REST polling)
│ └─ manager.py # ChatManager: orchestrates adapters
├─ llm/ # LLM provider abstraction
│ ├─ base.py # Abstract base class
│ ├─ gemini.py # Google Gemini (image + audio support)
│ ├─ claude.py # Anthropic Claude (via CLI)
│ └─ codex.py # OpenAI Codex (via CLI)
├─ capture/ # Data capture modules
│ ├─ camera.py # Webcam (V4L2 / MJPEG)
│ ├─ screen.py # Screen capture (PowerShell)
│ ├─ audio.py # Audio recording (ALSA)
│ ├─ window.py # Foreground window monitor (PowerShell + Win32)
│ └─ frame_store.py # JPEG file storage
├─ analysis/ # Local analysis (no LLM)
│ ├─ motion.py # MOG2 background subtraction
│ ├─ scene.py # Brightness classification
│ ├─ change.py # Perceptual hash change detection
│ ├─ presence.py # Face detection + state machine
│ └─ transcribe.py # Audio → text via LLM
├─ summary/ # Summary formatting
│ ├─ formatter.py # CLI output formatting
│ └─ timeline.py # Timeline data builder
├─ claude/ # Claude-specific features
│ ├─ analyzer.py # Review analysis
│ └─ review.py # Daily review package generator
└─ storage/ # Database layer
├─ database.py # SQLite schema, migrations, queries
└─ models.py # Frame, Event, Summary, Report dataclasses
web/ # Tauri v2 desktop application
├─ src-tauri/
│ ├─ src/lib.rs # App setup, daemon lifecycle, tray
│ ├─ src/db.rs # SQLite connection, settings, cache
│ ├─ src/commands/ # IPC command handlers (18 modules)
│ └─ tauri.conf.json # App config, bundle resources
└─ src/
├─ App.tsx # Main SPA orchestrator
├─ components/ # React components
├─ hooks/ # Data fetching with 30s polling
└─ lib/ # IPC client, types, activity module, utilities
data/ # Runtime data (gitignored)
├─ frames/ # Camera JPEGs (YYYY-MM-DD/*.jpg)
├─ screens/ # Screen PNGs (YYYY-MM-DD/*.png)
├─ audio/ # Audio WAVs (YYYY-MM-DD/*.wav)
├─ live/ # Current MJPEG stream frame
├─ context.md # User profile for LLM context
├─ life.db # SQLite database (WAL mode)
└─ life.pid # Daemon PID file
See getting-started.md for full platform-specific instructions (日本語版).
| Platform | Guide |
|---|---|
| Windows (Native) | getting-started.md#windows-native |
| Windows (WSL2) | getting-started.md#windows-wsl2 |
| Mac | getting-started.md#mac |
| Windows (Native) | Windows (WSL2) | Mac | |
|---|---|---|---|
| Python | 3.12+ (Windows) | 3.12+ (in WSL2) | 3.12+ |
| Node.js | 22+ (Windows) | 22+ (in WSL2) | 22+ |
| Camera | Built-in / USB (DirectShow) | External USB via usbipd | Built-in |
| Microphone | Built-in / USB (WASAPI) | External USB via usbipd | Built-in |
| Screen capture | PowerShell + Windows Forms | PowerShell + Windows Forms | screencapture (built-in) |
| Window tracking | PowerShell + Win32 API | PowerShell + Win32 API | osascript (built-in) |
| AI provider | Claude Code CLI / Codex CLI / Gemini API key | Claude Code CLI / Codex CLI / Gemini API key | Claude Code CLI / Codex CLI / Gemini API key |
Settings are managed via the Settings UI inside the desktop app (stored in the settings table of data/life.db). On first launch, defaults are applied automatically. For CLI-only use, settings can also be configured via life.toml and .env as fallback.
Recommended provider: Claude Code or Codex CLI for local analysis. Gemini remains available when you want API-key-based setup.
Tip: Create data/context.md with your name, occupation, and habits — the AI uses this for more accurate activity descriptions.
See the full configuration reference below for all options.
docker compose upFor camera/audio device passthrough, configure docker-compose.override.yml. See Docker setup.
| Command | Description |
|---|---|
life start [-d] |
Start the observer daemon (-d for background) |
life stop |
Stop the running daemon |
life status |
Show status (frame count, summaries, disk usage) |
life capture |
Capture a single test frame |
life look |
Capture and analyze a frame immediately |
life recent [-n 5] |
Show recent frame analyses |
life today [DATE] |
Show timeline for the day |
life stats [DATE] |
Show daily statistics |
life summaries [DATE] [--scale 1h] |
Show summaries (10m/30m/1h/6h/12h/24h) |
life events [DATE] |
List detected events |
life report [DATE] |
Generate daily diary report |
life review [DATE] [--json] |
Generate review package |
life consolidate-activities |
Merge similar activity categories via LLM |
life notify-test |
Test webhook notification |
Settings are managed via the Settings UI in the desktop app and stored in the settings table of data/life.db. For CLI-only use, life.toml and .env serve as fallback sources. The following are the available setting keys (DB key names shown):
| Key | Default | Description |
|---|---|---|
data_dir |
"data" |
Data directory path |
capture.device |
0 |
Camera device ID (/dev/videoN) |
capture.interval_sec |
30 |
Capture interval (seconds) |
capture.width |
640 |
Capture width |
capture.height |
480 |
Capture height |
capture.jpeg_quality |
85 |
JPEG quality |
capture.audio_device |
"" |
Audio device (empty = auto-detect) |
capture.audio_sample_rate |
44100 |
Audio sample rate |
analysis.motion_threshold |
0.02 |
MOG2 foreground pixel ratio |
analysis.brightness_dark |
40.0 |
Below = DARK scene |
analysis.brightness_bright |
180.0 |
Above = BRIGHT scene |
llm.provider |
"claude" |
"gemini", "claude", "codex", or "external" |
llm.claude_model |
"haiku" |
Claude model name |
llm.codex_model |
"gpt-5.4" |
Codex model name |
llm.gemini_model |
"gemini-3.1-flash-lite-preview" |
Gemini model name |
presence.enabled |
true |
Enable presence detection |
presence.absent_threshold_ticks |
3 |
Ticks before absent state |
presence.sleep_start_hour |
23 |
Sleep detection start hour |
presence.sleep_end_hour |
8 |
Sleep detection end hour |
notify.provider |
"discord" |
"discord" or "line" |
notify.webhook_url |
"" |
Webhook URL |
notify.enabled |
false |
Enable notifications |
chat.enabled |
false |
Master switch for chat integration |
chat.discord.enabled |
false |
Enable Discord adapter |
chat.discord.user_token |
"" |
Discord user token |
chat.discord.user_id |
"" |
Your Discord user ID |
chat.discord.poll_interval |
60 |
Seconds between polls |
chat.discord.backfill_months |
3 |
Months of history to backfill on first run (0 = skip) |
The frontend communicates with the Rust backend via Tauri invoke() commands, not HTTP endpoints. These are defined in web/src-tauri/src/commands/.
| Command | Module | Description |
|---|---|---|
get_frames |
frames | List frames for a date |
get_frame |
frames | Get frame by ID |
get_latest_frame |
frames | Get latest frame |
get_summaries |
summaries | List summaries by date and scale |
get_events |
events | List events for a date |
get_stats |
stats | Daily statistics (counts, averages, hourly activity) |
get_activities |
stats | Activity breakdown with duration and hourly detail |
get_apps |
stats | App usage from window events (duration, switch count) |
get_dates |
stats | List dates with data |
get_range_stats |
stats | Per-day stats with meta-category breakdown |
get_sessions |
sessions | Activity sessions (consecutive frame grouping) |
get_report |
reports | Get daily report |
list_reports |
reports | List recent reports |
list_activities |
activities | List activity categories with meta-categories |
get_activity_mappings |
activities | Activity to meta-category mapping table |
search_text |
search | Full-text search (frames + summaries) |
export_frames_csv |
export | Export frames as CSV |
export_summaries_csv |
export | Export summaries as CSV |
export_report |
export | Export daily report as JSON |
get_live_frame |
live | Single JPEG snapshot from live feed |
get_settings |
settings | Get all settings from DB |
put_settings |
settings | Update settings in DB |
get_memo |
memos | Get memo for a date |
put_memo |
memos | Save memo for a date |
get_context |
context | Get user profile context |
put_context |
context | Update user profile context |
get_devices |
devices | Enumerate camera and audio devices |
get_status |
status | Daemon status and data directory info |
get_data_dir |
status | Get data directory path |
get_chat |
chat | Get chat messages for a date |
ask_rag |
rag | RAG-based question answering |
get_data_stats |
data | Data storage statistics |
export_table |
data | Export a database table |
Click to expand
Core capture data: timestamp, camera path, screen path, extra screen paths, audio path, transcription, brightness, motion score, scene type, LLM description, activity category, foreground window.
Focus change events recorded by the window monitor: timestamp, process name, window title. Used for precise app usage duration calculation via LEAD() window function.
Multi-scale summaries (10m to 24h) with timestamp, scale, content, and frame count.
Detected events: scene changes, motion spikes, presence state changes. Linked to source frame.
Dynamic activity → meta-category mapping. Primary key is activity name, with meta_category, first_seen timestamp, and frame_count. Seeded from existing frame data on first migration. Updated automatically as the LLM generates new activities.
Daily auto-generated reports with content, frame count, and focus percentage.
Messages collected from chat platforms: platform, platform-specific message ID, channel/guild info, author, is_self flag, content, timestamp, and JSON metadata for attachments/embeds. Unique constraint on (platform, platform_message_id).
Daily user memos: date (primary key), content, updated_at. Editable only for today, read-only for past dates.
frames_fts (trigram) over description, transcription, activity, foreground_window. summaries_fts (trigram) over content.
- Daemon: Python 3.12 / Click / OpenCV / SQLite (WAL mode)
- LLM: Anthropic Claude Code CLI / OpenAI Codex CLI / Google Gemini (image + audio)
- Window tracking: PowerShell / Win32 P/Invoke (
GetForegroundWindow) - Desktop: Tauri v2 / Rust / rusqlite / WebView2 (Windows) / WebKitGTK (Linux) / WKWebView (macOS)
- Frontend: React 19 / TypeScript / Vite 6
- Infra: Docker Compose / WSL2
vida is a local-only system. Captured data stays on your machine; outbound traffic is limited to the configured LLM provider (optional) and allowlisted notification webhooks.
- Loopback-only servers. The daemon binds MJPEG (3002), RAG (3003), and WebSocket (3004) to
127.0.0.1and rejects non-loopback remotes, badHostheaders (DNS-rebinding defense), and untrusted origins. - Hardened Tauri layer. Every IPC command that takes a date, path, or scale runs through shared validators; file reads are confined to
data_dirvia asafe_joinhelper; Python binary discovery canonicalizes under the repo root;assetProtocolscope is registered dynamically at runtime only. - Hardened webview. CSP pins
img/media/connectto the specific loopback ports;script-src 'self',object-src 'none',frame-ancestors 'none'. RAG markdown output is sanitized with DOMPurify. - Prompt-injection mitigations. Untrusted strings (window titles, transcription, chat) are JSON-escaped in the analyzer prompt, and the system note tells the model that captured data is not instructions.
- Secrets hygiene. API keys live in the
settingstable oflife.db; on POSIX the daemon chmodsdata/to700and DB files to600, and LLM error messages are scrubbed of API keys before being broadcast over the WebSocket. - End-to-end tests. Network-level invariants (Host/Origin/body-size/query-length) are covered by
tests/e2e/booting real daemon servers on loopback; frontend sanitization and UI flows are covered by Playwright inweb/e2e/.
For the full hardening inventory, reporting instructions, and user best practices, see SECURITY.md.