A bilingual (English/Chinese) AI agent with advanced memory, personalization, and agentic capabilities, powered by Google Gemini 2.5 Flash.
- 🤖 Agentic Architecture: ReAct (Reason-Act-Observe) pattern for autonomous AI behavior
- 🧠 Intelligent Memory System: Automatic extraction with tiered retention (CORE/IMPORTANT/CONTEXT)
- 🔍 Web Search & Fetch: Real-time search with smart fallback chain (90-95% success rate)
- 🌐 Resilient Content Fetching: Cache → Direct → Jina.ai → Archive.org fallback for blocked sites
- 📊 Progress Tracking: Real-time visual feedback during AI response generation
- 🎨 Native Image Generation: Built-in Gemini 2.5 Flash Image generation
- 🚀 PRO Mode: Access to advanced Gemini 2.0 Flash Pro and Thinking models
- 🌏 Bilingual Support: Full English and Chinese support (175+ keywords)
- 📎 File Attachments: Upload and analyze images, PDFs with multimodal AI
- 🖼️ Image Upload & Storage: Paste/drag-drop images with Cloudflare R2 storage
- 📑 Paper Reader: Analyze arXiv papers with structured AI summaries
- 📂 Repo Reader: Analyze GitHub repositories and generate architecture docs
- 🔬 Deep Research: Multi-turn research sessions with Gemini Interactions API
- 💬 Streaming Responses: Real-time AI chat with syntax highlighting and LaTeX support
- 🔐 Google OAuth: Secure authentication with whitelist control
- 📝 Conversation Management: Auto-generated titles, full history
- 📄 Whim Editor: Notion-like WYSIWYG editor with LaTeX, code, tables, images
- 👨💼 Admin Panel: User management, whitelist, prompt configuration
- 🎯 Smart Personalization: AI remembers your preferences and context
- ⚙️ Dynamic Prompts: Admin-configurable system prompts
- 🎨 Clean UI: Modern interface with Tailwind CSS
- Framework: Next.js 14 (App Router)
- Language: TypeScript
- Database: Firestore (serverless)
- Authentication: NextAuth.js (Google OAuth)
- AI: Google Gemini API (2.5 Flash, Image, Lite)
- Styling: Tailwind CSS + shadcn/ui
- Unit Testing: Jest + TypeScript (374 tests, 100% pass rate)
- E2E Testing: Playwright (142 tests in 14 files, 100% pass rate)
- Deployment: Cloud Run (GCP)
- Node.js 20+
- npm or yarn
- Google Cloud Platform account
- Firebase project
npm installCreate a .env.local file in the root directory:
# Next.js
NEXTAUTH_URL=http://localhost:8080
NEXTAUTH_SECRET=your-secret-key-here
# Google OAuth (Get from GCP Console)
GOOGLE_CLIENT_ID=your-google-client-id
GOOGLE_CLIENT_SECRET=your-google-client-secret
# Gemini API (Get from https://ai.google.dev/)
GEMINI_API_KEY=your-gemini-api-key
# Firebase/Firestore
FIREBASE_PROJECT_ID=your-firebase-project-id
FIREBASE_PRIVATE_KEY="your-firebase-private-key"
FIREBASE_CLIENT_EMAIL=your-firebase-client-email
# Google Custom Search (for web search feature)
GOOGLE_SEARCH_API_KEY=your-google-search-api-key
GOOGLE_SEARCH_ENGINE_ID=your-search-engine-id
# Jina.ai Reader API (optional, for higher rate limits on web content fetching)
JINA_API_KEY=your-jina-api-key
# Admin Email
ADMIN_EMAIL=archeryue7@gmail.com
# Feature Flags (optional, all default to false)
NEXT_PUBLIC_USE_INTELLIGENT_ANALYSIS=true
NEXT_PUBLIC_USE_WEB_SEARCH=true
NEXT_PUBLIC_USE_AGENTIC_MODE=true- Go to https://ai.google.dev/
- Click "Get API Key"
- Create a new API key
- Copy the key to
GEMINI_API_KEYin.env.local
- Go to Firebase Console
- Create a new project (or use existing)
- Enable Firestore Database (Native mode)
- Go to Project Settings → Service Accounts
- Click "Generate new private key"
- Download the JSON file
- Copy values to
.env.local:project_id→FIREBASE_PROJECT_IDprivate_key→FIREBASE_PRIVATE_KEYclient_email→FIREBASE_CLIENT_EMAIL
- Go to GCP Console
- Navigate to APIs & Services → Credentials
- Click "Create Credentials" → "OAuth 2.0 Client ID"
- Application type: Web application
- Add authorized redirect URI:
http://localhost:8080/api/auth/callback/google - Copy Client ID and Client Secret to
.env.local
npm run devOpen http://localhost:8080 in your browser.
- Click "Get Started" → "Sign in with Google"
- Sign in with your Google account (archeryue7@gmail.com)
- You'll be automatically whitelisted as admin
- Start chatting!
src/
app/
api/ # API routes
auth/ # NextAuth endpoints
chat/ # Chat streaming endpoint
conversations/ # Conversation management
memory/ # Memory system API
admin/ # Admin endpoints (whitelist, users, prompts, cleanup)
chat/ # Main chat interface
admin/ # Admin panel
profile/ # User memory profile page
whim/ # Whim management page
paper-reader/ # arXiv paper analysis page
repo-reader/ # GitHub repo analysis page
deep-research/ # Deep research page
login/ # Login page
layout.tsx # Root layout with providers
components/
chat/ # Chat components (input, message, sidebar, topbar, progress)
admin/ # Admin components (whitelist, stats, prompts)
ui/ # UI components (shadcn/ui)
providers/ # Context providers
lib/
firebase-admin.ts # Firestore setup (lazy initialization)
auth.ts # NextAuth config
prompts.ts # Dynamic prompt management
providers/ # AI provider abstraction
provider-factory.ts
gemini.provider.ts
agent/ # Agentic architecture (ReAct pattern)
core/ # Agent core, context manager, prompts
tools/ # Tool implementations (web_search, memory, etc.)
prompt-analysis/ # AI-powered intent analysis
analyzer.ts # PromptAnalyzer using Gemini Flash Lite
context-engineering/ # Context orchestration
orchestrator.ts # Coordinates web search, memory, model selection
web-search/ # Web search integration
google-search.ts # Google Custom Search API client
rate-limiter.ts # Per-user rate limiting
content-fetcher.ts # Fetch and extract web content
progress/ # Progress tracking system
emitter.ts # Server-side event emitter
types.ts # Progress step types
memory/ # Memory system
storage.ts # CRUD operations
extractor.ts # AI-powered extraction
loader.ts # Memory loading for chat
cleanup.ts # Automatic cleanup
keywords/ # Keyword trigger system (legacy)
system.ts
triggers.ts
paper-reader/ # arXiv paper analysis
repo-reader/ # GitHub repository analysis
deep-research/ # Multi-turn research with Gemini
config/
models.ts # Gemini model tiering
keywords.ts # Bilingual keywords (175+ triggers)
feature-flags.ts # Feature toggles
types/
index.ts # Main types
memory.ts # Memory system types
prompts.ts # Prompt types
file.ts # File attachment types
ai-providers.ts # Provider interfaces
agent.ts # Agent types
prompt-analysis.ts # Analysis types
__tests__/ # Jest unit tests (374 tests)
e2e/ # Playwright E2E tests (142 tests, 14 files)
The AI automatically learns from your conversations:
- Hybrid Triggering: Keywords ("remember that") or automatic after 5+ messages
- Tiered Retention: CORE (permanent), IMPORTANT (90 days), CONTEXT (30 days)
- Smart Cleanup: Removes low-value facts to stay under 500-token budget
- User Control: View and delete facts at
/profile
Generate images directly in chat:
- English: "create an image of a sunset"
- Chinese: "生成一幅图片,描绘星空"
- Native Gemini 2.5 Flash Image model
- Inline display in conversation
Upload and analyze files:
- Images: PNG, JPG, GIF, WebP
- Documents: PDF
- AI can analyze and discuss file contents
- Multimodal processing with Gemini
Full Chinese and English support:
- 138 memory trigger keywords (both languages)
- 37 image generation keywords (both languages)
- Language preference auto-detection
- Hybrid mode for mixed conversations
ReAct (Reason-Act-Observe) pattern for autonomous AI:
- Iterative loop: Up to 5 iterations per request
- Available tools: web_search, web_fetch, memory_save, memory_retrieve, get_current_time
- sourceCategory: Target reliable sources (Wikipedia, StackOverflow, Reuters, etc.)
- Agent autonomously decides when to use tools vs respond directly
- Enable with
NEXT_PUBLIC_USE_AGENTIC_MODE=true
Real-time web search with resilient content fetching:
- Search Provider: Google Custom Search API (20/hour, 100/day per user)
- Content Fetching: Multi-tier fallback chain for 90-95% success rate
- Cache: In-memory LRU (500 entries, 1h TTL) for instant responses
- Direct: Cheerio + 8 diverse User-Agents
- Jina.ai Reader: JavaScript rendering + bot bypass for blocked sites
- Archive.org: Final fallback for historical/blocked content
- Smart Handling: Automatic fallback for 401/403 errors (Reuters, Bloomberg, WSJ)
- Source Tracking: Metadata shows which method succeeded (direct/jina.ai/archive.org)
- Enable with
NEXT_PUBLIC_USE_WEB_SEARCH=true
Real-time visual feedback during AI responses:
- Steps: Analyzing → Searching → Retrieving Memory → Building Context → Generating
- Single updating badge shows current progress
- Server-Sent Events protocol for streaming updates
Upload images directly in chat:
- Paste: Ctrl+V to paste images from clipboard
- Drag & drop: Drop images directly into chat input
- Storage: Cloudflare R2 for cost-effective storage
- CDN: Fast global delivery via Cloudflare CDN
Analyze academic papers from arXiv:
- URL validation: arXiv paper URL detection
- Multi-phase analysis: Overview, methodology, results, critique
- Save to Whim: Export analysis as editable document
- Progress tracking: Real-time analysis status
Analyze GitHub repositories:
- Deterministic exploration: Import-tracing instead of AI guessing
- 4-phase analysis: Recon → Entry points → Module exploration → Synthesis
- Architecture docs: Generate comprehensive architecture documentation
- Token-budgeted: Smart file selection within limits
Multi-turn research sessions:
- Gemini Interactions API: Grounded search with real-time results
- Iterative queries: Up to 10 search iterations
- Source aggregation: Automatic citation collection
- Save to Whim: Export research as document
As an admin, you can:
- Manage Whitelist: Add/remove emails that can access the app
- View User Stats: See all users, message counts, and last active times
- Configure Prompts: Edit system prompts and temperature settings
- Access Admin Panel: Click "Admin Panel" in the sidebar
For family use (5-10 users, ~1000 messages/month):
- Firestore: FREE (within free tier)
- Cloud Run: $5-10/month (scales to zero when idle)
- Gemini API: $2-5/month (tiered models for optimization)
- Chat (2.5 Flash): ~$1.70
- Memory extraction (2.5 Flash-Lite): ~$0.50
- Image generation (occasional): ~$0.50
- Web content extraction: ~$0.35/month (down from $0.50 thanks to caching!)
- Total: $7.50-17.50/month ✅ Well under $30 budget!
Cost optimizations:
- WebFetch caching: Saves ~$0.15/month (30-40% cache hit rate)
- Jina.ai Reader: FREE (with API key, unlimited use)
- Archive.org: FREE (unlimited historical content)
Cost per feature:
- Base chat: ~$6-12/month
- Memory system: +$0.50-1/month
- Image generation: +$0.50-2/month
- Web search & fetch: +$0.35/month (optimized with caching)
- File attachments: included (no extra cost)
WhimCraft has comprehensive test coverage with Jest (unit) and Playwright (E2E).
# Run all tests
npx jest
# Run with coverage
npx jest --coverage
# Run specific suite
npx jest src/__tests__/lib/memory/cleanup.test.ts
# Watch mode
npx jest --watchCurrent Status: 374 tests passing (100% pass rate)
- Memory system (42 tests): cleanup, extraction, loading, storage
- Agent system (58 tests): core, tools, context manager
- Web search & fetch (27 tests): search, rate limiting, content fetching, fallback chain, cache
- Context orchestration (8 tests)
- Prompt analysis (31 tests)
- Whim system (124 tests): editor, converter, storage, validation
- Paper/Repo readers (40+ tests): parsing, analysis, validation
- Deep Research (40+ tests): API integration, flow control
# Run all E2E tests (headless, ~2 minutes)
npm run test:e2e:fast
# or
npx playwright test
# Interactive UI mode
npx playwright test --ui
# Run with visible browser
npx playwright test --headed
# Debug mode with inspector
npx playwright test --debugCurrent Status: 142 tests in 14 files (100% pass rate)
01-ui-and-ux.e2e.ts- UI/UX fundamentals02-authenticated-chat.e2e.ts- Chat flows03-visual-and-accessibility.e2e.ts- Accessibility04-core-features.e2e.ts- Core functionality05-whim-editor.e2e.ts- Whim editor06-pro-mode.e2e.ts- PRO mode07-paper-reader.e2e.ts- Paper Reader08-pdf-tools.e2e.ts- PDF tools09-image-upload.e2e.ts- Image upload10-welcome-navigator.e2e.ts- Welcome page11-repo-reader.e2e.ts- Repo Reader12-deep-research.e2e.ts- Deep Researchweb-fetch-resilience.spec.ts- Web fetch fallback chainfinancial-web-fetch.spec.ts- Financial website handling
See docs/TESTING.md for detailed testing guide.
For complete deployment instructions to Google Cloud Run, see docs/DEPLOYMENT.md.