WhimCraft - AI Agent

A bilingual (English/Chinese) AI agent with advanced memory, personalization, and agentic capabilities, powered by Google Gemini 2.5 Flash.

Features

🤖 Agentic Architecture: ReAct (Reason-Act-Observe) pattern for autonomous AI behavior
🧠 Intelligent Memory System: Automatic extraction with tiered retention (CORE/IMPORTANT/CONTEXT)
🔍 Web Search & Fetch: Real-time search with smart fallback chain (90-95% success rate)
🌐 Resilient Content Fetching: Cache → Direct → Jina.ai → Archive.org fallback for blocked sites
📊 Progress Tracking: Real-time visual feedback during AI response generation
🎨 Native Image Generation: Built-in Gemini 2.5 Flash Image generation
🚀 PRO Mode: Access to advanced Gemini 2.0 Flash Pro and Thinking models
🌏 Bilingual Support: Full English and Chinese support (175+ keywords)
📎 File Attachments: Upload and analyze images, PDFs with multimodal AI
🖼️ Image Upload & Storage: Paste/drag-drop images with Cloudflare R2 storage
📑 Paper Reader: Analyze arXiv papers with structured AI summaries
📂 Repo Reader: Analyze GitHub repositories and generate architecture docs
🔬 Deep Research: Multi-turn research sessions with Gemini Interactions API
💬 Streaming Responses: Real-time AI chat with syntax highlighting and LaTeX support
🔐 Google OAuth: Secure authentication with whitelist control
📝 Conversation Management: Auto-generated titles, full history
📄 Whim Editor: Notion-like WYSIWYG editor with LaTeX, code, tables, images
👨‍💼 Admin Panel: User management, whitelist, prompt configuration
🎯 Smart Personalization: AI remembers your preferences and context
⚙️ Dynamic Prompts: Admin-configurable system prompts
🎨 Clean UI: Modern interface with Tailwind CSS

Tech Stack

Framework: Next.js 14 (App Router)
Language: TypeScript
Database: Firestore (serverless)
Authentication: NextAuth.js (Google OAuth)
AI: Google Gemini API (2.5 Flash, Image, Lite)
Styling: Tailwind CSS + shadcn/ui
Unit Testing: Jest + TypeScript (374 tests, 100% pass rate)
E2E Testing: Playwright (142 tests in 14 files, 100% pass rate)
Deployment: Cloud Run (GCP)

Local Development Setup

Prerequisites

Node.js 20+
npm or yarn
Google Cloud Platform account
Firebase project

Step 1: Install Dependencies

npm install

Step 2: Setup Environment Variables

Create a .env.local file in the root directory:

# Next.js
NEXTAUTH_URL=http://localhost:8080
NEXTAUTH_SECRET=your-secret-key-here

# Google OAuth (Get from GCP Console)
GOOGLE_CLIENT_ID=your-google-client-id
GOOGLE_CLIENT_SECRET=your-google-client-secret

# Gemini API (Get from https://ai.google.dev/)
GEMINI_API_KEY=your-gemini-api-key

# Firebase/Firestore
FIREBASE_PROJECT_ID=your-firebase-project-id
FIREBASE_PRIVATE_KEY="your-firebase-private-key"
FIREBASE_CLIENT_EMAIL=your-firebase-client-email

# Google Custom Search (for web search feature)
GOOGLE_SEARCH_API_KEY=your-google-search-api-key
GOOGLE_SEARCH_ENGINE_ID=your-search-engine-id

# Jina.ai Reader API (optional, for higher rate limits on web content fetching)
JINA_API_KEY=your-jina-api-key

# Admin Email
ADMIN_EMAIL=archeryue7@gmail.com

# Feature Flags (optional, all default to false)
NEXT_PUBLIC_USE_INTELLIGENT_ANALYSIS=true
NEXT_PUBLIC_USE_WEB_SEARCH=true
NEXT_PUBLIC_USE_AGENTIC_MODE=true

Step 3: Get API Keys and Credentials

1. Gemini API Key

Go to https://ai.google.dev/
Click "Get API Key"
Create a new API key
Copy the key to GEMINI_API_KEY in .env.local

2. Firebase Project

Go to Firebase Console
Create a new project (or use existing)
Enable Firestore Database (Native mode)
Go to Project Settings → Service Accounts
Click "Generate new private key"
Download the JSON file
Copy values to .env.local:
- project_id → FIREBASE_PROJECT_ID
- private_key → FIREBASE_PRIVATE_KEY
- client_email → FIREBASE_CLIENT_EMAIL

3. Google OAuth Credentials

Go to GCP Console
Navigate to APIs & Services → Credentials
Click "Create Credentials" → "OAuth 2.0 Client ID"
Application type: Web application
Add authorized redirect URI: http://localhost:8080/api/auth/callback/google
Copy Client ID and Client Secret to .env.local

Step 4: Run Development Server

npm run dev

Open http://localhost:8080 in your browser.

Step 5: First Login

Click "Get Started" → "Sign in with Google"
Sign in with your Google account (archeryue7@gmail.com)
You'll be automatically whitelisted as admin
Start chatting!

Project Structure

src/
  app/
    api/                  # API routes
      auth/               # NextAuth endpoints
      chat/               # Chat streaming endpoint
      conversations/      # Conversation management
      memory/             # Memory system API
      admin/              # Admin endpoints (whitelist, users, prompts, cleanup)
    chat/                 # Main chat interface
    admin/                # Admin panel
    profile/              # User memory profile page
    whim/                 # Whim management page
    paper-reader/         # arXiv paper analysis page
    repo-reader/          # GitHub repo analysis page
    deep-research/        # Deep research page
    login/                # Login page
    layout.tsx            # Root layout with providers
  components/
    chat/                 # Chat components (input, message, sidebar, topbar, progress)
    admin/                # Admin components (whitelist, stats, prompts)
    ui/                   # UI components (shadcn/ui)
    providers/            # Context providers
  lib/
    firebase-admin.ts     # Firestore setup (lazy initialization)
    auth.ts               # NextAuth config
    prompts.ts            # Dynamic prompt management
    providers/            # AI provider abstraction
      provider-factory.ts
      gemini.provider.ts
    agent/                # Agentic architecture (ReAct pattern)
      core/               # Agent core, context manager, prompts
      tools/              # Tool implementations (web_search, memory, etc.)
    prompt-analysis/      # AI-powered intent analysis
      analyzer.ts         # PromptAnalyzer using Gemini Flash Lite
    context-engineering/  # Context orchestration
      orchestrator.ts     # Coordinates web search, memory, model selection
    web-search/           # Web search integration
      google-search.ts    # Google Custom Search API client
      rate-limiter.ts     # Per-user rate limiting
      content-fetcher.ts  # Fetch and extract web content
    progress/             # Progress tracking system
      emitter.ts          # Server-side event emitter
      types.ts            # Progress step types
    memory/               # Memory system
      storage.ts          # CRUD operations
      extractor.ts        # AI-powered extraction
      loader.ts           # Memory loading for chat
      cleanup.ts          # Automatic cleanup
    keywords/             # Keyword trigger system (legacy)
      system.ts
      triggers.ts
    paper-reader/         # arXiv paper analysis
    repo-reader/          # GitHub repository analysis
    deep-research/        # Multi-turn research with Gemini
  config/
    models.ts             # Gemini model tiering
    keywords.ts           # Bilingual keywords (175+ triggers)
    feature-flags.ts      # Feature toggles
  types/
    index.ts              # Main types
    memory.ts             # Memory system types
    prompts.ts            # Prompt types
    file.ts               # File attachment types
    ai-providers.ts       # Provider interfaces
    agent.ts              # Agent types
    prompt-analysis.ts    # Analysis types
  __tests__/              # Jest unit tests (374 tests)
e2e/                      # Playwright E2E tests (142 tests, 14 files)

Key Features Explained

🧠 Memory System

The AI automatically learns from your conversations:

Hybrid Triggering: Keywords ("remember that") or automatic after 5+ messages
Tiered Retention: CORE (permanent), IMPORTANT (90 days), CONTEXT (30 days)
Smart Cleanup: Removes low-value facts to stay under 500-token budget
User Control: View and delete facts at /profile

🎨 Image Generation

Generate images directly in chat:

English: "create an image of a sunset"
Chinese: "生成一幅图片，描绘星空"
Native Gemini 2.5 Flash Image model
Inline display in conversation

📎 File Attachments

Upload and analyze files:

Images: PNG, JPG, GIF, WebP
Documents: PDF
AI can analyze and discuss file contents
Multimodal processing with Gemini

🌏 Bilingual Support

Full Chinese and English support:

138 memory trigger keywords (both languages)
37 image generation keywords (both languages)
Language preference auto-detection
Hybrid mode for mixed conversations

🤖 Agentic Architecture

ReAct (Reason-Act-Observe) pattern for autonomous AI:

Iterative loop: Up to 5 iterations per request
Available tools: web_search, web_fetch, memory_save, memory_retrieve, get_current_time
sourceCategory: Target reliable sources (Wikipedia, StackOverflow, Reuters, etc.)
Agent autonomously decides when to use tools vs respond directly
Enable with NEXT_PUBLIC_USE_AGENTIC_MODE=true

🔍 Web Search & Content Fetching

Real-time web search with resilient content fetching:

Search Provider: Google Custom Search API (20/hour, 100/day per user)
Content Fetching: Multi-tier fallback chain for 90-95% success rate
- Cache: In-memory LRU (500 entries, 1h TTL) for instant responses
- Direct: Cheerio + 8 diverse User-Agents
- Jina.ai Reader: JavaScript rendering + bot bypass for blocked sites
- Archive.org: Final fallback for historical/blocked content
Smart Handling: Automatic fallback for 401/403 errors (Reuters, Bloomberg, WSJ)
Source Tracking: Metadata shows which method succeeded (direct/jina.ai/archive.org)
Enable with NEXT_PUBLIC_USE_WEB_SEARCH=true

📊 Progress Tracking

Real-time visual feedback during AI responses:

Steps: Analyzing → Searching → Retrieving Memory → Building Context → Generating
Single updating badge shows current progress
Server-Sent Events protocol for streaming updates

🖼️ Image Upload & Storage

Upload images directly in chat:

Paste: Ctrl+V to paste images from clipboard
Drag & drop: Drop images directly into chat input
Storage: Cloudflare R2 for cost-effective storage
CDN: Fast global delivery via Cloudflare CDN

📑 Paper Reader

Analyze academic papers from arXiv:

URL validation: arXiv paper URL detection
Multi-phase analysis: Overview, methodology, results, critique
Save to Whim: Export analysis as editable document
Progress tracking: Real-time analysis status

📂 Repo Reader

Analyze GitHub repositories:

Deterministic exploration: Import-tracing instead of AI guessing
4-phase analysis: Recon → Entry points → Module exploration → Synthesis
Architecture docs: Generate comprehensive architecture documentation
Token-budgeted: Smart file selection within limits

🔬 Deep Research

Multi-turn research sessions:

Gemini Interactions API: Grounded search with real-time results
Iterative queries: Up to 10 search iterations
Source aggregation: Automatic citation collection
Save to Whim: Export research as document

Admin Features

As an admin, you can:

Manage Whitelist: Add/remove emails that can access the app
View User Stats: See all users, message counts, and last active times
Configure Prompts: Edit system prompts and temperature settings
Access Admin Panel: Click "Admin Panel" in the sidebar

Cost Estimation

For family use (5-10 users, ~1000 messages/month):

Firestore: FREE (within free tier)
Cloud Run: $5-10/month (scales to zero when idle)
Gemini API: $2-5/month (tiered models for optimization)
- Chat (2.5 Flash): ~$1.70
- Memory extraction (2.5 Flash-Lite): ~$0.50
- Image generation (occasional): ~$0.50
- Web content extraction: ~$0.35/month (down from $0.50 thanks to caching!)
Total: $7.50-17.50/month ✅ Well under $30 budget!

Cost optimizations:

WebFetch caching: Saves ~$0.15/month (30-40% cache hit rate)
Jina.ai Reader: FREE (with API key, unlimited use)
Archive.org: FREE (unlimited historical content)

Cost per feature:

Base chat: ~$6-12/month
Memory system: +$0.50-1/month
Image generation: +$0.50-2/month
Web search & fetch: +$0.35/month (optimized with caching)
File attachments: included (no extra cost)

Testing

WhimCraft has comprehensive test coverage with Jest (unit) and Playwright (E2E).

Unit Tests (Jest + TypeScript)

# Run all tests
npx jest

# Run with coverage
npx jest --coverage

# Run specific suite
npx jest src/__tests__/lib/memory/cleanup.test.ts

# Watch mode
npx jest --watch

Current Status: 374 tests passing (100% pass rate)

Memory system (42 tests): cleanup, extraction, loading, storage
Agent system (58 tests): core, tools, context manager
Web search & fetch (27 tests): search, rate limiting, content fetching, fallback chain, cache
Context orchestration (8 tests)
Prompt analysis (31 tests)
Whim system (124 tests): editor, converter, storage, validation
Paper/Repo readers (40+ tests): parsing, analysis, validation
Deep Research (40+ tests): API integration, flow control

E2E Tests (Playwright)

# Run all E2E tests (headless, ~2 minutes)
npm run test:e2e:fast
# or
npx playwright test

# Interactive UI mode
npx playwright test --ui

# Run with visible browser
npx playwright test --headed

# Debug mode with inspector
npx playwright test --debug

Current Status: 142 tests in 14 files (100% pass rate)

01-ui-and-ux.e2e.ts - UI/UX fundamentals
02-authenticated-chat.e2e.ts - Chat flows
03-visual-and-accessibility.e2e.ts - Accessibility
04-core-features.e2e.ts - Core functionality
05-whim-editor.e2e.ts - Whim editor
06-pro-mode.e2e.ts - PRO mode
07-paper-reader.e2e.ts - Paper Reader
08-pdf-tools.e2e.ts - PDF tools
09-image-upload.e2e.ts - Image upload
10-welcome-navigator.e2e.ts - Welcome page
11-repo-reader.e2e.ts - Repo Reader
12-deep-research.e2e.ts - Deep Research
web-fetch-resilience.spec.ts - Web fetch fallback chain
financial-web-fetch.spec.ts - Financial website handling

See docs/TESTING.md for detailed testing guide.

Deployment

For complete deployment instructions to Google Cloud Run, see docs/DEPLOYMENT.md.

Name		Name	Last commit message	Last commit date
Latest commit History 316 Commits
.claude/skills		.claude/skills
.github		.github
docs		docs
e2e		e2e
public		public
scripts		scripts
services/pdf-figures		services/pdf-figures
src		src
tests		tests
.env.local.example		.env.local.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
firebase.json		firebase.json
firestore.indexes.json		firestore.indexes.json
firestore.rules		firestore.rules
gitleaks.toml		gitleaks.toml
jest.config.js		jest.config.js
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

WhimCraft - AI Agent

Features

Tech Stack

Local Development Setup

Prerequisites

Step 1: Install Dependencies

Step 2: Setup Environment Variables

Step 3: Get API Keys and Credentials

1. Gemini API Key

2. Firebase Project

3. Google OAuth Credentials

Step 4: Run Development Server

Step 5: First Login

Project Structure

Key Features Explained

🧠 Memory System

🎨 Image Generation

📎 File Attachments

🌏 Bilingual Support

🤖 Agentic Architecture

🔍 Web Search & Content Fetching

📊 Progress Tracking

🖼️ Image Upload & Storage

📑 Paper Reader

📂 Repo Reader

🔬 Deep Research

Admin Features

Cost Estimation

Testing

Unit Tests (Jest + TypeScript)

E2E Tests (Playwright)

Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages