Skip to content

archeryue/WhimCraft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

316 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WhimCraft - AI Agent

A bilingual (English/Chinese) AI agent with advanced memory, personalization, and agentic capabilities, powered by Google Gemini 2.5 Flash.

Features

  • 🤖 Agentic Architecture: ReAct (Reason-Act-Observe) pattern for autonomous AI behavior
  • 🧠 Intelligent Memory System: Automatic extraction with tiered retention (CORE/IMPORTANT/CONTEXT)
  • 🔍 Web Search & Fetch: Real-time search with smart fallback chain (90-95% success rate)
  • 🌐 Resilient Content Fetching: Cache → Direct → Jina.ai → Archive.org fallback for blocked sites
  • 📊 Progress Tracking: Real-time visual feedback during AI response generation
  • 🎨 Native Image Generation: Built-in Gemini 2.5 Flash Image generation
  • 🚀 PRO Mode: Access to advanced Gemini 2.0 Flash Pro and Thinking models
  • 🌏 Bilingual Support: Full English and Chinese support (175+ keywords)
  • 📎 File Attachments: Upload and analyze images, PDFs with multimodal AI
  • 🖼️ Image Upload & Storage: Paste/drag-drop images with Cloudflare R2 storage
  • 📑 Paper Reader: Analyze arXiv papers with structured AI summaries
  • 📂 Repo Reader: Analyze GitHub repositories and generate architecture docs
  • 🔬 Deep Research: Multi-turn research sessions with Gemini Interactions API
  • 💬 Streaming Responses: Real-time AI chat with syntax highlighting and LaTeX support
  • 🔐 Google OAuth: Secure authentication with whitelist control
  • 📝 Conversation Management: Auto-generated titles, full history
  • 📄 Whim Editor: Notion-like WYSIWYG editor with LaTeX, code, tables, images
  • 👨‍💼 Admin Panel: User management, whitelist, prompt configuration
  • 🎯 Smart Personalization: AI remembers your preferences and context
  • ⚙️ Dynamic Prompts: Admin-configurable system prompts
  • 🎨 Clean UI: Modern interface with Tailwind CSS

Tech Stack

  • Framework: Next.js 14 (App Router)
  • Language: TypeScript
  • Database: Firestore (serverless)
  • Authentication: NextAuth.js (Google OAuth)
  • AI: Google Gemini API (2.5 Flash, Image, Lite)
  • Styling: Tailwind CSS + shadcn/ui
  • Unit Testing: Jest + TypeScript (374 tests, 100% pass rate)
  • E2E Testing: Playwright (142 tests in 14 files, 100% pass rate)
  • Deployment: Cloud Run (GCP)

Local Development Setup

Prerequisites

  • Node.js 20+
  • npm or yarn
  • Google Cloud Platform account
  • Firebase project

Step 1: Install Dependencies

npm install

Step 2: Setup Environment Variables

Create a .env.local file in the root directory:

# Next.js
NEXTAUTH_URL=http://localhost:8080
NEXTAUTH_SECRET=your-secret-key-here

# Google OAuth (Get from GCP Console)
GOOGLE_CLIENT_ID=your-google-client-id
GOOGLE_CLIENT_SECRET=your-google-client-secret

# Gemini API (Get from https://ai.google.dev/)
GEMINI_API_KEY=your-gemini-api-key

# Firebase/Firestore
FIREBASE_PROJECT_ID=your-firebase-project-id
FIREBASE_PRIVATE_KEY="your-firebase-private-key"
FIREBASE_CLIENT_EMAIL=your-firebase-client-email

# Google Custom Search (for web search feature)
GOOGLE_SEARCH_API_KEY=your-google-search-api-key
GOOGLE_SEARCH_ENGINE_ID=your-search-engine-id

# Jina.ai Reader API (optional, for higher rate limits on web content fetching)
JINA_API_KEY=your-jina-api-key

# Admin Email
ADMIN_EMAIL=archeryue7@gmail.com

# Feature Flags (optional, all default to false)
NEXT_PUBLIC_USE_INTELLIGENT_ANALYSIS=true
NEXT_PUBLIC_USE_WEB_SEARCH=true
NEXT_PUBLIC_USE_AGENTIC_MODE=true

Step 3: Get API Keys and Credentials

1. Gemini API Key

  1. Go to https://ai.google.dev/
  2. Click "Get API Key"
  3. Create a new API key
  4. Copy the key to GEMINI_API_KEY in .env.local

2. Firebase Project

  1. Go to Firebase Console
  2. Create a new project (or use existing)
  3. Enable Firestore Database (Native mode)
  4. Go to Project Settings → Service Accounts
  5. Click "Generate new private key"
  6. Download the JSON file
  7. Copy values to .env.local:
    • project_idFIREBASE_PROJECT_ID
    • private_keyFIREBASE_PRIVATE_KEY
    • client_emailFIREBASE_CLIENT_EMAIL

3. Google OAuth Credentials

  1. Go to GCP Console
  2. Navigate to APIs & Services → Credentials
  3. Click "Create Credentials" → "OAuth 2.0 Client ID"
  4. Application type: Web application
  5. Add authorized redirect URI: http://localhost:8080/api/auth/callback/google
  6. Copy Client ID and Client Secret to .env.local

Step 4: Run Development Server

npm run dev

Open http://localhost:8080 in your browser.

Step 5: First Login

  1. Click "Get Started" → "Sign in with Google"
  2. Sign in with your Google account (archeryue7@gmail.com)
  3. You'll be automatically whitelisted as admin
  4. Start chatting!

Project Structure

src/
  app/
    api/                  # API routes
      auth/               # NextAuth endpoints
      chat/               # Chat streaming endpoint
      conversations/      # Conversation management
      memory/             # Memory system API
      admin/              # Admin endpoints (whitelist, users, prompts, cleanup)
    chat/                 # Main chat interface
    admin/                # Admin panel
    profile/              # User memory profile page
    whim/                 # Whim management page
    paper-reader/         # arXiv paper analysis page
    repo-reader/          # GitHub repo analysis page
    deep-research/        # Deep research page
    login/                # Login page
    layout.tsx            # Root layout with providers
  components/
    chat/                 # Chat components (input, message, sidebar, topbar, progress)
    admin/                # Admin components (whitelist, stats, prompts)
    ui/                   # UI components (shadcn/ui)
    providers/            # Context providers
  lib/
    firebase-admin.ts     # Firestore setup (lazy initialization)
    auth.ts               # NextAuth config
    prompts.ts            # Dynamic prompt management
    providers/            # AI provider abstraction
      provider-factory.ts
      gemini.provider.ts
    agent/                # Agentic architecture (ReAct pattern)
      core/               # Agent core, context manager, prompts
      tools/              # Tool implementations (web_search, memory, etc.)
    prompt-analysis/      # AI-powered intent analysis
      analyzer.ts         # PromptAnalyzer using Gemini Flash Lite
    context-engineering/  # Context orchestration
      orchestrator.ts     # Coordinates web search, memory, model selection
    web-search/           # Web search integration
      google-search.ts    # Google Custom Search API client
      rate-limiter.ts     # Per-user rate limiting
      content-fetcher.ts  # Fetch and extract web content
    progress/             # Progress tracking system
      emitter.ts          # Server-side event emitter
      types.ts            # Progress step types
    memory/               # Memory system
      storage.ts          # CRUD operations
      extractor.ts        # AI-powered extraction
      loader.ts           # Memory loading for chat
      cleanup.ts          # Automatic cleanup
    keywords/             # Keyword trigger system (legacy)
      system.ts
      triggers.ts
    paper-reader/         # arXiv paper analysis
    repo-reader/          # GitHub repository analysis
    deep-research/        # Multi-turn research with Gemini
  config/
    models.ts             # Gemini model tiering
    keywords.ts           # Bilingual keywords (175+ triggers)
    feature-flags.ts      # Feature toggles
  types/
    index.ts              # Main types
    memory.ts             # Memory system types
    prompts.ts            # Prompt types
    file.ts               # File attachment types
    ai-providers.ts       # Provider interfaces
    agent.ts              # Agent types
    prompt-analysis.ts    # Analysis types
  __tests__/              # Jest unit tests (374 tests)
e2e/                      # Playwright E2E tests (142 tests, 14 files)

Key Features Explained

🧠 Memory System

The AI automatically learns from your conversations:

  • Hybrid Triggering: Keywords ("remember that") or automatic after 5+ messages
  • Tiered Retention: CORE (permanent), IMPORTANT (90 days), CONTEXT (30 days)
  • Smart Cleanup: Removes low-value facts to stay under 500-token budget
  • User Control: View and delete facts at /profile

🎨 Image Generation

Generate images directly in chat:

  • English: "create an image of a sunset"
  • Chinese: "生成一幅图片,描绘星空"
  • Native Gemini 2.5 Flash Image model
  • Inline display in conversation

📎 File Attachments

Upload and analyze files:

  • Images: PNG, JPG, GIF, WebP
  • Documents: PDF
  • AI can analyze and discuss file contents
  • Multimodal processing with Gemini

🌏 Bilingual Support

Full Chinese and English support:

  • 138 memory trigger keywords (both languages)
  • 37 image generation keywords (both languages)
  • Language preference auto-detection
  • Hybrid mode for mixed conversations

🤖 Agentic Architecture

ReAct (Reason-Act-Observe) pattern for autonomous AI:

  • Iterative loop: Up to 5 iterations per request
  • Available tools: web_search, web_fetch, memory_save, memory_retrieve, get_current_time
  • sourceCategory: Target reliable sources (Wikipedia, StackOverflow, Reuters, etc.)
  • Agent autonomously decides when to use tools vs respond directly
  • Enable with NEXT_PUBLIC_USE_AGENTIC_MODE=true

🔍 Web Search & Content Fetching

Real-time web search with resilient content fetching:

  • Search Provider: Google Custom Search API (20/hour, 100/day per user)
  • Content Fetching: Multi-tier fallback chain for 90-95% success rate
    • Cache: In-memory LRU (500 entries, 1h TTL) for instant responses
    • Direct: Cheerio + 8 diverse User-Agents
    • Jina.ai Reader: JavaScript rendering + bot bypass for blocked sites
    • Archive.org: Final fallback for historical/blocked content
  • Smart Handling: Automatic fallback for 401/403 errors (Reuters, Bloomberg, WSJ)
  • Source Tracking: Metadata shows which method succeeded (direct/jina.ai/archive.org)
  • Enable with NEXT_PUBLIC_USE_WEB_SEARCH=true

📊 Progress Tracking

Real-time visual feedback during AI responses:

  • Steps: Analyzing → Searching → Retrieving Memory → Building Context → Generating
  • Single updating badge shows current progress
  • Server-Sent Events protocol for streaming updates

🖼️ Image Upload & Storage

Upload images directly in chat:

  • Paste: Ctrl+V to paste images from clipboard
  • Drag & drop: Drop images directly into chat input
  • Storage: Cloudflare R2 for cost-effective storage
  • CDN: Fast global delivery via Cloudflare CDN

📑 Paper Reader

Analyze academic papers from arXiv:

  • URL validation: arXiv paper URL detection
  • Multi-phase analysis: Overview, methodology, results, critique
  • Save to Whim: Export analysis as editable document
  • Progress tracking: Real-time analysis status

📂 Repo Reader

Analyze GitHub repositories:

  • Deterministic exploration: Import-tracing instead of AI guessing
  • 4-phase analysis: Recon → Entry points → Module exploration → Synthesis
  • Architecture docs: Generate comprehensive architecture documentation
  • Token-budgeted: Smart file selection within limits

🔬 Deep Research

Multi-turn research sessions:

  • Gemini Interactions API: Grounded search with real-time results
  • Iterative queries: Up to 10 search iterations
  • Source aggregation: Automatic citation collection
  • Save to Whim: Export research as document

Admin Features

As an admin, you can:

  1. Manage Whitelist: Add/remove emails that can access the app
  2. View User Stats: See all users, message counts, and last active times
  3. Configure Prompts: Edit system prompts and temperature settings
  4. Access Admin Panel: Click "Admin Panel" in the sidebar

Cost Estimation

For family use (5-10 users, ~1000 messages/month):

  • Firestore: FREE (within free tier)
  • Cloud Run: $5-10/month (scales to zero when idle)
  • Gemini API: $2-5/month (tiered models for optimization)
    • Chat (2.5 Flash): ~$1.70
    • Memory extraction (2.5 Flash-Lite): ~$0.50
    • Image generation (occasional): ~$0.50
    • Web content extraction: ~$0.35/month (down from $0.50 thanks to caching!)
  • Total: $7.50-17.50/month ✅ Well under $30 budget!

Cost optimizations:

  • WebFetch caching: Saves ~$0.15/month (30-40% cache hit rate)
  • Jina.ai Reader: FREE (with API key, unlimited use)
  • Archive.org: FREE (unlimited historical content)

Cost per feature:

  • Base chat: ~$6-12/month
  • Memory system: +$0.50-1/month
  • Image generation: +$0.50-2/month
  • Web search & fetch: +$0.35/month (optimized with caching)
  • File attachments: included (no extra cost)

Testing

WhimCraft has comprehensive test coverage with Jest (unit) and Playwright (E2E).

Unit Tests (Jest + TypeScript)

# Run all tests
npx jest

# Run with coverage
npx jest --coverage

# Run specific suite
npx jest src/__tests__/lib/memory/cleanup.test.ts

# Watch mode
npx jest --watch

Current Status: 374 tests passing (100% pass rate)

  • Memory system (42 tests): cleanup, extraction, loading, storage
  • Agent system (58 tests): core, tools, context manager
  • Web search & fetch (27 tests): search, rate limiting, content fetching, fallback chain, cache
  • Context orchestration (8 tests)
  • Prompt analysis (31 tests)
  • Whim system (124 tests): editor, converter, storage, validation
  • Paper/Repo readers (40+ tests): parsing, analysis, validation
  • Deep Research (40+ tests): API integration, flow control

E2E Tests (Playwright)

# Run all E2E tests (headless, ~2 minutes)
npm run test:e2e:fast
# or
npx playwright test

# Interactive UI mode
npx playwright test --ui

# Run with visible browser
npx playwright test --headed

# Debug mode with inspector
npx playwright test --debug

Current Status: 142 tests in 14 files (100% pass rate)

  • 01-ui-and-ux.e2e.ts - UI/UX fundamentals
  • 02-authenticated-chat.e2e.ts - Chat flows
  • 03-visual-and-accessibility.e2e.ts - Accessibility
  • 04-core-features.e2e.ts - Core functionality
  • 05-whim-editor.e2e.ts - Whim editor
  • 06-pro-mode.e2e.ts - PRO mode
  • 07-paper-reader.e2e.ts - Paper Reader
  • 08-pdf-tools.e2e.ts - PDF tools
  • 09-image-upload.e2e.ts - Image upload
  • 10-welcome-navigator.e2e.ts - Welcome page
  • 11-repo-reader.e2e.ts - Repo Reader
  • 12-deep-research.e2e.ts - Deep Research
  • web-fetch-resilience.spec.ts - Web fetch fallback chain
  • financial-web-fetch.spec.ts - Financial website handling

See docs/TESTING.md for detailed testing guide.

Deployment

For complete deployment instructions to Google Cloud Run, see docs/DEPLOYMENT.md.

About

a private AI agent for family and friends

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages