🏗️ Architecture Documentation

Comprehensive technical documentation for the AI Interview Simulator.

System Overview
Architecture Diagram
Technology Stack
Core Components
Data Flow
Database Schema
Application Modes
Key Features Deep Dive
Project Structure

System Overview

The AI Interview Simulator is a real-time voice-based interview practice platform that uses Google's Gemini Live API for bidirectional audio streaming. Users can practice job interviews with an AI interviewer that responds naturally in real-time.

Key Characteristics

Real-time audio streaming - No record-then-send, pure live conversation
Server-side orchestration - Spring Boot manages sessions and security
Multi-language support - English and Bulgarian with localized prompts
CV-aware interviews - AI uses uploaded resumes for personalized questions
Automatic grading - Post-interview AI analysis and scoring

Architecture Diagram

┌─────────────────┐      STOMP/WebSocket      ┌──────────────────┐     OkHttp WebSocket     ┌─────────────────┐
│                 │                           │                  │                          │                 │
│     Browser     │ ◄───────────────────────► │   Spring Boot    │ ◄──────────────────────► │ Gemini Live API │
│   (16kHz PCM)   │                           │   Orchestrator   │                          │   (24kHz PCM)   │
│                 │                           │                  │                          │                 │
└─────────────────┘                           └──────────────────┘                          └─────────────────┘
        │                                             │
        │ REST API                                    │
        ├────────────────────────────────────────────►│
        │ • CV upload                                 │
        │ • API key validation                        │
        │ • Voice preview                             │
        │ • Mode check                                │◄────► PostgreSQL
        │                                             │       (Sessions & Feedback)

Why Three-Tier Architecture?

Benefit	Description
Security	API keys handled server-side, never exposed to browser
Persistence	Transcripts and feedback stored permanently
Business Logic	Grading, prompts, session management centralized
Multi-user	Concurrent sessions with proper isolation
i18n	Server-side internationalization in English/Bulgarian

Technology Stack

Backend

Technology	Version	Purpose
Spring Boot	4.0.0	Application framework
Java	21	Runtime (LTS)
Spring WebSocket	-	STOMP broker for browser communication
OkHttp	4.12.0	WebSocket client for Gemini API
Spring Data JPA	-	Database abstraction
Flyway	-	Database migrations
Lombok	-	Boilerplate reduction
Apache PDFBox	3.0.4	PDF text extraction
Apache POI	5.3.0	DOCX text extraction

Frontend

Technology	Purpose
Vanilla JavaScript	~1,400 lines, browser APIs only
Web Audio API	Microphone capture & audio playback
STOMP.js	WebSocket messaging protocol
SockJS	WebSocket fallback for older browsers
Tailwind CSS	Utility-first styling (CDN)
Thymeleaf	Server-side HTML rendering

AI Models

Model	Purpose
Gemini 2.5 Flash	Real-time audio conversations (live-model)
Gemini 3 Flash	Post-interview grading (grading-model)

Infrastructure

Component	Purpose
PostgreSQL 16	Primary database
Docker	Containerization
Docker Compose	Multi-container orchestration

Core Components

Backend Services

src/main/java/net/k2ai/interviewSimulator/
├── service/
│   ├── GeminiIntegrationService.java   # Session lifecycle, message routing
│   ├── GeminiLiveClient.java           # Low-level WebSocket to Gemini API
│   ├── GeminiModelRotationService.java # Model/key rotation with rate limit tracking
│   ├── InterviewService.java           # Database CRUD for sessions
│   ├── GradingService.java             # AI-powered post-interview evaluation
│   ├── InterviewPromptService.java     # Language/difficulty-aware prompts
│   ├── CvProcessingService.java        # PDF/DOCX text extraction
│   ├── InputSanitizerService.java      # Input validation & sanitization
│   ├── AdminService.java               # Admin dashboard stats, password management
│   └── RateLimitService.java           # API key validation rate limiting

Controllers

Controller	Path	Purpose
`SetupController`	`/setup/*`	Multi-step interview setup wizard
`InterviewWebSocketController`	`/app/interview/*`	STOMP message handlers
`ReportController`	`/report/{id}`	Server-rendered interview reports
`AdminController`	`/admin/*`	Admin dashboard, login, password change
`LegalController`	`/legal/*`	Mode-aware Privacy Policy & Terms pages
`ApiKeyController`	`/api/mode`, `/api/validate-key`	API key validation
`CvController`	`/api/cv/upload`	CV file upload
`VoiceController`	`/api/voices/*`	Voice list and preview audio
`ErrorController`	`/error/*`	Mobile-not-supported and error pages

Configuration Classes

Config	Purpose
`GeminiConfig`	API keys, model names, app mode
`WebSocketConfig`	STOMP broker configuration
`WebSocketEventListener`	Session connect/disconnect handling
`I18nConfig`	Locale resolver with cookie persistence
`SecurityConfig`	Spring Security configuration

Data Flow

Audio Streaming Flow

1. User speaks → Browser captures 16kHz PCM via Web Audio API
2. Browser → Server → STOMP message to /app/interview/audio (Base64)
3. Server → Gemini → OkHttp WebSocket sends Base64 PCM
4. Gemini → Server → Returns 24kHz audio + transcriptions
5. Server → Browser → STOMP to /user/queue/audio
6. Browser plays → Web Audio API at 24kHz (gapless crossfade scheduling)

CV Processing Flow

1. User uploads → PDF/DOCX file (max 10MB)
2. Browser → Server → POST /api/cv/upload with multipart form
3. Server extracts → Text via PDFBox (PDF) or Apache POI (DOCX)
4. Text included → In system instruction for personalized interview

Interview Session Flow

1. /setup/step1 → User enters name
2. /setup/step2 → User selects position, difficulty, uploads CV
3. /setup/step3 → User selects language and voice
4. /interview → WebSocket connects, interview begins
5. AI greets user → Interview conversation flows
6. AI concludes → Automatic grading triggered
7. /report/{id} → User views detailed feedback

Database Schema

Tables

-- Interview Sessions
interview_sessions (
    id UUID PRIMARY KEY,
    candidate_name VARCHAR(255) NOT NULL,
    job_position VARCHAR(255) NOT NULL,
    difficulty VARCHAR(50) NOT NULL,
    started_at TIMESTAMP NOT NULL,
    ended_at TIMESTAMP,
    transcript TEXT,
    score INTEGER,
    feedback_json TEXT
)

-- Interview Feedback
interview_feedback (
    id UUID PRIMARY KEY,
    session_id UUID NOT NULL REFERENCES interview_sessions(id),
    overall_score INTEGER NOT NULL,
    communication_score INTEGER NOT NULL,
    technical_score INTEGER NOT NULL,
    confidence_score INTEGER NOT NULL,
    strengths TEXT,           -- JSON array
    improvements TEXT,        -- JSON array
    detailed_analysis TEXT,
    verdict VARCHAR(50),      -- STRONG_HIRE, HIRE, MAYBE, NO_HIRE
    created_at TIMESTAMP NOT NULL
)

Indexes

CREATE INDEX idx_sessions_candidate ON interview_sessions(candidate_name);
CREATE INDEX idx_sessions_started_at ON interview_sessions(started_at);
CREATE INDEX idx_feedback_session ON interview_feedback(session_id);

Application Modes

Mode	API Key Source	Use Case
DEV	`GEMINI_API_KEY` env var on server	Local development
PROD	User provides via modal	Production deployment
REVIEWER	Multi-key rotation via `GEMINI_REVIEWER_KEYS`	Competition judges, demos

PROD Mode Flow

User loads page → App checks mode via /api/mode
Modal appears → Step-by-step guide to get free API key
User enters key → Validated via /api/validate-key
Key stored → In browser localStorage (never sent to our server again)
Key sent → Only to Gemini API via WebSocket

REVIEWER Mode Flow

User loads page → No API key modal shown
Server uses pre-configured API keys with automatic rotation
Grading uses model fallback chain: gemini-3-flash-preview → gemini-2.5-flash → gemini-2.5-flash-lite → gemma-3-12b-it
Each model is paired with its own API key
On rate limit or access error → automatically tries next model/key combo
Rate limit tracking is error-based (no hardcoded limits)

Key Features Deep Dive

Voice Selection

Voice ID	English Name	Bulgarian Name	Gender
Algieba	George	Георги	Male
Kore	Victoria	Виктория	Female
Fenrir	Max	Макс	Male
Despina	Diana	Диана	Female

Voice previews available as WAV files for both languages.

Difficulty Levels

Level	Behavior	CV Usage
Chill	Friendly, encouraging, hints provided	Primary focus - conversational about projects
Standard	Professional, balanced questioning	Mixed with technical questions
Stress	Challenging, time pressure, follow-ups	Background context only - deep technical focus

Position-Specific Prompts

The AI tailors questions based on target position:

Java/Backend → OOP, Spring Boot, databases, API design
Frontend → HTML/CSS/JS, React/Vue/Angular, UX
QA Engineer → Testing methodologies, automation
DevOps → CI/CD, cloud, containerization
PM/Product → Leadership, planning, stakeholders

Session Resumption

The Gemini Live API connection supports automatic resumption:

Session handles stored and updated on each message
On disconnect, server attempts automatic reconnection
Audio buffered during reconnection, flushed on resume
GoAway signals trigger proactive reconnection

Interview Auto-Conclusion

The AI naturally concludes interviews after 5-7 questions. Conclusion phrases (English and Bulgarian) are pattern-matched to trigger automatic grading:

// English patterns
"thank you for your time"
"that concludes our interview"
"we'll be in touch"

// Bulgarian patterns
"благодаря ви за отделеното време"
"ще се свържем с вас"

Privacy by Design

The application is designed to collect as little data as possible:

Aspect	Approach
No user accounts	Users are never asked to register — no emails, phone numbers, or passwords collected
No CV file storage	Uploaded CVs are parsed in-memory and immediately discarded — only extracted text is used during the session
No audio storage	Voice is streamed in real-time via WebSocket and never saved to disk or database
No transcript retention	The interview transcript exists only in-memory during the session for grading, then is discarded
Automatic cleanup	`SessionCleanupScheduler` runs every 6 hours and deletes all sessions + feedback older than 2 weeks
Mode-aware legal pages	Privacy Policy and Terms & Conditions hide API key sections when not in PROD mode

What IS Stored (temporarily)

The only data persisted to the database is the grading report (scores, strengths, improvements, verdict) and basic session metadata (candidate name, position, difficulty). This is automatically deleted after 2 weeks.

Mobile Device Blocking

The MobileDeviceInterceptor detects mobile User-Agent strings (Android, iPhone, iPad, etc.) and redirects them to a /error/mobile-not-supported page. This is intentional — a professional interview simulation requires a desktop environment with a proper microphone and screen.

Admin Panel

Features

Login page at /admin/login with Spring Security form-based authentication
Dashboard at /admin/dashboard showing:
- Total sessions (last 2 weeks), sessions today, average score, top position
- Paginated session table with filtering by position, difficulty, language
- Session duration calculation
- Feedback/verdict per session
Password change via POST to /admin/change-password

Default Credentials

Username: admin
Password: noit2026P4$$ (BCrypt-hashed in Flyway migration V3__create_admin_and_cleanup.sql)

⚠️ Change immediately after first deployment via the admin dashboard.

Security

/admin/** routes require ROLE_ADMIN authentication
/admin/login is publicly accessible
CSRF protection is enabled for admin forms
Passwords are hashed with BCrypt (strength 12)

Project Structure

src/main/java/net/k2ai/interviewSimulator/
├── InterviewSimulatorApplication.java
├── config/
│   ├── GeminiConfig.java
│   ├── I18nConfig.java
│   ├── SecurityConfig.java
│   ├── WebSocketConfig.java
│   └── WebSocketEventListener.java
├── controller/
│   ├── AdminController.java
│   ├── ApiKeyController.java
│   ├── CvController.java
│   ├── ErrorController.java
│   ├── InterviewWebSocketController.java
│   ├── LegalController.java
│   ├── ReportController.java
│   ├── SetupController.java
│   └── VoiceController.java
├── dto/
│   └── InterviewSetupDTO.java
├── entity/
│   ├── AdminUser.java
│   ├── InterviewFeedback.java
│   └── InterviewSession.java
├── exception/
│   ├── RateLimitException.java
│   └── ModelAccessException.java
├── interceptor/
│   └── MobileDeviceInterceptor.java
├── page/
│   └── PageController.java
├── repository/
│   ├── AdminUserRepository.java
│   ├── InterviewFeedbackRepository.java
│   └── InterviewSessionRepository.java
├── scheduler/
│   └── SessionCleanupScheduler.java
├── service/
│   ├── AdminService.java
│   ├── AdminServiceImpl.java
│   ├── AdminUserDetailsService.java
│   ├── CvProcessingService.java
│   ├── GeminiIntegrationService.java
│   ├── GeminiLiveClient.java
│   ├── GeminiModelRotationService.java
│   ├── GradingService.java
│   ├── InputSanitizerService.java
│   ├── InterviewPromptService.java
│   ├── InterviewService.java
│   └── RateLimitService.java
└── validation/
    └── (custom validators)

src/main/resources/
├── application.properties
├── messages.properties         # English (default)
├── messages_bg.properties      # Bulgarian
├── messages_en.properties      # English
├── db/migration/
│   ├── V1__initial_schema.sql
│   ├── V2__add_language_column.sql
│   └── V3__create_admin_and_cleanup.sql
├── static/
│   ├── audio/voices/          # Voice preview WAV files
│   └── js/
│       ├── apikey.js          # API key modal handling (PROD only)
│       ├── audio-processor.js # WebSocket, mic, playback (interview only)
│       ├── interview.js       # Interview UI controls (interview only)
│       ├── language-switcher.js # Language dropdown (all pages)
│       └── microphone-check.js  # Mic availability check (setup only)
└── templates/
    ├── layouts/
    │   ├── main.html          # Base layout
    │   └── fragments/
    │       ├── apikey-modal.html
    │       ├── bodyBottom.html   # Conditional script loading
    │       ├── head.html
    │       ├── microphone-modal.html
    │       └── styles.html
    └── pages/
        ├── admin/
        │   ├── dashboard.html # Admin dashboard with stats & sessions
        │   └── login.html     # Admin login page
        ├── legal/
        │   ├── privacy.html   # Mode-aware Privacy Policy
        │   └── terms.html     # Mode-aware Terms & Conditions
        ├── setup/
        │   ├── step1.html     # Profile
        │   ├── step2.html     # Details + CV
        │   └── step3.html     # Voice & Language
        ├── interview-standalone.html
        ├── mobile-not-supported.html
        ├── report-standalone.html
        └── report-error.html

WebSocket Message Reference

Client → Server

Destination	Purpose	Payload
`/app/interview/start`	Start session	`{candidateName, position, difficulty, language, cvText?, voiceId?, userApiKey?}`
`/app/interview/audio`	Send audio chunk	Base64-encoded 16kHz PCM
`/app/interview/end`	End interview	(none)
`/app/interview/mic-off`	Signal mic muted	(none)

Server → Client

Destination	Purpose	Payload
`/user/queue/status`	Connection/turn status	`{type, message}`
`/user/queue/audio`	AI audio response	`{data: base64}`
`/user/queue/transcript`	Speech transcription	`{speaker, text}`
`/user/queue/report`	Final grading report	Full feedback object
`/user/queue/error`	Error messages	`{message, rateLimited?, invalidKey?}`

← Back to README | API Reference →

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History