ConversAItion 🎙️

A web application for realistic multi-agent conversations powered by AI. Create up to 3 AI agents with distinct personalities and voices, then engage in natural conversations where agents listen to each other and respond dynamically.

Features

✅ Multi-Agent Conversations: Up to 3 AI agents + 1 human user ✅ Natural Turn-Taking: Agents compete naturally for speaking turns ✅ Voice I/O: Real-time speech-to-text and text-to-speech ✅ Custom Agents: Define personalities, roles, voices, and avatars ✅ Interruptions: Interrupt at any time for natural conversation flow ✅ Conversation Analysis: AI-powered feedback on your performance ✅ Transcript Saving: Full conversation history with audio

Use Cases

Interview Preparation: Practice with AI interviewers
D&D Adventures: Play with AI game masters and NPCs
Debate Practice: Engage multiple AI perspectives
Role-Playing: Develop characters through conversation

Technology Stack

Backend:

Node.js + Express + TypeScript
Socket.io (WebSocket real-time communication)
SQLite (database)
AWS Bedrock (Claude Sonnet 4.5 & Haiku)
OpenAI Whisper (speech-to-text)
ElevenLabs (text-to-speech)

Frontend:

React 18 + TypeScript
Tailwind CSS
Socket.io-client
Web Audio API

Prerequisites

Node.js 20+
npm or yarn
AWS Account with Bedrock access (Claude models)
ElevenLabs API Key (already configured)
OpenAI API Key (for Whisper STT)

Setup Instructions

1. Clone & Install

# Install backend dependencies
cd backend
npm install

# Install frontend dependencies (once created)
cd ../frontend
npm install

2. Configure Environment Variables

The backend .env file is already set up with your ElevenLabs key. You need to add:

# Edit backend/.env

# Add your OpenAI API key
OPENAI_API_KEY=your_openai_key_here

# AWS keys are already configured

3. Initialize Database

cd backend
npm run db:init

4. Start Development Servers

# Terminal 1: Start backend
cd backend
npm run dev

# Terminal 2: Start frontend (once created)
cd frontend
npm run dev

5. Open Browser

Navigate to http://localhost:5173

Project Structure

conversAItion/
├── backend/              # Node.js backend server
│   ├── src/
│   │   ├── server.ts     # Express + Socket.io entry point
│   │   ├── orchestrator.ts # Conversation orchestration
│   │   ├── services/     # AI services (Claude, Whisper, ElevenLabs)
│   │   ├── models/       # Database models
│   │   └── routes/       # REST API routes
│   ├── database/         # SQLite database
│   └── uploads/          # User-uploaded avatars
├── frontend/             # React frontend (to be created)
│   └── src/
│       ├── components/   # React components
│       ├── hooks/        # Custom hooks
│       └── services/     # API client
├── shared/               # Shared TypeScript types
└── project-plan.md       # Detailed implementation plan

API Endpoints

REST API

GET /api/agents - List all agents
POST /api/agents - Create agent
PUT /api/agents/:id - Update agent
DELETE /api/agents/:id - Delete agent
POST /api/agents/upload-avatar - Upload avatar image
GET /api/conversations - List conversations
GET /api/conversations/:id - Get conversation details
POST /api/conversations - Create conversation
GET /api/conversations/:id/analysis - Get AI analysis
GET /api/voices - Get ElevenLabs voices

WebSocket Events

Client → Server:

conversation:start - Start new conversation
user:speak - Send audio
user:interrupt - Interrupt agent
conversation:end - End conversation

Server → Client:

conversation:started - Conversation ready
status:update - Status change (listening/thinking/speaking)
agent:speaking - Agent about to speak
agent:audio - Agent audio data
transcript:update - New message
error - Error occurred

Usage Guide

1. Create Agents

Navigate to Agent Creator
Fill in agent details:
- Name: e.g., "Dr. Sarah Chen"
- Role: e.g., "Senior Technical Interviewer"
- Persona: Detailed personality description
Select voice from ElevenLabs library
Choose or upload avatar
Save agent

2. Start Conversation

Select 1-3 agents
Enter conversation topic
Click "Start Conversation"
Allow microphone access

3. Have Conversation

Speak: Just talk naturally
Listen: Agents will respond in turn
Interrupt: Click interrupt button or press hotkey
View Transcript: See conversation in real-time

4. End & Analyze

Click "End Conversation"
View AI-generated analysis:
- Conversation summary
- Your strengths
- Areas for improvement
- Key moments
- Detailed feedback
Export transcript if desired

Cost Estimation

Approximate costs per hour of conversation:

Claude API: ~$13-15 (agent intelligence + scoring)
Whisper STT: ~$0.36 (speech transcription)
ElevenLabs TTS: ~$18 (voice synthesis)

Total: ~$30-35/hour

Optimization strategies in project-plan.md can reduce to ~$20-25/hour.

Development

Run Tests

cd backend
npm run type-check
npm run lint

Build for Production

# Backend
cd backend
npm run build
npm start

# Frontend
cd frontend
npm run build

Troubleshooting

Backend won't start

Check that all environment variables are set
Ensure database is initialized: npm run db:init
Verify Node.js version (20+)

Voice not working

Verify ElevenLabs API key is valid
Check voice IDs are correct
Ensure agent has valid voiceId

STT not working

Add OpenAI API key to .env
Check microphone permissions
Verify audio format is supported

High latency

Check network connection
Consider using Claude Haiku for some agents
Optimize prompt lengths
Enable audio caching

Architecture

Conversation Flow

User speaks → Audio captured
STT → Whisper transcribes to text
Agent generation → All agents generate responses (parallel)
Scoring → Claude scores each response
Winner selection → Highest scored response chosen
TTS → ElevenLabs synthesizes audio
Playback → Audio played to user
Repeat → Loop continues

Natural Competition Algorithm

All agents generate responses simultaneously. Each response is scored on:

Relevance (0-4): Advances conversation
Consistency (0-3): Matches persona
Engagement (0-3): Interesting/valuable

Highest scoring agent speaks next.

Future Enhancements

See project-plan.md for complete roadmap, including:

Voice cloning for custom voices
Long-term agent memory
Multi-modal with video avatars
Mobile applications
Team workspaces
API access

Production Status & Known Issues

⚠️ IMPORTANT: This application is currently NOT PRODUCTION READY

A comprehensive architectural code review (2025-11-14) identified critical issues that must be addressed before production deployment:

Critical Issues (9)

Zero test coverage (no automated tests)
Security vulnerabilities (exposed API keys, weak auth, no input validation)
No database backups or disaster recovery
Missing production logging and error monitoring
Single-server architecture (cannot scale)
Memory leaks in conversation state
No CI/CD pipeline

Overall Grade: B- (70/100)

Strengths: Innovative architecture, excellent performance optimizations
Weaknesses: Security, testing, scalability, observability gaps

Timeline to Production Ready

Minimum viable: 4-5 weeks
Production ready: 8-10 weeks
Enterprise grade: 12-14 weeks

For complete details, see critical-fixes.md

Development vs Production

This is currently a development prototype demonstrating:

✅ Innovative multi-agent AI architecture
✅ Real-time voice conversation capabilities
✅ Natural turn-taking algorithms
✅ Clean, maintainable codebase

Not yet suitable for:

❌ Production deployment
❌ Handling real user data
❌ Scaling beyond demo usage
❌ Enterprise security requirements

Contributing

This is currently an MVP. Contributions welcome after initial release.

Priority areas for contribution:

Test suite implementation (Jest/Vitest)
Security hardening (JWT auth, input validation)
Production logging and monitoring
Docker containerization and CI/CD
Database migration system

License

MIT

Support

Documentation: See project-plan.md and CLAUDE.md for technical docs
Code Review: See critical-fixes.md for production readiness assessment
Issues: GitHub Issues
Questions: Contact project maintainer

Status: Development MVP ✅ | Production Ready 🚧 (8-10 weeks)

Built with ❤️ using Claude Sonnet 4.5

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backend		backend
frontend		frontend
shared		shared
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
MONETIZATION_ANALYSIS.md		MONETIZATION_ANALYSIS.md
README.md		README.md
critical-fixes.md		critical-fixes.md
project-plan.md		project-plan.md

Folders and files

Latest commit

History

Repository files navigation

ConversAItion 🎙️

Features

Use Cases

Technology Stack

Prerequisites

Setup Instructions

1. Clone & Install

2. Configure Environment Variables

3. Initialize Database

4. Start Development Servers

5. Open Browser

Project Structure

API Endpoints

REST API

WebSocket Events

Usage Guide

1. Create Agents

2. Start Conversation

3. Have Conversation

4. End & Analyze

Cost Estimation

Development

Run Tests

Build for Production

Troubleshooting

Backend won't start

Voice not working

STT not working

High latency

Architecture

Conversation Flow

Natural Competition Algorithm

Future Enhancements

Production Status & Known Issues

Critical Issues (9)

Overall Grade: B- (70/100)

Timeline to Production Ready

Development vs Production

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages