A web application for realistic multi-agent conversations powered by AI. Create up to 3 AI agents with distinct personalities and voices, then engage in natural conversations where agents listen to each other and respond dynamically.
✅ Multi-Agent Conversations: Up to 3 AI agents + 1 human user ✅ Natural Turn-Taking: Agents compete naturally for speaking turns ✅ Voice I/O: Real-time speech-to-text and text-to-speech ✅ Custom Agents: Define personalities, roles, voices, and avatars ✅ Interruptions: Interrupt at any time for natural conversation flow ✅ Conversation Analysis: AI-powered feedback on your performance ✅ Transcript Saving: Full conversation history with audio
- Interview Preparation: Practice with AI interviewers
- D&D Adventures: Play with AI game masters and NPCs
- Debate Practice: Engage multiple AI perspectives
- Role-Playing: Develop characters through conversation
Backend:
- Node.js + Express + TypeScript
- Socket.io (WebSocket real-time communication)
- SQLite (database)
- AWS Bedrock (Claude Sonnet 4.5 & Haiku)
- OpenAI Whisper (speech-to-text)
- ElevenLabs (text-to-speech)
Frontend:
- React 18 + TypeScript
- Tailwind CSS
- Socket.io-client
- Web Audio API
- Node.js 20+
- npm or yarn
- AWS Account with Bedrock access (Claude models)
- ElevenLabs API Key (already configured)
- OpenAI API Key (for Whisper STT)
# Install backend dependencies
cd backend
npm install
# Install frontend dependencies (once created)
cd ../frontend
npm installThe backend .env file is already set up with your ElevenLabs key. You need to add:
# Edit backend/.env
# Add your OpenAI API key
OPENAI_API_KEY=your_openai_key_here
# AWS keys are already configuredcd backend
npm run db:init# Terminal 1: Start backend
cd backend
npm run dev
# Terminal 2: Start frontend (once created)
cd frontend
npm run devNavigate to http://localhost:5173
conversAItion/
├── backend/ # Node.js backend server
│ ├── src/
│ │ ├── server.ts # Express + Socket.io entry point
│ │ ├── orchestrator.ts # Conversation orchestration
│ │ ├── services/ # AI services (Claude, Whisper, ElevenLabs)
│ │ ├── models/ # Database models
│ │ └── routes/ # REST API routes
│ ├── database/ # SQLite database
│ └── uploads/ # User-uploaded avatars
├── frontend/ # React frontend (to be created)
│ └── src/
│ ├── components/ # React components
│ ├── hooks/ # Custom hooks
│ └── services/ # API client
├── shared/ # Shared TypeScript types
└── project-plan.md # Detailed implementation plan
-
GET /api/agents- List all agents -
POST /api/agents- Create agent -
PUT /api/agents/:id- Update agent -
DELETE /api/agents/:id- Delete agent -
POST /api/agents/upload-avatar- Upload avatar image -
GET /api/conversations- List conversations -
GET /api/conversations/:id- Get conversation details -
POST /api/conversations- Create conversation -
GET /api/conversations/:id/analysis- Get AI analysis -
GET /api/voices- Get ElevenLabs voices
Client → Server:
conversation:start- Start new conversationuser:speak- Send audiouser:interrupt- Interrupt agentconversation:end- End conversation
Server → Client:
conversation:started- Conversation readystatus:update- Status change (listening/thinking/speaking)agent:speaking- Agent about to speakagent:audio- Agent audio datatranscript:update- New messageerror- Error occurred
- Navigate to Agent Creator
- Fill in agent details:
- Name: e.g., "Dr. Sarah Chen"
- Role: e.g., "Senior Technical Interviewer"
- Persona: Detailed personality description
- Select voice from ElevenLabs library
- Choose or upload avatar
- Save agent
- Select 1-3 agents
- Enter conversation topic
- Click "Start Conversation"
- Allow microphone access
- Speak: Just talk naturally
- Listen: Agents will respond in turn
- Interrupt: Click interrupt button or press hotkey
- View Transcript: See conversation in real-time
- Click "End Conversation"
- View AI-generated analysis:
- Conversation summary
- Your strengths
- Areas for improvement
- Key moments
- Detailed feedback
- Export transcript if desired
Approximate costs per hour of conversation:
- Claude API: ~$13-15 (agent intelligence + scoring)
- Whisper STT: ~$0.36 (speech transcription)
- ElevenLabs TTS: ~$18 (voice synthesis)
Total: ~$30-35/hour
Optimization strategies in project-plan.md can reduce to ~$20-25/hour.
cd backend
npm run type-check
npm run lint# Backend
cd backend
npm run build
npm start
# Frontend
cd frontend
npm run build- Check that all environment variables are set
- Ensure database is initialized:
npm run db:init - Verify Node.js version (20+)
- Verify ElevenLabs API key is valid
- Check voice IDs are correct
- Ensure agent has valid voiceId
- Add OpenAI API key to
.env - Check microphone permissions
- Verify audio format is supported
- Check network connection
- Consider using Claude Haiku for some agents
- Optimize prompt lengths
- Enable audio caching
- User speaks → Audio captured
- STT → Whisper transcribes to text
- Agent generation → All agents generate responses (parallel)
- Scoring → Claude scores each response
- Winner selection → Highest scored response chosen
- TTS → ElevenLabs synthesizes audio
- Playback → Audio played to user
- Repeat → Loop continues
All agents generate responses simultaneously. Each response is scored on:
- Relevance (0-4): Advances conversation
- Consistency (0-3): Matches persona
- Engagement (0-3): Interesting/valuable
Highest scoring agent speaks next.
See project-plan.md for complete roadmap, including:
- Voice cloning for custom voices
- Long-term agent memory
- Multi-modal with video avatars
- Mobile applications
- Team workspaces
- API access
A comprehensive architectural code review (2025-11-14) identified critical issues that must be addressed before production deployment:
- Zero test coverage (no automated tests)
- Security vulnerabilities (exposed API keys, weak auth, no input validation)
- No database backups or disaster recovery
- Missing production logging and error monitoring
- Single-server architecture (cannot scale)
- Memory leaks in conversation state
- No CI/CD pipeline
- Strengths: Innovative architecture, excellent performance optimizations
- Weaknesses: Security, testing, scalability, observability gaps
- Minimum viable: 4-5 weeks
- Production ready: 8-10 weeks
- Enterprise grade: 12-14 weeks
For complete details, see critical-fixes.md
This is currently a development prototype demonstrating:
- ✅ Innovative multi-agent AI architecture
- ✅ Real-time voice conversation capabilities
- ✅ Natural turn-taking algorithms
- ✅ Clean, maintainable codebase
Not yet suitable for:
- ❌ Production deployment
- ❌ Handling real user data
- ❌ Scaling beyond demo usage
- ❌ Enterprise security requirements
This is currently an MVP. Contributions welcome after initial release.
Priority areas for contribution:
- Test suite implementation (Jest/Vitest)
- Security hardening (JWT auth, input validation)
- Production logging and monitoring
- Docker containerization and CI/CD
- Database migration system
MIT
- Documentation: See
project-plan.mdandCLAUDE.mdfor technical docs - Code Review: See
critical-fixes.mdfor production readiness assessment - Issues: GitHub Issues
- Questions: Contact project maintainer
Status: Development MVP ✅ | Production Ready 🚧 (8-10 weeks)
Built with ❤️ using Claude Sonnet 4.5