Real-time AI voice conversation application using OpenAI Whisper, GPT-4o-mini, and TTS with LiveKit for WebRTC audio streaming.
- 🎤 Voice Activity Detection - Automatically detects when you start and stop speaking
- 🗣️ Real-time Transcription - Converts speech to text using OpenAI Whisper
- 🤖 AI Responses - Generates intelligent responses using GPT-4o-mini
- 🔊 Text-to-Speech - Plays AI responses with high-quality voice synthesis
- ⏱️ 2-Second Silence Detection - Automatically sends audio after you finish speaking
- 🔄 Sequential Flow - Prevents overlapping conversations for natural interaction
Backend:
- FastAPI
- OpenAI API (Whisper, GPT-4o-mini, TTS-1-HD)
- LiveKit
- Python 3.11+
Frontend:
- Next.js 15
- TypeScript
- LiveKit Client
- Tailwind CSS
- Python 3.11+
- Node.js 18+
- OpenAI API key
- LiveKit credentials (optional, for production)
cd backend
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txtCreate backend/.env:
OPENAI_API_KEY=sk-your-key-here
LIVEKIT_API_KEY=your-key
LIVEKIT_API_SECRET=your-secret
LIVEKIT_URL=ws://localhost:7880
Run backend:
uvicorn main:app --reload --port 8000cd frontend
npm installCreate frontend/.env.local:
NEXT_PUBLIC_BACKEND_URL=http://localhost:8000
Run frontend:
npm run dev📖 Backend Documentation - API endpoints, configuration, dependencies
📖 Frontend Documentation - Components, hooks, architecture
📖 AI Prompts - AI tools and prompts used during development
skill.io/
├── backend/
│ ├── main.py # FastAPI app with endpoints
│ ├── ai_handler.py # OpenAI integrations
│ ├── requirements.txt # Python dependencies
│ └── README.md # Backend documentation
├── frontend/
│ ├── app/ # Next.js app directory
│ ├── components/ # React components
│ ├── hooks/ # Custom hooks
│ └── README.md # Frontend documentation
├── README.md # This file
└── AI_PROMPTS.md # AI assistance documentation
- User speaks → Frontend records audio continuously
- 2 seconds of silence → Recording stops, audio sent to backend
- Backend transcribes → OpenAI Whisper converts speech to text
- Text displayed → User sees transcription immediately
- AI generates response → GPT-4o-mini creates reply
- Response played → TTS converts text to speech and plays audio
- Ready for next question → Cycle repeats
GET /- Health checkPOST /token- Generate LiveKit access tokenPOST /transcribe- Transcribe audio to textPOST /respond- Generate AI response with audio
Live Application Demo:
🔗 Watch Full Demo - Complete walkthrough of the AI voice conversation app
Additional Demo:
🔗 Extended Demo - Additional features and functionality
- Enhance voice activity detection accuracy with best / custom model
- When user speak, live transcribed text using streaming / sophisticated technology