Voice-in, AI-out, voice-back. This small Python app lets you speak into your mic, sends the transcribed text to an OpenAI model for a reply, and then speaks the reply aloud.
It uses:
- speech_recognition for microphone capture and Google Web Speech for transcription (internet required)
- OpenAI Python SDK (Responses API) for the AI reply
- pyttsx3 for offline text-to-speech output on Windows (SAPI5)
- python-dotenv to load your OpenAI API key from a local .env file
- Windows 10/11
- Python 3.9–3.12 (recommended 3.10+)
- A working microphone and speakers/headphones
- Internet access (for speech recognition and OpenAI API)
- An OpenAI API key
Keep your API key private. Don’t commit it to source control.
The commands below are for Windows PowerShell.
-
Clone or open this folder in VS Code.
-
Create and activate a virtual environment:
python -m venv .venv
.\.venv\Scripts\Activate- Install dependencies:
pip install -r requirements.txtIf you run into issues installing PyAudio on Windows, see Troubleshooting below.
- Create a .env file in the project root with your key:
OPENAI_API_KEY="your_api_key_here"- Run the app:
python aiwaifu.pyThe app will:
- calibrate for ambient noise,
- record up to 15 seconds of speech,
- send your words to an OpenAI model,
- and speak the model’s reply.
You can tweak a few things in aiwaifu.py.
-
Speech language: in
speech_to_text, changelanguage="en-US"to your locale, e.g."en-GB","fr-FR","ja-JP". -
Record duration: in
recognizer.listen(...), changetimeoutandphrase_time_limit(both are 15 seconds by default). -
Voice selection (TTS): in
speak_text, the code usesvoices[2]. If you get an index error or don’t like the voice, change the index. To list installed voices:import pyttsx3 e = pyttsx3.init() for i, v in enumerate(e.getProperty('voices')): print(i, v.id)
-
PyAudio installation fails on Windows
- Try installing via pipwin:
pip install pipwin pipwin install pyaudio
- Or install a prebuilt wheel compatible with your Python version from a trusted source, then
pip install <wheel_file.whl>.
- Try installing via pipwin:
-
No default input device / microphone not found
- In Windows Sound Settings, set a default input device and ensure the mic is allowed for desktop apps.
-
speech_recognition.WaitTimeoutError- You were silent past the
timeout. Speak sooner or increasetimeout.
- You were silent past the
-
Google speech recognition
RequestError- Check your internet connection. The default recognizer relies on Google’s web API.
-
OpenAI
InvalidRequestError: model not foundorAuthenticationError- Update the
modelname inaiwaifu.pyto a valid one (e.g.,gpt-4o,gpt-4o-mini). - Ensure
.envcontains a validOPENAI_API_KEYand it’s being loaded.
- Update the
-
TTS voice issues or index errors
- List voices (see snippet above) and pick an available index.
- Ensure your output device is working and not muted.
AIWaifuBot/
├─ aiwaifu.py # Main script: STT → OpenAI → TTS
├─ .env # Your OpenAI API key (not committed)
├─ requirements.txt # Python dependencies
└─ README.md # This file
- Speech transcription uses Google’s web API via
speech_recognition, which sends audio snippets over the internet. - OpenAI API usage is billed per token. Consider using
gpt-5-minifor lower cost.
No license specified yet.
- Add a system prompt/persona so the bot keeps a specific “waifu” style.
- Use a local or different STT provider (e.g., Whisper) for better privacy.
- Add a wake word or continuous listening loop.
- Persist conversation history for multi-turn context.