A local feedback tool for German language teachers (DaF). Upload a student's audio recording and receive detailed pronunciation analysis — privacy-first: audio never leaves your machine.
- faster-whisper large-v3-turbo (local, free): Transcribes the audio file with per-segment confidence scores
- Word alignment (local): Compares transcription against the target text, highlights uncertain words
- Gemini 2.5 Flash (Google AI): Analyses only the anonymised text — no audio, no biometric data sent to Google
- Python 3.9 or newer (tested with Python 3.13)
- A free Google AI Studio API key: https://aistudio.google.com/apikey
- Chrome, Edge, or Safari browser
Click Code → Download ZIP on this page, then unzip the folder.
Open a terminal, navigate to the project folder, and run:
cd path/to/aussprache_tool
# Create virtual environment
python3 -m venv venv
# Activate it (macOS/Linux)
source venv/bin/activate
# Activate it (Windows)
venv\Scripts\activateYou will see (venv) at the start of your terminal prompt when it is active.
pip install -r requirements.txtThis downloads faster-whisper, Flask, and the Google AI library (~1–2 GB, one-time only).
Note: On first launch, faster-whisper will automatically download the
large-v3-turbomodel (~1.6 GB). This happens once in the background.
- Go to https://aistudio.google.com/apikey and click Create API Key
- Open the file
key.txtin the project folder - Replace the placeholder text with your key:
AIzaSy...
Important — API quota: The Gemini 2.5 Flash Free Tier currently allows only 20 requests/day, which is not enough for a full class session. To increase this to ~250 requests/day at effectively no cost, simply add a payment method in Google AI Studio (Billing → Enable). Actual charges for classroom use are typically less than €0.10/month.
Run the following commands each time you want to use the tool:
# Navigate to the project folder
cd path/to/aussprache_tool
# Activate the virtual environment
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
# Start the server
python3 app.pyThen open http://127.0.0.1:5000 in your browser.
Keep the terminal window open while using the tool. To stop: press Ctrl+C.
- Enter the student's name (optional — appears in the feedback document)
- Select the feedback language: German / English / Traditional Chinese
- Optionally tick known pronunciation issues for targeted feedback even when the speech recogniser misses them:
- ei/ie confusion
- Umlauts ä/ö/ü
- ch-sound (ach vs. ich)
- r-sound
- Final consonants (-t/-d/-st)
- Word stress
- Number pronunciation
- Diphthongs (au/eu/äu)
- Paste the target text (the text the student was asked to read aloud)
- Upload the audio file by drag & drop or click
- Click "Aussprache analysieren" (Analyse Pronunciation)
- Wait ~20–40 seconds: faster-whisper transcribes locally, then Gemini analyses
Transcription & Word Alignment (top section):
- 🟢 Green: correctly recognised
- 🔴 Red + wavy underline: incorrectly recognised
- ⚫ Grey + [brackets]: not recognised (missing)
- 🟡 Yellow background: low confidence score
- Hover over a word to see: target word | recognised word | confidence %
Pronunciation Feedback (report section):
- Recognition rate with progress bar
- Target text (as continuous prose)
- Transcription & word alignment (colour-coded)
- Overall impression, strengths, problem table, targeted tips, practice exercise
Click "🖨️ A4 drucken" or "📱 A5 / Mobil" to open a print-ready version of the feedback. In the print dialog, choose "Save as PDF". The A5 format uses a larger font size for comfortable reading on mobile devices.
| Data | Where it is processed |
|---|---|
| Audio file | Stays on your computer; deleted immediately after analysis |
| Transcription & alignment | Computed locally; never leaves your machine |
| Sent to Google | Only: target text + transcription + error list (plain text) |
| Biometric data | None — Google sees text only |
| Component | Cost |
|---|---|
| faster-whisper (transcription) | Free — runs locally |
| Gemini 2.5 Flash — Free Tier | 20 requests/day (not sufficient for a full class) |
| Gemini 2.5 Flash — Tier 1 | ~250 requests/day; billing enabled but effectively free |
| Actual cost per analysis | ~€0.002–0.003 |
Recommended setup: Add a payment method in Google AI Studio to unlock Tier 1. For a class of 30 students per day, typical monthly costs are well under €0.10.
MP3, WAV, M4A, OGG, FLAC, WebM, MP4, AAC — max. 50 MB
First launch takes longer: The Whisper model is downloaded and loaded on first use (~30–60 sec). All subsequent analyses are faster.
Shorter recordings = better recognition: Separate recordings for questions and answers improve transcription quality for A1 learners significantly.
Limits of speech recognition: Whisper may not catch all errors in heavily accented A1 German. The "known issues" checkboxes allow targeted feedback even without direct transcription evidence.
PDF in Chinese: The Noto Sans TC font is loaded from Google Fonts when printing. An internet connection is required for Chinese PDF export.
Built with faster-whisper and Google Gemini.