Aussprache-Trainer (German Pronunciation Trainer)

A local feedback tool for German language teachers (DaF). Upload a student's audio recording and receive detailed pronunciation analysis — privacy-first: audio never leaves your machine.

How It Works

faster-whisper large-v3-turbo (local, free): Transcribes the audio file with per-segment confidence scores
Word alignment (local): Compares transcription against the target text, highlights uncertain words
Gemini 2.5 Flash (Google AI): Analyses only the anonymised text — no audio, no biometric data sent to Google

Requirements

Python 3.9 or newer (tested with Python 3.13)
A free Google AI Studio API key: https://aistudio.google.com/apikey
Chrome, Edge, or Safari browser

Installation (one-time, ~10–15 minutes)

1. Download the project

Click Code → Download ZIP on this page, then unzip the folder.

2. Create a virtual environment (important on macOS)

Open a terminal, navigate to the project folder, and run:

cd path/to/aussprache_tool

# Create virtual environment
python3 -m venv venv

# Activate it (macOS/Linux)
source venv/bin/activate

# Activate it (Windows)
venv\Scripts\activate

You will see (venv) at the start of your terminal prompt when it is active.

3. Install dependencies

pip install -r requirements.txt

This downloads faster-whisper, Flask, and the Google AI library (~1–2 GB, one-time only).

Note: On first launch, faster-whisper will automatically download the large-v3-turbo model (~1.6 GB). This happens once in the background.

4. Add your Google API key

Go to https://aistudio.google.com/apikey and click Create API Key
Open the file key.txt in the project folder
Replace the placeholder text with your key:
```
AIzaSy...
```

Important — API quota: The Gemini 2.5 Flash Free Tier currently allows only 20 requests/day, which is not enough for a full class session. To increase this to ~250 requests/day at effectively no cost, simply add a payment method in Google AI Studio (Billing → Enable). Actual charges for classroom use are typically less than €0.10/month.

Starting the Tool

Run the following commands each time you want to use the tool:

# Navigate to the project folder
cd path/to/aussprache_tool

# Activate the virtual environment
source venv/bin/activate        # macOS/Linux
# venv\Scripts\activate         # Windows

# Start the server
python3 app.py

Then open http://127.0.0.1:5000 in your browser.

Keep the terminal window open while using the tool. To stop: press Ctrl+C.

How to Use

Input

Enter the student's name (optional — appears in the feedback document)
Select the feedback language: German / English / Traditional Chinese
Optionally tick known pronunciation issues for targeted feedback even when the speech recogniser misses them:
- ei/ie confusion
- Umlauts ä/ö/ü
- ch-sound (ach vs. ich)
- r-sound
- Final consonants (-t/-d/-st)
- Word stress
- Number pronunciation
- Diphthongs (au/eu/äu)
Paste the target text (the text the student was asked to read aloud)
Upload the audio file by drag & drop or click

Analysis

Click "Aussprache analysieren" (Analyse Pronunciation)
Wait ~20–40 seconds: faster-whisper transcribes locally, then Gemini analyses

Results

Transcription & Word Alignment (top section):

🟢 Green: correctly recognised
🔴 Red + wavy underline: incorrectly recognised
⚫ Grey + [brackets]: not recognised (missing)
🟡 Yellow background: low confidence score
Hover over a word to see: target word | recognised word | confidence %

Pronunciation Feedback (report section):

Recognition rate with progress bar
Target text (as continuous prose)
Transcription & word alignment (colour-coded)
Overall impression, strengths, problem table, targeted tips, practice exercise

PDF Export

Click "🖨️ A4 drucken" or "📱 A5 / Mobil" to open a print-ready version of the feedback. In the print dialog, choose "Save as PDF". The A5 format uses a larger font size for comfortable reading on mobile devices.

Privacy

Data	Where it is processed
Audio file	Stays on your computer; deleted immediately after analysis
Transcription & alignment	Computed locally; never leaves your machine
Sent to Google	Only: target text + transcription + error list (plain text)
Biometric data	None — Google sees text only

Cost

Component	Cost
faster-whisper (transcription)	Free — runs locally
Gemini 2.5 Flash — Free Tier	20 requests/day (not sufficient for a full class)
Gemini 2.5 Flash — Tier 1	~250 requests/day; billing enabled but effectively free
Actual cost per analysis	~€0.002–0.003

Recommended setup: Add a payment method in Google AI Studio to unlock Tier 1. For a class of 30 students per day, typical monthly costs are well under €0.10.

Supported Audio Formats

MP3, WAV, M4A, OGG, FLAC, WebM, MP4, AAC — max. 50 MB

Tips

First launch takes longer: The Whisper model is downloaded and loaded on first use (~30–60 sec). All subsequent analyses are faster.

Shorter recordings = better recognition: Separate recordings for questions and answers improve transcription quality for A1 learners significantly.

Limits of speech recognition: Whisper may not catch all errors in heavily accented A1 German. The "known issues" checkboxes allow targeted feedback even without direct transcription evidence.

PDF in Chinese: The Noto Sans TC font is loaded from Google Fonts when printing. An internet connection is required for Chinese PDF export.

Acknowledgements

Built with faster-whisper and Google Gemini.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
app.py		app.py
gitignore		gitignore
index.html		index.html
key.txt		key.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aussprache-Trainer (German Pronunciation Trainer)

How It Works

Requirements

Installation (one-time, ~10–15 minutes)

1. Download the project

2. Create a virtual environment (important on macOS)

3. Install dependencies

4. Add your Google API key

Starting the Tool

How to Use

Input

Analysis

Results

PDF Export

Privacy

Cost

Supported Audio Formats

Tips

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aussprache-Trainer (German Pronunciation Trainer)

How It Works

Requirements

Installation (one-time, ~10–15 minutes)

1. Download the project

2. Create a virtual environment (important on macOS)

3. Install dependencies

4. Add your Google API key

Starting the Tool

How to Use

Input

Analysis

Results

PDF Export

Privacy

Cost

Supported Audio Formats

Tips

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages