Multilingual Speech Language Identification

Machine Learning Final Project

Overview

This project implements an end-to-end spoken language identification system using classical machine learning techniques. It is developed in two phases:

Phase 1: Dataset Collection and Problem Understanding
Phase 2: Processing, Modeling, and Evaluation

The goal is to classify short speech segments into one of four languages and analyze their structure using both supervised and unsupervised learning.

Phase 1 – Dataset Construction

A multilingual speech dataset was created from audiobook-style podcast recordings in:

Italian
German
Korean
Spanish

Each audio file:

Is approximately one minute long
Starts at the beginning of a sentence
Ends at the end of a sentence
Contains clean, continuous speech

This phase focused on building a balanced and structured dataset suitable for machine learning.

Phase 2 – Machine Learning Pipeline

In this phase, the collected audio data was processed and analyzed through:

Data cleaning and preprocessing
Feature extraction
Supervised classification (multiple ML models)
Unsupervised clustering
Quantitative evaluation (Accuracy, F1-score, Confusion Matrix, Silhouette Score)

The project demonstrates a complete workflow from raw speech data to language classification and analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
phase 1		phase 1
phase 2		phase 2
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilingual Speech Language Identification

Overview

Phase 1 – Dataset Construction

Phase 2 – Machine Learning Pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multilingual Speech Language Identification

Overview

Phase 1 – Dataset Construction

Phase 2 – Machine Learning Pipeline

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages