A Python-based machine learning project designed to detect AI-generated speech (deepfakes). The system utilizes the librosa library to extract advanced acoustic features from audio files and trains a Linear Support Vector Machine (SVM) model to accurately classify the audio as real or fake.
- Advanced Feature Extraction: Extracts 98 distinct features per audio file, including 13 MFCCs + Deltas, Spectral Centroid, Spectral Contrast, Spectral Rolloff, and Zero-Crossing Rate (ZCR).
- Audio Preprocessing: Automatically detects and trims silence from audio files before processing to ensure high-quality feature extraction.
- Robust Classification: Uses a Linear Support Vector Machine (SVM) optimized with grid search tuning for maximum accuracy.
- Explainable AI: Maps SVM coefficients to specific acoustic categories (Spectral, Pitch, Phase, Background) to provide insights and explain the model's predictions.
Ensure you have Python 3.8 or higher installed. You will also need pip for installing the required packages.
- Clone this repository to your local machine:
git clone https://github.com/DevAlnahari/Deepfake-Audio-Detector.git cd Deepfake-Audio-Detector - Install the required dependencies:
pip install -r requirements.txt
Before running the model, ensure your dataset is properly organized into the following directory structure:
.
├── 📁 Train/
│ ├── 📁 Real/
│ └── 📁 Fake/
├── 📁 Val/
│ ├── 📁 Real/
│ └── 📁 Fake/
└── 📁 uploads/
Once your dataset is organized, run the main script to start extracting features and training the model:
python main.pyThe model has been evaluated on a comprehensive validation dataset with the following results:
- Model Architecture: Linear SVM (
kernel='linear',class_weight='balanced') - Validation Dataset Size: 1,846 audio files
- Accuracy: 97.56%
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
This project is open-source and available under the MIT License.
