MiNi_Wildlife-Identification-Using-Audio_ML

Audio Classification of Dog Barks, Cat Meows, and Bird Chirps

Project Overview

This mini-project aims to classify audio recordings into three categories: dog barks, cat meows, and bird chirps. It leverages deep learning techniques, specifically convolutional neural networks (CNNs), to extract features from audio spectrograms and perform classification.

Dataset

The project utilizes the UrbanSound8K dataset, a collection of urban sound recordings. However, for this specific project, we focus on a subset of the dataset containing only audio files related to the three selected classes: dog barks, cat meows, and bird chirps.

Dataset Source: https://zenodo.org/record/1203745/files/UrbanSound8K.tar.gz
Selected Classes: 'dog_bark', 'cat_meow', 'chirping_birds'

Libraries

The following Python libraries are used in this project:

librosa: For audio loading, feature extraction (mel spectrograms), and audio processing.
audiomentations: For potential data augmentation (not used in this version).
matplotlib: For visualizing spectrograms.
torch: For deep learning model building and training.
torchvision: For dataset and data loading utilities.
seaborn: For creating visually appealing confusion matrices.
wget: For downloading the dataset.
numpy: For numerical computations.
pandas: For data manipulation and loading the metadata.
sklearn: For data splitting, model evaluation (classification report, confusion matrix).
tqdm: For displaying progress bars during training (not used in this version).

Methods

Data Preprocessing:
- Audio files are loaded using librosa.load.
- Mel spectrograms are generated using librosa.feature.melspectrogram.
- Spectrograms are converted to decibels using librosa.power_to_db.
- Spectrograms are padded/truncated to a fixed length for consistency.
- Spectrograms are normalized using z-score normalization.
Model Building:
- A CNN model is implemented using torch.nn modules.
- The model consists of convolutional layers, max pooling, and fully connected layers.
- The architecture is designed to learn relevant features from the spectrograms.
Training:
- The model is trained using the Adam optimizer and cross-entropy loss.
- Data is split into training and testing sets.
- Training is performed for a fixed number of epochs.
- Performance metrics (loss, accuracy, precision, recall, F1-score) are tracked during training.
Evaluation:
- The trained model is evaluated on the testing set.
- A classification report and confusion matrix are generated to assess the model's performance.

Steps to Run the Project

Install Libraries: Use the following command to install the required libraries
Download Dataset: Download and extract the UrbanSound8K dataset.
Run the Python Code: Execute the provided Python code, which performs the following steps:
- Loads the dataset and metadata.
- Filters the dataset for the selected classes.
- Preprocesses the audio data.
- Splits the data into training and testing sets.
- Defines and trains the CNN model.
- Evaluates the model and prints the results.
- Saves the trained model.

Potential Improvements

Data Augmentation: Explore using audiomentations to augment the training data and improve generalization.
Hyperparameter Tuning: Experiment with different hyperparameters (learning rate, batch size, number of layers, etc.) to optimize model performance.
Advanced Architectures: Consider using more sophisticated CNN architectures or other deep learning models for audio classification.
Larger Dataset: Train the model on a larger and more diverse dataset to further improve its accuracy and robustness.

Conclusion

This mini-project demonstrates the application of deep learning for audio classification using the UrbanSound8K dataset. The results show the potential of CNNs in extracting relevant features from audio spectrograms and achieving reasonable classification accuracy. Further improvements can be made through data augmentation, hyperparameter tuning, and exploring more advanced models and architectures.

👋 HellO There! Let's Dive Into the World of Ideas 🚀

Hey, folks! I'm Himanshu Rajak, your friendly neighborhood tech enthusiast. When I'm not busy solving DSA problems or training models that make computers a tad bit smarter, you’ll find me diving deep into the realms of Data Science, Machine Learning, and Artificial Intelligence.

Here’s the fun part: I’m totally obsessed with exploring Large Language Models (LLMs), Generative AI (yes, those mind-blowing AI that can create art, text, and maybe even jokes one day 🤖), and Quantum Computing (because who doesn’t love qubits doing magical things?).

But wait, there's more! I’m also super passionate about publishing research papers and sharing my nerdy findings with the world. If you’re a fellow explorer or just someone who loves discussing tech, memes, or AI breakthroughs, let’s connect!

LinkedIn: Himanshu Rajak (Professional vibes only 😉)
Medium: Himanshu Rajak (Where I pen my thoughts and experiments 🖋️)

Let’s team up and create something epic. Whether it’s about generative algorithms or quantum wizardry, I’m all ears—and ideas!
🎯 Ping me, let’s innovate, and maybe grab some virtual coffee. ☕✨

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
MiNi_Wildlife_Identification_Using_Audio_ML.ipynb		MiNi_Wildlife_Identification_Using_Audio_ML.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiNi_Wildlife-Identification-Using-Audio_ML

Audio Classification of Dog Barks, Cat Meows, and Bird Chirps

Project Overview

Dataset

Libraries

Methods

Steps to Run the Project

Potential Improvements

Conclusion

👋 HellO There! Let's Dive Into the World of Ideas 🚀

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MiNi_Wildlife-Identification-Using-Audio_ML

Audio Classification of Dog Barks, Cat Meows, and Bird Chirps

Project Overview

Dataset

Libraries

Methods

Steps to Run the Project

Potential Improvements

Conclusion

👋 HellO There! Let's Dive Into the World of Ideas 🚀

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages