Audio-Event-Detection

The endsem project for MLSP-2020

Task-1 is the implimentation of the model from [1]. Here the goal is to classify given audio file which contains a single event-class - for example "dog_bark". There are a total of 10 classes we are interested in.

Task-2 is more interesting in the sense that here we detect the sequence of these event-classes. The network which was used is shown in the following figure. We train the network using CTC[2] loss. Spectogram Image Features (SIF) augmented with energy in each frame as described in [1] is used as features. And mean normalized edit-distance is used as a metric.

[1] P han, H., Hertel, L., Maaß, M., & Mertins, A. (2016). Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks. I NTERSPEECH.

[2]: Graves, S.Fernández, F.Gomez, and J.Schmidhuber. (2006). Connectionist temporal classification: l abelling unsegmented sequence data with recurrent neural networks.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
images		images
README.md		README.md
bettersplit.py		bettersplit.py
extract_features.py		extract_features.py
helper_funcs.py		helper_funcs.py
randsplit.py		randsplit.py
task1_code.ipynb		task1_code.ipynb
task2_code.ipynb		task2_code.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio-Event-Detection

About

Uh oh!

Releases

Packages

Languages

DibyojyotiS/Audio-Event-Detection

Folders and files

Latest commit

History

Repository files navigation

Audio-Event-Detection

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages