feat: Sound Event Detection module (Goal 1) with YAMNet and India-specific labels#19
Open
bhuvan-somisetty wants to merge 1 commit into
Open
Conversation
Signed-off-by: bhuvan-somisetty <somisettybhuvan5@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a focused, modular implementation of Goal 1 (Sound Event Detection) from issue #2.
The core of this module is
SoundEventDetectorinsrc/audio/detector.py. It loads YAMNet from TF-Hub, runs it frame-by-frame on a 16 kHz mono WAV file, suppresses speech classes, maps surviving events to CC labels, and merges adjacent same-label detections within a 0.5 s gap. All heavy imports (tensorflow_hub, soundfile) are deferred to method bodies so the module is importable without any ML stack — this keeps the test suite fast and dependency-free.The label layer in
src/audio/labels.pydoes a few things worth calling out. TheSPEECH_LABELSfrozenset covers all 26 YAMNet classes that represent human speech or breath sounds, which is the main source of over-captioning. TheLABEL_MAPgoes beyond generic AudioSet names and includes India-specific sounds that come up in regional-language content: Fireworks maps to[firecrackers](Diwali crackers), Tabla and Dhol map to their own labels, and Temple bells gets its own entry. Unmapped non-speech events fall back to[<lowercased class name>]rather than being silently dropped.The test suite covers all of this without requiring TensorFlow or soundfile installed. The detector tests inject a fake model directly and mock soundfile at the sys.modules level, so the 26 tests run in under 0.3 s.
Tested with
pytest tests/ -v— 26/26 passed.Closes #2