Skip to content

feat: Sound Event Detection module (Goal 1) with YAMNet and India-specific labels#19

Open
bhuvan-somisetty wants to merge 1 commit into
PlanetRead:mainfrom
bhuvan-somisetty:feat/sed-module-goal1
Open

feat: Sound Event Detection module (Goal 1) with YAMNet and India-specific labels#19
bhuvan-somisetty wants to merge 1 commit into
PlanetRead:mainfrom
bhuvan-somisetty:feat/sed-module-goal1

Conversation

@bhuvan-somisetty

Copy link
Copy Markdown

Adds a focused, modular implementation of Goal 1 (Sound Event Detection) from issue #2.

The core of this module is SoundEventDetector in src/audio/detector.py. It loads YAMNet from TF-Hub, runs it frame-by-frame on a 16 kHz mono WAV file, suppresses speech classes, maps surviving events to CC labels, and merges adjacent same-label detections within a 0.5 s gap. All heavy imports (tensorflow_hub, soundfile) are deferred to method bodies so the module is importable without any ML stack — this keeps the test suite fast and dependency-free.

The label layer in src/audio/labels.py does a few things worth calling out. The SPEECH_LABELS frozenset covers all 26 YAMNet classes that represent human speech or breath sounds, which is the main source of over-captioning. The LABEL_MAP goes beyond generic AudioSet names and includes India-specific sounds that come up in regional-language content: Fireworks maps to [firecrackers] (Diwali crackers), Tabla and Dhol map to their own labels, and Temple bells gets its own entry. Unmapped non-speech events fall back to [<lowercased class name>] rather than being silently dropped.

The test suite covers all of this without requiring TensorFlow or soundfile installed. The detector tests inject a fake model directly and mock soundfile at the sys.modules level, so the 26 tests run in under 0.3 s.

Tested with pytest tests/ -v — 26/26 passed.

Closes #2

Signed-off-by: bhuvan-somisetty <somisettybhuvan5@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DMP 2026]: Create Intelligent Closed Caption (CC) Suggestion Tool

1 participant