Implement for Goal 1 and Goal 2 for intelligent CC suggestion tool by imtushar01 · Pull Request #20 · PlanetRead/Intelligent-cc-generation

imtushar01 · 2026-05-11T15:24:39Z

Summary

This PR implements the midpoint milestone for the Intelligent Closed Caption (CC) Suggestion Tool by completing:

Goal 1 — Sound Event Detection
Goal 2 — Speaker/Scene Reaction Detection

The pipeline detects non-speech audio events in a video and checks whether there is a visible reaction on screen before marking the event as a CC candidate.

The goal is to reduce over-captioning by filtering out background sounds that do not meaningfully affect the scene.

File Structure

.
├── sound_event_detector.py   # Goal 1
├── reaction_detector.py      # Goal 2
├── requirements.txt
└── README.md

Setup & Installation

Prerequisites

Install ffmpeg:

macOS

brew install ffmpeg

Ubuntu / Debian

sudo apt install ffmpeg

Clone the repository

git clone <repo-url>
cd Intelligent-cc-generation

Create virtual environment

macOS / Linux

python3 -m venv venv

Windows

python -m venv venv

Activate virtual environment

macOS / Linux

source venv/bin/activate

Windows

venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

The first run downloads the YAMNet model from TensorFlow Hub and caches it locally.

Goal 1 — Sound Event Detection

Run:

python3 sound_event_detector.py video.mp4

Implemented:

audio extraction using ffmpeg
YAMNet-based sound classification
speech filtering
event merging
CC-friendly label mapping
timestamp and confidence score generation

Goal 2 — Speaker Reaction Detection

Run:

python3 reaction_detector.py video.mp4

Implemented:

frame extraction around detected events
motion analysis using frame differencing
face detection using OpenCV Haar cascades
combined reaction scoring

Events with low visible impact are filtered out to reduce unnecessary CC suggestions.

Sample outputs and filtering behavior are demonstrated in the attached demo video.

Demo Video

Limitations & Future Improvements

YAMNet may not classify some Indian/regional sounds accurately
Haar cascades struggle with profile or partially occluded faces
Reaction scoring currently uses a limited frame sample window
Goal 3 (CC decision engine + SRT/SLS generation) is not yet implemente

…ion tool"

git commit -m "Implement Goal 1 and Goal 2 for intelligent CC suggest…

96dd6f1

…ion tool"

imtushar01 changed the title ~~git commit -m "Implement Goal 1 and Goal 2 for intelligent CC suggest…~~ Implement Goal 1 and Goal 2 for intelligent CC suggestions May 11, 2026

imtushar01 changed the title ~~Implement Goal 1 and Goal 2 for intelligent CC suggestions~~ Implement for Goal 1 and Goal 2 for intelligent CC suggestion tool May 11, 2026

imtushar01 mentioned this pull request May 11, 2026

[DMP 2026]: Create Intelligent Closed Caption (CC) Suggestion Tool #2

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement for Goal 1 and Goal 2 for intelligent CC suggestion tool#20

Implement for Goal 1 and Goal 2 for intelligent CC suggestion tool#20
imtushar01 wants to merge 1 commit into
PlanetRead:mainfrom
imtushar01:feature/intelligent-cc-demo

imtushar01 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

imtushar01 commented May 11, 2026

Summary

File Structure

Setup & Installation

Prerequisites

macOS

Ubuntu / Debian

Clone the repository

Create virtual environment

macOS / Linux

Windows

Activate virtual environment

macOS / Linux

Windows

Install dependencies

Goal 1 — Sound Event Detection

Goal 2 — Speaker Reaction Detection

Demo Video

Limitations & Future Improvements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant