Skip to content

Implement for Goal 1 and Goal 2 for intelligent CC suggestion tool#20

Open
imtushar01 wants to merge 1 commit into
PlanetRead:mainfrom
imtushar01:feature/intelligent-cc-demo
Open

Implement for Goal 1 and Goal 2 for intelligent CC suggestion tool#20
imtushar01 wants to merge 1 commit into
PlanetRead:mainfrom
imtushar01:feature/intelligent-cc-demo

Conversation

@imtushar01

Copy link
Copy Markdown

Summary

This PR implements the midpoint milestone for the Intelligent Closed Caption (CC) Suggestion Tool by completing:

  • Goal 1 — Sound Event Detection
  • Goal 2 — Speaker/Scene Reaction Detection

The pipeline detects non-speech audio events in a video and checks whether there is a visible reaction on screen before marking the event as a CC candidate.

The goal is to reduce over-captioning by filtering out background sounds that do not meaningfully affect the scene.


File Structure

.
├── sound_event_detector.py   # Goal 1
├── reaction_detector.py      # Goal 2
├── requirements.txt
└── README.md

Setup & Installation

Prerequisites

Install ffmpeg:

macOS

brew install ffmpeg

Ubuntu / Debian

sudo apt install ffmpeg

Clone the repository

git clone <repo-url>
cd Intelligent-cc-generation

Create virtual environment

macOS / Linux

python3 -m venv venv

Windows

python -m venv venv

Activate virtual environment

macOS / Linux

source venv/bin/activate

Windows

venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

The first run downloads the YAMNet model from TensorFlow Hub and caches it locally.


Goal 1 — Sound Event Detection

Run:

python3 sound_event_detector.py video.mp4

Implemented:

  • audio extraction using ffmpeg
  • YAMNet-based sound classification
  • speech filtering
  • event merging
  • CC-friendly label mapping
  • timestamp and confidence score generation

Goal 2 — Speaker Reaction Detection

Run:

python3 reaction_detector.py video.mp4

Implemented:

  • frame extraction around detected events
  • motion analysis using frame differencing
  • face detection using OpenCV Haar cascades
  • combined reaction scoring

Events with low visible impact are filtered out to reduce unnecessary CC suggestions.

Sample outputs and filtering behavior are demonstrated in the attached demo video.


Demo Video

Demo Video


Limitations & Future Improvements

  • YAMNet may not classify some Indian/regional sounds accurately
  • Haar cascades struggle with profile or partially occluded faces
  • Reaction scoring currently uses a limited frame sample window
  • Goal 3 (CC decision engine + SRT/SLS generation) is not yet implemente

@imtushar01 imtushar01 changed the title git commit -m "Implement Goal 1 and Goal 2 for intelligent CC suggest… Implement Goal 1 and Goal 2 for intelligent CC suggestions May 11, 2026
@imtushar01 imtushar01 changed the title Implement Goal 1 and Goal 2 for intelligent CC suggestions Implement for Goal 1 and Goal 2 for intelligent CC suggestion tool May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant