[DMP 2026]: Create Intelligent Closed Caption (CC) Suggestion Tool

### Ticket Contents

## Description
Our goal is to develop an AI-powered tool that intelligently identifies moments in a video where a Closed Caption (CC) annotation is genuinely necessary — such as when a non-speech audio event meaningfully affects the speakers or the scene — and suggests contextually relevant CC text, without over-captioning routine or low-impact sounds. The tool will analyze both the audio and visual tracks together to determine whether a non-speech event is significant enough to warrant a CC, reducing the manual effort of editors and accessibility teams who currently add CC annotations by hand.


### Goals & Mid-Point Milestone

## Goals
- [ ] **Goal 1:** Sound Event Detection Module Automatically detect and classify non-speech audio events in a given video file with confidence scores and timestamps. Steps Involved: The video file is taken as input. The audio track is extracted and passed through an open-source sound event detection model. The model classifies events such as honking, explosions, laughter, music, glass breaking, alarms, and applause. The output is a list of detected events with confidence scores and start/end timestamps.
- [ ] **Goal 2:** Speaker Reaction Detection Module (Mid-Point Milestone) Detect visible speaker or scene reactions to audio events using visual analysis of video frames. Steps Involved: At each detected audio event timestamp, the corresponding video frames are extracted. A visual analysis model detects reactions such as head turns, startled body language, paused speech, or facial expressions. A reaction confidence score is assigned per event and stored alongside the audio event data for downstream combination.
- [ ] **Goal 3:** CC Decision Engine & SRT/SLS Output Combine audio event signals and visual reaction signals to make a CC/no-CC decision and generate a labelled output file. Steps Involved: The audio event confidence and visual reaction confidence are combined to determine whether a CC is warranted. A CC text label is auto-generated for each accepted event (e.g., [honking], [gunshot], [crowd cheering]). The accepted suggestions are exported with correct timestamps into a standard SRT or SLS file. The tool is tested on a sample set of Hindi and regional-language content and feedback is collected from editors on suggestion accuracy.

- [ ] The midpoint milestones will be completion of Goal 1 and Goal 2.


### Setup/Installation

_No response_

### Expected Outcome

The Intelligent Closed Caption (CC) Suggestion Tool is a Python-based backend pipeline that accepts any video file as input and produces a ready-to-use SRT or SLS file containing only contextually meaningful, non-speech closed caption annotations — reducing manual effort for accessibility editors and teams working on Hindi and regional-language content.

### Acceptance Criteria

The tool should successfully detect non-speech audio events, assess speaker/scene reaction, and produce a CC-annotated SRT or SLS file for any given video file. It must avoid over-captioning ambient sounds that do not affect the speakers or narrative.

### Implementation Details

Open-source stack — Python, audio event detection model (e.g., YAMNet or PANNs), OpenCV (frame extraction), MediaPipe or similar (pose and expression analysis), decision combiner logic, SRT/SLS file output.

### Mockups/Wireframes

_No response_

### Product Name

Intelligent Closed Caption (CC) Suggestion Tool

### Organisation Name

Planet Read

### Domain

⁠Education

### Tech Skills Needed

Artificial Intelligence, Computer Vision, Python, Machine Learning

### Mentor(s)

@abinash-sketch @keerthiseelan-planetread 

### Category

Backend, Machine Learning, AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DMP 2026]: Create Intelligent Closed Caption (CC) Suggestion Tool #2

Ticket Contents

Description

Goals & Mid-Point Milestone

Goals

Setup/Installation

Expected Outcome

Acceptance Criteria

Implementation Details

Mockups/Wireframes

Product Name

Organisation Name

Domain

Tech Skills Needed

Mentor(s)

Category

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[DMP 2026]: Create Intelligent Closed Caption (CC) Suggestion Tool #2

Description

Ticket Contents

Description

Goals & Mid-Point Milestone

Goals

Setup/Installation

Expected Outcome

Acceptance Criteria

Implementation Details

Mockups/Wireframes

Product Name

Organisation Name

Domain

Tech Skills Needed

Mentor(s)

Category

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions