cc_quality: post-processing validator that ensures generated captions are actually readable by bhuvan-somisetty · Pull Request #24 · PlanetRead/Intelligent-cc-generation

bhuvan-somisetty · 2026-05-14T17:14:23Z

What this is

Every PR in this project answers when to add a caption. This one answers the question that comes after: are the resulting captions usable by the viewers they are meant to serve?

A caption can be technically present in an SRT file and still be invisible to a deaf student - if it flashes on-screen for 0.4 seconds, or contains 60 words that scroll at 900 WPM, or overlaps the next caption and produces a visual blur. This module catches those problems before the file reaches a viewer.

What changed

cc_quality/ - a standalone post-processing module, zero ML dependencies, no coupling to any other PR:

cc_quality/
├── models.py      # Caption, Violation, ValidationReport dataclasses
├── validator.py   # Five accessibility rules → quality score 0–100
├── optimizer.py   # Timestamp-only auto-fix + SRT serialiser
└── cli.py         # cc-quality CLI

tests/
├── test_validator.py   # 28 tests covering every rule and edge case
├── test_optimizer.py   # 11 tests for auto-fix and round-trip
└── fixtures/           # good.srt, violations.srt, hindi.srt

Rules enforced

Rule	Severity	Standard
`MIN_DURATION` - caption < 1.5 s on-screen	error	BBC Subtitle Guidelines 2024
`READING_SPEED` - WPM exceeds limit	error	FCC 47 CFR § 79.1 (220 adult / 130 children)
`LINE_LENGTH` - line too long to fit video frame	warning	BBC (42 chars Latin, 28 Devanagari)
`OVERLAP` - caption end > next caption start	error	WCAG 2.1 SC 1.2.2
`MIN_GAP` - gap < 83 ms between captions	warning	~2 frames at 24 fps

CLI

pip install -e .

# Validate
cc-quality output.srt
cc-quality output.srt --content-type children
cc-quality output.srt --report json

# Auto-fix timing violations (never changes caption text)
cc-quality output.srt --fix
cc-quality output.srt --fix --output reviewed.srt

Example output on a problematic file:

────────────────────────────────────────────────────────────
  CC Quality Report  ·  output.srt
────────────────────────────────────────────────────────────
  Quality score : 74.0 / 100
  Captions      : 4  |  Errors : 2  |  Warnings : 1
────────────────────────────────────────────────────────────

  ✗ [MIN_DURATION] Caption #1  @00:01.00
     Caption displays for 0.80s (minimum 1.5s per BBC guidelines)
     → Extend end time to 2.500

  ✗ [READING_SPEED] Caption #2  @00:03.00
     Reading speed 960 WPM exceeds FCC limit of 220 WPM
     → Extend display duration or shorten the caption text

As a library

from cc_quality import parse_srt, validate, optimize, write_srt

captions = parse_srt(open("output.srt").read())
report = validate(captions, content_type="adult")

print(f"Score: {report.quality_score:.1f}/100")
for v in report.violations:
    print(f"[{v.severity}] {v.rule}: {v.detail}")

if not report.passed():
    fixed = optimize(captions)
    open("fixed.srt", "w").write(write_srt(fixed))

Hindi / Devanagari

Script is auto-detected. Hindi captions get the tighter 28-character line limit and are preserved intact through parse → validate → optimize → write cycles. Test fixture included.

Tests

39 passed in 0.17s

No external dependencies - runs with stdlib + pytest only.

Why this fits alongside the other PRs

This module sits at the end of every pipeline, not inside any one of them. It does not compete with Goal 1, 2, or 3 implementations - it validates whatever they produce. Any merged implementation can pipe its SRT output through cc-quality as a final quality gate.

Closes #23

… SRT files Validates generated captions against WCAG 2.1, FCC 47 CFR § 79.1, and BBC Subtitle Guidelines before they reach viewers. Catches problems that detection modules cannot — reading speed, minimum on-screen duration, overlaps, and line-length limits — and auto-fixes timing violations without altering text. - validator.py: five rules (MIN_DURATION, READING_SPEED, LINE_LENGTH, OVERLAP, MIN_GAP) with per-caption violation details and a 0-100 quality score - optimizer.py: timestamp-only auto-fix pass + SRT serialiser - cli.py: cc-quality CLI with --fix, --report json, --content-type children - Full Hindi/Devanagari support (script auto-detected, tighter line limit applied) - 39 passing unit tests with zero external dependencies - Closes PlanetRead#23

bhuvan-somisetty mentioned this pull request May 16, 2026

feat: syllabic caption localizer for 9 Indian languages with RCI accessibility scoring #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cc_quality: post-processing validator that ensures generated captions are actually readable#24

cc_quality: post-processing validator that ensures generated captions are actually readable#24
bhuvan-somisetty wants to merge 1 commit into
PlanetRead:mainfrom
bhuvan-somisetty:feat/cc-quality-validator

bhuvan-somisetty commented May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bhuvan-somisetty commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this is

What changed

Rules enforced

CLI

As a library

Hindi / Devanagari

Tests

Why this fits alongside the other PRs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bhuvan-somisetty commented May 14, 2026 •

edited

Loading