diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..5d37a97
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,2 @@
+# Demo videos (too large for GitHub)
+demo_videos/*.mp4
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..1cac3a9
--- /dev/null
+++ b/README.md
@@ -0,0 +1,1451 @@
+# Intelligent Closed Caption Suggestion Tool
+
+An AI-powered Python backend and editor review tool for generating meaningful non-speech closed caption suggestions from raw video.
+
+The project focuses on detecting moments where a non-speech sound meaningfully affects the scene, speaker, or narrative, then suggesting concise SRT captions such as `[horn honks]`, `[glass breaks]`, or `[crowd cheering]`. The goal is to assist accessibility editors without over-captioning routine, ambient, or low-impact sounds.
+
+## Project Context
+
+- **Product name:** Intelligent Closed Caption (CC) Suggestion Tool
+- **Organisation:** Planet Read
+- **Domain:** Education, accessibility, media tooling
+- **Category:** Backend, Machine Learning, AI, Computer Vision
+- **Primary users:** Accessibility editors, subtitling teams, content review teams
+- **Initial content focus:** Hindi and Indian regional-language videos
+- **Primary output:** SRT files containing only non-speech CC annotations
+
+This project does not generate full dialogue subtitles in the first version. It analyzes raw videos and produces non-speech closed caption suggestions only.
+
+## Implementation Status
+
+The first runnable Python implementation has been started under [`main/`](main/). It includes the modular package, CLI, diagnostics, mock audio and vision backends, a CPU DSP audio backend, an OpenCV visual baseline, decision engine, multilingual caption labels, Streamlit UI client, and SRT/JSON/CSV exports.
+
+Current scaffold commands:
+
+```bash
+cd main
+python -m cc_suggester doctor
+python -m cc_suggester analyze README.md --lang hi --device auto --out outputs
+python -m cc_suggester labels
+python -m pytest tests
+```
+
+The mock backends remain for deterministic tests, while `--audio-backend dsp` and `--vision-backend opencv` provide local real-processing baselines. YAMNet is wired as an optional TensorFlow Hub backend, and MediaPipe is wired as an optional pose-based reaction backend. PANNs, AST, BEATs, and richer MediaPipe face/expression scoring remain documented next steps in [`docs/implementation-plan.md`](docs/implementation-plan.md).
+
+For environments without system ffmpeg, the sample generator can create an OpenCV video plus sidecar WAV file so the DSP/OpenCV path can still be tested locally.
+
+## Interface Overview
+
+### Web UI Editor Review Workspace
+
+The Web UI is built as a modern editor workspace with **warm dark glassmorphism design** and full light/dark theme support. It features:
+
+- **Interactive video player** with event markers and draggable timeline
+- **Real-time review panel** for editing and accepting/rejecting captions
+- **Multilingual support** with live caption label switching
+- **Device & backend controls** for audio/vision model selection
+- **Comprehensive event table** with all confidence scores and reasoning
+
+#### Dark Mode (Default) — Hindi
+
+![Web UI Dark Mode with Hindi captions](mockups/hindi.png)
+
+The warm dark glassmorphism design features:
+- Deep amber/charcoal background with warm gold accents
+- Frosted glass panels with subtle warm-tinted borders
+- Smooth theme toggle (☀/🌙) for light/dark switching
+
+#### Multilingual Support
+
+**Telugu:**
+![Web UI Telugu](mockups/telugu.png)
+
+**Malayalam:**
+![Web UI Malayalam](mockups/mallu.png)
+
+Caption labels update live across all panels when language is changed.
+
+#### Architecture & System Diagram
+
+```mermaid
+flowchart TB
+    subgraph Inputs["Video Input"]
+        VIDEO["Raw Video\n(.mp4, .mov, .mkv)"]
+    end
+
+    subgraph Audio["Audio Analysis"]
+        direction TB
+        EXTRACT["Audio Extraction\n(ffmpeg)"]
+        DSP["DSP Baseline\n(RMS, STFT, Onsets)"]
+        A_MODELS["Audio ML Backends\n(YAMNet / PANNs / AST)"]
+        SMOOTH["Event Smoothing\n(Merge, Filter, Normalize)"]
+        EXTRACT --> DSP --> A_MODELS --> SMOOTH
+    end
+
+    subgraph Vision["Visual Reaction"]
+        direction TB
+        FRAMES["Frame Sampler\n(before / during / after)"]
+        FLOW["Optical Flow\n(OpenCV)"]
+        V_MODELS["Vision ML Backends\n(MediaPipe / MMPose)"]
+        REACT["Reaction Scoring"]
+        FRAMES --> FLOW --> REACT
+        FRAMES --> V_MODELS --> REACT
+    end
+
+    subgraph Decision["Decision Engine"]
+        direction TB
+        SCORER["Scorer\n(audio + reaction + importance\n- ambient penalty)"]
+        LABELS["Caption Labels\n(Glossary per language)"]
+        SCORER --> LABELS
+    end
+
+    subgraph Outputs["Exports"]
+        direction LR
+        SRT["SRT\n(accepted captions)"]
+        JSON["JSON\n(full debug report)"]
+        CSV["CSV\n(reviewer spreadsheet)"]
+    end
+
+    subgraph Clients["User Interfaces"]
+        direction LR
+        CLI["CLI\n(ccs analyze / doctor / export)"]
+        WEB["Web UI\n(Streamlit editor workspace)"]
+    end
+
+    VIDEO --> EXTRACT
+    SMOOTH --> AudioEvents["Audio Event\nCandidates"]
+    AudioEvents --> FRAMES
+    AudioEvents --> SCORER
+    REACT --> SCORER
+    LABELS --> SRT
+    SCORER --> JSON
+    SCORER --> CSV
+
+    CLI --> VIDEO
+    CLI --> SCORER
+    WEB --> VIDEO
+    WEB --> SCORER
+
+    style Inputs fill:#1e1308,stroke:#f59e0b,color:#f0e4cc
+    style Audio fill:#1e1308,stroke:#f59e0b,color:#f0e4cc
+    style Vision fill:#1e1308,stroke:#f59e0b,color:#f0e4cc
+    style Decision fill:#1e1308,stroke:#f59e0b,color:#f0e4cc
+    style Outputs fill:#1e1308,stroke:#f59e0b,color:#f0e4cc
+    style Clients fill:#1e1308,stroke:#f59e0b,color:#f0e4cc
+```
+
+### Demo Video & Sample Data
+
+Demo video and sample recordings are available on Google Drive:
+
+[![Demo Video & Sample Data](https://drive.google.com/drive/folders/1Ti5aqztP9VHas_5AbrH7utSn-G27HZXW?usp=sharing)](https://drive.google.com/drive/folders/1Ti5aqztP9VHas_5AbrH7utSn-G27HZXW?usp=sharing)
+
+> **Sample videos & recordings:** [Google Drive folder](https://drive.google.com/drive/folders/1Ti5aqztP9VHas_5AbrH7utSn-G27HZXW?usp=sharing) — contains demo videos, recordings, and sample SRT outputs.
+
+## Problem Statement
+
+Accessibility editors currently add non-speech closed caption annotations by hand. This is time-consuming and requires judgment: not every sound should be captioned.
+
+For example:
+
+- A horn that causes a speaker to turn around may need `[horn honks]`.
+- Constant background traffic may not need any caption.
+- A glass breaking off-screen may need `[glass breaks]` if it affects the scene.
+- Background music may not need a caption unless it is narratively important.
+
+The tool should detect candidate sound events, inspect nearby visual reaction cues, decide whether the event is meaningful enough to caption, and export accepted suggestions into an SRT file.
+
+## Goals
+
+### Goal 1: Sound Event Detection Module
+
+Automatically detect and classify non-speech audio events in a given video file with confidence scores and timestamps.
+
+Expected behavior:
+
+- Accept a video file as input.
+- Extract the audio track.
+- Run audio analysis using a pluggable sound event detection backend.
+- Detect events such as honking, explosions, laughter, music, glass breaking, alarms, applause, door slams, phone rings, and crowd reactions.
+- Produce timestamped audio event candidates with confidence scores.
+
+Output example:
+
+```json
+{
+  "event_id": "horn_honk",
+  "label": "Horn honk",
+  "start_time": 12.4,
+  "end_time": 13.8,
+  "audio_confidence": 0.87
+}
+```
+
+### Goal 2: Speaker Reaction Detection Module
+
+Detect visible speaker or scene reactions to audio events using visual analysis of video frames.
+
+Expected behavior:
+
+- For each detected audio event, sample video frames before, during, and after the event.
+- Detect visual reaction cues such as:
+  - head turn
+  - sudden posture shift
+  - startled body movement
+  - facial expression change
+  - mouth/eye/brow change
+  - speech pause or freeze
+  - scene-level movement spike
+- Assign a reaction confidence score per event.
+- Store visual analysis results alongside audio event data.
+
+Output example:
+
+```json
+{
+  "event_id": "horn_honk",
+  "start_time": 12.4,
+  "end_time": 13.8,
+  "reaction_confidence": 0.71,
+  "reaction_signals": {
+    "head_turn": 0.82,
+    "optical_flow_spike": 0.64,
+    "facial_expression_change": 0.55
+  }
+}
+```
+
+### Goal 3: CC Decision Engine and SRT Output
+
+Combine audio and visual signals to decide whether a caption is warranted, then export accepted captions to SRT.
+
+Expected behavior:
+
+- Combine audio confidence, visual reaction confidence, event importance, and ambient sound penalties.
+- Reject low-impact ambient sounds.
+- Generate short, editor-friendly CC labels.
+- Export accepted captions to SRT.
+- Export full debug results to JSON/CSV for review.
+
+Example accepted SRT:
+
+```srt
+1
+00:00:12,400 --> 00:00:13,800
+[horn honks]
+
+2
+00:01:03,100 --> 00:01:04,600
+[glass breaks]
+```
+
+## Midpoint Milestone
+
+The midpoint milestone is completion of Goal 1 and Goal 2.
+
+At midpoint, the project should demonstrate:
+
+- CLI accepts a raw video input.
+- Audio is extracted successfully.
+- Non-speech sound events are detected with timestamps and confidence scores.
+- Video frames are sampled around detected events.
+- Visual reaction scores are computed and attached to each audio event.
+- JSON debug output is generated.
+- Basic SRT export may exist, but the final decision engine can still be simple.
+- The pipeline runs on CPU and can optionally use GPU when available.
+- The project is tested on a small sample set of Hindi and regional-language videos.
+
+## Final Expected Outcome
+
+The final project should provide:
+
+- A Python-based backend pipeline.
+- A command-line interface.
+- A web-based editor review UI.
+- Pluggable audio and vision model backends.
+- CPU/GPU device selection and diagnostics.
+- Multilingual non-speech CC label export.
+- SRT export for accepted captions.
+- JSON/CSV debug reports for all candidates.
+- Documentation for installation, usage, troubleshooting, and contribution.
+
+## Non-Goals for Version 1
+
+The first version will not focus on:
+
+- Full dialogue transcription.
+- Full dialogue translation.
+- Dubbing.
+- Speaker diarization.
+- Live real-time captioning.
+- Perfect automatic caption approval without editor review.
+- Training a custom large model from scratch.
+
+These can become future extensions, but the core value is non-speech CC suggestion.
+
+## Supported Languages
+
+Version 1 should support caption label export in:
+
+- English
+- Hindi
+- Tamil
+- Telugu
+- Bengali
+- Marathi
+- Malayalam
+
+The default language is assumed to be the same language as the video. Since Version 1 only generates non-speech captions, default language handling can be simple:
+
+- User selects the video language in CLI or UI.
+- If language is not selected, default to English.
+- Later, add automatic spoken-language detection.
+
+Caption labels should be generated from a curated glossary first. Machine translation can be used later as a fallback, but editor-approved labels are safer because CC labels must be short, consistent, and natural.
+
+Example:
+
+| Event ID | English | Hindi |
+| --- | --- | --- |
+| `horn_honk` | `[horn honks]` | `[हॉर्न बजता है]` |
+| `glass_break` | `[glass breaks]` | `[कांच टूटता है]` |
+| `crowd_cheer` | `[crowd cheering]` | `[भीड़ जयकार करती है]` |
+
+## High-Level Architecture
+
+The project should be designed as reusable modules, not as logic embedded inside the CLI or Web UI.
+
+```text
+Core pipeline modules
+  used by CLI
+  used by Web UI
+  later used by VLC plugin, API, or desktop app
+```
+
+The diagrams below use Mermaid, which renders directly in GitHub and many Markdown viewers.
+
+```mermaid
+flowchart TB
+    subgraph Clients["User-Facing Clients"]
+        CLI["CLI\nccs analyze / doctor / export"]
+        WEB["Web UI\neditor review workspace"]
+        VLC["Future VLC Plugin"]
+        API["Future Local API"]
+    end
+
+    subgraph Core["Reusable Core Pipeline"]
+        PIPE["Pipeline Orchestrator"]
+        CONFIG["Config + Thresholds"]
+        DIAG["Diagnostics + Friendly Errors"]
+        TYPES["Shared Data Models"]
+    end
+
+    subgraph Audio["Audio Analysis"]
+        EXTRACT["Audio Extraction"]
+        DSP["DSP Features\nFFT / STFT / RMS / Onsets"]
+        A_BACKENDS["Audio Backends\nYAMNet / PANNs / AST / BEATs"]
+        EVENTS["Event Smoothing\nMerge / Filter / Normalize"]
+    end
+
+    subgraph Vision["Visual Reaction Analysis"]
+        FRAMES["Frame Sampler"]
+        FLOW["Optical Flow"]
+        V_BACKENDS["Vision Backends\nMediaPipe / MMPose / MMAction2"]
+        REACT["Reaction Scoring"]
+    end
+
+    subgraph Decision["Caption Decision"]
+        SCORE["Decision Scorer"]
+        RULES["Importance Rules\nAmbient Penalties"]
+        LABELS["Caption Labels\nGlossary + Translation"]
+    end
+
+    subgraph Outputs["Exports"]
+        SRT["SRT"]
+        JSON["JSON Debug Report"]
+        CSV["CSV Review Report"]
+    end
+
+    CLI --> PIPE
+    WEB --> PIPE
+    VLC --> API
+    API --> PIPE
+
+    PIPE --> CONFIG
+    PIPE --> DIAG
+    PIPE --> TYPES
+    PIPE --> EXTRACT
+    EXTRACT --> DSP
+    DSP --> A_BACKENDS
+    A_BACKENDS --> EVENTS
+    EVENTS --> FRAMES
+    FRAMES --> FLOW
+    FRAMES --> V_BACKENDS
+    FLOW --> REACT
+    V_BACKENDS --> REACT
+    EVENTS --> SCORE
+    REACT --> SCORE
+    RULES --> SCORE
+    SCORE --> LABELS
+    LABELS --> SRT
+    SCORE --> JSON
+    SCORE --> CSV
+```
+
+Recommended repository structure:
+
+```text
+cc-suggester/
+  cc_suggester/
+    core/
+      pipeline.py
+      config.py
+      diagnostics.py
+      errors.py
+      types.py
+
+    audio/
+      extractor.py
+      dsp.py
+      vad.py
+      events.py
+      backends/
+        base.py
+        yamnet.py
+        panns.py
+        ast.py
+        beats.py
+
+    vision/
+      frame_sampler.py
+      optical_flow.py
+      reactions.py
+      backends/
+        base.py
+        mediapipe_face.py
+        mediapipe_pose.py
+        mmaction.py
+
+    decision/
+      scorer.py
+      rules.py
+      labels.py
+
+    output/
+      srt.py
+      json_report.py
+      csv_report.py
+
+    translation/
+      glossary.py
+      indictrans.py
+
+    cli/
+      app.py
+
+    ui/
+      streamlit_app.py
+
+  configs/
+    default.yaml
+    cpu.yaml
+    gpu.yaml
+
+  label_maps/
+    events.en.json
+    events.hi.json
+    events.ta.json
+    events.te.json
+    events.bn.json
+    events.mr.json
+    events.ml.json
+
+  docs/
+    architecture.md
+    cli.md
+    web-ui.md
+    models.md
+    troubleshooting.md
+    evaluation.md
+    vlc-plugin.md
+
+  examples/
+    README.md
+
+  tests/
+    unit/
+    integration/
+
+  requirements.txt
+  requirements-ui.txt
+  requirements-dev.txt
+  requirements-translate.txt
+  README.md
+  CONTRIBUTING.md
+  LICENSE
+```
+
+The exact file names can change during implementation, but the separation of responsibilities should remain.
+
+## Data Model
+
+The pipeline should pass structured objects between modules.
+
+### Audio Event Candidate
+
+Represents a detected sound event before visual analysis.
+
+Fields:
+
+- `event_id`
+- `label`
+- `start_time`
+- `end_time`
+- `audio_confidence`
+- `audio_backend`
+- `raw_class_name`
+- `debug_info`
+
+### Reaction Result
+
+Represents visual reaction evidence for an audio event.
+
+Fields:
+
+- `event_id`
+- `start_time`
+- `end_time`
+- `reaction_confidence`
+- `reaction_signals`
+- `frames_sampled`
+- `vision_backend`
+- `debug_info`
+
+### Caption Suggestion
+
+Represents the final decision.
+
+Fields:
+
+- `event_id`
+- `start_time`
+- `end_time`
+- `audio_confidence`
+- `reaction_confidence`
+- `decision_score`
+- `accepted`
+- `reason`
+- `caption_text`
+- `language`
+- `requires_review`
+- `debug_info`
+
+This structure allows the same result to be used by:
+
+- CLI output
+- Web UI review panel
+- SRT export
+- JSON report
+- CSV report
+- future VLC integration
+
+```mermaid
+classDiagram
+    class AudioEventCandidate {
+        string event_id
+        string label
+        float start_time
+        float end_time
+        float audio_confidence
+        string audio_backend
+        string raw_class_name
+        dict debug_info
+    }
+
+    class ReactionResult {
+        string event_id
+        float start_time
+        float end_time
+        float reaction_confidence
+        dict reaction_signals
+        int frames_sampled
+        string vision_backend
+        dict debug_info
+    }
+
+    class CaptionSuggestion {
+        string event_id
+        float start_time
+        float end_time
+        float audio_confidence
+        float reaction_confidence
+        float decision_score
+        bool accepted
+        string reason
+        string caption_text
+        string language
+        bool requires_review
+        dict debug_info
+    }
+
+    AudioEventCandidate --> ReactionResult : analyzed visually at event timestamp
+    AudioEventCandidate --> CaptionSuggestion : contributes audio evidence
+    ReactionResult --> CaptionSuggestion : contributes visual evidence
+```
+
+## Pipeline Flow
+
+```text
+Input video
+  -> validate input
+  -> extract metadata
+  -> extract audio
+  -> run DSP candidate detection
+  -> run sound event model backend
+  -> merge and smooth audio events
+  -> sample frames around event timestamps
+  -> run visual reaction analysis
+  -> combine audio and visual signals
+  -> generate caption suggestions
+  -> export SRT, JSON, CSV
+```
+
+```mermaid
+flowchart TD
+    A["Raw video input"] --> B{"Valid video?"}
+    B -- "No" --> B_ERR["Friendly error\nsuggest inspect/doctor command"]
+    B -- "Yes" --> C["Extract metadata\nfps, duration, resolution"]
+    C --> D["Extract audio with ffmpeg"]
+    D --> E["Compute DSP features\nRMS, STFT, spectral flux"]
+    E --> F["Run audio backend\nYAMNet first, PANNs/AST later"]
+    F --> G["Smooth + merge detections"]
+    G --> H["Audio event candidates"]
+    H --> I["Sample frames around each event"]
+    I --> J["Run visual backends\nMediaPipe face/pose + optical flow"]
+    J --> K["Reaction confidence per event"]
+    H --> L["Decision engine"]
+    K --> L
+    L --> M{"Caption warranted?"}
+    M -- "No" --> N["Rejected candidate\nkept in JSON/CSV debug report"]
+    M -- "Yes" --> O["Accepted caption suggestion"]
+    O --> P["Language label mapping"]
+    P --> Q["Export SRT"]
+    L --> R["Export full JSON report"]
+    L --> S["Export CSV review report"]
+```
+
+## Audio Module Plan
+
+The audio module should combine explainable signal processing with model-based classification.
+
+### DSP Baseline
+
+Use lightweight mathematical features to find candidate regions and explain event salience:
+
+- RMS energy
+- short-time Fourier transform
+- log-mel spectrogram
+- spectral flux
+- onset strength
+- zero-crossing rate
+- peak detection
+- duration filtering
+
+This layer is useful because it is:
+
+- fast
+- CPU-friendly
+- explainable
+- helpful for debugging model outputs
+
+However, DSP should not be the final classifier. It can identify that something happened, but not reliably classify what happened.
+
+### Model Backends
+
+Recommended backend priority:
+
+1. **YAMNet** as the first baseline.
+2. **PANNs** as a stronger optional backend.
+3. **AST** for transformer-based audio classification experiments.
+4. **BEATs** for advanced audio representation experiments.
+5. **CLAP** later for open-vocabulary event matching.
+
+The backend interface should stay stable:
+
+```text
+detect(audio_path, config) -> list of audio events
+```
+
+### Event Smoothing
+
+Raw model outputs should be post-processed:
+
+- merge adjacent detections of the same event
+- remove very short low-confidence events
+- suppress speech-like classes unless desired
+- suppress constant ambient sounds
+- normalize model labels into project event IDs
+
+Example:
+
+```text
+Raw model labels:
+  Vehicle horn, car horn, honking
+
+Normalized event ID:
+  horn_honk
+```
+
+## Vision Module Plan
+
+The vision module should detect whether people or the scene visibly react to an audio event.
+
+### Frame Sampling
+
+For each audio event, sample frames from:
+
+- before the event
+- during the event
+- after the event
+
+Example:
+
+```text
+event_start - 1.0s
+event_start - 0.5s
+event_start
+event_midpoint
+event_end
+event_end + 0.5s
+event_end + 1.0s
+```
+
+### Reaction Signals
+
+The reaction score can combine:
+
+- head turn magnitude
+- pose shift magnitude
+- sudden optical flow spike
+- facial expression change
+- mouth open or close change
+- eye/brow movement
+- speaker pause proxy
+- scene movement spike
+
+### First Backend
+
+Use:
+
+- OpenCV for frame extraction and optical flow.
+- MediaPipe Face Landmarker for facial landmarks and expression blendshapes.
+- MediaPipe Pose Landmarker for body and head movement.
+
+This is suitable for the midpoint because it is interpretable and can run on CPU.
+
+### Future Backends
+
+Potential later backends:
+
+- MMPose for stronger pose estimation.
+- MMAction2 for action recognition.
+- Video-language models for heavier scene reasoning.
+
+These should remain optional because they may be GPU-heavy.
+
+## Decision Engine Plan
+
+The decision engine decides whether a sound event deserves a caption.
+
+A simple scoring formula:
+
+```text
+decision_score =
+  audio_confidence
+  + reaction_confidence
+  + event_importance_prior
+  + speech_pause_bonus
+  - ambient_penalty
+```
+
+Example rules:
+
+- Caption high-impact events even if reaction is weak:
+  - gunshot
+  - explosion
+  - alarm
+  - siren
+  - glass breaking
+- Require reaction or high confidence for common events:
+  - horn
+  - door slam
+  - phone ring
+  - applause
+- Usually reject ambient continuous sounds:
+  - fan noise
+  - traffic hum
+  - low background music
+  - crowd murmur
+
+Every decision should include a human-readable reason.
+
+Example:
+
+```text
+Accepted because the audio model detected horn_honk with high confidence and the speaker turned their head immediately after the event.
+```
+
+Example rejection:
+
+```text
+Rejected because traffic noise was continuous, low-confidence, and no visible reaction was detected.
+```
+
+```mermaid
+flowchart LR
+    A["Audio confidence"] --> E["Decision scorer"]
+    B["Reaction confidence"] --> E
+    C["Event importance prior"] --> E
+    D["Ambient sound penalty"] --> E
+    P["Speech pause / scene impact bonus"] --> E
+
+    E --> F{"Decision score >= threshold?"}
+    F -- "Yes" --> G["Accept caption"]
+    F -- "Borderline" --> H["Needs editor review"]
+    F -- "No" --> I["Reject candidate"]
+
+    G --> J["Generate caption text"]
+    H --> J
+    I --> K["Keep reason in debug output"]
+    J --> L["SRT / JSON / CSV"]
+```
+
+## CLI Plan
+
+The CLI should be useful for developers, batch processing, debugging, and reviewers who prefer terminal workflows.
+
+Recommended command shape:
+
+```bash
+ccs analyze input.mp4 --lang hi --device auto
+ccs analyze input.mp4 --audio-backend yamnet --vision-backend mediapipe --out outputs/
+ccs inspect input.mp4
+ccs doctor
+ccs export outputs/result.json --format srt --lang ta
+ccs web
+```
+
+### CLI Commands
+
+| Command | Purpose |
+| --- | --- |
+| `ccs analyze` | Run full pipeline on a video |
+| `ccs audio` | Run only sound event detection |
+| `ccs vision` | Run visual reaction analysis from existing audio events |
+| `ccs export` | Convert JSON results to SRT/CSV |
+| `ccs inspect` | Show video metadata and input validity |
+| `ccs doctor` | Check environment, ffmpeg, models, CPU/GPU |
+| `ccs web` | Launch the Web UI |
+
+### CLI Error Suggestions
+
+The CLI should explain errors and suggest next steps.
+
+Wrong command example:
+
+```text
+No such command: analize
+Did you mean: analyze?
+
+Try:
+  ccs analyze input.mp4 --device auto --lang hi
+```
+
+Missing video example:
+
+```text
+Input file was not found:
+  videos/sample.mp4
+
+Suggestions:
+1. Check the path.
+2. Run:
+   ccs inspect /path/to/video.mp4
+```
+
+GPU failure example:
+
+```text
+CUDA was requested, but no usable GPU was detected.
+
+Detected:
+- torch.cuda.is_available(): false
+- CUDA runtime: not found
+- NVIDIA driver: not found
+
+Suggestions:
+1. Retry on CPU:
+   ccs analyze input.mp4 --device cpu
+
+2. Check environment:
+   ccs doctor
+
+3. Install a CUDA-compatible PyTorch build if GPU acceleration is required.
+```
+
+## Device Handling
+
+The project should support:
+
+```text
+device = auto | cpu | cuda
+```
+
+Behavior:
+
+- `auto`: use GPU if available, otherwise CPU.
+- `cpu`: force CPU.
+- `cuda`: require GPU; fail clearly if unavailable.
+
+Each run should save device metadata:
+
+- selected device
+- actual device used
+- model backend
+- GPU name if available
+- CUDA availability
+- runtime
+- fallback reason if CPU was used
+
+The UI should provide:
+
+- Auto/CPU/GPU toggle
+- GPU diagnostics popup
+- Retry on CPU button
+- Copy diagnostic report button
+
+```mermaid
+flowchart TD
+    A["User selects device mode"] --> B{"Mode"}
+    B -- "auto" --> C{"GPU available?"}
+    C -- "Yes" --> D["Use GPU"]
+    C -- "No" --> E["Fallback to CPU\nrecord fallback reason"]
+    B -- "cpu" --> F["Force CPU"]
+    B -- "cuda" --> G{"GPU available?"}
+    G -- "Yes" --> D
+    G -- "No" --> H["Stop with clear diagnostic"]
+    H --> I["Suggest: retry with --device cpu"]
+    H --> J["Suggest: run ccs doctor"]
+    D --> K["Save device metadata"]
+    E --> K
+    F --> K
+```
+
+## Web UI Plan
+
+The Web UI should be an editor review workspace, not a basic demo.
+
+Recommended initial framework:
+
+- Streamlit for the first implementation because it is fast to build and supports video display.
+- Later, consider React/FastAPI if the UI needs more advanced timeline editing.
+
+### UI Layout
+
+```text
+Top Bar
+  Product name
+  Device mode selector
+  Language selector
+  Audio backend selector
+  Vision backend selector
+  Run Doctor button
+
+Left Panel
+  Video dropdown/upload
+  Video metadata
+  Start Caption button
+  Export SRT button
+  Export JSON button
+  Export CSV button
+
+Center Panel
+  Video player
+  Play/Pause controls
+  Current timestamp
+  Draggable timeline
+  Event markers
+  Previous/Next event buttons
+
+Right Panel
+  Review SRT suggestions
+  Caption text editor
+  Accept/Reject toggle
+  Confidence scores
+  Decision reason
+  Warning/error badges
+
+Bottom Panel
+  Event table
+  Start/end timestamps
+  Event labels
+  Audio confidence
+  Reaction confidence
+  Decision score
+  Status
+```
+
+```mermaid
+flowchart TB
+    subgraph Top["Top Bar"]
+        T1["Device Mode"]
+        T2["Language"]
+        T3["Audio Backend"]
+        T4["Vision Backend"]
+        T5["Run Doctor"]
+    end
+
+    subgraph Left["Left Panel"]
+        L1["Video Dropdown / Upload"]
+        L2["Video Metadata"]
+        L3["Start Caption"]
+        L4["Export SRT / JSON / CSV"]
+    end
+
+    subgraph Center["Center Panel"]
+        C1["Video Player"]
+        C2["Play / Pause"]
+        C3["Draggable Timeline"]
+        C4["Event Markers"]
+        C5["Previous / Next Event"]
+    end
+
+    subgraph Right["Right Review Panel"]
+        R1["SRT Suggestions"]
+        R2["Editable Caption Text"]
+        R3["Accept / Reject"]
+        R4["Confidence Scores"]
+        R5["Decision Reason"]
+        R6["Error / Warning Badges"]
+    end
+
+    subgraph Bottom["Bottom Panel"]
+        B1["Event Table"]
+        B2["Timestamps"]
+        B3["Audio + Reaction Scores"]
+        B4["Status"]
+    end
+
+    L1 --> L3
+    T1 --> L3
+    T2 --> L3
+    T3 --> L3
+    T4 --> L3
+    L3 --> C1
+    L3 --> R1
+    L3 --> B1
+    C4 --> R1
+    B1 --> R1
+    R3 --> L4
+    R2 --> L4
+```
+
+### Timeline Behavior
+
+The timeline should show event markers:
+
+- Green: accepted caption
+- Yellow: needs review
+- Gray: rejected
+- Blue: currently selected event
+
+Clicking a marker should:
+
+- seek the video to that timestamp
+- open the suggestion in the right review panel
+- highlight the corresponding event table row
+
+### Editor Review Flow
+
+1. User selects or uploads a video.
+2. User selects language and device mode.
+3. User clicks **Start Caption**.
+4. Pipeline generates caption suggestions.
+5. User reviews suggestions in the right panel.
+6. User edits captions if needed.
+7. User accepts or rejects suggestions.
+8. User exports final SRT.
+
+```mermaid
+sequenceDiagram
+    actor Editor
+    participant UI as Web UI
+    participant Pipeline as Core Pipeline
+    participant Review as Review State
+    participant Export as Exporter
+
+    Editor->>UI: Select video, language, device
+    Editor->>UI: Click Start Caption
+    UI->>Pipeline: Run analysis with config
+    Pipeline-->>UI: Caption suggestions + diagnostics
+    UI->>Review: Load suggestions into timeline and panel
+    Editor->>Review: Jump to event marker
+    Editor->>Review: Edit caption text
+    Editor->>Review: Accept or reject suggestion
+    Editor->>UI: Export final SRT
+    UI->>Export: Export accepted captions
+    Export-->>Editor: SRT / JSON / CSV files
+```
+
+### UI Error Handling
+
+The UI should include:
+
+- toast notifications for recoverable errors
+- modal popups for blocking errors
+- expandable debug details
+- retry on CPU button when GPU fails
+- model download/setup hints
+- export success messages
+
+## Output Files
+
+Each run should produce a run directory:
+
+```text
+outputs/
+  sample-video/
+    captions.en.srt
+    captions.hi.srt
+    results.json
+    events.csv
+    diagnostics.json
+    config.yaml
+```
+
+### SRT
+
+Only accepted caption suggestions.
+
+### JSON
+
+Full structured output:
+
+- accepted suggestions
+- rejected candidates
+- confidence scores
+- reaction signals
+- decision reasons
+- diagnostics
+
+### CSV
+
+Reviewer-friendly table for spreadsheets.
+
+## Installation Plan
+
+Start with requirements files:
+
+```text
+requirements.txt
+requirements-ui.txt
+requirements-dev.txt
+requirements-translate.txt
+```
+
+Recommended split:
+
+- `requirements.txt`: core CPU pipeline
+- `requirements-ui.txt`: Streamlit/Web UI
+- `requirements-dev.txt`: test, lint, formatting, docs
+- `requirements-translate.txt`: IndicTrans2 or translation extras
+
+Example install flow:
+
+```bash
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+pip install -r requirements-ui.txt
+ccs doctor
+```
+
+GPU installation should be documented separately because CUDA-compatible PyTorch/TensorFlow installation depends on the user's system.
+
+Docker should be added later for reproducibility, but requirements files are easier for first-time contributors.
+
+## Configuration Plan
+
+Use YAML config files for reproducible runs.
+
+Example settings:
+
+```yaml
+device: auto
+language: en
+audio_backend: yamnet
+vision_backend: mediapipe
+audio_threshold: 0.45
+reaction_threshold: 0.35
+decision_threshold: 0.65
+min_event_duration: 0.25
+merge_gap: 0.40
+sample_window_before: 1.0
+sample_window_after: 1.0
+```
+
+CLI flags should override config values.
+
+## Evaluation Plan
+
+The tool should be evaluated on a small sample set of Hindi and regional-language content.
+
+### Suggested Evaluation Data
+
+Initial languages:
+
+- Hindi
+- Tamil
+- Telugu
+- Bengali
+- Marathi
+- Malayalam
+
+Video types:
+
+- educational videos
+- conversational scenes
+- public-service clips
+- classroom-style videos
+- documentary-style clips
+
+### Annotation Process
+
+Editors should review:
+
+- whether the suggested caption is needed
+- whether the label is correct
+- whether the timestamp is correct
+- whether any important sound was missed
+- whether any unnecessary sound was captioned
+
+### Metrics
+
+Track:
+
+- audio event precision
+- audio event recall
+- caption decision precision
+- over-captioning rate
+- missed-important-event rate
+- timestamp quality
+- editor acceptance rate
+- average editor correction time
+
+### Feedback Fields
+
+Suggested review CSV columns:
+
+```text
+video_id
+event_id
+start_time
+end_time
+caption_text
+audio_confidence
+reaction_confidence
+decision_score
+accepted_by_tool
+accepted_by_editor
+editor_corrected_label
+editor_notes
+```
+
+## VLC Plugin Plan
+
+VLC integration is a useful extension, but it should not be part of the midpoint.
+
+Recommended phases:
+
+1. Generate SRT externally and let users load it in VLC.
+2. Add a helper command that analyzes the current video path.
+3. Build a VLC Lua extension that calls the local CLI or local API.
+4. Let the extension load the generated SRT when analysis finishes.
+
+The VLC plugin should use the same core modules indirectly through CLI/API, not duplicate analysis logic.
+
+## Roadmap
+
+### Phase 1: Project Foundation
+
+- Define module interfaces.
+- Create data models.
+- Add config system.
+- Add diagnostics and friendly errors.
+- Add README and project documentation.
+
+### Phase 2: Goal 1 Audio Detection
+
+- Extract audio with ffmpeg.
+- Add DSP baseline.
+- Add YAMNet backend.
+- Add event smoothing and label normalization.
+- Export audio event JSON.
+
+### Phase 3: Goal 2 Visual Reaction Detection
+
+- Sample event-aligned frames.
+- Add OpenCV optical flow features.
+- Add MediaPipe face/pose backends.
+- Compute reaction confidence.
+- Attach reaction data to audio events.
+
+### Phase 4: Goal 3 Decision and Output
+
+- Add decision scorer.
+- Add event importance rules.
+- Add ambient rejection logic.
+- Add SRT/JSON/CSV export.
+- Add multilingual caption label glossary.
+
+### Phase 5: CLI Productization
+
+- Add complete CLI commands.
+- Add typo suggestions.
+- Add error recovery suggestions.
+- Add `doctor` diagnostics.
+- Add CPU/GPU fallback behavior.
+
+### Phase 6: Web Editor Review UI
+
+- Add video selector/upload.
+- Add Start Caption button.
+- Add review panel.
+- Add timeline event markers.
+- Add accept/reject/edit flow.
+- Add export buttons.
+- Add error popups and diagnostics panel.
+
+### Phase 7: Advanced Backends
+
+- Add PANNs backend.
+- Add AST or BEATs backend.
+- Add optional translation backend.
+- Add stronger visual backends if needed.
+
+### Phase 8: Evaluation and Packaging
+
+- Evaluate on Hindi and regional-language videos.
+- Collect editor feedback.
+- Improve thresholds and label glossary.
+- Add Docker.
+- Add VLC integration prototype.
+
+```mermaid
+flowchart LR
+    P1["Phase 1\nProject Foundation"] --> P2["Phase 2\nAudio Detection"]
+    P2 --> P3["Phase 3\nVisual Reaction Detection"]
+    P3 --> P4["Phase 4\nDecision + Output"]
+    P4 --> P5["Phase 5\nCLI Productization"]
+    P5 --> P6["Phase 6\nWeb Editor UI"]
+    P6 --> P7["Phase 7\nAdvanced Backends"]
+    P7 --> P8["Phase 8\nEvaluation + Packaging"]
+
+    P2 -.->|Midpoint Goal 1| M["Midpoint\nGoals 1 + 2 complete"]
+    P3 -.->|Midpoint Goal 2| M
+```
+
+## Open Questions
+
+- Should caption labels be formal, conversational, or broadcaster-style in each language?
+- Should laughter be treated as speech-adjacent or non-speech for this project?
+- Should music be captioned only when it begins/stops or changes mood?
+- How should overlapping non-speech events be represented in SRT?
+- What timestamp tolerance is acceptable for editors?
+- Will sample videos be provided with editor-approved ground truth?
+- Should the Web UI support manual timestamp adjustment in Version 1?
+- Should rejected events be shown by default or hidden under debug/review mode?
+
+## Contribution Guidelines
+
+Contributors should follow these principles:
+
+- Keep backend logic independent from UI.
+- Add new models through backend interfaces.
+- Preserve JSON output compatibility where possible.
+- Include tests for decision rules and output formatting.
+- Prefer readable, debuggable logic over opaque automation.
+- Avoid over-captioning as a core product principle.
+- Document new event labels in the multilingual glossary.
+
+## Proposed Tech Stack
+
+Core:
+
+- Python
+- ffmpeg
+- OpenCV
+- NumPy
+- SciPy/librosa-style audio features
+- PyTorch and/or TensorFlow depending on model backend
+
+Audio:
+
+- DSP baseline
+- YAMNet
+- PANNs
+- AST or BEATs as optional advanced backends
+
+Vision:
+
+- OpenCV
+- MediaPipe Face Landmarker
+- MediaPipe Pose Landmarker
+- optional MMPose/MMAction2 later
+
+CLI:
+
+- Typer
+- Rich
+
+Web UI:
+
+- Streamlit first
+- optional FastAPI/React later
+
+Translation:
+
+- curated glossary first
+- IndicTrans2 fallback later
+
+Testing:
+
+- pytest
+- small synthetic fixtures
+- sample video integration tests
+
+## Success Criteria
+
+The project is successful when:
+
+- It accepts raw videos without subtitles or transcripts.
+- It detects non-speech audio events with timestamps.
+- It estimates visible reaction confidence around each event.
+- It avoids captioning low-impact ambient sounds.
+- It exports clean SRT files.
+- It provides useful debug information.
+- It runs on CPU and can use GPU when available.
+- It lets editors review, edit, accept, reject, and export suggestions.
+- It supports English plus initial Indian regional-language caption labels.
+
+## Summary
+
+This project should be built as a modular open-source tool:
+
+- The backend pipeline does the real work.
+- The CLI provides batch and debug workflows.
+- The Web UI provides editor review and export workflows.
+- Future VLC or API integrations reuse the same modules.
+
+The first implementation should prioritize a reliable, explainable pipeline using DSP, YAMNet, OpenCV, MediaPipe, and a rule-based decision engine. Stronger audio/video models can be added later through the pluggable backend system.
+# Intelligent-Closed-Caption-CC-Suggestion-Tool
diff --git a/demo_videos/drivelink b/demo_videos/drivelink
new file mode 100644
index 0000000..8c9a4cc
--- /dev/null
+++ b/demo_videos/drivelink
@@ -0,0 +1,3 @@
+https://drive.google.com/drive/folders/1Ti5aqztP9VHas_5AbrH7utSn-G27HZXW?usp=sharing
+
+demo videos and recording
diff --git a/demo_videos/output/vid1.reviewed.en.srt b/demo_videos/output/vid1.reviewed.en.srt
new file mode 100644
index 0000000..6f6b2f5
--- /dev/null
+++ b/demo_videos/output/vid1.reviewed.en.srt
@@ -0,0 +1,7 @@
+1
+00:00:08,640 --> 00:00:09,600
+[laughter]
+
+2
+00:00:55,680 --> 00:00:59,040
+[students applauding]
diff --git a/demo_videos/output/vid10.reviewed.en.srt b/demo_videos/output/vid10.reviewed.en.srt
new file mode 100644
index 0000000..30f1961
--- /dev/null
+++ b/demo_videos/output/vid10.reviewed.en.srt
@@ -0,0 +1,31 @@
+1
+00:00:18,720 --> 00:00:19,680
+[explosion]
+
+2
+00:00:19,200 --> 00:00:20,640
+[explosion]
+
+3
+00:00:31,200 --> 00:00:39,840
+[music]
+
+4
+00:00:51,360 --> 00:00:55,200
+[music]
+
+5
+00:00:58,560 --> 00:01:00,000
+[explosion]
+
+6
+00:01:09,600 --> 00:01:11,520
+[music]
+
+7
+00:01:36,960 --> 00:01:38,880
+[music]
+
+8
+00:02:04,800 --> 00:02:06,720
+[explosion]
diff --git a/demo_videos/output/vid11.reviewed.en.srt b/demo_videos/output/vid11.reviewed.en.srt
new file mode 100644
index 0000000..e4bb91c
--- /dev/null
+++ b/demo_videos/output/vid11.reviewed.en.srt
@@ -0,0 +1,31 @@
+1
+00:00:10,560 --> 00:00:11,520
+[explosion]
+
+2
+00:00:10,560 --> 00:00:11,520
+[gunshot]
+
+3
+00:00:12,960 --> 00:00:13,920
+[gunshot]
+
+4
+00:00:12,960 --> 00:00:13,920
+[explosion]
+
+5
+00:00:14,400 --> 00:00:16,320
+[music]
+
+6
+00:00:23,520 --> 00:00:33,120
+[music]
+
+7
+00:01:31,200 --> 00:01:36,480
+[music]
+
+8
+00:01:54,720 --> 00:01:58,080
+[music]
diff --git a/demo_videos/output/vid2.reviewed.en.srt b/demo_videos/output/vid2.reviewed.en.srt
new file mode 100644
index 0000000..e837348
--- /dev/null
+++ b/demo_videos/output/vid2.reviewed.en.srt
@@ -0,0 +1,19 @@
+1
+00:00:02,880 --> 00:00:14,880
+[music]
+
+2
+00:00:15,360 --> 00:00:18,720
+[music]
+
+3
+00:01:12,960 --> 00:01:13,920
+[explosion]
+
+4
+00:01:12,960 --> 00:01:13,920
+[gunshot]
+
+5
+00:01:13,440 --> 00:01:14,400
+[explosion]
diff --git a/demo_videos/output/vid3.reviewed.en.srt b/demo_videos/output/vid3.reviewed.en.srt
new file mode 100644
index 0000000..ef6c046
--- /dev/null
+++ b/demo_videos/output/vid3.reviewed.en.srt
@@ -0,0 +1,31 @@
+1
+00:00:23,520 --> 00:00:24,480
+[gunshot]
+
+2
+00:00:26,400 --> 00:00:27,360
+[glass breaks]
+
+3
+00:00:47,520 --> 00:00:48,480
+[explosion]
+
+4
+00:00:50,400 --> 00:00:53,760
+[music]
+
+5
+00:00:59,040 --> 00:01:08,640
+[music]
+
+6
+00:01:11,040 --> 00:01:12,480
+[explosion]
+
+7
+00:01:16,320 --> 00:01:17,280
+[gunshot]
+
+8
+00:01:16,320 --> 00:01:17,760
+[explosion]
diff --git a/demo_videos/output/vid4.reviewed.en.srt b/demo_videos/output/vid4.reviewed.en.srt
new file mode 100644
index 0000000..5356448
--- /dev/null
+++ b/demo_videos/output/vid4.reviewed.en.srt
@@ -0,0 +1,7 @@
+1
+00:00:00,480 --> 00:00:04,320
+[music]
+
+2
+00:00:11,520 --> 00:00:19,680
+[music]
diff --git a/demo_videos/output/vid5.reviewed.en.srt b/demo_videos/output/vid5.reviewed.en.srt
new file mode 100644
index 0000000..01bfe5d
--- /dev/null
+++ b/demo_videos/output/vid5.reviewed.en.srt
@@ -0,0 +1,7 @@
+1
+00:00:00,480 --> 00:00:05,280
+[music]
+
+2
+00:00:05,760 --> 00:00:12,960
+[music]
diff --git a/main/.gitignore b/main/.gitignore
new file mode 100644
index 0000000..5c2d003
--- /dev/null
+++ b/main/.gitignore
@@ -0,0 +1,12 @@
+__pycache__/
+*.py[cod]
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+.venv/
+outputs/
+*.wav
+*.mp4
+*.mkv
+*.mov
+*.avi
diff --git a/main/README.md b/main/README.md
new file mode 100644
index 0000000..fc64fbd
--- /dev/null
+++ b/main/README.md
@@ -0,0 +1,235 @@
+# cc-suggester
+
+Python implementation for the Intelligent Closed Caption Suggestion Tool.
+
+This package generates meaningful non-speech closed caption suggestions from video. The current implementation is a runnable foundation: it proves the modular pipeline, CLI, diagnostics, decision engine, multilingual labels, and SRT/JSON/CSV export flow before heavy ML backends are added.
+
+## Current Implementation Status
+
+Implemented now:
+
+- `cc_suggester.core`: pipeline orchestration, config, shared data models, diagnostics, media inspection, friendly errors
+- `cc_suggester.audio`: audio backend interface, deterministic mock backend, DSP backend, event smoothing, ffmpeg extraction helper, advanced backend placeholders
+- `cc_suggester.vision`: vision backend interface, deterministic mock backend, OpenCV backend, optional MediaPipe pose backend, frame-sampling and reaction helpers
+- `cc_suggester.decision`: scoring rules, ambient penalties, multilingual caption glossary
+- `cc_suggester.output`: SRT, JSON, CSV, and reviewed export helpers
+- `cc_suggester.cli`: `analyze`, `audio`, `inspect`, `doctor`, `export`, `labels`, and `web` commands
+- `cc_suggester.ui`: Streamlit editor review client with edited SRT/CSV/session downloads
+- `tests`: tests for SRT output, label lookup, config/CLI behavior, DSP detection, and reviewed exports
+
+Not implemented yet:
+
+- Real YAMNet/PANNs/AST/BEATs semantic audio backend
+- MediaPipe face-landmark/expression reaction scoring
+- Advanced Streamlit timeline editing and persisted review sessions
+- Real evaluation dataset and editor feedback loop
+- Docker and VLC integration
+
+The full roadmap is documented in [`../docs/implementation-plan.md`](../docs/implementation-plan.md).
+
+## Setup
+
+The current scaffold uses only the Python standard library for the core pipeline.
+
+```bash
+cd main
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+
+For development tests:
+
+```bash
+pip install -r requirements-dev.txt
+```
+
+For the Web UI:
+
+```bash
+pip install -r requirements-ui.txt
+```
+
+For the OpenCV vision backend:
+
+```bash
+pip install -r requirements-vision.txt
+```
+
+`requirements-vision.txt` also includes MediaPipe for the optional pose-based reaction backend.
+
+## CLI Usage
+
+Run diagnostics:
+
+```bash
+python -m cc_suggester doctor
+```
+
+Inspect a video:
+
+```bash
+python -m cc_suggester inspect path/to/video.mp4
+```
+
+Run the current mock pipeline:
+
+```bash
+python -m cc_suggester analyze path/to/video.mp4 --lang hi --device auto --out outputs/
+```
+
+Run the CPU DSP audio baseline:
+
+```bash
+python -m cc_suggester analyze path/to/video.mp4 --audio-backend dsp --vision-backend mock --lang en
+```
+
+Run only audio detection:
+
+```bash
+python -m cc_suggester audio path/to/video.mp4 --audio-backend dsp --out outputs/
+```
+
+Run only visual reaction scoring from an audio report:
+
+```bash
+python -m cc_suggester vision path/to/video.mp4 outputs/video/audio_events.json --vision-backend opencv
+```
+
+Run the optional YAMNet backend after installing audio dependencies:
+
+```bash
+pip install -r requirements-audio.txt
+python -m cc_suggester audio path/to/video.mp4 --audio-backend yamnet --out outputs/
+```
+
+For offline environments, point YAMNet to a local TensorFlow Hub model directory:
+
+```bash
+python -m cc_suggester audio path/to/video.mp4 \
+  --audio-backend yamnet \
+  --yamnet-model /path/to/local/yamnet
+```
+
+Export another language from an existing JSON report:
+
+```bash
+python -m cc_suggester export outputs/video/results.json --format srt --lang ml
+```
+
+Show Web UI guidance:
+
+```bash
+python -m cc_suggester web
+```
+
+List supported labels:
+
+```bash
+python -m cc_suggester labels
+```
+
+The installed package will expose the same CLI as `ccs`:
+
+```bash
+ccs analyze path/to/video.mp4 --lang hi --device auto
+```
+
+## Output Files
+
+Each analysis run creates a directory under `outputs/`:
+
+```text
+outputs/
+  video-name/
+    captions.<lang>.srt
+    results.json
+    events.csv
+    diagnostics.json
+    config.json
+```
+
+`captions.<lang>.srt` contains only accepted captions. `results.json` and `events.csv` include accepted, rejected, and review-needed candidates for debugging and editor review.
+
+The Streamlit UI can also export reviewed SRT, CSV, and JSON session content from the current editor choices. This means edited caption text and manual accept/reject/review decisions drive the downloaded files.
+
+## Backend Strategy
+
+Backends are intentionally pluggable.
+
+Audio backends implement:
+
+```text
+detect(video_path, metadata, config) -> list[AudioEventCandidate]
+```
+
+Vision backends implement:
+
+```text
+analyze(video_path, metadata, audio_events, config) -> list[ReactionResult]
+```
+
+The DSP audio backend and OpenCV vision backend are available as local baselines. YAMNet is implemented as an optional TensorFlow Hub backend and requires `requirements-audio.txt`. MediaPipe is implemented as an optional pose-based reaction backend and requires `requirements-vision.txt`. Mock backends should remain available for tests and demos.
+
+## Verification
+
+Run syntax checks:
+
+```bash
+python -m compileall cc_suggester
+```
+
+Run tests:
+
+```bash
+python -m pytest tests
+```
+
+Run CLI smoke checks:
+
+```bash
+python -m cc_suggester doctor
+python -m cc_suggester analize
+python -m cc_suggester analyze README.md --lang hi --device auto --out outputs
+python -m cc_suggester export outputs/README/results.json --format srt --lang ml --out outputs/README/captions.ml.srt
+python -m cc_suggester labels
+python -m cc_suggester vision tests/fixtures/sample_classroom.mp4 outputs/sample_classroom/audio_events.json --vision-backend opencv
+```
+
+The `analize` command is intentionally useful as a smoke check for friendly typo suggestions.
+
+## Real Sample Video Fixture
+
+Generate a tiny deterministic MP4 fixture for local integration testing:
+
+```bash
+python scripts/generate_sample_video.py
+```
+
+Then run:
+
+```bash
+python -m cc_suggester inspect tests/fixtures/sample_classroom.mp4
+python -m cc_suggester analyze tests/fixtures/sample_classroom.mp4 --audio-backend dsp --vision-backend mock --lang hi
+```
+
+If `ffmpeg` is available, the MP4 includes embedded audio. If `ffmpeg` is unavailable but OpenCV is installed, the script writes a video-only MP4 plus a sidecar WAV file:
+
+```bash
+python -m cc_suggester analyze tests/fixtures/sample_classroom.mp4 \
+  --audio-backend dsp \
+  --vision-backend opencv \
+  --audio-path tests/fixtures/sample_classroom.wav \
+  --lang hi
+```
+
+## Immediate Next Sprint
+
+1. Test YAMNet with an installed TensorFlow/TensorFlow Hub environment and a cached/local model.
+2. Test MediaPipe in an environment with `requirements-vision.txt` installed and tune pose thresholds.
+3. Add face-landmark/expression scoring to the MediaPipe backend.
+4. Add more decision-rule and backend dependency tests.
+5. Add timeline markers and persisted review sessions to the Streamlit editor.
+6. Add evaluation scripts for editor feedback.
+
+After that, add evaluation scripts and package the CPU pipeline with Docker.
diff --git a/main/cc_suggester.egg-info/PKG-INFO b/main/cc_suggester.egg-info/PKG-INFO
new file mode 100644
index 0000000..276fdb9
--- /dev/null
+++ b/main/cc_suggester.egg-info/PKG-INFO
@@ -0,0 +1,254 @@
+Metadata-Version: 2.4
+Name: cc-suggester
+Version: 0.1.0
+Summary: AI-assisted non-speech closed caption suggestion pipeline.
+Author: Planet Read project contributor
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Provides-Extra: audio
+Requires-Dist: numpy>=1.26; extra == "audio"
+Requires-Dist: tensorflow>=2.16; extra == "audio"
+Requires-Dist: tensorflow-hub>=0.16; extra == "audio"
+Provides-Extra: ui
+Requires-Dist: streamlit>=1.34; extra == "ui"
+Provides-Extra: vision
+Requires-Dist: opencv-python>=4.8; extra == "vision"
+Requires-Dist: mediapipe>=0.10; extra == "vision"
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0; extra == "dev"
+
+# cc-suggester
+
+Python implementation for the Intelligent Closed Caption Suggestion Tool.
+
+This package generates meaningful non-speech closed caption suggestions from video. The current implementation is a runnable foundation: it proves the modular pipeline, CLI, diagnostics, decision engine, multilingual labels, and SRT/JSON/CSV export flow before heavy ML backends are added.
+
+## Current Implementation Status
+
+Implemented now:
+
+- `cc_suggester.core`: pipeline orchestration, config, shared data models, diagnostics, media inspection, friendly errors
+- `cc_suggester.audio`: audio backend interface, deterministic mock backend, DSP backend, event smoothing, ffmpeg extraction helper, advanced backend placeholders
+- `cc_suggester.vision`: vision backend interface, deterministic mock backend, OpenCV backend, optional MediaPipe pose backend, frame-sampling and reaction helpers
+- `cc_suggester.decision`: scoring rules, ambient penalties, multilingual caption glossary
+- `cc_suggester.output`: SRT, JSON, CSV, and reviewed export helpers
+- `cc_suggester.cli`: `analyze`, `audio`, `inspect`, `doctor`, `export`, `labels`, and `web` commands
+- `cc_suggester.ui`: Streamlit editor review client with edited SRT/CSV/session downloads
+- `tests`: tests for SRT output, label lookup, config/CLI behavior, DSP detection, and reviewed exports
+
+Not implemented yet:
+
+- Real YAMNet/PANNs/AST/BEATs semantic audio backend
+- MediaPipe face-landmark/expression reaction scoring
+- Advanced Streamlit timeline editing and persisted review sessions
+- Real evaluation dataset and editor feedback loop
+- Docker and VLC integration
+
+The full roadmap is documented in [`../docs/implementation-plan.md`](../docs/implementation-plan.md).
+
+## Setup
+
+The current scaffold uses only the Python standard library for the core pipeline.
+
+```bash
+cd main
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+
+For development tests:
+
+```bash
+pip install -r requirements-dev.txt
+```
+
+For the Web UI:
+
+```bash
+pip install -r requirements-ui.txt
+```
+
+For the OpenCV vision backend:
+
+```bash
+pip install -r requirements-vision.txt
+```
+
+`requirements-vision.txt` also includes MediaPipe for the optional pose-based reaction backend.
+
+## CLI Usage
+
+Run diagnostics:
+
+```bash
+python -m cc_suggester doctor
+```
+
+Inspect a video:
+
+```bash
+python -m cc_suggester inspect path/to/video.mp4
+```
+
+Run the current mock pipeline:
+
+```bash
+python -m cc_suggester analyze path/to/video.mp4 --lang hi --device auto --out outputs/
+```
+
+Run the CPU DSP audio baseline:
+
+```bash
+python -m cc_suggester analyze path/to/video.mp4 --audio-backend dsp --vision-backend mock --lang en
+```
+
+Run only audio detection:
+
+```bash
+python -m cc_suggester audio path/to/video.mp4 --audio-backend dsp --out outputs/
+```
+
+Run only visual reaction scoring from an audio report:
+
+```bash
+python -m cc_suggester vision path/to/video.mp4 outputs/video/audio_events.json --vision-backend opencv
+```
+
+Run the optional YAMNet backend after installing audio dependencies:
+
+```bash
+pip install -r requirements-audio.txt
+python -m cc_suggester audio path/to/video.mp4 --audio-backend yamnet --out outputs/
+```
+
+For offline environments, point YAMNet to a local TensorFlow Hub model directory:
+
+```bash
+python -m cc_suggester audio path/to/video.mp4 \
+  --audio-backend yamnet \
+  --yamnet-model /path/to/local/yamnet
+```
+
+Export another language from an existing JSON report:
+
+```bash
+python -m cc_suggester export outputs/video/results.json --format srt --lang ml
+```
+
+Show Web UI guidance:
+
+```bash
+python -m cc_suggester web
+```
+
+List supported labels:
+
+```bash
+python -m cc_suggester labels
+```
+
+The installed package will expose the same CLI as `ccs`:
+
+```bash
+ccs analyze path/to/video.mp4 --lang hi --device auto
+```
+
+## Output Files
+
+Each analysis run creates a directory under `outputs/`:
+
+```text
+outputs/
+  video-name/
+    captions.<lang>.srt
+    results.json
+    events.csv
+    diagnostics.json
+    config.json
+```
+
+`captions.<lang>.srt` contains only accepted captions. `results.json` and `events.csv` include accepted, rejected, and review-needed candidates for debugging and editor review.
+
+The Streamlit UI can also export reviewed SRT, CSV, and JSON session content from the current editor choices. This means edited caption text and manual accept/reject/review decisions drive the downloaded files.
+
+## Backend Strategy
+
+Backends are intentionally pluggable.
+
+Audio backends implement:
+
+```text
+detect(video_path, metadata, config) -> list[AudioEventCandidate]
+```
+
+Vision backends implement:
+
+```text
+analyze(video_path, metadata, audio_events, config) -> list[ReactionResult]
+```
+
+The DSP audio backend and OpenCV vision backend are available as local baselines. YAMNet is implemented as an optional TensorFlow Hub backend and requires `requirements-audio.txt`. MediaPipe is implemented as an optional pose-based reaction backend and requires `requirements-vision.txt`. Mock backends should remain available for tests and demos.
+
+## Verification
+
+Run syntax checks:
+
+```bash
+python -m compileall cc_suggester
+```
+
+Run tests:
+
+```bash
+python -m pytest tests
+```
+
+Run CLI smoke checks:
+
+```bash
+python -m cc_suggester doctor
+python -m cc_suggester analize
+python -m cc_suggester analyze README.md --lang hi --device auto --out outputs
+python -m cc_suggester export outputs/README/results.json --format srt --lang ml --out outputs/README/captions.ml.srt
+python -m cc_suggester labels
+python -m cc_suggester vision tests/fixtures/sample_classroom.mp4 outputs/sample_classroom/audio_events.json --vision-backend opencv
+```
+
+The `analize` command is intentionally useful as a smoke check for friendly typo suggestions.
+
+## Real Sample Video Fixture
+
+Generate a tiny deterministic MP4 fixture for local integration testing:
+
+```bash
+python scripts/generate_sample_video.py
+```
+
+Then run:
+
+```bash
+python -m cc_suggester inspect tests/fixtures/sample_classroom.mp4
+python -m cc_suggester analyze tests/fixtures/sample_classroom.mp4 --audio-backend dsp --vision-backend mock --lang hi
+```
+
+If `ffmpeg` is available, the MP4 includes embedded audio. If `ffmpeg` is unavailable but OpenCV is installed, the script writes a video-only MP4 plus a sidecar WAV file:
+
+```bash
+python -m cc_suggester analyze tests/fixtures/sample_classroom.mp4 \
+  --audio-backend dsp \
+  --vision-backend opencv \
+  --audio-path tests/fixtures/sample_classroom.wav \
+  --lang hi
+```
+
+## Immediate Next Sprint
+
+1. Test YAMNet with an installed TensorFlow/TensorFlow Hub environment and a cached/local model.
+2. Test MediaPipe in an environment with `requirements-vision.txt` installed and tune pose thresholds.
+3. Add face-landmark/expression scoring to the MediaPipe backend.
+4. Add more decision-rule and backend dependency tests.
+5. Add timeline markers and persisted review sessions to the Streamlit editor.
+6. Add evaluation scripts for editor feedback.
+
+After that, add evaluation scripts and package the CPU pipeline with Docker.
diff --git a/main/cc_suggester.egg-info/SOURCES.txt b/main/cc_suggester.egg-info/SOURCES.txt
new file mode 100644
index 0000000..aefcf8f
--- /dev/null
+++ b/main/cc_suggester.egg-info/SOURCES.txt
@@ -0,0 +1,61 @@
+README.md
+pyproject.toml
+cc_suggester/__init__.py
+cc_suggester/__main__.py
+cc_suggester.egg-info/PKG-INFO
+cc_suggester.egg-info/SOURCES.txt
+cc_suggester.egg-info/dependency_links.txt
+cc_suggester.egg-info/entry_points.txt
+cc_suggester.egg-info/requires.txt
+cc_suggester.egg-info/top_level.txt
+cc_suggester/audio/__init__.py
+cc_suggester/audio/dsp.py
+cc_suggester/audio/events.py
+cc_suggester/audio/extractor.py
+cc_suggester/audio/label_mapping.py
+cc_suggester/audio/vad.py
+cc_suggester/audio/wav.py
+cc_suggester/audio/backends/__init__.py
+cc_suggester/audio/backends/base.py
+cc_suggester/audio/backends/dsp.py
+cc_suggester/audio/backends/mock.py
+cc_suggester/audio/backends/unavailable.py
+cc_suggester/audio/backends/yamnet.py
+cc_suggester/cli/__init__.py
+cc_suggester/cli/app.py
+cc_suggester/core/__init__.py
+cc_suggester/core/config.py
+cc_suggester/core/diagnostics.py
+cc_suggester/core/errors.py
+cc_suggester/core/media.py
+cc_suggester/core/pipeline.py
+cc_suggester/core/types.py
+cc_suggester/decision/__init__.py
+cc_suggester/decision/labels.py
+cc_suggester/decision/rules.py
+cc_suggester/decision/scorer.py
+cc_suggester/output/__init__.py
+cc_suggester/output/csv_report.py
+cc_suggester/output/json_report.py
+cc_suggester/output/review_export.py
+cc_suggester/output/srt.py
+cc_suggester/translation/__init__.py
+cc_suggester/translation/glossary.py
+cc_suggester/ui/__init__.py
+cc_suggester/ui/streamlit_app.py
+cc_suggester/vision/__init__.py
+cc_suggester/vision/frame_sampler.py
+cc_suggester/vision/optical_flow.py
+cc_suggester/vision/reactions.py
+cc_suggester/vision/backends/__init__.py
+cc_suggester/vision/backends/base.py
+cc_suggester/vision/backends/mediapipe.py
+cc_suggester/vision/backends/mock.py
+cc_suggester/vision/backends/opencv.py
+tests/test_config_cli.py
+tests/test_dsp_backend.py
+tests/test_outputs.py
+tests/test_real_video_integration.py
+tests/test_review_export.py
+tests/test_vision_pipeline.py
+tests/test_yamnet_backend.py
\ No newline at end of file
diff --git a/main/cc_suggester.egg-info/dependency_links.txt b/main/cc_suggester.egg-info/dependency_links.txt
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/main/cc_suggester.egg-info/dependency_links.txt
@@ -0,0 +1 @@
+
diff --git a/main/cc_suggester.egg-info/entry_points.txt b/main/cc_suggester.egg-info/entry_points.txt
new file mode 100644
index 0000000..3b01b60
--- /dev/null
+++ b/main/cc_suggester.egg-info/entry_points.txt
@@ -0,0 +1,2 @@
+[console_scripts]
+ccs = cc_suggester.cli.app:main
diff --git a/main/cc_suggester.egg-info/requires.txt b/main/cc_suggester.egg-info/requires.txt
new file mode 100644
index 0000000..b339e44
--- /dev/null
+++ b/main/cc_suggester.egg-info/requires.txt
@@ -0,0 +1,15 @@
+
+[audio]
+numpy>=1.26
+tensorflow>=2.16
+tensorflow-hub>=0.16
+
+[dev]
+pytest>=8.0
+
+[ui]
+streamlit>=1.34
+
+[vision]
+opencv-python>=4.8
+mediapipe>=0.10
diff --git a/main/cc_suggester.egg-info/top_level.txt b/main/cc_suggester.egg-info/top_level.txt
new file mode 100644
index 0000000..4ebf94e
--- /dev/null
+++ b/main/cc_suggester.egg-info/top_level.txt
@@ -0,0 +1 @@
+cc_suggester
diff --git a/main/cc_suggester/__init__.py b/main/cc_suggester/__init__.py
new file mode 100644
index 0000000..d525093
--- /dev/null
+++ b/main/cc_suggester/__init__.py
@@ -0,0 +1,3 @@
+"""Intelligent Closed Caption Suggestion Tool."""
+
+__version__ = "0.1.0"
diff --git a/main/cc_suggester/__main__.py b/main/cc_suggester/__main__.py
new file mode 100644
index 0000000..ec49b7e
--- /dev/null
+++ b/main/cc_suggester/__main__.py
@@ -0,0 +1,7 @@
+"""Run the CLI with ``python -m cc_suggester``."""
+
+from cc_suggester.cli.app import main
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/main/cc_suggester/audio/__init__.py b/main/cc_suggester/audio/__init__.py
new file mode 100644
index 0000000..08392d6
--- /dev/null
+++ b/main/cc_suggester/audio/__init__.py
@@ -0,0 +1 @@
+"""Audio event detection modules."""
diff --git a/main/cc_suggester/audio/backends/__init__.py b/main/cc_suggester/audio/backends/__init__.py
new file mode 100644
index 0000000..7866ed7
--- /dev/null
+++ b/main/cc_suggester/audio/backends/__init__.py
@@ -0,0 +1,30 @@
+"""Audio backend registry."""
+
+from cc_suggester.audio.backends.base import AudioBackend
+from cc_suggester.audio.backends.dsp import DspAudioBackend
+from cc_suggester.audio.backends.mock import MockAudioBackend
+from cc_suggester.audio.backends.unavailable import UnavailableAudioBackend
+from cc_suggester.audio.backends.yamnet import YamnetAudioBackend
+
+
+def get_audio_backend(name: str) -> AudioBackend:
+    """Return an audio backend by name."""
+
+    normalized = name.lower().strip()
+    if normalized in {"mock", "demo"}:
+        return MockAudioBackend()
+    if normalized in {"dsp", "energy"}:
+        return DspAudioBackend()
+    if normalized == "yamnet":
+        return YamnetAudioBackend()
+    if normalized == "panns":
+        return UnavailableAudioBackend("panns", "Install PyTorch PANNs dependencies and add checkpoint loading.")
+    if normalized == "ast":
+        return UnavailableAudioBackend("ast", "Install AST dependencies and add an AudioSet checkpoint.")
+    if normalized == "beats":
+        return UnavailableAudioBackend("beats", "Install BEATs dependencies and add model checkpoint loading.")
+    if normalized == "clap":
+        return UnavailableAudioBackend("clap", "Install CLAP dependencies for open-vocabulary matching.")
+    raise ValueError(
+        f"Unknown audio backend '{name}'. Available: mock, dsp, yamnet, panns, ast, beats, clap."
+    )
diff --git a/main/cc_suggester/audio/backends/base.py b/main/cc_suggester/audio/backends/base.py
new file mode 100644
index 0000000..09db990
--- /dev/null
+++ b/main/cc_suggester/audio/backends/base.py
@@ -0,0 +1,26 @@
+"""Audio backend interface."""
+
+from __future__ import annotations
+
+from abc import ABC, abstractmethod
+from pathlib import Path
+
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.types import AudioEventCandidate, VideoMetadata
+
+
+class AudioBackend(ABC):
+    """Interface implemented by sound event detection backends."""
+
+    name: str
+    requires_audio_file: bool = False
+    requires_valid_media: bool = False
+
+    @abstractmethod
+    def detect(
+        self,
+        video_path: Path,
+        metadata: VideoMetadata,
+        config: PipelineConfig,
+    ) -> list[AudioEventCandidate]:
+        """Detect non-speech audio event candidates."""
diff --git a/main/cc_suggester/audio/backends/dsp.py b/main/cc_suggester/audio/backends/dsp.py
new file mode 100644
index 0000000..e1bfa1b
--- /dev/null
+++ b/main/cc_suggester/audio/backends/dsp.py
@@ -0,0 +1,148 @@
+"""CPU-friendly DSP audio backend.
+
+This backend performs simple energy/onset style detection from a mono WAV file.
+It is not a semantic classifier like YAMNet or PANNs, but it is useful as a real
+offline baseline and as a candidate-region generator.
+"""
+
+from __future__ import annotations
+
+import math
+import statistics
+from dataclasses import dataclass
+from pathlib import Path
+
+from cc_suggester.audio.backends.base import AudioBackend
+from cc_suggester.audio.extractor import extract_audio
+from cc_suggester.audio.wav import load_wav_mono_pcm
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.types import AudioEventCandidate, VideoMetadata
+
+
+@dataclass(slots=True)
+class EnergyWindow:
+    """RMS summary for a short audio window."""
+
+    start: float
+    end: float
+    rms_norm: float
+
+
+class DspAudioBackend(AudioBackend):
+    """Detect non-speech candidate regions using RMS energy windows."""
+
+    name = "dsp"
+    requires_audio_file = True
+    requires_valid_media = True
+
+    def detect(
+        self,
+        video_path: Path,
+        metadata: VideoMetadata,
+        config: PipelineConfig,
+    ) -> list[AudioEventCandidate]:
+        audio_path = self._audio_path_for(video_path, config)
+        windows = _read_energy_windows(audio_path)
+        if not windows:
+            return []
+
+        values = [window.rms_norm for window in windows]
+        median = statistics.median(values)
+        peak = max(values)
+        adaptive_threshold = max(0.015, median * 3.0, peak * 0.32)
+
+        active = [window for window in windows if window.rms_norm >= adaptive_threshold]
+        groups = _group_windows(active, max_gap=0.35)
+        events: list[AudioEventCandidate] = []
+        for index, group in enumerate(groups, start=1):
+            start = group[0].start
+            end = group[-1].end
+            duration = end - start
+            peak_norm = max(window.rms_norm for window in group)
+            confidence = _confidence(peak_norm, adaptive_threshold)
+            if confidence < config.audio_threshold:
+                continue
+            event_id = _event_id_for(duration, peak_norm)
+            events.append(
+                AudioEventCandidate(
+                    event_id=event_id,
+                    label=event_id.replace("_", " ").title(),
+                    start_time=round(start, 3),
+                    end_time=round(end, 3),
+                    audio_confidence=confidence,
+                    audio_backend=self.name,
+                    raw_class_name="RMS energy candidate",
+                    debug_info={
+                        "audio_path": str(audio_path),
+                        "window_index": index,
+                        "rms_peak": round(peak_norm, 6),
+                        "rms_median": round(median, 6),
+                        "adaptive_threshold": round(adaptive_threshold, 6),
+                        "duration": round(duration, 3),
+                    },
+                )
+            )
+        return events
+
+    def _audio_path_for(self, video_path: Path, config: PipelineConfig) -> Path:
+        if config.sidecar_audio_path is not None:
+            return Path(config.sidecar_audio_path)
+        if video_path.suffix.lower() == ".wav":
+            return video_path
+        run_dir = config.run_dir or config.output_dir / video_path.stem
+        return extract_audio(video_path, run_dir / "artifacts" / "audio.wav")
+
+
+def _read_energy_windows(
+    audio_path: Path,
+    *,
+    window_seconds: float = 0.50,
+    hop_seconds: float = 0.25,
+) -> list[EnergyWindow]:
+    wav = load_wav_mono_pcm(audio_path)
+    samples = wav.samples
+    if not samples:
+        return []
+
+    window_samples = max(1, int(wav.sample_rate * window_seconds))
+    hop_samples = max(1, int(wav.sample_rate * hop_seconds))
+    max_amplitude = float(2 ** (8 * wav.sample_width - 1))
+
+    windows: list[EnergyWindow] = []
+    for start_index in range(0, max(0, len(samples) - window_samples + 1), hop_samples):
+        chunk = samples[start_index : start_index + window_samples]
+        if len(chunk) < window_samples:
+            break
+        start = start_index / wav.sample_rate
+        end = start + window_seconds
+        rms = math.sqrt(sum(sample * sample for sample in chunk) / len(chunk))
+        windows.append(EnergyWindow(start=start, end=end, rms_norm=rms / max_amplitude))
+    return windows
+
+
+def _group_windows(windows: list[EnergyWindow], max_gap: float) -> list[list[EnergyWindow]]:
+    if not windows:
+        return []
+    groups: list[list[EnergyWindow]] = [[windows[0]]]
+    for window in windows[1:]:
+        previous = groups[-1][-1]
+        if window.start - previous.end <= max_gap:
+            groups[-1].append(window)
+        else:
+            groups.append([window])
+    return groups
+
+
+def _confidence(peak_norm: float, threshold: float) -> float:
+    if threshold <= 0:
+        return 0.0
+    ratio = peak_norm / threshold
+    return round(max(0.0, min(0.99, 0.35 + ratio * 0.28)), 3)
+
+
+def _event_id_for(duration: float, peak_norm: float) -> str:
+    if duration <= 0.85 and peak_norm >= 0.08:
+        return "impact_sound"
+    if duration >= 3.0:
+        return "sustained_sound"
+    return "loud_sound"
diff --git a/main/cc_suggester/audio/backends/mock.py b/main/cc_suggester/audio/backends/mock.py
new file mode 100644
index 0000000..9a1956c
--- /dev/null
+++ b/main/cc_suggester/audio/backends/mock.py
@@ -0,0 +1,68 @@
+"""Deterministic demo audio backend.
+
+This backend keeps the first scaffold runnable without large model downloads. It
+will be replaced by YAMNet/PANNs/AST/BEATs implementations through the same
+interface.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from cc_suggester.audio.backends.base import AudioBackend
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.types import AudioEventCandidate, VideoMetadata
+
+
+class MockAudioBackend(AudioBackend):
+    """Return classroom-style non-speech events for pipeline testing."""
+
+    name = "mock"
+
+    def detect(
+        self,
+        video_path: Path,
+        metadata: VideoMetadata,
+        config: PipelineConfig,
+    ) -> list[AudioEventCandidate]:
+        duration = metadata.duration or 402.0
+        anchors = [
+            (0.34, "children_cheer", "Children cheering", 0.91),
+            (0.43, "school_bell", "School bell", 0.86),
+            (0.54, "applause", "Applause", 0.74),
+            (0.71, "chair_scrape", "Chair scrape", 0.58),
+            (0.81, "background_chatter", "Background chatter", 0.52),
+        ]
+        events: list[AudioEventCandidate] = []
+        for ratio, event_id, label, confidence in anchors:
+            start = max(0.0, min(duration - 1.0, duration * ratio))
+            end = min(duration, start + _duration_for(event_id))
+            if confidence < config.audio_threshold:
+                continue
+            events.append(
+                AudioEventCandidate(
+                    event_id=event_id,
+                    label=label,
+                    start_time=round(start, 3),
+                    end_time=round(end, 3),
+                    audio_confidence=confidence,
+                    audio_backend=self.name,
+                    raw_class_name=label,
+                    debug_info={
+                        "source": "deterministic mock backend",
+                        "input_name": video_path.name,
+                    },
+                )
+            )
+        return events
+
+
+def _duration_for(event_id: str) -> float:
+    durations = {
+        "children_cheer": 2.1,
+        "school_bell": 1.6,
+        "applause": 3.3,
+        "chair_scrape": 1.1,
+        "background_chatter": 7.5,
+    }
+    return durations.get(event_id, 1.5)
diff --git a/main/cc_suggester/audio/backends/unavailable.py b/main/cc_suggester/audio/backends/unavailable.py
new file mode 100644
index 0000000..6c92886
--- /dev/null
+++ b/main/cc_suggester/audio/backends/unavailable.py
@@ -0,0 +1,37 @@
+"""Registered placeholders for planned advanced audio backends."""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from cc_suggester.audio.backends.base import AudioBackend
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.errors import BackendUnavailableError
+from cc_suggester.core.types import AudioEventCandidate, VideoMetadata
+
+
+class UnavailableAudioBackend(AudioBackend):
+    """Backend placeholder that explains how to proceed."""
+
+    requires_audio_file = True
+    requires_valid_media = True
+
+    def __init__(self, name: str, install_hint: str) -> None:
+        self.name = name
+        self.install_hint = install_hint
+
+    def detect(
+        self,
+        video_path: Path,
+        metadata: VideoMetadata,
+        config: PipelineConfig,
+    ) -> list[AudioEventCandidate]:
+        raise BackendUnavailableError(
+            message=f"The {self.name} backend is registered but not implemented in this environment yet.",
+            code=f"{self.name}_not_installed",
+            suggestions=[
+                "Use --audio-backend dsp for a real offline CPU baseline.",
+                "Use --audio-backend mock for deterministic demos/tests.",
+                self.install_hint,
+            ],
+        )
diff --git a/main/cc_suggester/audio/backends/yamnet.py b/main/cc_suggester/audio/backends/yamnet.py
new file mode 100644
index 0000000..78e6df8
--- /dev/null
+++ b/main/cc_suggester/audio/backends/yamnet.py
@@ -0,0 +1,190 @@
+"""Optional YAMNet sound event detection backend."""
+
+from __future__ import annotations
+
+import csv
+import os
+from pathlib import Path
+from typing import Any, Sequence
+
+from cc_suggester.audio.backends.base import AudioBackend
+from cc_suggester.audio.extractor import extract_audio
+from cc_suggester.audio.label_mapping import normalize_sound_label
+from cc_suggester.audio.wav import load_wav_mono_float32
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.errors import BackendUnavailableError
+from cc_suggester.core.types import AudioEventCandidate, VideoMetadata
+
+
+DEFAULT_YAMNET_HANDLE = "https://tfhub.dev/google/yamnet/1"
+YAMNET_SAMPLE_RATE = 16000
+YAMNET_FRAME_HOP_SECONDS = 0.48
+YAMNET_FRAME_DURATION_SECONDS = 0.96
+
+
+class YamnetAudioBackend(AudioBackend):
+    """Classify non-speech events using TensorFlow Hub YAMNet when installed."""
+
+    name = "yamnet"
+    requires_audio_file = True
+    requires_valid_media = True
+
+    def detect(
+        self,
+        video_path: Path,
+        metadata: VideoMetadata,
+        config: PipelineConfig,
+    ) -> list[AudioEventCandidate]:
+        tf, hub, np = _import_dependencies()
+        audio_path = _audio_path_for(video_path, config)
+        waveform = load_wav_mono_float32(audio_path, target_sample_rate=YAMNET_SAMPLE_RATE)
+        if not waveform:
+            return []
+
+        model_handle = config.yamnet_model or os.environ.get("CCS_YAMNET_MODEL") or DEFAULT_YAMNET_HANDLE
+        try:
+            model = hub.load(model_handle)
+        except Exception as exc:
+            raise BackendUnavailableError(
+                message=f"YAMNet model could not be loaded from: {model_handle}",
+                code="yamnet_model_load_failed",
+                suggestions=[
+                    "Use --audio-backend dsp for an offline CPU baseline.",
+                    "Set CCS_YAMNET_MODEL to a local TensorFlow Hub YAMNet model directory.",
+                    "Ensure internet/model cache access is available if using the default TF Hub handle.",
+                ],
+                details={"model_handle": model_handle, "error": str(exc)},
+            ) from exc
+
+        waveform_tensor = tf.convert_to_tensor(waveform, dtype=tf.float32)
+        try:
+            scores, _embeddings, _spectrogram = model(waveform_tensor)
+        except Exception as exc:
+            raise BackendUnavailableError(
+                message="YAMNet inference failed.",
+                code="yamnet_inference_failed",
+                suggestions=[
+                    "Verify the input audio is mono 16 kHz WAV or extractable video audio.",
+                    "Try --audio-backend dsp to confirm audio extraction works.",
+                ],
+                details={"error": str(exc)},
+            ) from exc
+
+        class_names = _load_class_names(model, config.yamnet_class_map_path)
+        scores_array = scores.numpy() if hasattr(scores, "numpy") else np.asarray(scores)
+        return _events_from_scores(
+            scores_array=scores_array,
+            class_names=class_names,
+            audio_path=audio_path,
+            config=config,
+        )
+
+
+def _audio_path_for(video_path: Path, config: PipelineConfig) -> Path:
+    if config.sidecar_audio_path is not None:
+        return Path(config.sidecar_audio_path)
+    if video_path.suffix.lower() == ".wav":
+        return video_path
+    run_dir = config.run_dir or config.output_dir / video_path.stem
+    return extract_audio(video_path, run_dir / "artifacts" / "audio.wav")
+
+
+def _import_dependencies():
+    try:
+        import numpy as np  # type: ignore
+        import tensorflow as tf  # type: ignore
+        import tensorflow_hub as hub  # type: ignore
+    except Exception as exc:
+        raise BackendUnavailableError(
+            message="The YAMNet backend requires TensorFlow, TensorFlow Hub, and NumPy.",
+            code="yamnet_dependencies_missing",
+            suggestions=[
+                "Install audio dependencies: pip install -r requirements-audio.txt",
+                "Use --audio-backend dsp for an offline CPU baseline.",
+                "Use --audio-backend mock for deterministic demos/tests.",
+            ],
+            details={"error": str(exc)},
+        ) from exc
+    return tf, hub, np
+
+
+def _load_class_names(model: Any, class_map_path: Path | None) -> list[str]:
+    path = class_map_path
+    if path is None and hasattr(model, "class_map_path"):
+        raw_path = model.class_map_path()
+        if hasattr(raw_path, "numpy"):
+            raw_path = raw_path.numpy()
+        if isinstance(raw_path, bytes):
+            raw_path = raw_path.decode("utf-8")
+        path = Path(str(raw_path))
+
+    if path is None:
+        return []
+
+    with Path(path).open("r", newline="", encoding="utf-8") as file_obj:
+        reader = csv.DictReader(file_obj)
+        class_names: list[str] = []
+        for row in reader:
+            class_names.append(row.get("display_name") or row.get("name") or row.get("label") or "")
+        return class_names
+
+
+def _events_from_scores(
+    *,
+    scores_array: Sequence[Sequence[float]],
+    class_names: list[str],
+    audio_path: Path,
+    config: PipelineConfig,
+) -> list[AudioEventCandidate]:
+    events: list[AudioEventCandidate] = []
+    for frame_index, frame_scores in enumerate(scores_array):
+        scored_classes = _top_scored_classes(frame_scores, class_names, top_k=config.yamnet_top_k)
+        event_scores: dict[str, tuple[float, str]] = {}
+        for class_name, score in scored_classes:
+            if score < config.audio_threshold:
+                continue
+            event_id = normalize_sound_label(class_name)
+            if event_id is None:
+                continue
+            existing = event_scores.get(event_id)
+            if existing is None or score > existing[0]:
+                event_scores[event_id] = (score, class_name)
+
+        for event_id, (score, class_name) in event_scores.items():
+            start = frame_index * YAMNET_FRAME_HOP_SECONDS
+            end = start + YAMNET_FRAME_DURATION_SECONDS
+            events.append(
+                AudioEventCandidate(
+                    event_id=event_id,
+                    label=event_id.replace("_", " ").title(),
+                    start_time=round(start, 3),
+                    end_time=round(end, 3),
+                    audio_confidence=round(float(score), 3),
+                    audio_backend="yamnet",
+                    raw_class_name=class_name,
+                    debug_info={
+                        "audio_path": str(audio_path),
+                        "frame_index": frame_index,
+                        "yamnet_frame_hop_seconds": YAMNET_FRAME_HOP_SECONDS,
+                        "yamnet_frame_duration_seconds": YAMNET_FRAME_DURATION_SECONDS,
+                    },
+                )
+            )
+    return events
+
+
+def _top_scored_classes(
+    frame_scores: Sequence[float],
+    class_names: list[str],
+    *,
+    top_k: int,
+) -> list[tuple[str, float]]:
+    indexed = sorted(enumerate(frame_scores), key=lambda item: float(item[1]), reverse=True)
+    output: list[tuple[str, float]] = []
+    for class_index, score in indexed[:top_k]:
+        if class_index < len(class_names):
+            class_name = class_names[class_index]
+        else:
+            class_name = f"class_{class_index}"
+        output.append((class_name, float(score)))
+    return output
diff --git a/main/cc_suggester/audio/dsp.py b/main/cc_suggester/audio/dsp.py
new file mode 100644
index 0000000..75e62b7
--- /dev/null
+++ b/main/cc_suggester/audio/dsp.py
@@ -0,0 +1,27 @@
+"""Placeholder DSP feature definitions for the first scaffold."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+
+@dataclass(slots=True)
+class DspFeatureSummary:
+    """Small explainability summary for future DSP extraction."""
+
+    rms_energy: float
+    spectral_flux: float
+    onset_strength: float
+
+
+def describe_planned_features() -> list[str]:
+    """Return the DSP features planned for the first real audio backend."""
+
+    return [
+        "RMS energy",
+        "short-time Fourier transform",
+        "log-mel spectrogram",
+        "spectral flux",
+        "onset strength",
+        "zero-crossing rate",
+    ]
diff --git a/main/cc_suggester/audio/events.py b/main/cc_suggester/audio/events.py
new file mode 100644
index 0000000..0e68134
--- /dev/null
+++ b/main/cc_suggester/audio/events.py
@@ -0,0 +1,33 @@
+"""Post-processing helpers for audio events."""
+
+from __future__ import annotations
+
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.types import AudioEventCandidate
+
+
+def smooth_events(
+    events: list[AudioEventCandidate],
+    config: PipelineConfig,
+) -> list[AudioEventCandidate]:
+    """Merge adjacent same-label events and remove very short events."""
+
+    filtered = [
+        event
+        for event in sorted(events, key=lambda item: (item.start_time, item.end_time))
+        if event.end_time - event.start_time >= config.min_event_duration
+    ]
+    if not filtered:
+        return []
+
+    merged: list[AudioEventCandidate] = [filtered[0]]
+    for event in filtered[1:]:
+        previous = merged[-1]
+        gap = event.start_time - previous.end_time
+        if event.event_id == previous.event_id and gap <= config.merge_gap:
+            previous.end_time = max(previous.end_time, event.end_time)
+            previous.audio_confidence = max(previous.audio_confidence, event.audio_confidence)
+            previous.debug_info["merged"] = True
+        else:
+            merged.append(event)
+    return merged
diff --git a/main/cc_suggester/audio/extractor.py b/main/cc_suggester/audio/extractor.py
new file mode 100644
index 0000000..7455428
--- /dev/null
+++ b/main/cc_suggester/audio/extractor.py
@@ -0,0 +1,53 @@
+"""Audio extraction helpers for real model backends."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+from pathlib import Path
+
+from cc_suggester.core.errors import AudioExtractionError, BackendUnavailableError
+
+
+def extract_audio(video_path: Path, output_path: Path, sample_rate: int = 16000) -> Path:
+    """Extract mono WAV audio using ffmpeg."""
+
+    ffmpeg = shutil.which("ffmpeg")
+    if ffmpeg is None:
+        raise BackendUnavailableError(
+            message="ffmpeg was not found, so audio extraction cannot run.",
+            code="ffmpeg_missing",
+            suggestions=[
+                "Install ffmpeg and ensure it is on PATH.",
+                "Run ccs doctor to verify the environment.",
+            ],
+        )
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    command = [
+        ffmpeg,
+        "-y",
+        "-i",
+        str(video_path),
+        "-ac",
+        "1",
+        "-ar",
+        str(sample_rate),
+        str(output_path),
+    ]
+    try:
+        subprocess.run(command, capture_output=True, check=True, text=True)
+    except subprocess.CalledProcessError as exc:
+        raise AudioExtractionError(
+            message="ffmpeg failed while extracting audio.",
+            code="audio_extraction_failed",
+            suggestions=[
+                "Run ccs inspect on the input file.",
+                "Try a different video container or re-encode the video.",
+                "Run ccs doctor to verify ffmpeg availability.",
+            ],
+            details={
+                "stderr": exc.stderr[-1000:] if exc.stderr else "",
+                "returncode": exc.returncode,
+            },
+        ) from exc
+    return output_path
diff --git a/main/cc_suggester/audio/label_mapping.py b/main/cc_suggester/audio/label_mapping.py
new file mode 100644
index 0000000..12f28f5
--- /dev/null
+++ b/main/cc_suggester/audio/label_mapping.py
@@ -0,0 +1,45 @@
+"""Normalize model-specific sound labels into project event IDs."""
+
+from __future__ import annotations
+
+
+LABEL_RULES: tuple[tuple[str, tuple[str, ...]], ...] = (
+    ("horn_honk", ("vehicle horn", "car horn", "honking", "horn")),
+    ("glass_break", ("glass", "shatter", "breaking")),
+    ("crowd_cheer", ("cheering", "cheer", "crowd cheering")),
+    ("applause", ("applause", "clapping")),
+    ("laughter", ("laughter", "laughing", "giggle")),
+    ("music", ("music", "song", "singing", "musical")),
+    ("alarm", ("alarm", "beep", "buzzer")),
+    ("siren", ("siren", "police car", "ambulance")),
+    ("explosion", ("explosion", "blast", "boom")),
+    ("gunshot", ("gunshot", "gunfire", "shooting")),
+    ("door_slam", ("door", "slam", "knock")),
+    ("phone_ring", ("telephone", "ringtone", "ringing", "phone")),
+    ("dog_bark", ("bark", "dog")),
+)
+
+
+def normalize_sound_label(label: str) -> str | None:
+    """Map an AudioSet/YAMNet label to an internal event ID."""
+
+    normalized = label.lower().replace("_", " ")
+    for event_id, needles in LABEL_RULES:
+        required = _required_tokens(event_id)
+        if required and all(_matches(normalized, token) for token in required):
+            return event_id
+        if any(needle in normalized for needle in needles):
+            return event_id
+    return None
+
+
+def _matches(label: str, token: str) -> bool:
+    return token in label
+
+
+def _required_tokens(event_id: str) -> tuple[str, ...]:
+    if event_id == "glass_break":
+        return ("glass",)
+    if event_id == "door_slam":
+        return ("door",)
+    return ()
diff --git a/main/cc_suggester/audio/vad.py b/main/cc_suggester/audio/vad.py
new file mode 100644
index 0000000..6cff096
--- /dev/null
+++ b/main/cc_suggester/audio/vad.py
@@ -0,0 +1,9 @@
+"""Voice activity masking placeholder."""
+
+from __future__ import annotations
+
+
+def is_speech_masking_available() -> bool:
+    """Return whether a real VAD backend has been configured."""
+
+    return False
diff --git a/main/cc_suggester/audio/wav.py b/main/cc_suggester/audio/wav.py
new file mode 100644
index 0000000..95abd5d
--- /dev/null
+++ b/main/cc_suggester/audio/wav.py
@@ -0,0 +1,89 @@
+"""Small WAV loading helpers shared by audio backends."""
+
+from __future__ import annotations
+
+import struct
+import wave
+from dataclasses import dataclass
+from pathlib import Path
+
+
+@dataclass(slots=True)
+class WavPcm:
+    """Decoded mono PCM WAV samples."""
+
+    sample_rate: int
+    sample_width: int
+    samples: list[int]
+
+
+def load_wav_mono_pcm(path: Path) -> WavPcm:
+    """Load a WAV file as mono integer PCM samples."""
+
+    with wave.open(str(path), "rb") as wav:
+        sample_rate = wav.getframerate()
+        sample_width = wav.getsampwidth()
+        channels = wav.getnchannels()
+        frames = wav.readframes(wav.getnframes())
+    samples = _decode_pcm(frames, sample_width, channels)
+    return WavPcm(sample_rate=sample_rate, sample_width=sample_width, samples=samples)
+
+
+def load_wav_mono_float32(path: Path, target_sample_rate: int = 16000) -> list[float]:
+    """Load WAV samples normalized to [-1, 1] and resampled if required."""
+
+    wav = load_wav_mono_pcm(path)
+    max_amplitude = float(2 ** (8 * wav.sample_width - 1))
+    floats = [max(-1.0, min(1.0, sample / max_amplitude)) for sample in wav.samples]
+    if wav.sample_rate != target_sample_rate:
+        floats = _resample_linear(floats, wav.sample_rate, target_sample_rate)
+    return floats
+
+
+def _decode_pcm(frames: bytes, sample_width: int, channels: int) -> list[int]:
+    if sample_width == 1:
+        values = [byte - 128 for byte in frames]
+    elif sample_width == 2:
+        count = len(frames) // 2
+        values = list(struct.unpack(f"<{count}h", frames[: count * 2]))
+    elif sample_width == 4:
+        count = len(frames) // 4
+        values = list(struct.unpack(f"<{count}i", frames[: count * 4]))
+    elif sample_width == 3:
+        values = [_decode_24bit(frames[index : index + 3]) for index in range(0, len(frames) - 2, 3)]
+    else:
+        return []
+
+    if channels <= 1:
+        return values
+
+    mono: list[int] = []
+    for index in range(0, len(values) - channels + 1, channels):
+        mono.append(int(sum(values[index : index + channels]) / channels))
+    return mono
+
+
+def _decode_24bit(chunk: bytes) -> int:
+    padded = chunk + (b"\xff" if chunk[2] & 0x80 else b"\x00")
+    return struct.unpack("<i", padded)[0]
+
+
+def _resample_linear(samples: list[float], source_rate: int, target_rate: int) -> list[float]:
+    if not samples or source_rate <= 0 or target_rate <= 0:
+        return samples
+    if source_rate == target_rate:
+        return samples
+
+    target_len = max(1, int(len(samples) * target_rate / source_rate))
+    if target_len == 1:
+        return [samples[0]]
+
+    scale = (len(samples) - 1) / (target_len - 1)
+    output: list[float] = []
+    for index in range(target_len):
+        source_pos = index * scale
+        left = int(source_pos)
+        right = min(left + 1, len(samples) - 1)
+        fraction = source_pos - left
+        output.append(samples[left] * (1.0 - fraction) + samples[right] * fraction)
+    return output
diff --git a/main/cc_suggester/cli/__init__.py b/main/cc_suggester/cli/__init__.py
new file mode 100644
index 0000000..429eb26
--- /dev/null
+++ b/main/cc_suggester/cli/__init__.py
@@ -0,0 +1 @@
+"""Command-line interface."""
diff --git a/main/cc_suggester/cli/app.py b/main/cc_suggester/cli/app.py
new file mode 100644
index 0000000..044280a
--- /dev/null
+++ b/main/cc_suggester/cli/app.py
@@ -0,0 +1,313 @@
+"""CLI entrypoint for the Intelligent CC Suggestion Tool."""
+
+from __future__ import annotations
+
+import argparse
+import difflib
+import sys
+from pathlib import Path
+from typing import Sequence
+
+from cc_suggester import __version__
+from cc_suggester.core.config import (
+    SUPPORTED_DEVICES,
+    SUPPORTED_LANGUAGES,
+    PipelineConfig,
+    load_config,
+    merge_config,
+)
+from cc_suggester.core.diagnostics import run_diagnostics
+from cc_suggester.core.errors import CCSuggesterError
+from cc_suggester.core.media import inspect_video
+from cc_suggester.core.pipeline import analyze_video, detect_audio_events, export_from_report, score_visual_reactions
+from cc_suggester.translation.glossary import supported_event_ids
+
+
+COMMANDS = ("analyze", "audio", "vision", "inspect", "doctor", "export", "labels", "web")
+
+
+class FriendlyParser(argparse.ArgumentParser):
+    """ArgumentParser that raises instead of exiting mid-flow."""
+
+    def error(self, message: str) -> None:
+        raise ValueError(message)
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    """Run the CLI."""
+
+    args = list(sys.argv[1:] if argv is None else argv)
+    if args and not args[0].startswith("-") and args[0] not in COMMANDS:
+        return _unknown_command(args[0])
+
+    parser = _build_parser()
+    try:
+        namespace = parser.parse_args(args)
+        if not hasattr(namespace, "handler"):
+            parser.print_help()
+            return 0
+        return int(namespace.handler(namespace))
+    except CCSuggesterError as exc:
+        _print_friendly_error(exc)
+        return 2
+    except ValueError as exc:
+        print(f"Command error: {exc}", file=sys.stderr)
+        print("\nTry:", file=sys.stderr)
+        print("  ccs --help", file=sys.stderr)
+        return 2
+
+
+def _build_parser() -> FriendlyParser:
+    parser = FriendlyParser(
+        prog="ccs",
+        description="Generate meaningful non-speech closed caption suggestions from video.",
+    )
+    parser.add_argument("--version", action="version", version=f"ccs {__version__}")
+    subparsers = parser.add_subparsers(dest="command", parser_class=FriendlyParser)
+
+    analyze = subparsers.add_parser("analyze", help="Run the full CC suggestion pipeline.")
+    analyze.add_argument("input", type=Path, help="Input video path.")
+    _add_pipeline_args(analyze)
+    analyze.set_defaults(handler=_handle_analyze)
+
+    audio = subparsers.add_parser("audio", help="Run only audio event detection.")
+    audio.add_argument("input", type=Path, help="Input video or WAV path.")
+    audio.add_argument("--config", type=Path, default=None, help="JSON config file.")
+    audio.add_argument("--device", default=None, choices=SUPPORTED_DEVICES, help="Device mode.")
+    audio.add_argument("--audio-backend", default=None, help="Audio backend name.")
+    audio.add_argument("--out", default=None, type=Path, help="Output root directory.")
+    audio.add_argument("--audio-threshold", default=None, type=float, help="Audio event threshold.")
+    audio.add_argument("--audio-path", default=None, type=Path, help="Optional sidecar WAV audio path.")
+    audio.add_argument("--yamnet-model", default=None, help="YAMNet TF Hub handle or local model directory.")
+    audio.add_argument("--yamnet-class-map", default=None, type=Path, help="YAMNet class map CSV path.")
+    audio.add_argument("--yamnet-top-k", default=None, type=int, help="Top YAMNet classes to inspect per frame.")
+    audio.add_argument("--allow-demo-input", action="store_true", help="Allow non-video demo files.")
+    audio.set_defaults(handler=_handle_audio)
+
+    vision = subparsers.add_parser("vision", help="Run visual reaction scoring from audio event JSON.")
+    vision.add_argument("input", type=Path, help="Input video path.")
+    vision.add_argument("audio_report", type=Path, help="audio_events.json or results.json path.")
+    vision.add_argument("--config", type=Path, default=None, help="JSON config file.")
+    vision.add_argument("--device", default=None, choices=SUPPORTED_DEVICES, help="Device mode.")
+    vision.add_argument("--vision-backend", default=None, help="Vision backend name.")
+    vision.add_argument("--out", default=None, type=Path, help="Output root directory.")
+    vision.add_argument("--allow-demo-input", action="store_true", help="Allow probe fallback for demo media.")
+    vision.set_defaults(handler=_handle_vision)
+
+    inspect = subparsers.add_parser("inspect", help="Inspect video metadata.")
+    inspect.add_argument("input", type=Path, help="Input video path.")
+    inspect.set_defaults(handler=_handle_inspect)
+
+    doctor = subparsers.add_parser("doctor", help="Check ffmpeg, Python, and CPU/GPU status.")
+    doctor.add_argument("--device", default="auto", choices=SUPPORTED_DEVICES, help="Device mode to validate.")
+    doctor.set_defaults(handler=_handle_doctor)
+
+    export = subparsers.add_parser("export", help="Export SRT from a JSON result report.")
+    export.add_argument("report", type=Path, help="Pipeline results.json file.")
+    export.add_argument("--format", default="srt", choices=("srt",), help="Export format.")
+    export.add_argument("--lang", default="en", choices=SUPPORTED_LANGUAGES, help="Caption label language.")
+    export.add_argument("--out", type=Path, default=None, help="Output SRT path.")
+    export.set_defaults(handler=_handle_export)
+
+    labels = subparsers.add_parser("labels", help="List supported languages and event label IDs.")
+    labels.set_defaults(handler=_handle_labels)
+
+    web = subparsers.add_parser("web", help="Show how to launch the planned Web UI.")
+    web.set_defaults(handler=_handle_web)
+    return parser
+
+
+def _handle_analyze(args: argparse.Namespace) -> int:
+    config = _config_from_args(args)
+    result = analyze_video(args.input, config)
+    accepted = sum(1 for item in result.suggestions if item.accepted)
+    review = sum(1 for item in result.suggestions if item.requires_review)
+    rejected = len(result.suggestions) - accepted - review
+
+    print("Analysis complete.")
+    print(f"Input: {result.input_path}")
+    print(f"Output directory: {result.output_dir}")
+    print(f"Device used: {result.diagnostics.actual_device}")
+    print(f"Events: {len(result.audio_events)} detected, {accepted} accepted, {review} review, {rejected} rejected")
+    for name, path in result.files.items():
+        print(f"{name}: {path}")
+    return 0
+
+
+def _handle_audio(args: argparse.Namespace) -> int:
+    base = load_config(args.config) if args.config else PipelineConfig()
+    config = merge_config(
+        base,
+        device=args.device,
+        audio_backend=args.audio_backend,
+        output_dir=args.out,
+        audio_threshold=args.audio_threshold,
+        sidecar_audio_path=args.audio_path,
+        yamnet_model=args.yamnet_model,
+        yamnet_class_map_path=args.yamnet_class_map,
+        yamnet_top_k=args.yamnet_top_k,
+        allow_demo_input=args.allow_demo_input or None,
+    )
+    payload = detect_audio_events(args.input, config)
+    events = payload["audio_events"]
+    files = payload.get("files", {})
+    print("Audio detection complete.")
+    print(f"Input: {payload['input_path']}")
+    print(f"Events: {len(events)}")
+    if isinstance(files, dict):
+        for name, path in files.items():
+            print(f"{name}: {path}")
+    return 0
+
+
+def _handle_vision(args: argparse.Namespace) -> int:
+    base = load_config(args.config) if args.config else PipelineConfig()
+    config = merge_config(
+        base,
+        device=args.device,
+        vision_backend=args.vision_backend,
+        output_dir=args.out,
+        allow_demo_input=args.allow_demo_input or None,
+    )
+    payload = score_visual_reactions(args.input, args.audio_report, config)
+    reactions = payload["reactions"]
+    files = payload.get("files", {})
+    print("Visual reaction scoring complete.")
+    print(f"Input: {payload['input_path']}")
+    print(f"Audio report: {payload['audio_report_path']}")
+    print(f"Reactions: {len(reactions)}")
+    if isinstance(files, dict):
+        for name, path in files.items():
+            print(f"{name}: {path}")
+    return 0
+
+
+def _handle_inspect(args: argparse.Namespace) -> int:
+    metadata = inspect_video(args.input)
+    print(f"Path: {metadata.path}")
+    print(f"Exists: {metadata.exists}")
+    print(f"Size: {metadata.size_bytes}")
+    print(f"Container: {metadata.container}")
+    print(f"Duration: {metadata.duration}")
+    print(f"FPS: {metadata.fps}")
+    print(f"Resolution: {_format_resolution(metadata.width, metadata.height)}")
+    print(f"Has audio: {metadata.has_audio}")
+    if metadata.probe_error:
+        print(f"Probe warning: {metadata.probe_error}")
+    return 0
+
+
+def _handle_doctor(args: argparse.Namespace) -> int:
+    config = PipelineConfig(device=args.device)
+    diagnostics = run_diagnostics(config)
+    print("Environment diagnostics")
+    print(f"Python: {diagnostics.python_version}")
+    print(f"ffmpeg: {diagnostics.ffmpeg_path or 'not found'}")
+    print(f"ffprobe: {diagnostics.ffprobe_path or 'not found'}")
+    print(f"Torch available: {diagnostics.torch_available}")
+    print(f"CUDA available: {diagnostics.cuda_available}")
+    print(f"Selected device: {diagnostics.selected_device}")
+    print(f"Actual device: {diagnostics.actual_device}")
+    print(f"GPU: {diagnostics.gpu_name or 'none'}")
+    if diagnostics.fallback_reason:
+        print(f"Fallback: {diagnostics.fallback_reason}")
+    for warning in diagnostics.warnings:
+        print(f"Warning: {warning}")
+    return 0
+
+
+def _handle_export(args: argparse.Namespace) -> int:
+    output_path = args.out or args.report.with_name(f"captions.{args.lang}.srt")
+    written = export_from_report(args.report, output_path, args.lang)
+    print(f"Exported {args.format.upper()}: {written}")
+    return 0
+
+
+def _handle_labels(args: argparse.Namespace) -> int:
+    print("Supported languages:")
+    print("  " + ", ".join(SUPPORTED_LANGUAGES))
+    print("Supported event IDs:")
+    for event_id in supported_event_ids():
+        print(f"  {event_id}")
+    return 0
+
+
+def _handle_web(args: argparse.Namespace) -> int:
+    app_path = Path(__file__).resolve().parents[1] / "ui" / "streamlit_app.py"
+    mockup_path = Path(__file__).resolve().parents[3] / "mockups" / "web-ui.html"
+    print("The planned Web UI will use the same core pipeline modules as the CLI.")
+    print("Run:")
+    print(f"  streamlit run {app_path}")
+    print("\nInteractive HTML mockup:")
+    print(f"  {mockup_path}")
+    return 0
+
+
+def _unknown_command(command: str) -> int:
+    suggestion = difflib.get_close_matches(command, COMMANDS, n=1)
+    print(f"No such command: {command}", file=sys.stderr)
+    if suggestion:
+        print(f"Did you mean: {suggestion[0]}?", file=sys.stderr)
+    print("\nTry:", file=sys.stderr)
+    print("  ccs analyze input.mp4 --device auto --lang hi", file=sys.stderr)
+    print("  ccs doctor", file=sys.stderr)
+    return 2
+
+
+def _print_friendly_error(error: CCSuggesterError) -> None:
+    print(error.message, file=sys.stderr)
+    if error.suggestions:
+        print("\nSuggestions:", file=sys.stderr)
+        for index, suggestion in enumerate(error.suggestions, start=1):
+            print(f"{index}. {suggestion}", file=sys.stderr)
+    if error.details:
+        print("\nDetails:", file=sys.stderr)
+        for key, value in error.details.items():
+            print(f"- {key}: {value}", file=sys.stderr)
+
+
+def _format_resolution(width: int | None, height: int | None) -> str:
+    if width is None or height is None:
+        return "unknown"
+    return f"{width} x {height}"
+
+
+def _add_pipeline_args(parser: argparse.ArgumentParser) -> None:
+    parser.add_argument("--config", type=Path, default=None, help="JSON config file.")
+    parser.add_argument("--lang", default=None, choices=SUPPORTED_LANGUAGES, help="Caption label language.")
+    parser.add_argument("--device", default=None, choices=SUPPORTED_DEVICES, help="Device mode.")
+    parser.add_argument("--audio-backend", default=None, help="Audio backend name.")
+    parser.add_argument("--vision-backend", default=None, help="Vision backend name.")
+    parser.add_argument("--out", default=None, type=Path, help="Output root directory.")
+    parser.add_argument("--audio-threshold", default=None, type=float, help="Audio event threshold.")
+    parser.add_argument("--audio-path", default=None, type=Path, help="Optional sidecar WAV audio path.")
+    parser.add_argument("--yamnet-model", default=None, help="YAMNet TF Hub handle or local model directory.")
+    parser.add_argument("--yamnet-class-map", default=None, type=Path, help="YAMNet class map CSV path.")
+    parser.add_argument("--yamnet-top-k", default=None, type=int, help="Top YAMNet classes to inspect per frame.")
+    parser.add_argument("--decision-threshold", default=None, type=float, help="Accept threshold.")
+    parser.add_argument("--review-threshold", default=None, type=float, help="Review threshold.")
+    parser.add_argument("--allow-demo-input", action="store_true", help="Allow non-video demo files.")
+
+
+def _config_from_args(args: argparse.Namespace) -> PipelineConfig:
+    base = load_config(args.config) if args.config else PipelineConfig()
+    return merge_config(
+        base,
+        language=args.lang,
+        device=args.device,
+        audio_backend=args.audio_backend,
+        vision_backend=args.vision_backend,
+        output_dir=args.out,
+        audio_threshold=args.audio_threshold,
+        sidecar_audio_path=args.audio_path,
+        yamnet_model=args.yamnet_model,
+        yamnet_class_map_path=args.yamnet_class_map,
+        yamnet_top_k=args.yamnet_top_k,
+        decision_threshold=args.decision_threshold,
+        review_threshold=args.review_threshold,
+        allow_demo_input=args.allow_demo_input or None,
+    )
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/main/cc_suggester/core/__init__.py b/main/cc_suggester/core/__init__.py
new file mode 100644
index 0000000..ede8db5
--- /dev/null
+++ b/main/cc_suggester/core/__init__.py
@@ -0,0 +1 @@
+"""Core pipeline contracts and orchestration."""
diff --git a/main/cc_suggester/core/config.py b/main/cc_suggester/core/config.py
new file mode 100644
index 0000000..2a908fd
--- /dev/null
+++ b/main/cc_suggester/core/config.py
@@ -0,0 +1,80 @@
+"""Configuration for pipeline runs."""
+
+from __future__ import annotations
+
+from dataclasses import asdict, dataclass
+import json
+from pathlib import Path
+from typing import Any
+
+
+SUPPORTED_LANGUAGES = ("en", "hi", "ta", "te", "bn", "mr", "ml")
+SUPPORTED_DEVICES = ("auto", "cpu", "cuda")
+
+
+@dataclass(slots=True)
+class PipelineConfig:
+    """Runtime configuration shared by CLI, UI, and future integrations."""
+
+    language: str = "en"
+    device: str = "auto"
+    audio_backend: str = "mock"
+    vision_backend: str = "mock"
+    output_dir: Path = Path("outputs")
+    sidecar_audio_path: Path | None = None
+    yamnet_model: str | None = None
+    yamnet_class_map_path: Path | None = None
+    yamnet_top_k: int = 5
+    audio_threshold: float = 0.45
+    reaction_threshold: float = 0.35
+    decision_threshold: float = 0.65
+    review_threshold: float = 0.50
+    min_event_duration: float = 0.25
+    merge_gap: float = 0.40
+    sample_window_before: float = 1.0
+    sample_window_after: float = 1.0
+    write_rejected_to_reports: bool = True
+    allow_demo_input: bool = False
+    run_dir: Path | None = None
+
+    def __post_init__(self) -> None:
+        if self.language not in SUPPORTED_LANGUAGES:
+            supported = ", ".join(SUPPORTED_LANGUAGES)
+            raise ValueError(f"Unsupported language '{self.language}'. Supported: {supported}")
+        if self.device not in SUPPORTED_DEVICES:
+            supported = ", ".join(SUPPORTED_DEVICES)
+            raise ValueError(f"Unsupported device '{self.device}'. Supported: {supported}")
+        self.output_dir = Path(self.output_dir)
+        if self.run_dir is not None:
+            self.run_dir = Path(self.run_dir)
+        if self.sidecar_audio_path is not None:
+            self.sidecar_audio_path = Path(self.sidecar_audio_path)
+        if self.yamnet_class_map_path is not None:
+            self.yamnet_class_map_path = Path(self.yamnet_class_map_path)
+
+    def to_dict(self) -> dict[str, Any]:
+        data = asdict(self)
+        data["output_dir"] = str(self.output_dir)
+        data["run_dir"] = str(self.run_dir) if self.run_dir else None
+        data["sidecar_audio_path"] = str(self.sidecar_audio_path) if self.sidecar_audio_path else None
+        data["yamnet_class_map_path"] = str(self.yamnet_class_map_path) if self.yamnet_class_map_path else None
+        return data
+
+
+def load_config(path: Path) -> PipelineConfig:
+    """Load a JSON config file."""
+
+    path = Path(path)
+    payload = json.loads(path.read_text(encoding="utf-8"))
+    return PipelineConfig(**payload)
+
+
+def merge_config(base: PipelineConfig, **overrides: Any) -> PipelineConfig:
+    """Return a config with non-None overrides applied."""
+
+    data = base.to_dict()
+    data.pop("run_dir", None)
+    for key, value in overrides.items():
+        if value is not None:
+            data[key] = value
+    return PipelineConfig(**data)
diff --git a/main/cc_suggester/core/diagnostics.py b/main/cc_suggester/core/diagnostics.py
new file mode 100644
index 0000000..4cd6f1d
--- /dev/null
+++ b/main/cc_suggester/core/diagnostics.py
@@ -0,0 +1,80 @@
+"""Environment and device diagnostics."""
+
+from __future__ import annotations
+
+import platform
+import shutil
+from typing import Any
+
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.errors import DeviceUnavailableError
+from cc_suggester.core.types import DiagnosticsReport
+
+
+def _torch_status() -> tuple[bool, bool, str | None]:
+    try:
+        import torch  # type: ignore
+    except Exception:
+        return False, False, None
+
+    cuda_available = bool(torch.cuda.is_available())
+    gpu_name = None
+    if cuda_available:
+        try:
+            gpu_name = str(torch.cuda.get_device_name(0))
+        except Exception:
+            gpu_name = "CUDA device"
+    return True, cuda_available, gpu_name
+
+
+def run_diagnostics(config: PipelineConfig) -> DiagnosticsReport:
+    """Collect environment details and resolve the actual processing device."""
+
+    torch_available, cuda_available, gpu_name = _torch_status()
+    ffmpeg_path = shutil.which("ffmpeg")
+    ffprobe_path = shutil.which("ffprobe")
+    warnings: list[str] = []
+
+    if ffmpeg_path is None:
+        warnings.append("ffmpeg was not found; real video/audio extraction will fail.")
+    if ffprobe_path is None:
+        warnings.append("ffprobe was not found; metadata inspection will be limited.")
+
+    actual_device = "cpu"
+    fallback_reason = None
+    if config.device == "cuda":
+        if not cuda_available:
+            details: dict[str, Any] = {
+                "torch_available": torch_available,
+                "cuda_available": cuda_available,
+                "gpu_name": gpu_name,
+                "ffmpeg_path": ffmpeg_path,
+            }
+            raise DeviceUnavailableError(
+                message="CUDA was requested, but no usable GPU was detected.",
+                code="cuda_unavailable",
+                suggestions=[
+                    "Retry with --device cpu.",
+                    "Run ccs doctor to inspect the environment.",
+                    "Install a CUDA-compatible PyTorch build if GPU acceleration is required.",
+                ],
+                details=details,
+            )
+        actual_device = "cuda"
+    elif config.device == "auto" and cuda_available:
+        actual_device = "cuda"
+    elif config.device == "auto":
+        fallback_reason = "CUDA was not detected; using CPU."
+
+    return DiagnosticsReport(
+        python_version=platform.python_version(),
+        ffmpeg_path=ffmpeg_path,
+        ffprobe_path=ffprobe_path,
+        selected_device=config.device,
+        actual_device=actual_device,
+        cuda_available=cuda_available,
+        gpu_name=gpu_name,
+        torch_available=torch_available,
+        fallback_reason=fallback_reason,
+        warnings=warnings,
+    )
diff --git a/main/cc_suggester/core/errors.py b/main/cc_suggester/core/errors.py
new file mode 100644
index 0000000..c0fb55e
--- /dev/null
+++ b/main/cc_suggester/core/errors.py
@@ -0,0 +1,38 @@
+"""Friendly error types surfaced by CLI and UI clients."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+
+
+@dataclass(slots=True)
+class CCSuggesterError(Exception):
+    """Base exception with user-facing suggestions."""
+
+    message: str
+    code: str = "ccs_error"
+    suggestions: list[str] = field(default_factory=list)
+    details: dict[str, object] = field(default_factory=dict)
+
+    def __str__(self) -> str:
+        return self.message
+
+
+class InputNotFoundError(CCSuggesterError):
+    """Raised when the requested input video does not exist."""
+
+
+class InvalidMediaError(CCSuggesterError):
+    """Raised when a file cannot be processed as required media."""
+
+
+class AudioExtractionError(CCSuggesterError):
+    """Raised when ffmpeg audio extraction fails."""
+
+
+class DeviceUnavailableError(CCSuggesterError):
+    """Raised when a required device, such as CUDA, is unavailable."""
+
+
+class BackendUnavailableError(CCSuggesterError):
+    """Raised when a requested model backend is not installed or registered."""
diff --git a/main/cc_suggester/core/media.py b/main/cc_suggester/core/media.py
new file mode 100644
index 0000000..64d3d83
--- /dev/null
+++ b/main/cc_suggester/core/media.py
@@ -0,0 +1,179 @@
+"""Video metadata inspection utilities."""
+
+from __future__ import annotations
+
+import json
+import shutil
+import subprocess
+from pathlib import Path
+
+from cc_suggester.core.errors import InvalidMediaError
+from cc_suggester.core.types import VideoMetadata
+
+
+VIDEO_EXTENSIONS = {".mp4", ".mkv", ".mov", ".avi", ".webm", ".m4v"}
+AUDIO_EXTENSIONS = {".wav", ".mp3", ".flac", ".aac", ".m4a", ".ogg"}
+
+
+def inspect_video(path: Path) -> VideoMetadata:
+    """Inspect a video using ffprobe when available, with a safe fallback."""
+
+    path = Path(path)
+    exists = path.exists()
+    metadata = VideoMetadata(
+        path=path,
+        exists=exists,
+        size_bytes=path.stat().st_size if exists else None,
+        container=path.suffix.lstrip(".").lower() or None,
+    )
+    if not exists:
+        return metadata
+
+    ffprobe = shutil.which("ffprobe")
+    if ffprobe is None:
+        metadata.probe_error = "ffprobe not found"
+        _inspect_with_opencv(metadata)
+        return metadata
+
+    command = [
+        ffprobe,
+        "-v",
+        "error",
+        "-print_format",
+        "json",
+        "-show_format",
+        "-show_streams",
+        str(path),
+    ]
+    try:
+        completed = subprocess.run(command, capture_output=True, check=True, text=True)
+        payload = json.loads(completed.stdout or "{}")
+    except Exception as exc:
+        metadata.probe_error = str(exc)
+        return metadata
+
+    fmt = payload.get("format", {})
+    try:
+        metadata.duration = float(fmt["duration"]) if "duration" in fmt else None
+    except (TypeError, ValueError):
+        metadata.duration = None
+
+    has_video = False
+    has_audio = False
+    for stream in payload.get("streams", []):
+        if stream.get("codec_type") == "audio":
+            has_audio = True
+            metadata.audio_codec = stream.get("codec_name")
+            sample_rate = stream.get("sample_rate")
+            try:
+                metadata.audio_sample_rate = int(sample_rate) if sample_rate else None
+            except (TypeError, ValueError):
+                metadata.audio_sample_rate = None
+            metadata.audio_channels = stream.get("channels")
+        if stream.get("codec_type") == "video":
+            has_video = True
+            metadata.video_codec = stream.get("codec_name")
+            metadata.width = stream.get("width")
+            metadata.height = stream.get("height")
+            rate = stream.get("avg_frame_rate") or stream.get("r_frame_rate")
+            metadata.fps = _parse_fraction(rate)
+    metadata.has_audio = has_audio
+    metadata.has_video = has_video
+    return metadata
+
+
+def _inspect_with_opencv(metadata: VideoMetadata) -> None:
+    if metadata.path.suffix.lower() not in VIDEO_EXTENSIONS:
+        return
+    try:
+        import cv2  # type: ignore
+    except Exception:
+        return
+
+    capture = cv2.VideoCapture(str(metadata.path))
+    if not capture.isOpened():
+        return
+    try:
+        metadata.has_video = True
+        metadata.width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) or None
+        metadata.height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) or None
+        fps = float(capture.get(cv2.CAP_PROP_FPS) or 0)
+        frame_count = float(capture.get(cv2.CAP_PROP_FRAME_COUNT) or 0)
+        metadata.fps = fps or None
+        metadata.duration = frame_count / fps if fps > 0 and frame_count > 0 else None
+        metadata.video_codec = "opencv-readable"
+    finally:
+        capture.release()
+
+
+def validate_media(
+    metadata: VideoMetadata,
+    *,
+    require_video: bool = True,
+    require_audio: bool = True,
+    allow_probe_failure: bool = False,
+) -> None:
+    """Validate input media for real processing backends."""
+
+    if not metadata.exists:
+        raise InvalidMediaError(
+            message=f"Input file was not found: {metadata.path}",
+            code="input_not_found",
+            suggestions=["Check the path and run ccs inspect /path/to/video.mp4."],
+        )
+
+    suffix = metadata.path.suffix.lower()
+    if require_video and suffix not in VIDEO_EXTENSIONS:
+        raise InvalidMediaError(
+            message=f"Input does not look like a supported video file: {metadata.path}",
+            code="unsupported_video_type",
+            suggestions=[
+                "Use MP4, MKV, MOV, AVI, WEBM, or M4V video input.",
+                "For demo-only testing, run with --allow-demo-input and mock backends.",
+            ],
+            details={"suffix": suffix or "none"},
+        )
+
+    if metadata.probe_error and not allow_probe_failure and metadata.has_video is not True:
+        raise InvalidMediaError(
+            message="Video metadata could not be probed.",
+            code="probe_failed",
+            suggestions=[
+                "Install ffprobe/ffmpeg and ensure they are on PATH.",
+                "Run ccs doctor to inspect the environment.",
+            ],
+            details={"probe_error": metadata.probe_error},
+        )
+
+    if require_video and metadata.has_video is False:
+        raise InvalidMediaError(
+            message="No video stream was found in the input file.",
+            code="missing_video_stream",
+            suggestions=["Use a video file that contains a valid video stream."],
+        )
+
+    if require_audio and metadata.has_audio is False:
+        raise InvalidMediaError(
+            message="No audio stream was found in the input file.",
+            code="missing_audio_stream",
+            suggestions=[
+                "Use a video file that contains audio.",
+                "Run ccs inspect /path/to/video.mp4 to confirm stream details.",
+            ],
+        )
+
+
+def _parse_fraction(value: str | None) -> float | None:
+    if not value:
+        return None
+    try:
+        numerator, denominator = value.split("/", maxsplit=1)
+        denominator_float = float(denominator)
+        if denominator_float == 0:
+            return None
+        return float(numerator) / denominator_float
+    except Exception:
+        try:
+            return float(value)
+        except Exception:
+            return None
diff --git a/main/cc_suggester/core/pipeline.py b/main/cc_suggester/core/pipeline.py
new file mode 100644
index 0000000..748483f
--- /dev/null
+++ b/main/cc_suggester/core/pipeline.py
@@ -0,0 +1,294 @@
+"""Pipeline orchestration."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+from cc_suggester.audio.backends import get_audio_backend
+from cc_suggester.audio.events import smooth_events
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.diagnostics import run_diagnostics
+from cc_suggester.core.errors import BackendUnavailableError, InputNotFoundError
+from cc_suggester.core.media import AUDIO_EXTENSIONS, inspect_video, validate_media
+from cc_suggester.core.types import AudioEventCandidate, PipelineResult
+from cc_suggester.decision.scorer import decide_captions
+from cc_suggester.output.csv_report import write_csv_report
+from cc_suggester.output.json_report import write_json_report
+from cc_suggester.output.srt import write_srt
+from cc_suggester.vision.backends import get_vision_backend
+
+
+def analyze_video(video_path: Path, config: PipelineConfig) -> PipelineResult:
+    """Run the full caption suggestion pipeline."""
+
+    video_path = Path(video_path)
+    if not video_path.exists():
+        raise InputNotFoundError(
+            message=f"Input file was not found: {video_path}",
+            code="input_not_found",
+            suggestions=[
+                "Check the path and filename.",
+                "Run ccs inspect /path/to/video.mp4 to validate a video file.",
+            ],
+            details={"input_path": str(video_path)},
+        )
+
+    metadata = inspect_video(video_path)
+    diagnostics = run_diagnostics(config)
+    run_dir = _run_dir(config.output_dir, video_path)
+    config.run_dir = run_dir
+
+    try:
+        audio_backend = get_audio_backend(config.audio_backend)
+        vision_backend = get_vision_backend(config.vision_backend)
+    except ValueError as exc:
+        raise BackendUnavailableError(
+            message=str(exc),
+            code="backend_unavailable",
+            suggestions=[
+                "Use --audio-backend mock and --vision-backend mock for the first scaffold.",
+                "Install the optional backend dependencies before selecting advanced backends.",
+            ],
+        ) from exc
+
+    _validate_sidecar_audio(config)
+    if audio_backend.requires_valid_media or vision_backend.requires_valid_media:
+        is_audio_only_input = video_path.suffix.lower() in AUDIO_EXTENSIONS
+        require_audio = audio_backend.requires_audio_file and config.sidecar_audio_path is None
+        validate_media(
+            metadata,
+            require_video=vision_backend.requires_valid_media or not is_audio_only_input,
+            require_audio=require_audio,
+            allow_probe_failure=config.allow_demo_input or is_audio_only_input,
+        )
+
+    audio_events = smooth_events(audio_backend.detect(video_path, metadata, config), config)
+    reactions = vision_backend.analyze(video_path, metadata, audio_events, config)
+    suggestions = decide_captions(audio_events, reactions, config)
+
+    result = PipelineResult(
+        input_path=video_path,
+        output_dir=run_dir,
+        metadata=metadata,
+        diagnostics=diagnostics,
+        audio_events=audio_events,
+        reactions=reactions,
+        suggestions=suggestions,
+        artifacts=_collect_artifacts(run_dir, config),
+    )
+    result.files = _write_outputs(result, config)
+    return result
+
+
+def detect_audio_events(video_path: Path, config: PipelineConfig) -> dict[str, object]:
+    """Run only the audio detection stage and write an audio JSON report."""
+
+    video_path = Path(video_path)
+    if not video_path.exists():
+        raise InputNotFoundError(
+            message=f"Input file was not found: {video_path}",
+            code="input_not_found",
+            suggestions=["Check the path and run ccs inspect /path/to/video.mp4."],
+        )
+
+    metadata = inspect_video(video_path)
+    diagnostics = run_diagnostics(config)
+    run_dir = _run_dir(config.output_dir, video_path)
+    config.run_dir = run_dir
+
+    try:
+        audio_backend = get_audio_backend(config.audio_backend)
+    except ValueError as exc:
+        raise BackendUnavailableError(
+            message=str(exc),
+            code="backend_unavailable",
+            suggestions=["Use --audio-backend mock or --audio-backend dsp."],
+        ) from exc
+
+    _validate_sidecar_audio(config)
+    if audio_backend.requires_valid_media:
+        is_audio_only_input = video_path.suffix.lower() in AUDIO_EXTENSIONS
+        require_audio = audio_backend.requires_audio_file and config.sidecar_audio_path is None
+        validate_media(
+            metadata,
+            require_video=not is_audio_only_input,
+            require_audio=require_audio,
+            allow_probe_failure=config.allow_demo_input or is_audio_only_input,
+        )
+
+    events = smooth_events(audio_backend.detect(video_path, metadata, config), config)
+    payload: dict[str, object] = {
+        "input_path": str(video_path),
+        "output_dir": str(run_dir),
+        "metadata": metadata.to_dict(),
+        "diagnostics": diagnostics.to_dict(),
+        "audio_events": [event.to_dict() for event in events],
+        "artifacts": {name: str(path) for name, path in _collect_artifacts(run_dir, config).items()},
+    }
+    run_dir.mkdir(parents=True, exist_ok=True)
+    report_path = write_json_report(payload, run_dir / "audio_events.json")
+    payload["files"] = {"audio_json": str(report_path)}
+    return payload
+
+
+def score_visual_reactions(
+    video_path: Path,
+    audio_report_path: Path,
+    config: PipelineConfig,
+) -> dict[str, object]:
+    """Run only visual reaction scoring from an existing audio event report."""
+
+    video_path = Path(video_path)
+    audio_report_path = Path(audio_report_path)
+    if not video_path.exists():
+        raise InputNotFoundError(
+            message=f"Input file was not found: {video_path}",
+            code="input_not_found",
+            suggestions=["Check the path and run ccs inspect /path/to/video.mp4."],
+        )
+    if not audio_report_path.exists():
+        raise InputNotFoundError(
+            message=f"Audio event report was not found: {audio_report_path}",
+            code="audio_report_not_found",
+            suggestions=[
+                "Run ccs audio first to generate audio_events.json.",
+                "Pass a valid path to a pipeline results.json file or audio_events.json file.",
+            ],
+        )
+
+    metadata = inspect_video(video_path)
+    diagnostics = run_diagnostics(config)
+    run_dir = _run_dir(config.output_dir, video_path)
+    config.run_dir = run_dir
+
+    try:
+        vision_backend = get_vision_backend(config.vision_backend)
+    except ValueError as exc:
+        raise BackendUnavailableError(
+            message=str(exc),
+            code="backend_unavailable",
+            suggestions=["Use --vision-backend mock or --vision-backend opencv."],
+        ) from exc
+
+    if vision_backend.requires_valid_media:
+        validate_media(
+            metadata,
+            require_video=True,
+            require_audio=False,
+            allow_probe_failure=config.allow_demo_input,
+        )
+
+    audio_events = _load_audio_events(audio_report_path)
+    reactions = vision_backend.analyze(video_path, metadata, audio_events, config)
+    payload: dict[str, object] = {
+        "input_path": str(video_path),
+        "audio_report_path": str(audio_report_path),
+        "output_dir": str(run_dir),
+        "metadata": metadata.to_dict(),
+        "diagnostics": diagnostics.to_dict(),
+        "audio_events": [event.to_dict() for event in audio_events],
+        "reactions": [reaction.to_dict() for reaction in reactions],
+    }
+    run_dir.mkdir(parents=True, exist_ok=True)
+    report_path = write_json_report(payload, run_dir / "vision_reactions.json")
+    payload["files"] = {"vision_json": str(report_path)}
+    return payload
+
+
+def export_from_report(report_path: Path, output_path: Path, language: str) -> Path:
+    """Export SRT from a JSON report produced by the pipeline."""
+
+    import json
+
+    from cc_suggester.core.types import CaptionSuggestion
+    from cc_suggester.decision.labels import caption_for
+
+    payload = json.loads(Path(report_path).read_text(encoding="utf-8"))
+    suggestions: list[CaptionSuggestion] = []
+    for item in payload.get("suggestions", []):
+        suggestions.append(
+            CaptionSuggestion(
+                event_id=item["event_id"],
+                start_time=float(item["start_time"]),
+                end_time=float(item["end_time"]),
+                audio_confidence=float(item["audio_confidence"]),
+                reaction_confidence=float(item["reaction_confidence"]),
+                decision_score=float(item["decision_score"]),
+                accepted=bool(item["accepted"]),
+                reason=str(item["reason"]),
+                caption_text=caption_for(str(item["event_id"]), language),
+                language=language,
+                requires_review=bool(item.get("requires_review", False)),
+                debug_info=item.get("debug_info", {}),
+            )
+        )
+    return write_srt(suggestions, output_path)
+
+
+def _load_audio_events(report_path: Path) -> list[AudioEventCandidate]:
+    payload = json.loads(Path(report_path).read_text(encoding="utf-8"))
+    raw_events = payload.get("audio_events", [])
+    events: list[AudioEventCandidate] = []
+    for item in raw_events:
+        events.append(
+            AudioEventCandidate(
+                event_id=str(item["event_id"]),
+                label=str(item.get("label") or item["event_id"]),
+                start_time=float(item["start_time"]),
+                end_time=float(item["end_time"]),
+                audio_confidence=float(item["audio_confidence"]),
+                audio_backend=str(item.get("audio_backend") or "unknown"),
+                raw_class_name=item.get("raw_class_name"),
+                debug_info=item.get("debug_info", {}),
+            )
+        )
+    return events
+
+
+def _write_outputs(result: PipelineResult, config: PipelineConfig) -> dict[str, Path]:
+    result.output_dir.mkdir(parents=True, exist_ok=True)
+    files = {
+        "srt": result.output_dir / f"captions.{config.language}.srt",
+        "json": result.output_dir / "results.json",
+        "csv": result.output_dir / "events.csv",
+        "diagnostics": result.output_dir / "diagnostics.json",
+        "config": result.output_dir / "config.json",
+    }
+    result.files = files
+    write_srt(result.suggestions, files["srt"])
+    write_csv_report(result.suggestions, files["csv"])
+    write_json_report(result.to_dict(), files["json"])
+    write_json_report(result.diagnostics.to_dict(), files["diagnostics"])
+    write_json_report(config.to_dict(), files["config"])
+    return files
+
+
+def _run_dir(output_dir: Path, video_path: Path) -> Path:
+    stem = video_path.stem or "video"
+    safe_stem = "".join(char if char.isalnum() or char in {"-", "_"} else "-" for char in stem)
+    return Path(output_dir) / safe_stem
+
+
+def _collect_artifacts(run_dir: Path, config: PipelineConfig) -> dict[str, Path]:
+    artifacts = {
+        "audio_wav": run_dir / "artifacts" / "audio.wav",
+    }
+    if config.sidecar_audio_path is not None:
+        artifacts["audio_wav"] = Path(config.sidecar_audio_path)
+    return {name: path for name, path in artifacts.items() if path.exists()}
+
+
+def _validate_sidecar_audio(config: PipelineConfig) -> None:
+    if config.sidecar_audio_path is None:
+        return
+    if not Path(config.sidecar_audio_path).exists():
+        raise InputNotFoundError(
+            message=f"Sidecar audio file was not found: {config.sidecar_audio_path}",
+            code="sidecar_audio_not_found",
+            suggestions=[
+                "Check the --audio-path value.",
+                "Generate sample media with python scripts/generate_sample_video.py.",
+            ],
+            details={"audio_path": str(config.sidecar_audio_path)},
+        )
diff --git a/main/cc_suggester/core/types.py b/main/cc_suggester/core/types.py
new file mode 100644
index 0000000..35149e7
--- /dev/null
+++ b/main/cc_suggester/core/types.py
@@ -0,0 +1,138 @@
+"""Shared data models passed between pipeline modules."""
+
+from __future__ import annotations
+
+from dataclasses import asdict, dataclass, field
+from pathlib import Path
+from typing import Any
+
+
+JsonDict = dict[str, Any]
+
+
+@dataclass(slots=True)
+class VideoMetadata:
+    """Basic metadata discovered from an input video."""
+
+    path: Path
+    exists: bool
+    size_bytes: int | None = None
+    duration: float | None = None
+    fps: float | None = None
+    width: int | None = None
+    height: int | None = None
+    has_audio: bool | None = None
+    has_video: bool | None = None
+    audio_codec: str | None = None
+    video_codec: str | None = None
+    audio_sample_rate: int | None = None
+    audio_channels: int | None = None
+    container: str | None = None
+    probe_error: str | None = None
+
+    def to_dict(self) -> JsonDict:
+        data = asdict(self)
+        data["path"] = str(self.path)
+        return data
+
+
+@dataclass(slots=True)
+class DiagnosticsReport:
+    """Environment and device diagnostics for a run."""
+
+    python_version: str
+    ffmpeg_path: str | None
+    ffprobe_path: str | None
+    selected_device: str
+    actual_device: str
+    cuda_available: bool
+    gpu_name: str | None = None
+    torch_available: bool = False
+    fallback_reason: str | None = None
+    warnings: list[str] = field(default_factory=list)
+
+    def to_dict(self) -> JsonDict:
+        return asdict(self)
+
+
+@dataclass(slots=True)
+class AudioEventCandidate:
+    """Detected sound event before visual reaction analysis."""
+
+    event_id: str
+    label: str
+    start_time: float
+    end_time: float
+    audio_confidence: float
+    audio_backend: str
+    raw_class_name: str | None = None
+    debug_info: JsonDict = field(default_factory=dict)
+
+    def to_dict(self) -> JsonDict:
+        return asdict(self)
+
+
+@dataclass(slots=True)
+class ReactionResult:
+    """Visual reaction evidence around an audio event."""
+
+    event_id: str
+    start_time: float
+    end_time: float
+    reaction_confidence: float
+    reaction_signals: JsonDict = field(default_factory=dict)
+    frames_sampled: int = 0
+    vision_backend: str = "mock"
+    debug_info: JsonDict = field(default_factory=dict)
+
+    def to_dict(self) -> JsonDict:
+        return asdict(self)
+
+
+@dataclass(slots=True)
+class CaptionSuggestion:
+    """Final caption decision for a candidate event."""
+
+    event_id: str
+    start_time: float
+    end_time: float
+    audio_confidence: float
+    reaction_confidence: float
+    decision_score: float
+    accepted: bool
+    reason: str
+    caption_text: str
+    language: str
+    requires_review: bool = False
+    debug_info: JsonDict = field(default_factory=dict)
+
+    def to_dict(self) -> JsonDict:
+        return asdict(self)
+
+
+@dataclass(slots=True)
+class PipelineResult:
+    """Complete result returned by the pipeline."""
+
+    input_path: Path
+    output_dir: Path
+    metadata: VideoMetadata
+    diagnostics: DiagnosticsReport
+    audio_events: list[AudioEventCandidate]
+    reactions: list[ReactionResult]
+    suggestions: list[CaptionSuggestion]
+    files: dict[str, Path] = field(default_factory=dict)
+    artifacts: dict[str, Path] = field(default_factory=dict)
+
+    def to_dict(self) -> JsonDict:
+        return {
+            "input_path": str(self.input_path),
+            "output_dir": str(self.output_dir),
+            "metadata": self.metadata.to_dict(),
+            "diagnostics": self.diagnostics.to_dict(),
+            "audio_events": [event.to_dict() for event in self.audio_events],
+            "reactions": [reaction.to_dict() for reaction in self.reactions],
+            "suggestions": [suggestion.to_dict() for suggestion in self.suggestions],
+            "files": {name: str(path) for name, path in self.files.items()},
+            "artifacts": {name: str(path) for name, path in self.artifacts.items()},
+        }
diff --git a/main/cc_suggester/decision/__init__.py b/main/cc_suggester/decision/__init__.py
new file mode 100644
index 0000000..9208dab
--- /dev/null
+++ b/main/cc_suggester/decision/__init__.py
@@ -0,0 +1 @@
+"""Caption decision engine."""
diff --git a/main/cc_suggester/decision/labels.py b/main/cc_suggester/decision/labels.py
new file mode 100644
index 0000000..8f16ec9
--- /dev/null
+++ b/main/cc_suggester/decision/labels.py
@@ -0,0 +1,198 @@
+"""Caption label glossary."""
+
+from __future__ import annotations
+
+
+LABELS: dict[str, dict[str, str]] = {
+    "children_cheer": {
+        "en": "[children cheering]",
+        "hi": "[बच्चे उत्साह से चिल्लाते हैं]",
+        "ta": "[குழந்தைகள் ஆரவாரம் செய்கின்றனர்]",
+        "te": "[పిల్లలు ఆనందంగా కేకలు వేస్తున్నారు]",
+        "bn": "[শিশুরা উল্লাস করছে]",
+        "mr": "[मुले आनंदाने ओरडत आहेत]",
+        "ml": "[കുട്ടികൾ ആഹ്ലാദിക്കുന്നു]",
+    },
+    "crowd_cheer": {
+        "en": "[crowd cheering]",
+        "hi": "[भीड़ जयकार करती है]",
+        "ta": "[கூட்டம் ஆரவாரம் செய்கிறது]",
+        "te": "[జనం ఆనందంగా కేకలు వేస్తున్నారు]",
+        "bn": "[ভিড় উল্লাস করছে]",
+        "mr": "[गर्दी जल्लोष करते]",
+        "ml": "[ജനക്കൂട്ടം ആഹ്ലാദിക്കുന്നു]",
+    },
+    "school_bell": {
+        "en": "[school bell rings]",
+        "hi": "[स्कूल की घंटी बजती है]",
+        "ta": "[பள்ளி மணி ஒலிக்கிறது]",
+        "te": "[పాఠశాల గంట మోగుతుంది]",
+        "bn": "[স্কুলের ঘণ্টা বাজছে]",
+        "mr": "[शाळेची घंटा वाजते]",
+        "ml": "[സ്കൂൾ മണി മുഴങ്ങുന്നു]",
+    },
+    "applause": {
+        "en": "[students applauding]",
+        "hi": "[छात्र तालियां बजाते हैं]",
+        "ta": "[மாணவர்கள் கைத்தட்டுகின்றனர்]",
+        "te": "[విద్యార్థులు చప్పట్లు కొడుతున్నారు]",
+        "bn": "[ছাত্ররা হাততালি দিচ্ছে]",
+        "mr": "[विद्यार्थी टाळ्या वाजवत आहेत]",
+        "ml": "[വിദ്യാർത്ഥികൾ കൈയടിക്കുന്നു]",
+    },
+    "chair_scrape": {
+        "en": "[chair scrapes]",
+        "hi": "[कुर्सी घिसटती है]",
+        "ta": "[நாற்காலி இழுக்கும் சத்தம்]",
+        "te": "[కుర్చీ లాగిన శబ్దం]",
+        "bn": "[চেয়ার ঘষার শব্দ]",
+        "mr": "[खुर्ची ओढल्याचा आवाज]",
+        "ml": "[കസേര വലിക്കുന്ന ശബ്ദം]",
+    },
+    "background_chatter": {
+        "en": "[background chatter]",
+        "hi": "[पृष्ठभूमि में बातचीत]",
+        "ta": "[பின்னணி பேச்சு]",
+        "te": "[నేపథ్యంలో మాటలు]",
+        "bn": "[পেছনে কথাবার্তা]",
+        "mr": "[पार्श्वभूमीत गप्पा]",
+        "ml": "[പശ്ചാത്തല സംഭാഷണം]",
+    },
+    "horn_honk": {
+        "en": "[horn honks]",
+        "hi": "[हॉर्न बजता है]",
+        "ta": "[ஹார்ன் ஒலிக்கிறது]",
+        "te": "[హారన్ మోగుతుంది]",
+        "bn": "[হর্ন বাজছে]",
+        "mr": "[हॉर्न वाजतो]",
+        "ml": "[ഹോൺ മുഴങ്ങുന്നു]",
+    },
+    "glass_break": {
+        "en": "[glass breaks]",
+        "hi": "[कांच टूटता है]",
+        "ta": "[கண்ணாடி உடைகிறது]",
+        "te": "[గాజు పగులుతుంది]",
+        "bn": "[কাচ ভাঙছে]",
+        "mr": "[काच फुटते]",
+        "ml": "[ഗ്ലാസ് പൊട്ടുന്നു]",
+    },
+    "laughter": {
+        "en": "[laughter]",
+        "hi": "[हंसी]",
+        "ta": "[சிரிப்பு]",
+        "te": "[నవ్వు]",
+        "bn": "[হাসি]",
+        "mr": "[हशा]",
+        "ml": "[ചിരി]",
+    },
+    "music": {
+        "en": "[music]",
+        "hi": "[संगीत]",
+        "ta": "[இசை]",
+        "te": "[సంగీతం]",
+        "bn": "[সঙ্গীত]",
+        "mr": "[संगीत]",
+        "ml": "[സംഗീതം]",
+    },
+    "alarm": {
+        "en": "[alarm ringing]",
+        "hi": "[अलार्म बजता है]",
+        "ta": "[அலாரம் ஒலிக்கிறது]",
+        "te": "[అలారం మోగుతుంది]",
+        "bn": "[অ্যালার্ম বাজছে]",
+        "mr": "[अलार्म वाजतो]",
+        "ml": "[അലാറം മുഴങ്ങുന്നു]",
+    },
+    "siren": {
+        "en": "[siren wails]",
+        "hi": "[सायरन बजता है]",
+        "ta": "[சைரன் ஒலிக்கிறது]",
+        "te": "[సైరన్ మోగుతుంది]",
+        "bn": "[সাইরেন বাজছে]",
+        "mr": "[सायरेन वाजतो]",
+        "ml": "[സൈറൺ മുഴങ്ങുന്നു]",
+    },
+    "explosion": {
+        "en": "[explosion]",
+        "hi": "[विस्फोट]",
+        "ta": "[வெடிப்பு]",
+        "te": "[పేలుడు]",
+        "bn": "[বিস্ফোরণ]",
+        "mr": "[स्फोट]",
+        "ml": "[സ്ഫോടനം]",
+    },
+    "gunshot": {
+        "en": "[gunshot]",
+        "hi": "[गोली चलती है]",
+        "ta": "[துப்பாக்கிச் சத்தம்]",
+        "te": "[తుపాకీ శబ్దం]",
+        "bn": "[গুলির শব্দ]",
+        "mr": "[गोळीबाराचा आवाज]",
+        "ml": "[വെടിയൊച്ച]",
+    },
+    "door_slam": {
+        "en": "[door slams]",
+        "hi": "[दरवाज़ा ज़ोर से बंद होता है]",
+        "ta": "[கதவு பலமாக மூடப்படுகிறது]",
+        "te": "[తలుపు బలంగా మూసుకుంటుంది]",
+        "bn": "[দরজা জোরে বন্ধ হয়]",
+        "mr": "[दरवाजा जोरात बंद होतो]",
+        "ml": "[വാതിൽ ശക്തമായി അടയുന്നു]",
+    },
+    "phone_ring": {
+        "en": "[phone rings]",
+        "hi": "[फ़ोन बजता है]",
+        "ta": "[தொலைபேசி ஒலிக்கிறது]",
+        "te": "[ఫోన్ మోగుతుంది]",
+        "bn": "[ফোন বাজছে]",
+        "mr": "[फोन वाजतो]",
+        "ml": "[ഫോൺ മുഴങ്ങുന്നു]",
+    },
+    "dog_bark": {
+        "en": "[dog barking]",
+        "hi": "[कुत्ता भौंकता है]",
+        "ta": "[நாய் குரைக்கிறது]",
+        "te": "[కుక్క మొరుగుతుంది]",
+        "bn": "[কুকুর ডাকছে]",
+        "mr": "[कुत्रा भुंकतो]",
+        "ml": "[നായ കുരയ്ക്കുന്നു]",
+    },
+    "impact_sound": {
+        "en": "[sudden sound]",
+        "hi": "[अचानक आवाज़]",
+        "ta": "[திடீர் சத்தம்]",
+        "te": "[అకస్మాత్తుగా శబ్దం]",
+        "bn": "[হঠাৎ শব্দ]",
+        "mr": "[अचानक आवाज]",
+        "ml": "[പെട്ടെന്നുള്ള ശബ്ദം]",
+    },
+    "loud_sound": {
+        "en": "[loud sound]",
+        "hi": "[तेज़ आवाज़]",
+        "ta": "[உரத்த சத்தம்]",
+        "te": "[పెద్ద శబ్దం]",
+        "bn": "[জোরে শব্দ]",
+        "mr": "[मोठा आवाज]",
+        "ml": "[വലിയ ശബ്ദം]",
+    },
+    "sustained_sound": {
+        "en": "[continuous sound]",
+        "hi": "[लगातार आवाज़]",
+        "ta": "[தொடர்ச்சியான சத்தம்]",
+        "te": "[నిరంతర శబ్దం]",
+        "bn": "[অবিরত শব্দ]",
+        "mr": "[सतत आवाज]",
+        "ml": "[തുടർച്ചയായ ശബ്ദം]",
+    },
+}
+
+
+def caption_for(event_id: str, language: str) -> str:
+    """Return a caption label for an event and language."""
+
+    values = LABELS.get(event_id, {})
+    if language in values:
+        return values[language]
+    if "en" in values:
+        return values["en"]
+    return f"[{event_id.replace('_', ' ')}]"
diff --git a/main/cc_suggester/decision/rules.py b/main/cc_suggester/decision/rules.py
new file mode 100644
index 0000000..57add09
--- /dev/null
+++ b/main/cc_suggester/decision/rules.py
@@ -0,0 +1,64 @@
+"""Decision priors and ambient penalties."""
+
+from __future__ import annotations
+
+
+EVENT_IMPORTANCE_PRIOR: dict[str, float] = {
+    "glass_break": 0.30,
+    "explosion": 0.35,
+    "gunshot": 0.35,
+    "alarm": 0.30,
+    "siren": 0.28,
+    "school_bell": 0.20,
+    "children_cheer": 0.18,
+    "crowd_cheer": 0.18,
+    "horn_honk": 0.18,
+    "applause": 0.12,
+    "laughter": 0.10,
+    "music": 0.06,
+    "door_slam": 0.15,
+    "phone_ring": 0.14,
+    "dog_bark": 0.08,
+    "chair_scrape": 0.04,
+    "background_chatter": 0.02,
+    "impact_sound": 0.14,
+    "loud_sound": 0.10,
+    "sustained_sound": 0.04,
+}
+
+AMBIENT_PENALTY: dict[str, float] = {
+    "background_chatter": 0.30,
+    "traffic_noise": 0.32,
+    "fan_noise": 0.35,
+    "background_music": 0.24,
+    "music": 0.18,
+    "crowd_murmur": 0.28,
+    "chair_scrape": 0.10,
+    "sustained_sound": 0.18,
+}
+
+HIGH_IMPACT_EVENTS = {
+    "glass_break",
+    "explosion",
+    "gunshot",
+    "alarm",
+    "siren",
+}
+
+
+def importance_prior(event_id: str) -> float:
+    """Return event importance prior."""
+
+    return EVENT_IMPORTANCE_PRIOR.get(event_id, 0.08)
+
+
+def ambient_penalty(event_id: str) -> float:
+    """Return event ambient penalty."""
+
+    return AMBIENT_PENALTY.get(event_id, 0.0)
+
+
+def is_high_impact(event_id: str) -> bool:
+    """Return whether the event is high-impact by default."""
+
+    return event_id in HIGH_IMPACT_EVENTS
diff --git a/main/cc_suggester/decision/scorer.py b/main/cc_suggester/decision/scorer.py
new file mode 100644
index 0000000..69bea03
--- /dev/null
+++ b/main/cc_suggester/decision/scorer.py
@@ -0,0 +1,99 @@
+"""Combine audio and visual evidence into caption decisions."""
+
+from __future__ import annotations
+
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.types import AudioEventCandidate, CaptionSuggestion, ReactionResult
+from cc_suggester.decision.labels import caption_for
+from cc_suggester.decision.rules import ambient_penalty, importance_prior, is_high_impact
+
+
+def decide_captions(
+    audio_events: list[AudioEventCandidate],
+    reactions: list[ReactionResult],
+    config: PipelineConfig,
+) -> list[CaptionSuggestion]:
+    """Create final caption suggestions from audio and visual signals."""
+
+    reaction_by_key = {
+        (reaction.event_id, reaction.start_time, reaction.end_time): reaction
+        for reaction in reactions
+    }
+    suggestions: list[CaptionSuggestion] = []
+
+    for event in audio_events:
+        reaction = reaction_by_key.get((event.event_id, event.start_time, event.end_time))
+        reaction_confidence = reaction.reaction_confidence if reaction else 0.0
+        prior = importance_prior(event.event_id)
+        penalty = ambient_penalty(event.event_id)
+        score = _score(
+            audio_confidence=event.audio_confidence,
+            reaction_confidence=reaction_confidence,
+            prior=prior,
+            penalty=penalty,
+        )
+
+        accepted = score >= config.decision_threshold
+        requires_review = config.review_threshold <= score < config.decision_threshold
+        if is_high_impact(event.event_id) and event.audio_confidence >= 0.70:
+            accepted = True
+            requires_review = False
+
+        reason = _reason_for(event, reaction_confidence, score, accepted, requires_review, penalty)
+        suggestions.append(
+            CaptionSuggestion(
+                event_id=event.event_id,
+                start_time=event.start_time,
+                end_time=event.end_time,
+                audio_confidence=event.audio_confidence,
+                reaction_confidence=reaction_confidence,
+                decision_score=round(score, 3),
+                accepted=accepted,
+                reason=reason,
+                caption_text=caption_for(event.event_id, config.language),
+                language=config.language,
+                requires_review=requires_review,
+                debug_info={
+                    "importance_prior": prior,
+                    "ambient_penalty": penalty,
+                    "high_impact": is_high_impact(event.event_id),
+                    "reaction_signals": reaction.reaction_signals if reaction else {},
+                },
+            )
+        )
+    return suggestions
+
+
+def _score(
+    audio_confidence: float,
+    reaction_confidence: float,
+    prior: float,
+    penalty: float,
+) -> float:
+    raw = (0.52 * audio_confidence) + (0.34 * reaction_confidence) + prior - penalty
+    return max(0.0, min(1.0, raw))
+
+
+def _reason_for(
+    event: AudioEventCandidate,
+    reaction_confidence: float,
+    score: float,
+    accepted: bool,
+    requires_review: bool,
+    penalty: float,
+) -> str:
+    if accepted:
+        if reaction_confidence >= 0.50:
+            return (
+                f"Accepted because {event.event_id} has strong audio confidence "
+                "and visible scene reaction."
+            )
+        return f"Accepted because {event.event_id} is important and audio confidence is high."
+    if requires_review:
+        return (
+            f"Needs review because {event.event_id} is plausible but the combined "
+            f"decision score is borderline ({score:.2f})."
+        )
+    if penalty > 0:
+        return f"Rejected because {event.event_id} appears ambient or low-impact."
+    return f"Rejected because combined audio and reaction evidence is weak ({score:.2f})."
diff --git a/main/cc_suggester/output/__init__.py b/main/cc_suggester/output/__init__.py
new file mode 100644
index 0000000..e8f7502
--- /dev/null
+++ b/main/cc_suggester/output/__init__.py
@@ -0,0 +1 @@
+"""Output writers for caption suggestions."""
diff --git a/main/cc_suggester/output/csv_report.py b/main/cc_suggester/output/csv_report.py
new file mode 100644
index 0000000..322f1e3
--- /dev/null
+++ b/main/cc_suggester/output/csv_report.py
@@ -0,0 +1,57 @@
+"""CSV review report export."""
+
+from __future__ import annotations
+
+import csv
+import io
+from pathlib import Path
+
+from cc_suggester.core.types import CaptionSuggestion
+
+
+FIELDNAMES = [
+    "event_id",
+    "start_time",
+    "end_time",
+    "caption_text",
+    "language",
+    "audio_confidence",
+    "reaction_confidence",
+    "decision_score",
+    "accepted",
+    "requires_review",
+    "reason",
+]
+
+
+def render_csv_report(suggestions: list[CaptionSuggestion]) -> str:
+    """Render a reviewer-friendly CSV report."""
+
+    buffer = io.StringIO()
+    writer = csv.DictWriter(buffer, fieldnames=FIELDNAMES)
+    writer.writeheader()
+    for suggestion in suggestions:
+        writer.writerow(
+            {
+                "event_id": suggestion.event_id,
+                "start_time": suggestion.start_time,
+                "end_time": suggestion.end_time,
+                "caption_text": suggestion.caption_text,
+                "language": suggestion.language,
+                "audio_confidence": suggestion.audio_confidence,
+                "reaction_confidence": suggestion.reaction_confidence,
+                "decision_score": suggestion.decision_score,
+                "accepted": suggestion.accepted,
+                "requires_review": suggestion.requires_review,
+                "reason": suggestion.reason,
+            }
+        )
+    return buffer.getvalue()
+
+
+def write_csv_report(suggestions: list[CaptionSuggestion], output_path: Path) -> Path:
+    """Write a reviewer-friendly CSV report."""
+
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    output_path.write_text(render_csv_report(suggestions), encoding="utf-8")
+    return output_path
diff --git a/main/cc_suggester/output/json_report.py b/main/cc_suggester/output/json_report.py
new file mode 100644
index 0000000..e2662d2
--- /dev/null
+++ b/main/cc_suggester/output/json_report.py
@@ -0,0 +1,18 @@
+"""JSON report export."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+from typing import Any
+
+
+def write_json_report(payload: dict[str, Any], output_path: Path) -> Path:
+    """Write a UTF-8 JSON report."""
+
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    output_path.write_text(
+        json.dumps(payload, indent=2, ensure_ascii=False, sort_keys=True),
+        encoding="utf-8",
+    )
+    return output_path
diff --git a/main/cc_suggester/output/review_export.py b/main/cc_suggester/output/review_export.py
new file mode 100644
index 0000000..c42ec67
--- /dev/null
+++ b/main/cc_suggester/output/review_export.py
@@ -0,0 +1,148 @@
+"""Helpers for exporting manually reviewed caption suggestions."""
+
+from __future__ import annotations
+
+from collections.abc import Mapping, Sequence
+import json
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any
+
+from cc_suggester.core.types import CaptionSuggestion
+from cc_suggester.output.csv_report import render_csv_report, write_csv_report
+from cc_suggester.output.json_report import write_json_report
+from cc_suggester.output.srt import render_srt, write_srt
+
+
+VALID_REVIEW_STATUSES = {"accepted", "review", "rejected"}
+
+
+@dataclass(frozen=True, slots=True)
+class ReviewExport:
+    """In-memory reviewed export payload for UI download buttons."""
+
+    suggestions: list[CaptionSuggestion]
+    srt_text: str
+    csv_text: str
+    json_text: str
+
+
+def build_review_export(rows: Sequence[Mapping[str, Any]], language: str) -> ReviewExport:
+    """Convert editable review rows into exportable SRT and CSV content."""
+
+    suggestions = suggestions_from_review_rows(rows, language)
+    return ReviewExport(
+        suggestions=suggestions,
+        srt_text=render_srt(suggestions),
+        csv_text=render_csv_report(suggestions),
+        json_text=json.dumps(
+            review_payload(suggestions, language),
+            indent=2,
+            ensure_ascii=False,
+            sort_keys=True,
+        ),
+    )
+
+
+def write_review_exports(rows: Sequence[Mapping[str, Any]], output_dir: Path, language: str) -> dict[str, Path]:
+    """Write reviewed SRT, CSV, and JSON files to a directory."""
+
+    export = build_review_export(rows, language)
+    output_dir.mkdir(parents=True, exist_ok=True)
+    files = {
+        "reviewed_srt": write_srt(export.suggestions, output_dir / f"reviewed_captions.{language}.srt"),
+        "reviewed_csv": write_csv_report(export.suggestions, output_dir / "reviewed_events.csv"),
+    }
+    files["reviewed_json"] = write_json_report(
+        review_payload(export.suggestions, language),
+        output_dir / "reviewed_results.json",
+    )
+    return files
+
+
+def review_payload(suggestions: Sequence[CaptionSuggestion], language: str) -> dict[str, Any]:
+    """Build a JSON-serializable reviewed session payload."""
+
+    return {
+        "language": language,
+        "suggestions": [suggestion.to_dict() for suggestion in suggestions],
+        "summary": {
+            "total": len(suggestions),
+            "accepted": sum(1 for item in suggestions if item.accepted),
+            "review": sum(1 for item in suggestions if item.requires_review),
+            "rejected": sum(1 for item in suggestions if not item.accepted and not item.requires_review),
+        },
+    }
+
+
+def suggestions_from_review_rows(rows: Sequence[Mapping[str, Any]], language: str) -> list[CaptionSuggestion]:
+    """Build caption suggestions from Web UI review rows."""
+
+    suggestions: list[CaptionSuggestion] = []
+    for fallback_index, row in enumerate(rows, start=1):
+        status = _status_for(row)
+        caption_text = _string_for(row, ("caption", "caption_text"), default="").strip()
+        suggestions.append(
+            CaptionSuggestion(
+                event_id=_string_for(row, ("event_id",), default=f"event_{fallback_index}"),
+                start_time=_float_for(row, ("start", "start_time"), default=0.0),
+                end_time=_float_for(row, ("end", "end_time"), default=0.0),
+                audio_confidence=_float_for(row, ("audio", "audio_confidence"), default=0.0),
+                reaction_confidence=_float_for(row, ("reaction", "reaction_confidence"), default=0.0),
+                decision_score=_float_for(row, ("decision", "decision_score"), default=0.0),
+                accepted=status == "accepted",
+                requires_review=status == "review",
+                reason=_reason_for(row, status),
+                caption_text=caption_text,
+                language=language,
+                debug_info={
+                    "editor_status": status,
+                    "review_index": row.get("index", fallback_index),
+                    "source": "review_export",
+                },
+            )
+        )
+    return suggestions
+
+
+def _status_for(row: Mapping[str, Any]) -> str:
+    status = _string_for(row, ("status",), default="").strip().lower()
+    if not status:
+        if bool(row.get("accepted", False)):
+            status = "accepted"
+        elif bool(row.get("requires_review", False)):
+            status = "review"
+        else:
+            status = "rejected"
+    if status not in VALID_REVIEW_STATUSES:
+        valid = ", ".join(sorted(VALID_REVIEW_STATUSES))
+        raise ValueError(f"Unknown review status '{status}'. Expected one of: {valid}.")
+    return status
+
+
+def _reason_for(row: Mapping[str, Any], status: str) -> str:
+    existing = _string_for(row, ("reason",), default="").strip()
+    editor_note = f"Editor marked this event as {status}."
+    if not existing:
+        return editor_note
+    if existing.endswith(editor_note):
+        return existing
+    return f"{existing} {editor_note}"
+
+
+def _string_for(row: Mapping[str, Any], keys: tuple[str, ...], default: str) -> str:
+    for key in keys:
+        if key in row and row[key] is not None:
+            return str(row[key])
+    return default
+
+
+def _float_for(row: Mapping[str, Any], keys: tuple[str, ...], default: float) -> float:
+    for key in keys:
+        if key not in row or row[key] is None:
+            continue
+        try:
+            return float(row[key])
+        except (TypeError, ValueError):
+            return default
+    return default
diff --git a/main/cc_suggester/output/srt.py b/main/cc_suggester/output/srt.py
new file mode 100644
index 0000000..a51e57f
--- /dev/null
+++ b/main/cc_suggester/output/srt.py
@@ -0,0 +1,46 @@
+"""SRT export."""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from cc_suggester.core.types import CaptionSuggestion
+
+
+def render_srt(suggestions: list[CaptionSuggestion]) -> str:
+    """Render accepted caption suggestions as SRT text."""
+
+    accepted = [item for item in suggestions if item.accepted]
+    lines: list[str] = []
+    for index, suggestion in enumerate(accepted, start=1):
+        lines.extend(
+            [
+                str(index),
+                f"{format_srt_time(suggestion.start_time)} --> {format_srt_time(suggestion.end_time)}",
+                suggestion.caption_text,
+                "",
+            ]
+        )
+    return "\n".join(lines)
+
+
+def write_srt(suggestions: list[CaptionSuggestion], output_path: Path) -> Path:
+    """Write accepted caption suggestions to an SRT file."""
+
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    output_path.write_text(render_srt(suggestions), encoding="utf-8")
+    return output_path
+
+
+def format_srt_time(seconds: float) -> str:
+    """Format seconds as SRT timestamp."""
+
+    safe_seconds = max(0.0, seconds)
+    hours = int(safe_seconds // 3600)
+    minutes = int((safe_seconds % 3600) // 60)
+    whole_seconds = int(safe_seconds % 60)
+    milliseconds = int(round((safe_seconds - int(safe_seconds)) * 1000))
+    if milliseconds == 1000:
+        milliseconds = 0
+        whole_seconds += 1
+    return f"{hours:02d}:{minutes:02d}:{whole_seconds:02d},{milliseconds:03d}"
diff --git a/main/cc_suggester/translation/__init__.py b/main/cc_suggester/translation/__init__.py
new file mode 100644
index 0000000..1bc0acb
--- /dev/null
+++ b/main/cc_suggester/translation/__init__.py
@@ -0,0 +1 @@
+"""Translation and multilingual label support."""
diff --git a/main/cc_suggester/translation/glossary.py b/main/cc_suggester/translation/glossary.py
new file mode 100644
index 0000000..9e3c8f1
--- /dev/null
+++ b/main/cc_suggester/translation/glossary.py
@@ -0,0 +1,17 @@
+"""Glossary helpers for non-speech caption labels."""
+
+from __future__ import annotations
+
+from cc_suggester.decision.labels import LABELS, caption_for
+
+
+def supported_event_ids() -> list[str]:
+    """Return event IDs available in the curated glossary."""
+
+    return sorted(LABELS)
+
+
+def get_caption(event_id: str, language: str) -> str:
+    """Return a caption from the curated glossary."""
+
+    return caption_for(event_id, language)
diff --git a/main/cc_suggester/ui/__init__.py b/main/cc_suggester/ui/__init__.py
new file mode 100644
index 0000000..4d40324
--- /dev/null
+++ b/main/cc_suggester/ui/__init__.py
@@ -0,0 +1 @@
+"""Web UI clients."""
diff --git a/main/cc_suggester/ui/streamlit_app.py b/main/cc_suggester/ui/streamlit_app.py
new file mode 100644
index 0000000..af7a606
--- /dev/null
+++ b/main/cc_suggester/ui/streamlit_app.py
@@ -0,0 +1,199 @@
+"""Streamlit editor review UI.
+
+Run with:
+    streamlit run cc_suggester/ui/streamlit_app.py
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+import tempfile
+
+import streamlit as st
+
+from cc_suggester.core.config import SUPPORTED_DEVICES, SUPPORTED_LANGUAGES, PipelineConfig
+from cc_suggester.core.errors import CCSuggesterError
+from cc_suggester.core.pipeline import analyze_video
+from cc_suggester.output.review_export import build_review_export
+
+
+def main() -> None:
+    st.set_page_config(
+        page_title="Intelligent CC Suggestion Tool",
+        page_icon="CC",
+        layout="wide",
+    )
+    st.title("Intelligent Closed Caption Suggestion Tool")
+    st.caption("Generate and review meaningful non-speech CC suggestions.")
+
+    with st.sidebar:
+        st.header("Pipeline")
+        uploaded = st.file_uploader("Video file", type=["mp4", "mkv", "mov", "avi", "webm", "wav"])
+        language = st.selectbox("Language", SUPPORTED_LANGUAGES, index=0)
+        device = st.selectbox("Device", SUPPORTED_DEVICES, index=0)
+        audio_backend = st.selectbox("Audio backend", ["mock", "dsp", "yamnet"], index=0)
+        vision_backend = st.selectbox("Vision backend", ["mock", "opencv", "mediapipe"], index=0)
+        decision_threshold = st.slider("Decision threshold", 0.0, 1.0, 0.65, 0.01)
+        review_threshold = st.slider("Review threshold", 0.0, 1.0, 0.50, 0.01)
+        allow_demo_input = st.checkbox("Allow demo/non-video input", value=audio_backend == "mock")
+        run = st.button("Start Caption", type="primary", use_container_width=True)
+
+    if uploaded is None:
+        st.info("Upload a video to begin. Use mock backends for a fast demo, or DSP/OpenCV for real local processing.")
+        return
+
+    input_path = _save_upload(uploaded)
+    left, right = st.columns([1.5, 1.0], gap="large")
+    with left:
+        st.subheader("Video Preview")
+        if input_path.suffix.lower() != ".wav":
+            st.video(str(input_path))
+        else:
+            st.audio(str(input_path))
+
+    if run:
+        config = PipelineConfig(
+            language=language,
+            device=device,
+            audio_backend=audio_backend,
+            vision_backend=vision_backend,
+            output_dir=Path("outputs"),
+            decision_threshold=decision_threshold,
+            review_threshold=review_threshold,
+            allow_demo_input=allow_demo_input,
+        )
+        try:
+            with st.spinner("Analyzing audio and visual reaction signals..."):
+                st.session_state["result"] = analyze_video(input_path, config)
+        except CCSuggesterError as exc:
+            st.error(exc.message)
+            if exc.suggestions:
+                st.markdown("**Suggestions**")
+                for suggestion in exc.suggestions:
+                    st.write(f"- {suggestion}")
+            if exc.details:
+                with st.expander("Debug details"):
+                    st.json(exc.details)
+            return
+
+    result = st.session_state.get("result")
+    if not result:
+        return
+
+    with right:
+        st.subheader("Run Summary")
+        accepted = [item for item in result.suggestions if item.accepted]
+        review = [item for item in result.suggestions if item.requires_review]
+        rejected = [item for item in result.suggestions if not item.accepted and not item.requires_review]
+        st.metric("Detected events", len(result.audio_events))
+        st.metric("Accepted", len(accepted))
+        st.metric("Needs review", len(review))
+        st.metric("Rejected", len(rejected))
+        st.write(f"Device used: `{result.diagnostics.actual_device}`")
+        if result.diagnostics.warnings:
+            with st.expander("Diagnostics warnings"):
+                for warning in result.diagnostics.warnings:
+                    st.warning(warning)
+
+    st.subheader("Review Suggestions")
+    rows = []
+    for index, suggestion in enumerate(result.suggestions, start=1):
+        status = "accepted" if suggestion.accepted else "review" if suggestion.requires_review else "rejected"
+        row_key = f"{Path(result.input_path).stem}-{index}-{suggestion.event_id}-{suggestion.start_time:.3f}"
+        with st.expander(
+            f"{index}. {suggestion.caption_text} | {suggestion.start_time:.2f}s-{suggestion.end_time:.2f}s | {status}",
+            expanded=index == 1,
+        ):
+            edited = st.text_input(
+                "Caption text",
+                value=suggestion.caption_text,
+                key=f"caption-{row_key}",
+            )
+            c1, c2, c3 = st.columns(3)
+            c1.metric("Audio", f"{suggestion.audio_confidence:.2f}")
+            c2.metric("Reaction", f"{suggestion.reaction_confidence:.2f}")
+            c3.metric("Decision", f"{suggestion.decision_score:.2f}")
+            st.write(suggestion.reason)
+            status_choice = st.radio(
+                "Editor decision",
+                ["accepted", "review", "rejected"],
+                index=["accepted", "review", "rejected"].index(status),
+                horizontal=True,
+                key=f"status-{row_key}",
+            )
+            rows.append(
+                {
+                    "index": index,
+                    "event_id": suggestion.event_id,
+                    "start": suggestion.start_time,
+                    "end": suggestion.end_time,
+                    "caption": edited,
+                    "status": status_choice,
+                    "audio": suggestion.audio_confidence,
+                    "reaction": suggestion.reaction_confidence,
+                    "decision": suggestion.decision_score,
+                    "reason": suggestion.reason,
+                }
+            )
+
+    st.subheader("Exports")
+    st.dataframe(rows, use_container_width=True)
+    export_language = result.suggestions[0].language if result.suggestions else language
+    review_export = build_review_export(rows, export_language)
+    reviewed_accepted = sum(1 for item in review_export.suggestions if item.accepted)
+    reviewed_review = sum(1 for item in review_export.suggestions if item.requires_review)
+    reviewed_rejected = len(review_export.suggestions) - reviewed_accepted - reviewed_review
+    srt_name = f"{Path(result.input_path).stem}.reviewed.{export_language}.srt"
+    csv_name = f"{Path(result.input_path).stem}.reviewed.events.csv"
+    json_name = f"{Path(result.input_path).stem}.reviewed.session.json"
+
+    c1, c2, c3 = st.columns(3)
+    c1.metric("Reviewed accepted", reviewed_accepted)
+    c2.metric("Still needs review", reviewed_review)
+    c3.metric("Rejected", reviewed_rejected)
+
+    export_left, export_middle, export_right = st.columns(3)
+    export_left.download_button(
+        label="Download Reviewed SRT",
+        data=review_export.srt_text.encode("utf-8"),
+        file_name=srt_name,
+        mime="application/x-subrip",
+        type="primary",
+        use_container_width=True,
+    )
+    export_middle.download_button(
+        label="Download Reviewed CSV",
+        data=review_export.csv_text.encode("utf-8"),
+        file_name=csv_name,
+        mime="text/csv",
+        use_container_width=True,
+    )
+    export_right.download_button(
+        label="Download Review Session",
+        data=review_export.json_text.encode("utf-8"),
+        file_name=json_name,
+        mime="application/json",
+        use_container_width=True,
+    )
+
+    with st.expander("Raw pipeline exports"):
+        for name, path in result.files.items():
+            if path.exists():
+                st.download_button(
+                    label=f"Download Original {name.upper()}",
+                    data=path.read_bytes(),
+                    file_name=path.name,
+                    use_container_width=False,
+                )
+
+
+def _save_upload(uploaded) -> Path:
+    temp_dir = Path(tempfile.gettempdir()) / "cc_suggester_uploads"
+    temp_dir.mkdir(parents=True, exist_ok=True)
+    path = temp_dir / uploaded.name
+    path.write_bytes(uploaded.getbuffer())
+    return path
+
+
+if __name__ == "__main__":
+    main()
diff --git a/main/cc_suggester/vision/__init__.py b/main/cc_suggester/vision/__init__.py
new file mode 100644
index 0000000..bcd5db8
--- /dev/null
+++ b/main/cc_suggester/vision/__init__.py
@@ -0,0 +1 @@
+"""Visual reaction analysis modules."""
diff --git a/main/cc_suggester/vision/backends/__init__.py b/main/cc_suggester/vision/backends/__init__.py
new file mode 100644
index 0000000..228cbe2
--- /dev/null
+++ b/main/cc_suggester/vision/backends/__init__.py
@@ -0,0 +1,22 @@
+"""Vision backend registry."""
+
+from cc_suggester.vision.backends.base import VisionBackend
+from cc_suggester.vision.backends.mediapipe import MediaPipeVisionBackend
+from cc_suggester.vision.backends.mock import MockVisionBackend
+from cc_suggester.vision.backends.opencv import OpenCvVisionBackend
+
+
+def get_vision_backend(name: str) -> VisionBackend:
+    """Return a visual reaction backend by name."""
+
+    normalized = name.lower().strip()
+    if normalized in {"mock", "demo"}:
+        return MockVisionBackend()
+    if normalized in {"opencv", "cv2"}:
+        return OpenCvVisionBackend()
+    if normalized == "mediapipe":
+        return MediaPipeVisionBackend()
+    raise ValueError(
+        f"Unknown vision backend '{name}'. Available: mock, opencv, mediapipe. "
+        "Planned advanced backends: mmpose, mmaction."
+    )
diff --git a/main/cc_suggester/vision/backends/base.py b/main/cc_suggester/vision/backends/base.py
new file mode 100644
index 0000000..49122c9
--- /dev/null
+++ b/main/cc_suggester/vision/backends/base.py
@@ -0,0 +1,26 @@
+"""Vision backend interface."""
+
+from __future__ import annotations
+
+from abc import ABC, abstractmethod
+from pathlib import Path
+
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.types import AudioEventCandidate, ReactionResult, VideoMetadata
+
+
+class VisionBackend(ABC):
+    """Interface implemented by visual reaction analysis backends."""
+
+    name: str
+    requires_valid_media: bool = False
+
+    @abstractmethod
+    def analyze(
+        self,
+        video_path: Path,
+        metadata: VideoMetadata,
+        audio_events: list[AudioEventCandidate],
+        config: PipelineConfig,
+    ) -> list[ReactionResult]:
+        """Analyze visible reactions around audio event timestamps."""
diff --git a/main/cc_suggester/vision/backends/mediapipe.py b/main/cc_suggester/vision/backends/mediapipe.py
new file mode 100644
index 0000000..e42b885
--- /dev/null
+++ b/main/cc_suggester/vision/backends/mediapipe.py
@@ -0,0 +1,187 @@
+"""Optional MediaPipe visual reaction backend."""
+
+from __future__ import annotations
+
+import math
+from pathlib import Path
+from typing import Any
+
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.errors import BackendUnavailableError
+from cc_suggester.core.types import AudioEventCandidate, ReactionResult, VideoMetadata
+from cc_suggester.vision.backends.base import VisionBackend
+
+_MODEL_URL = (
+    "https://storage.googleapis.com/mediapipe-models/"
+    "pose_landmarker/pose_landmarker_lite/float16/latest/"
+    "pose_landmarker_lite.task"
+)
+
+
+def _ensure_model() -> Path:
+    import os
+    cache_dir = Path(os.path.expanduser("~/.cache/cc_suggester"))
+    cache_dir.mkdir(parents=True, exist_ok=True)
+    model_path = cache_dir / "pose_landmarker_lite.task"
+    if not model_path.exists():
+        import urllib.request
+        urllib.request.urlretrieve(_MODEL_URL, model_path)
+    return model_path
+
+
+class MediaPipeVisionBackend(VisionBackend):
+    """Estimate pose-based reaction signals around audio events."""
+
+    name = "mediapipe"
+    requires_valid_media = True
+
+    def analyze(
+        self,
+        video_path: Path,
+        metadata: VideoMetadata,
+        audio_events: list[AudioEventCandidate],
+        config: PipelineConfig,
+    ) -> list[ReactionResult]:
+        cv2, mp, PoseLandmarker, VisionTaskRunningMode, BaseOptions = _import_dependencies()
+        capture = cv2.VideoCapture(str(video_path))
+        if not capture.isOpened():
+            raise BackendUnavailableError(
+                message="OpenCV could not open the input video for MediaPipe analysis.",
+                code="mediapipe_video_open_failed",
+                suggestions=[
+                    "Run ccs inspect on the input file.",
+                    "Try --vision-backend opencv to confirm basic video decoding works.",
+                ],
+            )
+
+        fps = metadata.fps or capture.get(cv2.CAP_PROP_FPS) or 25.0
+        results: list[ReactionResult] = []
+
+        from mediapipe.tasks.python.vision import PoseLandmarkerOptions
+        model_path = _ensure_model()
+        options = PoseLandmarkerOptions(
+            base_options=BaseOptions(model_asset_path=str(model_path)),
+            running_mode=VisionTaskRunningMode.IMAGE,
+            num_poses=1,
+            min_pose_detection_confidence=0.4,
+            output_segmentation_masks=False,
+        )
+        pose = PoseLandmarker.create_from_options(options)
+        try:
+            for event in audio_events:
+                frames = _sample_frames(cv2, capture, fps, _event_offsets(event, config))
+                landmarks = [_pose_landmarks(cv2, mp, pose, frame) for frame in frames]
+                landmarks = [item for item in landmarks if item is not None]
+                pose_motion = _landmark_motion(landmarks)
+                head_motion = _head_motion(landmarks)
+                visibility = len(landmarks) / max(1, len(frames))
+                reaction_confidence = round(
+                    max(0.0, min(0.95, (pose_motion * 3.0) + (head_motion * 2.0) + (visibility * 0.08))),
+                    3,
+                )
+                results.append(
+                    ReactionResult(
+                        event_id=event.event_id,
+                        start_time=event.start_time,
+                        end_time=event.end_time,
+                        reaction_confidence=reaction_confidence,
+                        reaction_signals={
+                            "pose_motion": round(pose_motion, 4),
+                            "head_motion": round(head_motion, 4),
+                            "pose_visibility": round(visibility, 4),
+                        },
+                        frames_sampled=len(frames),
+                        vision_backend=self.name,
+                        debug_info={"fps": fps, "landmark_frames": len(landmarks)},
+                    )
+                )
+        finally:
+            pose.close()
+            capture.release()
+        return results
+
+
+def _import_dependencies():
+    try:
+        import cv2  # type: ignore
+        import mediapipe as mp  # type: ignore
+        from mediapipe.tasks.python.vision import PoseLandmarker
+        from mediapipe.tasks.python.vision.core.vision_task_running_mode import VisionTaskRunningMode
+        from mediapipe.tasks.python.core.base_options import BaseOptions
+    except Exception as exc:
+        raise BackendUnavailableError(
+            message="The MediaPipe vision backend requires mediapipe and opencv-python.",
+            code="mediapipe_not_installed",
+            suggestions=[
+                "Install vision dependencies: pip install -r requirements-vision.txt",
+                "Use --vision-backend opencv for a CPU scene-motion baseline.",
+                "Use --vision-backend mock for deterministic demos/tests.",
+            ],
+            details={"error": str(exc)},
+        ) from exc
+    return cv2, mp, PoseLandmarker, VisionTaskRunningMode, BaseOptions
+
+
+def _event_offsets(event: AudioEventCandidate, config: PipelineConfig) -> list[float]:
+    midpoint = (event.start_time + event.end_time) / 2.0
+    return [
+        event.start_time - config.sample_window_before,
+        event.start_time,
+        midpoint,
+        event.end_time,
+        event.end_time + config.sample_window_after,
+    ]
+
+
+def _sample_frames(cv2, capture, fps: float, offsets: list[float]) -> list[Any]:
+    frames: list[Any] = []
+    for seconds in offsets:
+        if seconds < 0:
+            continue
+        capture.set(cv2.CAP_PROP_POS_FRAMES, int(seconds * fps))
+        ok, frame = capture.read()
+        if ok and frame is not None:
+            frames.append(frame)
+    return frames
+
+
+def _pose_landmarks(cv2, mp, pose, frame) -> list[tuple[float, float, float]] | None:
+    rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+    mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb)
+    result = pose.detect(mp_image)
+    if not result.pose_landmarks:
+        return None
+    return [(lm.x, lm.y, lm.visibility) for lm in result.pose_landmarks[0]]
+
+
+def _landmark_motion(frames: list[list[tuple[float, float, float]]]) -> float:
+    if len(frames) < 2:
+        return 0.0
+    indices = [0, 11, 12, 15, 16, 23, 24]
+    motions: list[float] = []
+    for previous, current in zip(frames, frames[1:]):
+        distances = []
+        for index in indices:
+            if index >= len(previous) or index >= len(current):
+                continue
+            if previous[index][2] < 0.35 or current[index][2] < 0.35:
+                continue
+            distances.append(_distance(previous[index], current[index]))
+        if distances:
+            motions.append(sum(distances) / len(distances))
+    return max(motions) if motions else 0.0
+
+
+def _head_motion(frames: list[list[tuple[float, float, float]]]) -> float:
+    if len(frames) < 2:
+        return 0.0
+    motions: list[float] = []
+    for previous, current in zip(frames, frames[1:]):
+        if previous[0][2] < 0.35 or current[0][2] < 0.35:
+            continue
+        motions.append(_distance(previous[0], current[0]))
+    return max(motions) if motions else 0.0
+
+
+def _distance(first: tuple[float, float, float], second: tuple[float, float, float]) -> float:
+    return math.sqrt((first[0] - second[0]) ** 2 + (first[1] - second[1]) ** 2)
diff --git a/main/cc_suggester/vision/backends/mock.py b/main/cc_suggester/vision/backends/mock.py
new file mode 100644
index 0000000..da90e8b
--- /dev/null
+++ b/main/cc_suggester/vision/backends/mock.py
@@ -0,0 +1,45 @@
+"""Deterministic demo visual reaction backend."""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.types import AudioEventCandidate, ReactionResult, VideoMetadata
+from cc_suggester.vision.backends.base import VisionBackend
+
+
+class MockVisionBackend(VisionBackend):
+    """Return plausible reaction scores for classroom-style events."""
+
+    name = "mock"
+
+    def analyze(
+        self,
+        video_path: Path,
+        metadata: VideoMetadata,
+        audio_events: list[AudioEventCandidate],
+        config: PipelineConfig,
+    ) -> list[ReactionResult]:
+        return [_reaction_for(event) for event in audio_events]
+
+
+def _reaction_for(event: AudioEventCandidate) -> ReactionResult:
+    reaction_map = {
+        "children_cheer": (0.82, {"raised_hands": 0.89, "face_change": 0.72, "motion_spike": 0.78}),
+        "school_bell": (0.61, {"head_turn": 0.67, "posture_shift": 0.52, "motion_spike": 0.64}),
+        "applause": (0.54, {"hand_motion": 0.79, "face_change": 0.38, "motion_spike": 0.68}),
+        "chair_scrape": (0.39, {"posture_shift": 0.35, "motion_spike": 0.42}),
+        "background_chatter": (0.16, {"ambient_scene": 0.73, "head_turn": 0.09}),
+    }
+    confidence, signals = reaction_map.get(event.event_id, (0.25, {}))
+    return ReactionResult(
+        event_id=event.event_id,
+        start_time=event.start_time,
+        end_time=event.end_time,
+        reaction_confidence=confidence,
+        reaction_signals=signals,
+        frames_sampled=7,
+        vision_backend="mock",
+        debug_info={"source": "deterministic mock backend"},
+    )
diff --git a/main/cc_suggester/vision/backends/opencv.py b/main/cc_suggester/vision/backends/opencv.py
new file mode 100644
index 0000000..5d372d7
--- /dev/null
+++ b/main/cc_suggester/vision/backends/opencv.py
@@ -0,0 +1,115 @@
+"""OpenCV visual reaction backend."""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.errors import BackendUnavailableError
+from cc_suggester.core.types import AudioEventCandidate, ReactionResult, VideoMetadata
+from cc_suggester.vision.backends.base import VisionBackend
+
+
+class OpenCvVisionBackend(VisionBackend):
+    """Estimate scene reaction using frame differences around each event."""
+
+    name = "opencv"
+    requires_valid_media = True
+
+    def analyze(
+        self,
+        video_path: Path,
+        metadata: VideoMetadata,
+        audio_events: list[AudioEventCandidate],
+        config: PipelineConfig,
+    ) -> list[ReactionResult]:
+        cv2 = _import_cv2()
+        capture = cv2.VideoCapture(str(video_path))
+        if not capture.isOpened():
+            raise BackendUnavailableError(
+                message="OpenCV could not open the input video.",
+                code="opencv_open_failed",
+                suggestions=[
+                    "Run ccs inspect on the input file.",
+                    "Try re-encoding the video to MP4/H.264.",
+                ],
+            )
+
+        fps = metadata.fps or capture.get(cv2.CAP_PROP_FPS) or 25.0
+        results: list[ReactionResult] = []
+        try:
+            for event in audio_events:
+                frames = _sample_grayscale_frames(
+                    cv2=cv2,
+                    capture=capture,
+                    fps=fps,
+                    offsets=[
+                        event.start_time - config.sample_window_before,
+                        event.start_time,
+                        (event.start_time + event.end_time) / 2.0,
+                        event.end_time,
+                        event.end_time + config.sample_window_after,
+                    ],
+                )
+                motion = _motion_score(cv2, frames)
+                reaction_confidence = round(max(0.0, min(0.95, motion * 3.5)), 3)
+                results.append(
+                    ReactionResult(
+                        event_id=event.event_id,
+                        start_time=event.start_time,
+                        end_time=event.end_time,
+                        reaction_confidence=reaction_confidence,
+                        reaction_signals={
+                            "scene_motion_delta": round(motion, 4),
+                            "frame_difference": round(motion, 4),
+                        },
+                        frames_sampled=len(frames),
+                        vision_backend=self.name,
+                        debug_info={"fps": fps, "method": "grayscale frame difference"},
+                    )
+                )
+        finally:
+            capture.release()
+        return results
+
+
+def _import_cv2():
+    try:
+        import cv2  # type: ignore
+    except Exception as exc:
+        raise BackendUnavailableError(
+            message="The OpenCV vision backend requires opencv-python.",
+            code="opencv_not_installed",
+            suggestions=[
+                "Install vision dependencies: pip install -r requirements-vision.txt",
+                "Use --vision-backend mock for deterministic demos/tests.",
+            ],
+        ) from exc
+    return cv2
+
+
+def _sample_grayscale_frames(cv2, capture, fps: float, offsets: list[float]) -> list[object]:
+    frames: list[object] = []
+    for seconds in offsets:
+        if seconds < 0:
+            continue
+        capture.set(cv2.CAP_PROP_POS_FRAMES, int(seconds * fps))
+        ok, frame = capture.read()
+        if not ok or frame is None:
+            continue
+        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
+        gray = cv2.resize(gray, (160, 90))
+        frames.append(gray)
+    return frames
+
+
+def _motion_score(cv2, frames: list[object]) -> float:
+    if len(frames) < 2:
+        return 0.0
+    scores: list[float] = []
+    for previous, current in zip(frames, frames[1:]):
+        diff = cv2.absdiff(previous, current)
+        mean_score = float(diff.mean()) / 255.0
+        changed_fraction = float((diff > 18).sum()) / float(diff.size)
+        scores.append(max(mean_score * 8.0, changed_fraction * 5.0))
+    return max(scores) if scores else 0.0
diff --git a/main/cc_suggester/vision/frame_sampler.py b/main/cc_suggester/vision/frame_sampler.py
new file mode 100644
index 0000000..618abd2
--- /dev/null
+++ b/main/cc_suggester/vision/frame_sampler.py
@@ -0,0 +1,9 @@
+"""Frame sampling policy for event-aligned visual analysis."""
+
+from __future__ import annotations
+
+
+def sample_offsets(before: float = 1.0, after: float = 1.0) -> list[float]:
+    """Return relative frame offsets around an event start/end window."""
+
+    return [-before, -0.5, 0.0, 0.5, after]
diff --git a/main/cc_suggester/vision/optical_flow.py b/main/cc_suggester/vision/optical_flow.py
new file mode 100644
index 0000000..da501df
--- /dev/null
+++ b/main/cc_suggester/vision/optical_flow.py
@@ -0,0 +1,14 @@
+"""Placeholder optical flow helpers."""
+
+from __future__ import annotations
+
+
+def describe_planned_signals() -> list[str]:
+    """Return visual motion signals planned for OpenCV implementation."""
+
+    return [
+        "global optical-flow magnitude",
+        "localized motion spike",
+        "pre/post-event motion delta",
+        "camera shake suppression",
+    ]
diff --git a/main/cc_suggester/vision/reactions.py b/main/cc_suggester/vision/reactions.py
new file mode 100644
index 0000000..24c1b3d
--- /dev/null
+++ b/main/cc_suggester/vision/reactions.py
@@ -0,0 +1,13 @@
+"""Reaction scoring helpers."""
+
+from __future__ import annotations
+
+from cc_suggester.core.types import ReactionResult
+
+
+def strongest_signal(reaction: ReactionResult) -> str | None:
+    """Return the strongest named reaction signal, if available."""
+
+    if not reaction.reaction_signals:
+        return None
+    return max(reaction.reaction_signals, key=lambda key: reaction.reaction_signals[key])
diff --git a/main/configs/default.json b/main/configs/default.json
new file mode 100644
index 0000000..3a364d9
--- /dev/null
+++ b/main/configs/default.json
@@ -0,0 +1,17 @@
+{
+  "language": "en",
+  "device": "auto",
+  "audio_backend": "mock",
+  "vision_backend": "mock",
+  "yamnet_model": null,
+  "yamnet_class_map_path": null,
+  "yamnet_top_k": 5,
+  "audio_threshold": 0.45,
+  "reaction_threshold": 0.35,
+  "decision_threshold": 0.65,
+  "review_threshold": 0.5,
+  "min_event_duration": 0.25,
+  "merge_gap": 0.4,
+  "sample_window_before": 1.0,
+  "sample_window_after": 1.0
+}
diff --git a/main/pyproject.toml b/main/pyproject.toml
new file mode 100644
index 0000000..b8dcaa0
--- /dev/null
+++ b/main/pyproject.toml
@@ -0,0 +1,27 @@
+[build-system]
+requires = ["setuptools>=68"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "cc-suggester"
+version = "0.1.0"
+description = "AI-assisted non-speech closed caption suggestion pipeline."
+readme = "README.md"
+requires-python = ">=3.10"
+authors = [
+  { name = "Planet Read project contributor" }
+]
+dependencies = []
+
+[project.optional-dependencies]
+audio = ["numpy>=1.26", "tensorflow>=2.16", "tensorflow-hub>=0.16"]
+ui = ["streamlit>=1.34"]
+vision = ["opencv-python>=4.8", "mediapipe>=0.10"]
+dev = ["pytest>=8.0"]
+
+[project.scripts]
+ccs = "cc_suggester.cli.app:main"
+
+[tool.setuptools.packages.find]
+where = ["."]
+include = ["cc_suggester*"]
diff --git a/main/requirements-audio.txt b/main/requirements-audio.txt
new file mode 100644
index 0000000..ab78471
--- /dev/null
+++ b/main/requirements-audio.txt
@@ -0,0 +1,6 @@
+# CPU DSP backend uses only the Python standard library plus ffmpeg.
+#
+# Optional YAMNet semantic backend:
+numpy>=1.26
+tensorflow>=2.16
+tensorflow-hub>=0.16
diff --git a/main/requirements-dev.txt b/main/requirements-dev.txt
new file mode 100644
index 0000000..039d26e
--- /dev/null
+++ b/main/requirements-dev.txt
@@ -0,0 +1 @@
+pytest>=8.0
diff --git a/main/requirements-translate.txt b/main/requirements-translate.txt
new file mode 100644
index 0000000..5dea570
--- /dev/null
+++ b/main/requirements-translate.txt
@@ -0,0 +1 @@
+# Placeholder for IndicTrans2 or other translation backend dependencies.
diff --git a/main/requirements-ui.txt b/main/requirements-ui.txt
new file mode 100644
index 0000000..15743b7
--- /dev/null
+++ b/main/requirements-ui.txt
@@ -0,0 +1 @@
+streamlit>=1.34
diff --git a/main/requirements-vision.txt b/main/requirements-vision.txt
new file mode 100644
index 0000000..2e625e8
--- /dev/null
+++ b/main/requirements-vision.txt
@@ -0,0 +1,2 @@
+opencv-python>=4.8
+mediapipe>=0.10
diff --git a/main/requirements.txt b/main/requirements.txt
new file mode 100644
index 0000000..cc67f4f
--- /dev/null
+++ b/main/requirements.txt
@@ -0,0 +1,2 @@
+# Core scaffold intentionally uses only the Python standard library.
+# Real model backends will add optional dependencies as they are implemented.
diff --git a/main/scripts/generate_sample_video.py b/main/scripts/generate_sample_video.py
new file mode 100644
index 0000000..1fec783
--- /dev/null
+++ b/main/scripts/generate_sample_video.py
@@ -0,0 +1,130 @@
+"""Generate tiny deterministic sample media for integration testing.
+
+The script uses Python's standard library to create a synthetic WAV file and
+ffmpeg to combine it with a generated test video pattern when ffmpeg is
+available. If ffmpeg is missing but OpenCV is installed, it writes a video-only
+MP4 plus a sidecar WAV file. The sidecar path can be passed to the CLI with
+``--audio-path``.
+
+Usage:
+    python scripts/generate_sample_video.py
+    python scripts/generate_sample_video.py --out tests/fixtures/sample.mp4
+"""
+
+from __future__ import annotations
+
+import argparse
+import math
+import shutil
+import subprocess
+import wave
+from pathlib import Path
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Generate a sample MP4 with audible non-speech events.")
+    parser.add_argument("--out", type=Path, default=Path("tests/fixtures/sample_classroom.mp4"))
+    parser.add_argument("--duration", type=float, default=6.0)
+    args = parser.parse_args()
+
+    args.out.parent.mkdir(parents=True, exist_ok=True)
+    wav_path = args.out.with_suffix(".wav")
+    _write_synthetic_wav(wav_path, duration=args.duration)
+    ffmpeg = shutil.which("ffmpeg")
+    if ffmpeg is not None:
+        _write_video_with_ffmpeg(ffmpeg, wav_path, args.out, duration=args.duration)
+        print(f"Generated embedded-audio sample video: {args.out}")
+    else:
+        _write_video_with_opencv(args.out, duration=args.duration)
+        print(f"Generated video-only sample: {args.out}")
+        print("ffmpeg was not found, so use the sidecar WAV with --audio-path.")
+    print(f"Generated source audio: {wav_path}")
+    return 0
+
+
+def _write_synthetic_wav(path: Path, *, duration: float, sample_rate: int = 16000) -> None:
+    samples = []
+    total_samples = int(sample_rate * duration)
+    for index in range(total_samples):
+        seconds = index / sample_rate
+        base = math.sin(2 * math.pi * 180 * seconds) * 450
+        event_one = math.sin(2 * math.pi * 880 * seconds) * 19000 if 1.15 <= seconds <= 1.55 else 0
+        event_two = math.sin(2 * math.pi * 1240 * seconds) * 17000 if 3.25 <= seconds <= 3.70 else 0
+        value = int(max(-32000, min(32000, base + event_one + event_two)))
+        samples.append(value)
+
+    with wave.open(str(path), "wb") as wav:
+        wav.setnchannels(1)
+        wav.setsampwidth(2)
+        wav.setframerate(sample_rate)
+        wav.writeframes(b"".join(sample.to_bytes(2, "little", signed=True) for sample in samples))
+
+
+def _write_video_with_ffmpeg(ffmpeg: str, wav_path: Path, out_path: Path, *, duration: float) -> None:
+    command = [
+        ffmpeg,
+        "-y",
+        "-f",
+        "lavfi",
+        "-i",
+        f"testsrc=size=640x360:rate=25:duration={duration}",
+        "-i",
+        str(wav_path),
+        "-shortest",
+        "-c:v",
+        "mpeg4",
+        "-q:v",
+        "5",
+        "-c:a",
+        "aac",
+        "-pix_fmt",
+        "yuv420p",
+        str(out_path),
+    ]
+    completed = subprocess.run(command, capture_output=True, text=True)
+    if completed.returncode != 0:
+        raise SystemExit(f"ffmpeg failed:\n{completed.stderr}")
+
+
+def _write_video_with_opencv(out_path: Path, *, duration: float) -> None:
+    try:
+        import cv2  # type: ignore
+        import numpy as np  # type: ignore
+    except Exception as exc:
+        raise SystemExit(
+            "Neither ffmpeg nor OpenCV video writing is available. "
+            "Install ffmpeg, or install opencv-python for sidecar fixture generation."
+        ) from exc
+
+    width, height, fps = 640, 360, 25
+    writer = cv2.VideoWriter(
+        str(out_path),
+        cv2.VideoWriter_fourcc(*"mp4v"),
+        fps,
+        (width, height),
+    )
+    if not writer.isOpened():
+        raise SystemExit("OpenCV could not open a VideoWriter for the requested output path.")
+
+    total_frames = int(duration * fps)
+    for frame_index in range(total_frames):
+        t = frame_index / fps
+        frame = np.zeros((height, width, 3), dtype=np.uint8)
+        frame[:, :] = (235, 238, 230)
+        cv2.rectangle(frame, (0, 0), (width, 84), (92, 130, 178), -1)
+        cv2.putText(frame, "Demo classroom scene", (28, 52), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 2)
+        cv2.rectangle(frame, (50, 120), (590, 300), (245, 245, 245), -1)
+        cv2.rectangle(frame, (72, 145), (210, 278), (88, 120, 150), -1)
+        cv2.rectangle(frame, (252, 145), (390, 278), (90, 150, 120), -1)
+        cv2.rectangle(frame, (432, 145), (570, 278), (150, 120, 90), -1)
+        if 1.15 <= t <= 1.55 or 3.25 <= t <= 3.70:
+            cv2.circle(frame, (320, 212), 44, (0, 215, 255), -1)
+            cv2.putText(frame, "SOUND EVENT", (230, 218), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (20, 45, 60), 2)
+        else:
+            cv2.circle(frame, (320, 212), 28, (190, 205, 220), -1)
+        writer.write(frame)
+    writer.release()
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/main/tests/test_config_cli.py b/main/tests/test_config_cli.py
new file mode 100644
index 0000000..828bdac
--- /dev/null
+++ b/main/tests/test_config_cli.py
@@ -0,0 +1,36 @@
+import json
+
+from cc_suggester.cli.app import main
+from cc_suggester.core.config import PipelineConfig, load_config, merge_config
+
+
+def test_load_and_merge_config(tmp_path):
+    config_path = tmp_path / "config.json"
+    config_path.write_text(
+        json.dumps({"language": "hi", "audio_backend": "mock", "vision_backend": "mock"}),
+        encoding="utf-8",
+    )
+
+    loaded = load_config(config_path)
+    merged = merge_config(loaded, language="ml", device="cpu")
+
+    assert loaded.language == "hi"
+    assert merged.language == "ml"
+    assert merged.device == "cpu"
+
+
+def test_cli_labels_command(capsys):
+    exit_code = main(["labels"])
+    captured = capsys.readouterr()
+
+    assert exit_code == 0
+    assert "Supported languages" in captured.out
+    assert "horn_honk" in captured.out
+
+
+def test_cli_unknown_command_suggests_analyze(capsys):
+    exit_code = main(["analize"])
+    captured = capsys.readouterr()
+
+    assert exit_code == 2
+    assert "Did you mean: analyze?" in captured.err
diff --git a/main/tests/test_dsp_backend.py b/main/tests/test_dsp_backend.py
new file mode 100644
index 0000000..74dd95d
--- /dev/null
+++ b/main/tests/test_dsp_backend.py
@@ -0,0 +1,41 @@
+import math
+import wave
+from pathlib import Path
+
+from cc_suggester.audio.backends.dsp import DspAudioBackend
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.types import VideoMetadata
+
+
+def test_dsp_backend_detects_synthetic_loud_region(tmp_path: Path):
+    wav_path = tmp_path / "synthetic.wav"
+    _write_synthetic_wav(wav_path)
+    backend = DspAudioBackend()
+    config = PipelineConfig(audio_backend="dsp", vision_backend="mock", audio_threshold=0.40, run_dir=tmp_path)
+    metadata = VideoMetadata(path=wav_path, exists=True, has_audio=True, has_video=False, duration=3.0)
+
+    events = backend.detect(wav_path, metadata, config)
+
+    assert events
+    assert events[0].audio_backend == "dsp"
+    assert events[0].audio_confidence >= 0.40
+    assert events[0].start_time < 1.5
+    assert events[0].end_time > 1.0
+
+
+def _write_synthetic_wav(path: Path) -> None:
+    sample_rate = 16000
+    samples = []
+    for index in range(sample_rate * 3):
+        seconds = index / sample_rate
+        if 1.0 <= seconds <= 1.45:
+            value = int(math.sin(2 * math.pi * 880 * seconds) * 18000)
+        else:
+            value = int(math.sin(2 * math.pi * 220 * seconds) * 500)
+        samples.append(value)
+
+    with wave.open(str(path), "wb") as wav:
+        wav.setnchannels(1)
+        wav.setsampwidth(2)
+        wav.setframerate(sample_rate)
+        wav.writeframes(b"".join(sample.to_bytes(2, "little", signed=True) for sample in samples))
diff --git a/main/tests/test_outputs.py b/main/tests/test_outputs.py
new file mode 100644
index 0000000..6449649
--- /dev/null
+++ b/main/tests/test_outputs.py
@@ -0,0 +1,73 @@
+from cc_suggester.core.types import CaptionSuggestion
+from cc_suggester.decision.labels import caption_for
+from cc_suggester.output.csv_report import render_csv_report
+from cc_suggester.output.srt import format_srt_time, write_srt
+
+
+def test_format_srt_time():
+    assert format_srt_time(0) == "00:00:00,000"
+    assert format_srt_time(62.345) == "00:01:02,345"
+
+
+def test_caption_for_known_language():
+    assert caption_for("horn_honk", "hi") == "[हॉर्न बजता है]"
+    assert caption_for("impact_sound", "ml") == "[പെട്ടെന്നുള്ള ശബ്ദം]"
+    assert caption_for("siren", "ta") == "[சைரன் ஒலிக்கிறது]"
+
+
+def test_write_srt_only_accepts_accepted(tmp_path):
+    suggestions = [
+        CaptionSuggestion(
+            event_id="horn_honk",
+            start_time=1.0,
+            end_time=2.0,
+            audio_confidence=0.9,
+            reaction_confidence=0.8,
+            decision_score=0.8,
+            accepted=True,
+            reason="accepted",
+            caption_text="[horn honks]",
+            language="en",
+        ),
+        CaptionSuggestion(
+            event_id="background_chatter",
+            start_time=3.0,
+            end_time=4.0,
+            audio_confidence=0.5,
+            reaction_confidence=0.1,
+            decision_score=0.2,
+            accepted=False,
+            reason="rejected",
+            caption_text="[background chatter]",
+            language="en",
+        ),
+    ]
+    output = tmp_path / "captions.srt"
+    write_srt(suggestions, output)
+    text = output.read_text(encoding="utf-8")
+    assert "[horn honks]" in text
+    assert "[background chatter]" not in text
+
+
+def test_render_csv_report_includes_review_flags():
+    suggestions = [
+        CaptionSuggestion(
+            event_id="school_bell",
+            start_time=10.0,
+            end_time=11.0,
+            audio_confidence=0.7,
+            reaction_confidence=0.4,
+            decision_score=0.6,
+            accepted=False,
+            requires_review=True,
+            reason="borderline",
+            caption_text="[school bell rings]",
+            language="en",
+        )
+    ]
+
+    text = render_csv_report(suggestions)
+
+    assert "school_bell" in text
+    assert "requires_review" in text
+    assert "True" in text
diff --git a/main/tests/test_real_video_integration.py b/main/tests/test_real_video_integration.py
new file mode 100644
index 0000000..2a0ab90
--- /dev/null
+++ b/main/tests/test_real_video_integration.py
@@ -0,0 +1,43 @@
+import subprocess
+import sys
+from pathlib import Path
+
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.media import inspect_video
+from cc_suggester.core.pipeline import analyze_video
+
+
+def test_real_sample_video_inspect_and_analyze(tmp_path: Path):
+    sample_path = tmp_path / "sample_classroom.mp4"
+    sidecar_path = sample_path.with_suffix(".wav")
+    generator = Path(__file__).resolve().parents[1] / "scripts" / "generate_sample_video.py"
+
+    subprocess.run(
+        [sys.executable, str(generator), "--out", str(sample_path)],
+        check=True,
+        capture_output=True,
+        text=True,
+    )
+
+    metadata = inspect_video(sample_path)
+    assert metadata.exists
+    assert metadata.has_video is True
+    assert metadata.duration is not None
+
+    result = analyze_video(
+        sample_path,
+        PipelineConfig(
+            language="en",
+            audio_backend="dsp",
+            vision_backend="opencv",
+            output_dir=tmp_path / "outputs",
+            sidecar_audio_path=sidecar_path,
+            audio_threshold=0.40,
+        ),
+    )
+
+    assert result.files["srt"].exists()
+    assert result.files["json"].exists()
+    assert result.artifacts["audio_wav"].exists()
+    assert result.audio_events
+    assert any(suggestion.accepted for suggestion in result.suggestions)
diff --git a/main/tests/test_review_export.py b/main/tests/test_review_export.py
new file mode 100644
index 0000000..3bea189
--- /dev/null
+++ b/main/tests/test_review_export.py
@@ -0,0 +1,72 @@
+import json
+
+import pytest
+
+from cc_suggester.output.review_export import build_review_export, suggestions_from_review_rows
+
+
+def test_review_export_uses_edited_statuses_and_caption_text():
+    rows = [
+        {
+            "index": 1,
+            "event_id": "horn_honk",
+            "start": 1.2,
+            "end": 2.4,
+            "caption": "[edited horn]",
+            "status": "accepted",
+            "audio": 0.9,
+            "reaction": 0.8,
+            "decision": 0.85,
+            "reason": "Pipeline accepted this event.",
+        },
+        {
+            "index": 2,
+            "event_id": "traffic_noise",
+            "start": 5.0,
+            "end": 7.0,
+            "caption": "[traffic]",
+            "status": "rejected",
+            "audio": 0.5,
+            "reaction": 0.1,
+            "decision": 0.2,
+            "reason": "Ambient background noise.",
+        },
+    ]
+
+    export = build_review_export(rows, "en")
+
+    assert len(export.suggestions) == 2
+    assert export.suggestions[0].accepted is True
+    assert export.suggestions[0].caption_text == "[edited horn]"
+    assert export.suggestions[1].accepted is False
+    assert export.suggestions[1].requires_review is False
+    assert "[edited horn]" in export.srt_text
+    assert "[traffic]" not in export.srt_text
+    assert "traffic_noise" in export.csv_text
+    assert json.loads(export.json_text)["summary"]["accepted"] == 1
+
+
+def test_review_export_preserves_review_state():
+    rows = [
+        {
+            "event_id": "school_bell",
+            "start_time": 10,
+            "end_time": 11,
+            "caption_text": "[school bell]",
+            "status": "review",
+        }
+    ]
+
+    suggestions = suggestions_from_review_rows(rows, "hi")
+
+    assert suggestions[0].accepted is False
+    assert suggestions[0].requires_review is True
+    assert suggestions[0].language == "hi"
+    assert suggestions[0].debug_info["editor_status"] == "review"
+
+
+def test_review_export_rejects_unknown_status():
+    rows = [{"event_id": "horn_honk", "status": "maybe"}]
+
+    with pytest.raises(ValueError, match="Unknown review status"):
+        suggestions_from_review_rows(rows, "en")
diff --git a/main/tests/test_vision_pipeline.py b/main/tests/test_vision_pipeline.py
new file mode 100644
index 0000000..7ec8472
--- /dev/null
+++ b/main/tests/test_vision_pipeline.py
@@ -0,0 +1,41 @@
+import subprocess
+import sys
+from pathlib import Path
+
+from cc_suggester.core.config import PipelineConfig
+from cc_suggester.core.pipeline import detect_audio_events, score_visual_reactions
+
+
+def test_score_visual_reactions_from_audio_report(tmp_path: Path):
+    sample_path = tmp_path / "sample_classroom.mp4"
+    sidecar_path = sample_path.with_suffix(".wav")
+    generator = Path(__file__).resolve().parents[1] / "scripts" / "generate_sample_video.py"
+    subprocess.run(
+        [sys.executable, str(generator), "--out", str(sample_path)],
+        check=True,
+        capture_output=True,
+        text=True,
+    )
+
+    audio_payload = detect_audio_events(
+        sample_path,
+        PipelineConfig(
+            audio_backend="dsp",
+            sidecar_audio_path=sidecar_path,
+            output_dir=tmp_path / "outputs",
+            audio_threshold=0.40,
+        ),
+    )
+
+    vision_payload = score_visual_reactions(
+        sample_path,
+        Path(audio_payload["files"]["audio_json"]),
+        PipelineConfig(
+            vision_backend="opencv",
+            output_dir=tmp_path / "outputs",
+        ),
+    )
+
+    assert vision_payload["reactions"]
+    assert Path(vision_payload["files"]["vision_json"]).exists()
+    assert vision_payload["reactions"][0]["vision_backend"] == "opencv"
diff --git a/main/tests/test_yamnet_backend.py b/main/tests/test_yamnet_backend.py
new file mode 100644
index 0000000..e249b64
--- /dev/null
+++ b/main/tests/test_yamnet_backend.py
@@ -0,0 +1,38 @@
+from pathlib import Path
+
+from cc_suggester.audio.backends.yamnet import _events_from_scores
+from cc_suggester.audio.label_mapping import normalize_sound_label
+from cc_suggester.core.config import PipelineConfig
+
+
+def test_normalize_sound_label_common_yamnet_classes():
+    assert normalize_sound_label("Vehicle horn, car horn, honking") == "horn_honk"
+    assert normalize_sound_label("Glass") == "glass_break"
+    assert normalize_sound_label("Applause") == "applause"
+    assert normalize_sound_label("Siren") == "siren"
+    assert normalize_sound_label("Speech") is None
+
+
+def test_events_from_yamnet_scores_maps_classes_to_events(tmp_path: Path):
+    scores = [
+        [0.91, 0.10, 0.05],
+        [0.05, 0.82, 0.02],
+        [0.02, 0.06, 0.78],
+    ]
+    class_names = [
+        "Vehicle horn, car horn, honking",
+        "Glass",
+        "Applause",
+    ]
+
+    events = _events_from_scores(
+        scores_array=scores,
+        class_names=class_names,
+        audio_path=tmp_path / "audio.wav",
+        config=PipelineConfig(audio_backend="yamnet", audio_threshold=0.40, yamnet_top_k=2),
+    )
+
+    assert [event.event_id for event in events] == ["horn_honk", "glass_break", "applause"]
+    assert events[0].audio_backend == "yamnet"
+    assert events[1].start_time == 0.48
+    assert events[2].raw_class_name == "Applause"
diff --git a/mockups/hindi.png b/mockups/hindi.png
new file mode 100644
index 0000000..85f310d
Binary files /dev/null and b/mockups/hindi.png differ
diff --git a/mockups/mallu.png b/mockups/mallu.png
new file mode 100644
index 0000000..cdf1c9a
Binary files /dev/null and b/mockups/mallu.png differ
diff --git a/mockups/telugu.png b/mockups/telugu.png
new file mode 100644
index 0000000..fb30d99
Binary files /dev/null and b/mockups/telugu.png differ
diff --git a/mockups/web-ui.html b/mockups/web-ui.html
new file mode 100644
index 0000000..527ba75
--- /dev/null
+++ b/mockups/web-ui.html
@@ -0,0 +1,1435 @@
+<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <title>Intelligent CC Suggestion Tool - Interactive Web UI Mockup</title>
+  <style>
+    /* ── Warm Dark (default) ── */
+    :root {
+      color-scheme: dark;
+      --ink:          #f0e4cc;
+      --ink-strong:   #faf0dc;
+      --muted:        #9a826a;
+      --line:         rgba(255, 190, 110, 0.09);
+      --line-strong:  rgba(255, 190, 110, 0.18);
+      --surface:      rgba(42, 28, 16, 0.72);
+      --surface-hdr:  rgba(36, 24, 13, 0.60);
+      --glass:        rgba(42, 28, 16, 0.55);
+      --gold:         #f59e0b;
+      --gold-dim:     rgba(245, 158, 11, 0.22);
+      --teal:         #34d399;
+      --teal-dim:     rgba(52, 211, 153, 0.18);
+      --rose:         #fb7185;
+      --rose-dim:     rgba(251, 113, 133, 0.20);
+      --amber:        #fb923c;
+      --amber-dim:    rgba(251, 146, 60, 0.20);
+      --slate:        #c4a070;
+      /* legacy alias keys kept for JS-generated markup */
+      --blue:         var(--gold);
+      --blue-soft:    var(--gold-dim);
+      --green:        var(--teal);
+      --green-soft:   var(--teal-dim);
+      --red:          var(--rose);
+      --red-soft:     var(--rose-dim);
+      --shadow:       0 22px 64px rgba(0, 0, 0, 0.55);
+      --shadow-soft:  0 8px 32px rgba(0, 0, 0, 0.38);
+      --radius:       10px;
+      --body-bg:
+        radial-gradient(ellipse at 18% 0%, rgba(245,158,11,.14), transparent 36%),
+        radial-gradient(ellipse at 84% 8%, rgba(180,83,9,.10),   transparent 32%),
+        linear-gradient(155deg, #1e1308 0%, #130d07 52%, #1b1108 100%);
+    }
+
+    /* ── Warm Light (toggled) ── */
+    [data-theme="light"] {
+      color-scheme: light;
+      --ink:          #2d1a09;
+      --ink-strong:   #180e04;
+      --muted:        #7c5a3a;
+      --line:         rgba(130, 72, 20, 0.10);
+      --line-strong:  rgba(130, 72, 20, 0.20);
+      --surface:      rgba(255, 247, 232, 0.82);
+      --surface-hdr:  rgba(255, 248, 236, 0.60);
+      --glass:        rgba(255, 247, 232, 0.66);
+      --gold:         #b45309;
+      --gold-dim:     rgba(180, 83, 9, 0.12);
+      --teal:         #047857;
+      --teal-dim:     rgba(4, 120, 87, 0.12);
+      --rose:         #be123c;
+      --rose-dim:     rgba(190, 18, 60, 0.12);
+      --amber:        #c2410c;
+      --amber-dim:    rgba(194, 65, 12, 0.12);
+      --slate:        #8b5e32;
+      --shadow:       0 22px 64px rgba(80, 36, 8, 0.20);
+      --shadow-soft:  0 8px 32px rgba(80, 36, 8, 0.12);
+      --body-bg:
+        radial-gradient(ellipse at 18% 0%, rgba(245,158,11,.14), transparent 36%),
+        radial-gradient(ellipse at 84% 8%, rgba(180,83,9,.10),   transparent 32%),
+        linear-gradient(155deg, #fdf5e4 0%, #faf0d8 52%, #f7e8cc 100%);
+    }
+
+    * { box-sizing: border-box; }
+
+    html, body { min-height: 100%; }
+
+    body {
+      margin: 0;
+      color: var(--ink);
+      font-family: "Inter", "Noto Sans", ui-sans-serif, system-ui, -apple-system,
+                   BlinkMacSystemFont, "Segoe UI", sans-serif;
+      background: var(--body-bg);
+      transition: background 220ms ease, color 220ms ease;
+    }
+
+    button, select, input { font: inherit; }
+    button, select { cursor: pointer; }
+
+    .app-shell {
+      min-height: 100vh;
+      display: grid;
+      grid-template-rows: auto 1fr auto;
+      gap: 14px;
+      padding: 14px;
+    }
+
+    /* ── Glass utility ── */
+    .glass {
+      background: var(--surface);
+      border: 1px solid var(--line-strong);
+      box-shadow: var(--shadow-soft), inset 0 1px 0 rgba(255, 210, 140, 0.07);
+      backdrop-filter: blur(22px) saturate(1.3);
+      -webkit-backdrop-filter: blur(22px) saturate(1.3);
+      transition: background 220ms ease, border-color 220ms ease;
+    }
+
+    /* ── Topbar ── */
+    .topbar {
+      display: grid;
+      grid-template-columns: minmax(280px, 1fr) auto;
+      align-items: center;
+      gap: 18px;
+      padding: 12px 16px;
+      border-radius: var(--radius);
+    }
+
+    .brand {
+      display: flex;
+      align-items: center;
+      gap: 12px;
+      min-width: 0;
+    }
+
+    .brand-mark {
+      width: 38px;
+      height: 38px;
+      border-radius: 9px;
+      display: grid;
+      place-items: center;
+      background: linear-gradient(135deg, #d97706, #0f766e);
+      color: white;
+      font-weight: 900;
+      letter-spacing: -0.5px;
+      box-shadow: inset 0 1px 0 rgba(255,255,255,.28), 0 8px 22px rgba(217,119,6,.30);
+    }
+
+    h1, h2, h3, p { margin: 0; }
+
+    .brand h1 {
+      font-size: 17px;
+      font-weight: 700;
+      letter-spacing: -0.2px;
+      line-height: 1.2;
+    }
+
+    .brand p {
+      margin-top: 3px;
+      color: var(--muted);
+      font-size: 11.5px;
+    }
+
+    .top-controls {
+      display: flex;
+      align-items: flex-end;
+      gap: 10px;
+      flex-wrap: wrap;
+      justify-content: flex-end;
+    }
+
+    .field {
+      display: grid;
+      gap: 5px;
+    }
+
+    .field label {
+      color: var(--slate);
+      font-size: 9.5px;
+      font-weight: 800;
+      text-transform: uppercase;
+      letter-spacing: 0.6px;
+    }
+
+    select,
+    input[type="text"],
+    input[type="number"] {
+      height: 36px;
+      min-width: 112px;
+      padding: 0 10px;
+      border: 1px solid var(--line-strong);
+      border-radius: 7px;
+      background: rgba(255, 200, 120, 0.07);
+      color: var(--ink);
+      outline: none;
+      transition: border-color 140ms ease, box-shadow 140ms ease;
+    }
+
+    [data-theme="light"] select,
+    [data-theme="light"] input[type="text"],
+    [data-theme="light"] input[type="number"] {
+      background: rgba(255, 255, 255, 0.88);
+    }
+
+    select:focus,
+    input:focus,
+    button:focus-visible {
+      box-shadow: 0 0 0 3px rgba(245, 158, 11, 0.25);
+      border-color: rgba(245, 158, 11, 0.55);
+      outline: none;
+    }
+
+    /* ── Buttons ── */
+    .btn {
+      min-height: 36px;
+      border: 1px solid var(--line-strong);
+      border-radius: 7px;
+      padding: 0 13px;
+      background: rgba(255, 200, 120, 0.08);
+      color: var(--ink);
+      font-weight: 700;
+      font-size: 13px;
+      display: inline-flex;
+      align-items: center;
+      justify-content: center;
+      gap: 8px;
+      transition: transform 130ms ease, box-shadow 130ms ease, background 130ms ease;
+      white-space: nowrap;
+    }
+
+    [data-theme="light"] .btn {
+      background: rgba(255, 255, 255, 0.82);
+    }
+
+    .btn:hover {
+      transform: translateY(-1px);
+      box-shadow: 0 6px 18px rgba(0, 0, 0, 0.22);
+    }
+
+    .btn.primary {
+      color: #fff8ee;
+      background: linear-gradient(135deg, #d97706, #b45309);
+      border-color: rgba(217, 119, 6, 0.50);
+      box-shadow: 0 4px 14px rgba(217,119,6,.25);
+    }
+
+    .btn.primary:hover {
+      box-shadow: 0 8px 22px rgba(217,119,6,.35);
+    }
+
+    .btn.success {
+      color: #ecfdf5;
+      background: linear-gradient(135deg, #059669, #047857);
+      border-color: rgba(5, 150, 105, 0.50);
+      box-shadow: 0 4px 14px rgba(5,150,105,.22);
+    }
+
+    .btn.danger {
+      color: var(--rose);
+      background: var(--rose-dim);
+      border-color: rgba(251, 113, 133, 0.28);
+    }
+
+    /* Theme toggle */
+    .btn.theme-toggle {
+      width: 36px;
+      padding: 0;
+      font-size: 16px;
+    }
+
+    /* ── Workspace ── */
+    .workspace {
+      display: grid;
+      grid-template-columns: 300px minmax(520px, 1fr) 380px;
+      gap: 14px;
+      min-height: 0;
+    }
+
+    .panel {
+      border-radius: var(--radius);
+      overflow: hidden;
+      min-height: 0;
+    }
+
+    .panel-header {
+      min-height: 52px;
+      padding: 13px 16px;
+      display: flex;
+      align-items: center;
+      justify-content: space-between;
+      gap: 12px;
+      border-bottom: 1px solid var(--line);
+      background: var(--surface-hdr);
+      backdrop-filter: blur(8px);
+    }
+
+    .panel-header h2 {
+      font-size: 13.5px;
+      font-weight: 700;
+      letter-spacing: -0.1px;
+    }
+
+    .panel-body {
+      padding: 16px;
+    }
+
+    .stack {
+      display: grid;
+      gap: 14px;
+    }
+
+    .muted { color: var(--muted); }
+    .tiny  { font-size: 12px; }
+
+    /* ── Tags / badges ── */
+    .tag {
+      display: inline-flex;
+      align-items: center;
+      min-height: 22px;
+      border-radius: 999px;
+      padding: 2px 9px;
+      font-size: 10.5px;
+      font-weight: 800;
+      letter-spacing: 0.2px;
+      white-space: nowrap;
+    }
+
+    .tag.ready,
+    .tag.accepted {
+      color: var(--teal);
+      background: var(--teal-dim);
+    }
+
+    .tag.review {
+      color: var(--amber);
+      background: var(--amber-dim);
+    }
+
+    .tag.rejected {
+      color: var(--muted);
+      background: rgba(130, 100, 70, 0.14);
+    }
+
+    .tag.live {
+      color: var(--gold);
+      background: var(--gold-dim);
+    }
+
+    /* ── Left panel ── */
+    .video-select { width: 100%; }
+
+    .progress-strip {
+      height: 6px;
+      overflow: hidden;
+      border-radius: 999px;
+      background: rgba(245, 158, 11, 0.12);
+    }
+
+    .progress-strip span {
+      display: block;
+      width: 66%;
+      height: 100%;
+      border-radius: inherit;
+      background: linear-gradient(90deg, #d97706, #059669);
+      transition: width 400ms ease;
+    }
+
+    .meta-grid {
+      display: grid;
+      gap: 0;
+      font-size: 12.5px;
+    }
+
+    .meta-row {
+      display: flex;
+      align-items: center;
+      justify-content: space-between;
+      gap: 12px;
+      padding: 7px 0;
+      border-bottom: 1px solid var(--line);
+    }
+
+    .meta-row strong { text-align: right; }
+
+    .alert {
+      border: 1px solid rgba(251, 113, 133, 0.24);
+      border-left: 4px solid var(--rose);
+      border-radius: 8px;
+      padding: 12px;
+      background: var(--rose-dim);
+      color: var(--rose);
+      display: none;
+      gap: 8px;
+      font-size: 12.5px;
+    }
+
+    .alert.visible { display: grid; }
+
+    /* ── Video panel ── */
+    .video-panel {
+      display: grid;
+      grid-template-rows: auto 1fr auto;
+      min-height: 0;
+    }
+
+    .video-stage {
+      margin: 14px;
+      min-height: 490px;
+      border-radius: 10px;
+      overflow: hidden;
+      position: relative;
+      background-image:
+        linear-gradient(180deg, rgba(10, 5, 0, 0.10), rgba(10, 5, 0, 0.65)),
+        url("https://upload.wikimedia.org/wikipedia/commons/6/69/Classroom_in_India.jpg");
+      background-size: cover;
+      background-position: center;
+      box-shadow: 0 0 0 1px rgba(255, 190, 110, 0.10), 0 12px 40px rgba(0,0,0,.4);
+    }
+
+    .stage-top {
+      position: absolute;
+      left: 14px;
+      right: 14px;
+      top: 14px;
+      display: flex;
+      justify-content: space-between;
+      gap: 10px;
+      align-items: flex-start;
+    }
+
+    .scene-chip,
+    .live-chip {
+      padding: 7px 11px;
+      border-radius: 8px;
+      background: rgba(20, 12, 6, 0.72);
+      color: var(--ink-strong, #f0e4cc);
+      border: 1px solid rgba(255, 190, 110, 0.18);
+      box-shadow: 0 8px 24px rgba(0,0,0,.28);
+      font-size: 11.5px;
+      font-weight: 700;
+      backdrop-filter: blur(14px);
+    }
+
+    .live-chip {
+      display: flex;
+      align-items: center;
+      gap: 7px;
+    }
+
+    .pulse {
+      width: 8px;
+      height: 8px;
+      border-radius: 50%;
+      background: var(--teal);
+      animation: pulse 1.8s infinite;
+    }
+
+    @keyframes pulse {
+      0%   { box-shadow: 0 0 0 0   rgba(52, 211, 153, 0.50); }
+      70%  { box-shadow: 0 0 0 9px rgba(52, 211, 153, 0);    }
+      100% { box-shadow: 0 0 0 0   rgba(52, 211, 153, 0);    }
+    }
+
+    .caption-overlay {
+      position: absolute;
+      left: 50%;
+      bottom: 54px;
+      transform: translateX(-50%);
+      max-width: min(620px, 84%);
+      padding: 9px 18px;
+      border-radius: 7px;
+      color: #fff8ee;
+      background: rgba(10, 5, 0, 0.80);
+      font-size: 19px;
+      font-weight: 700;
+      text-align: center;
+      border: 1px solid rgba(255, 190, 110, 0.15);
+      box-shadow: 0 8px 28px rgba(0,0,0,.35);
+    }
+
+    .analysis-card {
+      position: absolute;
+      right: 16px;
+      bottom: 20px;
+      width: min(320px, 46%);
+      padding: 12px;
+      border-radius: 10px;
+      color: #f0e4cc;
+      background: rgba(18, 10, 4, 0.65);
+      backdrop-filter: blur(18px);
+      border: 1px solid rgba(255, 190, 110, 0.18);
+      box-shadow: 0 8px 24px rgba(0,0,0,.30);
+    }
+
+    .analysis-card h3 {
+      margin-bottom: 8px;
+      font-size: 12px;
+      font-weight: 700;
+      color: var(--gold);
+      letter-spacing: 0.3px;
+    }
+
+    .signal-grid {
+      display: grid;
+      grid-template-columns: repeat(3, 1fr);
+      gap: 7px;
+    }
+
+    .signal {
+      padding: 8px;
+      border-radius: 7px;
+      background: rgba(255, 190, 110, 0.10);
+      font-size: 10.5px;
+      border: 1px solid rgba(255, 190, 110, 0.10);
+    }
+
+    .signal strong {
+      display: block;
+      margin-top: 3px;
+      font-size: 15px;
+      color: #f0e4cc;
+    }
+
+    /* ── Transport ── */
+    .transport {
+      display: grid;
+      grid-template-columns: auto auto 1fr auto auto;
+      gap: 9px;
+      align-items: center;
+      padding: 0 14px 14px;
+    }
+
+    .timecode {
+      font-variant-numeric: tabular-nums;
+      color: var(--slate);
+      font-size: 12.5px;
+      font-weight: 700;
+    }
+
+    .timeline {
+      position: relative;
+      height: 32px;
+      border-radius: 8px;
+      background: rgba(255, 190, 110, 0.10);
+      overflow: hidden;
+      box-shadow: inset 0 1px 3px rgba(0,0,0,.18);
+      cursor: pointer;
+      border: 1px solid var(--line);
+    }
+
+    .timeline-fill {
+      position: absolute;
+      inset: 0 auto 0 0;
+      width: 36%;
+      background: linear-gradient(90deg, rgba(217,119,6,.65), rgba(5,150,105,.45));
+    }
+
+    .scrubber {
+      position: absolute;
+      top: 4px;
+      left: calc(36% - 7px);
+      width: 14px;
+      height: 24px;
+      border-radius: 7px;
+      background: var(--ink-strong, #faf0dc);
+      border: 2.5px solid var(--gold);
+      box-shadow: 0 4px 12px rgba(0,0,0,.30);
+      cursor: grab;
+      z-index: 5;
+    }
+
+    .event-marker {
+      position: absolute;
+      top: 6px;
+      width: 9px;
+      height: 20px;
+      border-radius: 999px;
+      border: 2px solid rgba(255, 255, 255, 0.25);
+      box-shadow: 0 3px 10px rgba(0,0,0,.28);
+      z-index: 4;
+    }
+
+    .event-marker.accepted { background: var(--teal); }
+    .event-marker.review   { background: var(--amber); }
+    .event-marker.rejected { background: var(--muted); }
+
+    .event-marker.active {
+      transform: scale(1.28);
+      outline: 3px solid rgba(245, 158, 11, 0.35);
+    }
+
+    /* ── Review panel ── */
+    .review-list {
+      display: grid;
+      gap: 10px;
+      max-height: calc(100vh - 190px);
+      overflow: auto;
+      padding: 16px;
+      padding-right: 10px;
+    }
+
+    .review-list::-webkit-scrollbar { width: 4px; }
+    .review-list::-webkit-scrollbar-track { background: transparent; }
+    .review-list::-webkit-scrollbar-thumb {
+      background: var(--line-strong);
+      border-radius: 99px;
+    }
+
+    .suggestion {
+      border: 1px solid var(--line-strong);
+      border-radius: 9px;
+      padding: 12px;
+      background: rgba(255, 190, 110, 0.05);
+      display: grid;
+      gap: 10px;
+      transition: border 140ms ease, box-shadow 140ms ease, transform 140ms ease;
+      cursor: pointer;
+    }
+
+    [data-theme="light"] .suggestion {
+      background: rgba(255, 255, 255, 0.55);
+    }
+
+    .suggestion:hover {
+      transform: translateY(-1px);
+      box-shadow: 0 8px 22px rgba(0,0,0,.22);
+    }
+
+    .suggestion.active {
+      border-color: rgba(245, 158, 11, 0.65);
+      box-shadow: 0 0 0 3px rgba(245, 158, 11, 0.12), 0 8px 22px rgba(0,0,0,.22);
+    }
+
+    .suggestion-title {
+      display: flex;
+      align-items: center;
+      justify-content: space-between;
+      gap: 10px;
+    }
+
+    .suggestion-title strong {
+      font-variant-numeric: tabular-nums;
+      font-size: 13px;
+      font-weight: 700;
+    }
+
+    .caption-input {
+      width: 100%;
+      min-width: 0;
+      font-weight: 600;
+      padding-left: 10px;
+    }
+
+    .score-grid {
+      display: grid;
+      grid-template-columns: repeat(3, 1fr);
+      gap: 7px;
+    }
+
+    .score {
+      border-radius: 7px;
+      padding: 8px;
+      background: rgba(255, 190, 110, 0.10);
+      font-size: 10.5px;
+      border: 1px solid var(--line);
+    }
+
+    [data-theme="light"] .score {
+      background: rgba(255, 245, 220, 0.80);
+    }
+
+    .score strong {
+      display: block;
+      margin-top: 3px;
+      color: var(--ink);
+      font-size: 14.5px;
+    }
+
+    .actions {
+      display: flex;
+      gap: 7px;
+      flex-wrap: wrap;
+    }
+
+    /* ── Table ── */
+    .table-panel {
+      border-radius: var(--radius);
+      overflow: hidden;
+    }
+
+    table {
+      width: 100%;
+      border-collapse: collapse;
+      font-size: 12.5px;
+      background: transparent;
+    }
+
+    th, td {
+      padding: 9px 12px;
+      border-bottom: 1px solid var(--line);
+      text-align: left;
+      white-space: nowrap;
+      vertical-align: middle;
+    }
+
+    th {
+      color: var(--slate);
+      background: var(--surface-hdr);
+      font-size: 9.5px;
+      font-weight: 800;
+      text-transform: uppercase;
+      letter-spacing: 0.6px;
+    }
+
+    tr.active-row td {
+      background: rgba(245, 158, 11, 0.08);
+    }
+
+    /* ── Toast ── */
+    .toast {
+      position: fixed;
+      right: 18px;
+      bottom: 18px;
+      max-width: 360px;
+      padding: 12px 15px;
+      border-radius: 9px;
+      color: var(--ink-strong, #f0e4cc);
+      background: rgba(28, 18, 8, 0.94);
+      border: 1px solid rgba(255, 190, 110, 0.18);
+      box-shadow: var(--shadow);
+      transform: translateY(16px);
+      opacity: 0;
+      pointer-events: none;
+      transition: opacity 160ms ease, transform 160ms ease;
+      z-index: 10;
+      font-size: 13px;
+    }
+
+    .toast.visible {
+      opacity: 1;
+      transform: translateY(0);
+    }
+
+    /* ── Modal ── */
+    .modal {
+      position: fixed;
+      inset: 0;
+      display: none;
+      place-items: center;
+      background: rgba(8, 4, 0, 0.55);
+      backdrop-filter: blur(4px);
+      z-index: 20;
+      padding: 24px;
+    }
+
+    .modal.visible { display: grid; }
+
+    .modal-card {
+      width: min(520px, 100%);
+      border-radius: 12px;
+      padding: 20px;
+      background: var(--surface-strong, rgba(42,28,16,.96));
+      border: 1px solid var(--line-strong);
+      box-shadow: var(--shadow);
+      display: grid;
+      gap: 14px;
+    }
+
+    [data-theme="light"] .modal-card {
+      background: rgba(255, 250, 240, 0.97);
+    }
+
+    .modal-card pre {
+      margin: 0;
+      overflow: auto;
+      padding: 13px;
+      border-radius: 8px;
+      color: var(--ink);
+      background: rgba(255, 190, 110, 0.07);
+      border: 1px solid var(--line);
+      font-size: 12px;
+      line-height: 1.6;
+    }
+
+    [data-theme="light"] .modal-card pre {
+      background: rgba(255, 245, 220, 0.90);
+    }
+
+    /* ── Responsive ── */
+    @media (max-width: 1180px) {
+      .workspace { grid-template-columns: 1fr; }
+      .review-list { max-height: none; }
+      .video-stage { min-height: 430px; }
+      .analysis-card { width: min(360px, 80%); }
+    }
+
+    @media (max-width: 760px) {
+      .app-shell { padding: 10px; }
+      .topbar { grid-template-columns: 1fr; }
+      .top-controls { justify-content: stretch; }
+      .top-controls .field,
+      .top-controls .btn,
+      .top-controls select { width: 100%; }
+      .transport { grid-template-columns: 1fr 1fr; }
+      .timeline { grid-column: 1 / -1; }
+      .timecode { grid-column: 1 / -1; }
+      .video-stage { min-height: 360px; }
+      .analysis-card { left: 14px; right: 14px; width: auto; }
+    }
+  </style>
+</head>
+<body>
+  <main class="app-shell">
+    <header class="topbar glass">
+      <section class="brand" aria-label="Product identity">
+        <div class="brand-mark">CC</div>
+        <div>
+          <h1>Intelligent Closed Caption Suggestion Tool</h1>
+          <p>Non-speech sound detection, visual reaction scoring, and editor-reviewed SRT export</p>
+        </div>
+      </section>
+
+      <section class="top-controls" aria-label="Pipeline controls">
+        <div class="field">
+          <label for="device">Device</label>
+          <select id="device">
+            <option value="auto">Auto</option>
+            <option value="cpu">CPU</option>
+            <option value="cuda">GPU</option>
+          </select>
+        </div>
+        <div class="field">
+          <label for="language">Language</label>
+          <select id="language">
+            <option value="en">English</option>
+            <option value="hi" selected>Hindi</option>
+            <option value="ta">Tamil</option>
+            <option value="te">Telugu</option>
+            <option value="bn">Bengali</option>
+            <option value="mr">Marathi</option>
+            <option value="ml">Malayalam</option>
+          </select>
+        </div>
+        <div class="field">
+          <label for="audioBackend">Audio Backend</label>
+          <select id="audioBackend">
+            <option>YAMNet</option>
+            <option>PANNs</option>
+            <option>AST</option>
+            <option>BEATs</option>
+          </select>
+        </div>
+        <div class="field">
+          <label for="visionBackend">Vision Backend</label>
+          <select id="visionBackend">
+            <option>MediaPipe</option>
+            <option>OpenCV only</option>
+            <option>MMPose</option>
+          </select>
+        </div>
+        <button class="btn" id="doctorBtn" type="button">Run Doctor</button>
+        <button class="btn theme-toggle" id="themeBtn" type="button" title="Toggle light / dark">☀</button>
+      </section>
+    </header>
+
+    <section class="workspace">
+      <aside class="panel glass">
+        <div class="panel-header">
+          <h2>Input Video</h2>
+          <span class="tag ready" id="runStatus">Ready</span>
+        </div>
+        <div class="panel-body stack">
+          <div class="field">
+            <label for="videoSelect">Select Video</label>
+            <select id="videoSelect" class="video-select">
+              <option>bangalore_classroom_clip.mp4</option>
+              <option>hindi_learning_session.mov</option>
+              <option>regional_school_activity.mkv</option>
+            </select>
+          </div>
+
+          <button class="btn primary" id="startBtn" type="button">Start Caption</button>
+          <div class="progress-strip" aria-label="Analysis progress"><span id="analysisProgress"></span></div>
+
+          <div class="meta-grid">
+            <div class="meta-row"><span class="muted">Scene</span><strong>Classroom near Bangalore</strong></div>
+            <div class="meta-row"><span class="muted">Duration</span><strong>00:06:42</strong></div>
+            <div class="meta-row"><span class="muted">Resolution</span><strong>2047 x 1372</strong></div>
+            <div class="meta-row"><span class="muted">Audio</span><strong>16 kHz mono</strong></div>
+            <div class="meta-row"><span class="muted">Detected events</span><strong id="detectedCount">5</strong></div>
+            <div class="meta-row"><span class="muted">Accepted</span><strong id="acceptedCount">2</strong></div>
+            <div class="meta-row"><span class="muted">Source</span><strong>Demo classroom video</strong></div>
+          </div>
+
+          <div class="alert" id="gpuAlert">
+            <strong>GPU diagnostic</strong>
+            <span>CUDA was selected but not detected. Retry on CPU or run doctor for setup details.</span>
+            <div class="actions">
+              <button class="btn" id="retryCpuBtn" type="button">Retry CPU</button>
+              <button class="btn" id="copyReportBtn" type="button">Copy Report</button>
+            </div>
+          </div>
+
+          <div class="actions">
+            <button class="btn success" data-export="SRT" type="button">Export SRT</button>
+            <button class="btn" data-export="JSON" type="button">Export JSON</button>
+            <button class="btn" data-export="CSV" type="button">Export CSV</button>
+          </div>
+        </div>
+      </aside>
+
+      <section class="panel glass video-panel">
+        <div class="panel-header">
+          <h2>Video Review</h2>
+          <span class="timecode" id="timecode">00:02:18 / 00:06:42</span>
+        </div>
+
+        <div class="video-stage" aria-label="Classroom video preview mockup">
+          <div class="stage-top">
+            <div class="scene-chip" id="currentEventChip">Current event: children_cheer</div>
+            <div class="live-chip"><span class="pulse"></span><span id="playState">Paused</span></div>
+          </div>
+          <div class="caption-overlay" id="captionOverlay">[children cheering]</div>
+          <div class="analysis-card">
+            <h3>Live signal explanation</h3>
+            <div class="signal-grid">
+              <div class="signal">Audio<strong id="stageAudio">0.91</strong></div>
+              <div class="signal">Reaction<strong id="stageReaction">0.82</strong></div>
+              <div class="signal">Decision<strong id="stageDecision">0.88</strong></div>
+            </div>
+          </div>
+        </div>
+
+        <div class="transport">
+          <button class="btn" id="playBtn" type="button">Play</button>
+          <button class="btn" id="pauseBtn" type="button">Pause</button>
+          <div class="timeline" id="timeline" aria-label="Interactive draggable timeline">
+            <div class="timeline-fill" id="timelineFill"></div>
+            <div class="scrubber" id="scrubber" title="Drag to scrub"></div>
+          </div>
+          <button class="btn" id="prevBtn" type="button">Previous</button>
+          <button class="btn" id="nextBtn" type="button">Next Event</button>
+        </div>
+      </section>
+
+      <aside class="panel glass">
+        <div class="panel-header">
+          <h2>Review SRT Suggestions</h2>
+          <span class="tag review" id="reviewBadge">3 need review</span>
+        </div>
+        <div class="review-list" id="reviewList"></div>
+      </aside>
+    </section>
+
+    <section class="table-panel glass">
+      <table>
+        <thead>
+          <tr>
+            <th>Status</th>
+            <th>Start</th>
+            <th>End</th>
+            <th>Event</th>
+            <th>Audio</th>
+            <th>Reaction</th>
+            <th>Decision</th>
+            <th>Reason</th>
+          </tr>
+        </thead>
+        <tbody id="eventTable"></tbody>
+      </table>
+    </section>
+  </main>
+
+  <div class="toast" id="toast" role="status" aria-live="polite"></div>
+
+  <div class="modal" id="doctorModal" aria-hidden="true">
+    <div class="modal-card glass">
+      <h2>Environment Doctor</h2>
+      <p class="muted">Mock diagnostic report for the proposed CLI/Web UI experience.</p>
+      <pre>ffmpeg: found
+audio backend: YAMNet ready
+vision backend: MediaPipe ready
+torch.cuda.is_available(): false
+selected device: auto
+actual device: cpu
+suggestion: use --device cpu or install CUDA-compatible runtime</pre>
+      <div class="actions">
+        <button class="btn primary" id="closeDoctorBtn" type="button">Close</button>
+        <button class="btn" id="doctorCopyBtn" type="button">Copy Report</button>
+      </div>
+    </div>
+  </div>
+
+  <script>
+    const totalDuration = 402;
+    let currentTime = 138;
+    let playing = false;
+    let selectedIndex = 0;
+    let timer = null;
+
+    const labels = {
+      en: {
+        children_cheer: "[children cheering]",
+        school_bell: "[school bell rings]",
+        applause: "[students applauding]",
+        chair_scrape: "[chair scrapes]",
+        background_chatter: "[background chatter]"
+      },
+      hi: {
+        children_cheer: "[बच्चे उत्साह से चिल्लाते हैं]",
+        school_bell: "[स्कूल की घंटी बजती है]",
+        applause: "[छात्र तालियां बजाते हैं]",
+        chair_scrape: "[कुर्सी घिसटती है]",
+        background_chatter: "[पृष्ठभूमि में बातचीत]"
+      },
+      ta: {
+        children_cheer: "[குழந்தைகள் ஆரவாரம் செய்கின்றனர்]",
+        school_bell: "[பள்ளி மணி ஒலிக்கிறது]",
+        applause: "[மாணவர்கள் கைத்தட்டுகின்றனர்]",
+        chair_scrape: "[நாற்காலி இழுக்கும் சத்தம்]",
+        background_chatter: "[பின்னணி பேச்சு]"
+      },
+      te: {
+        children_cheer: "[పిల్లలు ఆనందంగా కేకలు వేస్తున్నారు]",
+        school_bell: "[పాఠశాల గంట మోగుతుంది]",
+        applause: "[విద్యార్థులు చప్పట్లు కొడుతున్నారు]",
+        chair_scrape: "[కుర్చీ లాగిన శబ్దం]",
+        background_chatter: "[నేపథ్యంలో మాటలు]"
+      },
+      bn: {
+        children_cheer: "[শিশুরা উল্লাস করছে]",
+        school_bell: "[স্কুলের ঘণ্টা বাজছে]",
+        applause: "[ছাত্ররা হাততালি দিচ্ছে]",
+        chair_scrape: "[চেয়ার ঘষার শব্দ]",
+        background_chatter: "[পেছনে কথাবার্তা]"
+      },
+      mr: {
+        children_cheer: "[मुले आनंदाने ओरडत आहेत]",
+        school_bell: "[शाळेची घंटा वाजते]",
+        applause: "[विद्यार्थी टाळ्या वाजवत आहेत]",
+        chair_scrape: "[खुर्ची ओढल्याचा आवाज]",
+        background_chatter: "[पार्श्वभूमीत गप्पा]"
+      },
+      ml: {
+        children_cheer: "[കുട്ടികൾ ആഹ്ലാദിക്കുന്നു]",
+        school_bell: "[സ്കൂൾ മണി മുഴങ്ങുന്നു]",
+        applause: "[വിദ്യാർത്ഥികൾ കൈയടിക്കുന്നു]",
+        chair_scrape: "[കസേര വലിക്കുന്ന ശബ്ദം]",
+        background_chatter: "[പശ്ചാത്തല സംഭാഷണം]"
+      }
+    };
+
+    const events = [
+      {
+        id: "children_cheer",
+        start: 138.1,
+        end: 140.2,
+        status: "accepted",
+        audio: 0.91,
+        reaction: 0.82,
+        decision: 0.88,
+        reason: "Accepted because students raise their hands and cheer in the classroom scene.",
+        caption: "[children cheering]"
+      },
+      {
+        id: "school_bell",
+        start: 174.4,
+        end: 176.0,
+        status: "accepted",
+        audio: 0.86,
+        reaction: 0.61,
+        decision: 0.78,
+        reason: "Accepted because the bell interrupts the class and students visibly shift attention.",
+        caption: "[school bell rings]"
+      },
+      {
+        id: "applause",
+        start: 218.5,
+        end: 221.8,
+        status: "review",
+        audio: 0.74,
+        reaction: 0.54,
+        decision: 0.66,
+        reason: "Needs review because clapping is detected, but the reaction score is moderate.",
+        caption: "[students applauding]"
+      },
+      {
+        id: "chair_scrape",
+        start: 287.2,
+        end: 288.3,
+        status: "review",
+        audio: 0.58,
+        reaction: 0.39,
+        decision: 0.49,
+        reason: "Borderline event; the sound may be a chair scrape but scene impact is unclear.",
+        caption: "[chair scrapes]"
+      },
+      {
+        id: "background_chatter",
+        start: 326.0,
+        end: 333.5,
+        status: "rejected",
+        audio: 0.52,
+        reaction: 0.16,
+        decision: 0.29,
+        reason: "Rejected because the chatter is ambient and does not change the scene.",
+        caption: "[background chatter]"
+      }
+    ];
+
+    const els = {
+      reviewList: document.getElementById("reviewList"),
+      eventTable: document.getElementById("eventTable"),
+      timeline: document.getElementById("timeline"),
+      timelineFill: document.getElementById("timelineFill"),
+      scrubber: document.getElementById("scrubber"),
+      timecode: document.getElementById("timecode"),
+      captionOverlay: document.getElementById("captionOverlay"),
+      currentEventChip: document.getElementById("currentEventChip"),
+      stageAudio: document.getElementById("stageAudio"),
+      stageReaction: document.getElementById("stageReaction"),
+      stageDecision: document.getElementById("stageDecision"),
+      playState: document.getElementById("playState"),
+      toast: document.getElementById("toast"),
+      gpuAlert: document.getElementById("gpuAlert"),
+      device: document.getElementById("device"),
+      language: document.getElementById("language"),
+      reviewBadge: document.getElementById("reviewBadge"),
+      acceptedCount: document.getElementById("acceptedCount"),
+      runStatus: document.getElementById("runStatus"),
+      analysisProgress: document.getElementById("analysisProgress"),
+      doctorModal: document.getElementById("doctorModal")
+    };
+
+    function pad(value) {
+      return String(value).padStart(2, "0");
+    }
+
+    function formatTime(seconds, ms = false) {
+      const safe = Math.max(0, seconds);
+      const h = Math.floor(safe / 3600);
+      const m = Math.floor((safe % 3600) / 60);
+      const s = Math.floor(safe % 60);
+      const milli = Math.floor((safe - Math.floor(safe)) * 1000);
+      if (ms) {
+        return `${pad(h)}:${pad(m)}:${pad(s)},${String(milli).padStart(3, "0")}`;
+      }
+      return `${pad(h)}:${pad(m)}:${pad(s)}`;
+    }
+
+    function showToast(message) {
+      els.toast.textContent = message;
+      els.toast.classList.add("visible");
+      window.clearTimeout(showToast.timeout);
+      showToast.timeout = window.setTimeout(() => els.toast.classList.remove("visible"), 2200);
+    }
+
+    function selectedLanguageLabels() {
+      return labels[els.language.value] || labels.en;
+    }
+
+    function captionFor(event) {
+      return selectedLanguageLabels()[event.id] || event.caption;
+    }
+
+    function statusClass(status) {
+      if (status === "accepted") return "accepted";
+      if (status === "review") return "review";
+      return "rejected";
+    }
+
+    function statusLabel(status) {
+      if (status === "accepted") return "Accepted";
+      if (status === "review") return "Review";
+      return "Rejected";
+    }
+
+    function renderMarkers() {
+      document.querySelectorAll(".event-marker").forEach((marker) => marker.remove());
+      events.forEach((event, index) => {
+        const marker = document.createElement("button");
+        marker.type = "button";
+        marker.className = `event-marker ${statusClass(event.status)}${index === selectedIndex ? " active" : ""}`;
+        marker.style.left = `calc(${(event.start / totalDuration) * 100}% - 5px)`;
+        marker.title = `${event.id}: ${formatTime(event.start, true)}`;
+        marker.setAttribute("aria-label", `Jump to ${event.id}`);
+        marker.addEventListener("click", (e) => {
+          e.stopPropagation();
+          selectEvent(index, true);
+        });
+        els.timeline.appendChild(marker);
+      });
+    }
+
+    function renderReviews() {
+      els.reviewList.innerHTML = "";
+      events.forEach((event, index) => {
+        const card = document.createElement("article");
+        card.className = `suggestion${index === selectedIndex ? " active" : ""}`;
+        card.innerHTML = `
+          <div class="suggestion-title">
+            <strong>${formatTime(event.start, true)} - ${formatTime(event.end, true)}</strong>
+            <span class="tag ${statusClass(event.status)}">${statusLabel(event.status)}</span>
+          </div>
+          <input class="caption-input" type="text" value="${captionFor(event)}" aria-label="Caption text for ${event.id}">
+          <div class="score-grid">
+            <div class="score">Audio<strong>${event.audio.toFixed(2)}</strong></div>
+            <div class="score">Reaction<strong>${event.reaction.toFixed(2)}</strong></div>
+            <div class="score">Decision<strong>${event.decision.toFixed(2)}</strong></div>
+          </div>
+          <p class="muted tiny">${event.reason}</p>
+          <div class="actions">
+            <button class="btn success" data-action="accept" type="button">Accept</button>
+            <button class="btn danger" data-action="reject" type="button">Reject</button>
+            <button class="btn" data-action="jump" type="button">Jump</button>
+          </div>
+        `;
+        card.addEventListener("click", () => selectEvent(index, false));
+        card.querySelector("input").addEventListener("input", (e) => {
+          event.caption = e.target.value;
+          if (index === selectedIndex) {
+            els.captionOverlay.textContent = e.target.value;
+          }
+        });
+        card.querySelector('[data-action="accept"]').addEventListener("click", (e) => {
+          e.stopPropagation();
+          updateStatus(index, "accepted");
+        });
+        card.querySelector('[data-action="reject"]').addEventListener("click", (e) => {
+          e.stopPropagation();
+          updateStatus(index, "rejected");
+        });
+        card.querySelector('[data-action="jump"]').addEventListener("click", (e) => {
+          e.stopPropagation();
+          selectEvent(index, true);
+        });
+        els.reviewList.appendChild(card);
+      });
+    }
+
+    function renderTable() {
+      els.eventTable.innerHTML = "";
+      events.forEach((event, index) => {
+        const row = document.createElement("tr");
+        row.className = index === selectedIndex ? "active-row" : "";
+        row.innerHTML = `
+          <td><span class="tag ${statusClass(event.status)}">${statusLabel(event.status)}</span></td>
+          <td>${formatTime(event.start, true)}</td>
+          <td>${formatTime(event.end, true)}</td>
+          <td>${event.id}</td>
+          <td>${event.audio.toFixed(2)}</td>
+          <td>${event.reaction.toFixed(2)}</td>
+          <td>${event.decision.toFixed(2)}</td>
+          <td>${event.reason}</td>
+        `;
+        row.addEventListener("click", () => selectEvent(index, true));
+        els.eventTable.appendChild(row);
+      });
+    }
+
+    function updateCounts() {
+      const accepted = events.filter((event) => event.status === "accepted").length;
+      const review = events.filter((event) => event.status === "review").length;
+      els.acceptedCount.textContent = String(accepted);
+      els.reviewBadge.textContent = `${review} need review`;
+    }
+
+    function updateStage() {
+      const event = events[selectedIndex];
+      els.captionOverlay.textContent = captionFor(event);
+      els.currentEventChip.textContent = `Current event: ${event.id}`;
+      els.stageAudio.textContent = event.audio.toFixed(2);
+      els.stageReaction.textContent = event.reaction.toFixed(2);
+      els.stageDecision.textContent = event.decision.toFixed(2);
+      els.timecode.textContent = `${formatTime(currentTime)} / ${formatTime(totalDuration)}`;
+    }
+
+    function updateTimeline() {
+      const percent = Math.min(100, Math.max(0, (currentTime / totalDuration) * 100));
+      els.timelineFill.style.width = `${percent}%`;
+      els.scrubber.style.left = `calc(${percent}% - 7px)`;
+      els.timecode.textContent = `${formatTime(currentTime)} / ${formatTime(totalDuration)}`;
+    }
+
+    function selectEvent(index, jump) {
+      selectedIndex = index;
+      if (jump) {
+        currentTime = events[index].start;
+      }
+      updateStage();
+      updateTimeline();
+      renderMarkers();
+      renderReviews();
+      renderTable();
+    }
+
+    function updateStatus(index, status) {
+      events[index].status = status;
+      selectEvent(index, false);
+      updateCounts();
+      showToast(`${events[index].id} marked as ${status}.`);
+    }
+
+    function setPlaying(next) {
+      playing = next;
+      els.playState.textContent = playing ? "Playing" : "Paused";
+      window.clearInterval(timer);
+      if (playing) {
+        timer = window.setInterval(() => {
+          currentTime += 1;
+          if (currentTime >= totalDuration) {
+            currentTime = totalDuration;
+            setPlaying(false);
+          }
+          const closest = events.findIndex((event) => currentTime >= event.start && currentTime <= event.end + 1);
+          if (closest >= 0 && closest !== selectedIndex) {
+            selectedIndex = closest;
+            updateStage();
+            renderMarkers();
+            renderReviews();
+            renderTable();
+          }
+          updateTimeline();
+        }, 800);
+      }
+    }
+
+    function scrubToClientX(clientX) {
+      const box = els.timeline.getBoundingClientRect();
+      const ratio = Math.min(1, Math.max(0, (clientX - box.left) / box.width));
+      currentTime = ratio * totalDuration;
+      updateTimeline();
+    }
+
+    function initTimelineDrag() {
+      let dragging = false;
+      els.timeline.addEventListener("pointerdown", (event) => {
+        dragging = true;
+        els.timeline.setPointerCapture(event.pointerId);
+        scrubToClientX(event.clientX);
+      });
+      els.timeline.addEventListener("pointermove", (event) => {
+        if (dragging) scrubToClientX(event.clientX);
+      });
+      els.timeline.addEventListener("pointerup", (event) => {
+        dragging = false;
+        els.timeline.releasePointerCapture(event.pointerId);
+        const nearest = events.reduce((bestIndex, item, index) => {
+          const bestDistance = Math.abs(events[bestIndex].start - currentTime);
+          const nextDistance = Math.abs(item.start - currentTime);
+          return nextDistance < bestDistance ? index : bestIndex;
+        }, 0);
+        if (Math.abs(events[nearest].start - currentTime) < 8) {
+          selectEvent(nearest, true);
+        }
+      });
+    }
+
+    document.getElementById("playBtn").addEventListener("click", () => setPlaying(true));
+    document.getElementById("pauseBtn").addEventListener("click", () => setPlaying(false));
+    document.getElementById("nextBtn").addEventListener("click", () => selectEvent((selectedIndex + 1) % events.length, true));
+    document.getElementById("prevBtn").addEventListener("click", () => selectEvent((selectedIndex + events.length - 1) % events.length, true));
+
+    document.getElementById("startBtn").addEventListener("click", () => {
+      els.runStatus.textContent = "Analyzing";
+      els.runStatus.className = "tag live";
+      els.analysisProgress.style.width = "18%";
+      showToast("Running audio detection and visual reaction scoring...");
+      window.setTimeout(() => {
+        els.analysisProgress.style.width = "100%";
+        els.runStatus.textContent = "Ready";
+        els.runStatus.className = "tag ready";
+        showToast("Caption suggestions are ready for review.");
+      }, 900);
+    });
+
+    els.device.addEventListener("change", () => {
+      if (els.device.value === "cuda") {
+        els.gpuAlert.classList.add("visible");
+        showToast("GPU mode requested. Mock diagnostic: CUDA unavailable.");
+      } else {
+        els.gpuAlert.classList.remove("visible");
+        showToast(`Device mode set to ${els.device.value}.`);
+      }
+    });
+
+    document.getElementById("retryCpuBtn").addEventListener("click", () => {
+      els.device.value = "cpu";
+      els.gpuAlert.classList.remove("visible");
+      showToast("Retrying with CPU mode.");
+    });
+
+    document.getElementById("copyReportBtn").addEventListener("click", () => showToast("Diagnostic report copied."));
+    document.getElementById("doctorCopyBtn").addEventListener("click", () => showToast("Doctor report copied."));
+
+    document.getElementById("doctorBtn").addEventListener("click", () => {
+      els.doctorModal.classList.add("visible");
+      els.doctorModal.setAttribute("aria-hidden", "false");
+    });
+
+    document.getElementById("closeDoctorBtn").addEventListener("click", () => {
+      els.doctorModal.classList.remove("visible");
+      els.doctorModal.setAttribute("aria-hidden", "true");
+    });
+
+    els.doctorModal.addEventListener("click", (event) => {
+      if (event.target === els.doctorModal) {
+        els.doctorModal.classList.remove("visible");
+        els.doctorModal.setAttribute("aria-hidden", "true");
+      }
+    });
+
+    els.language.addEventListener("change", () => {
+      renderReviews();
+      updateStage();
+      showToast(`Caption label language changed to ${els.language.options[els.language.selectedIndex].text}.`);
+    });
+
+    document.querySelectorAll("[data-export]").forEach((button) => {
+      button.addEventListener("click", () => {
+        const type = button.getAttribute("data-export");
+        showToast(`${type} export prepared from accepted editor suggestions.`);
+      });
+    });
+
+    // Theme toggle
+    const themeBtn = document.getElementById("themeBtn");
+    function applyTheme(dark) {
+      document.documentElement.setAttribute("data-theme", dark ? "" : "light");
+      themeBtn.textContent = dark ? "☀" : "🌙";
+      themeBtn.title = dark ? "Switch to light mode" : "Switch to dark mode";
+    }
+    themeBtn.addEventListener("click", () => {
+      const isDark = document.documentElement.getAttribute("data-theme") !== "light";
+      applyTheme(!isDark);
+    });
+    // default: dark
+    applyTheme(true);
+
+    initTimelineDrag();
+    renderMarkers();
+    renderReviews();
+    renderTable();
+    updateCounts();
+    selectEvent(0, true);
+  </script>
+</body>
+</html>