🎬 Cinematic Intelligence System

AI Lab Spring 2026 — OEL Lab Project

Overview

This project was developed as part of the OEL AI Lab (Spring 2026). It implements a complete multimodal AI pipeline for video understanding and cinematic trailer generation.

The system performs:

Character detection using YOLO11
ROI-based visual transformation (darkening effect)
High-impact scene analysis on pre-segmented clips
Machine learning-based clip classification
Automated trailer generation
Creepy visual transformation effects
NLP-based cinematic caption generation with TTS audio
Atmospheric background music generation
Final trailer export with mixed audio

The project is implemented using two separate Google Colab notebooks for modular execution.

Academic Integrity Notice

This project is intended for academic learning and evaluation purposes.

The code and methodology may be referenced for learning.
Copying, modifying, or using this project in assignments or submissions requires prior permission from the author.
Unauthorized use without consent is not permitted.

Tech Stack

Python
YOLO11 (Ultralytics)
OpenCV
Scikit-learn
MoviePy / FFmpeg
HuggingFace Transformers (BLIP)
Coqui TTS (tacotron2-DDC)
Pillow
Scipy (WAV audio I/O, pitch shifting, reverb processing)
Matplotlib / NumPy / Pandas

Project Structure

cinematic-intelligence-system/
│
├── task1_character_detection_videos/
├── task2_trailer_input_videos/
├── 01_yolo_detection.ipynb
├── 02_task2_trailer_generation.ipynb
├── OEL LAB (AI).pdf
└── README.md

Dataset Description

Task 1

Input: Full-length video files (horizon.mp4, uncharted.mp4, need_for_speed.mp4, lotr.mp4)
Purpose: Character detection and ROI-based darkening transformation

Task 2

Input: Pre-segmented video clips (numbered .mp4 files)
Purpose: Feature extraction, classification, trailer generation, and cinematic post-processing

How to Run (Google Colab)

This project is fully designed for execution in Google Colab.

Open Notebooks

Open:

01_yolo_detection.ipynb
02_task2_trailer_generation.ipynb

Mount Google Drive

from google.colab import drive
drive.mount('/content/drive')

Install Dependencies

pip install ultralytics opencv-python moviepy transformers scikit-learn matplotlib pandas numpy scipy Pillow TTS
apt-get install -y ffmpeg fonts-dejavu

TASK 1 — Character Detection & ROI Processing

Objective

Detect characters in video frames, apply bounding boxes, extract regions of interest (ROIs), apply a darkening transformation, and reconstruct the output video.

Workflow

1. YOLO11 Object Detection

Load yolo11n.pt and run inference on each frame
Detect all objects with confidence ≥ 0.5

2. Bounding Box Visualization

Draw green bounding boxes around all detected objects

3. ROI Extraction & Transformation

Crop the detected region (ROI) from each frame
Apply a darkening effect (pixel values scaled to 50%)
Reinsert the modified ROI back into the frame

4. Output Video Generation

Save the processed video using OpenCV's VideoWriter

Processed videos:

output_detection.mp4 (horizon)
uncharted_output_detection.mp4
needForSpeed_output_detection.mp4
lotr_output_detection.mp4
horizon(1)_output_detection.mp4

TASK 2 — Intelligent Trailer Generation System

Overview

Task 2 works on pre-segmented clips, where each clip is independently analyzed and ranked for impact. No scene detection or video splitting is performed.

1. Feature Extraction

Each clip is sampled every 5 frames and converted into a feature vector:

Motion score (mean absolute frame difference)
Brightness mean and variance
YOLO11 object count per frame
Scene cut rate (large brightness jumps between frames)
Audio energy and MFCC mean — zero-filled

Features are saved to features.csv.

2. Classification Model

Labels are generated automatically using a weighted composite impact score:

Motion (40%), Objects (35%), Scene cuts (15%), Brightness variance (10%)
Median split → +1 (high-impact) / -1 (low-impact)

Model used: Logistic Regression (L2-regularised, with StandardScaler)

Stratified K-Fold cross-validation (up to 5 folds)
Saved as impact_model.pkl and scaler.pkl

3. Trailer Generation (5 Clips)

Clips ranked by composite score (motion + objects)
Top 5 clips selected using a greedy diversity constraint (at least 2 index positions apart)
Clips ordered low → high impact for narrative suspense build
Merged using MoviePy → saved as trailer.mp4

4. Creepy Visual Transformation

Applied per-frame to trailer.mp4 → output: creepy_trailer.mp4

Effects include:

Face darkening (upper 40% of person bounding box shadowed)
Red glowing eyes (overlaid at estimated eye positions)
Random blood drip lines (30% probability per person detection)
Background desaturation (everything outside person boxes goes grayscale)
Cinematic vignette (edge darkening)
Brightness flicker (random ±15–20% per frame)
Glitch shift (6% probability, horizontal pixel band displacement)
Fog overlay (15% probability per frame)

5. NLP Caption Generation

Key frames are sampled from each selected clip and captioned using BLIP (Salesforce/blip-image-captioning-base).

Captions are transformed into cinematic horror-style text via:

Word-level substitution dictionary (e.g. "man" → "figure", "walks" → "lurks")
Random creepy prefix (e.g. "They were warned...")
Random creepy suffix (e.g. "Something watches.")

Captions are rendered as text overlays onto creepy_trailer.mp4 → captioned_trailer.mp4.

6. TTS Audio & Background Music

Each caption is converted to speech, then processed with:

Pitch-shifting (lower, creepier tone)
Reverb effect
Whisper-style amplitude envelope

Atmospheric background music is generated programmatically:

Dark sine-wave drone (40Hz, 60Hz, 90Hz, 135Hz layers)
Heartbeat pulse at ~62 BPM

Voice clips and drone are mixed and saved as final_audio.wav.

7. Final Export

Video and audio are merged using FFmpeg:

captioned_trailer.mp4 + final_audio.wav → FINAL_TRAILER_PRODUCTION.mp4

Outputs

Generated outputs include:

FINAL_TRAILER_PRODUCTION.mp4 — final trailer with captions and audio
creepy_trailer.mp4 — visually transformed trailer
trailer.mp4 — raw assembled trailer (5 clips)
features.csv — extracted clip features and labels
creepy_music.wav — intensity-driven atmospheric music (per-clip)
drone_bed.wav — background drone layer mixed under voice
impact_model.pkl / scaler.pkl — trained classification model
YOLO detection output videos (Task 1)

Outputs Location (Google Drive)

All outputs are stored externally due to large file sizes:

🔗 Google Drive Link: https://drive.google.com/drive/folders/1LcevU8MvltBCN0XaVQAyoz4Z5IRDgusD?usp=drive_link

Pipeline Overview

Input Videos / Clips
        ↓
Task 1: YOLO Detection + ROI Darkening
        ↓
Output Videos (per input file)
        ↓
Task 2: Feature Extraction (visual + object count)
        ↓
Logistic Regression Classification (+1 / -1)
        ↓
Top 5 High-Impact Clip Selection (diversity-aware)
        ↓
Trailer Assembly (narrative order: low → high impact)
        ↓
Creepy Visual Transformations (YOLO-guided, per-frame)
        ↓
BLIP Caption Generation + Horror NLP Transform
        ↓
Caption Overlay onto Video
        ↓
TTS Voice Generation + Atmospheric Music
        ↓
FFmpeg Audio/Video Merge
        ↓
FINAL_TRAILER_PRODUCTION.mp4

Author

Romaisa | Maham Anjum | Malaika

AI Lab - Spring 2026 | BS Artificial Intelligence

Final Note

This system demonstrates a full pipeline combining computer vision, machine learning, NLP, and audio synthesis for intelligent cinematic trailer generation.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
task1_character_detection_videos		task1_character_detection_videos
task2_trailer_input_videos		task2_trailer_input_videos
01_yolo_detection.ipynb		01_yolo_detection.ipynb
02_task2_trailer_generation.ipynb		02_task2_trailer_generation.ipynb
LICENSE.txt		LICENSE.txt
OEL LAB (AI).pdf		OEL LAB (AI).pdf
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🎬 Cinematic Intelligence System

AI Lab Spring 2026 — OEL Lab Project

Overview

Academic Integrity Notice

Tech Stack

Project Structure

Dataset Description

Task 1

Task 2

How to Run (Google Colab)

Open Notebooks

Mount Google Drive

Install Dependencies

TASK 1 — Character Detection & ROI Processing

Objective

Workflow

1. YOLO11 Object Detection

2. Bounding Box Visualization

3. ROI Extraction & Transformation

4. Output Video Generation

TASK 2 — Intelligent Trailer Generation System

Overview

1. Feature Extraction

2. Classification Model

3. Trailer Generation (5 Clips)

4. Creepy Visual Transformation

5. NLP Caption Generation

6. TTS Audio & Background Music

7. Final Export

Outputs

Outputs Location (Google Drive)

Pipeline Overview

Author

Final Note

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages