Skip to content

Ru-1234/cinematic-intelligence-system

Repository files navigation

🎬 Cinematic Intelligence System

AI Lab Spring 2026 — OEL Lab Project


Overview

This project was developed as part of the OEL AI Lab (Spring 2026). It implements a complete multimodal AI pipeline for video understanding and cinematic trailer generation.

The system performs:

  • Character detection using YOLO11
  • ROI-based visual transformation (darkening effect)
  • High-impact scene analysis on pre-segmented clips
  • Machine learning-based clip classification
  • Automated trailer generation
  • Creepy visual transformation effects
  • NLP-based cinematic caption generation with TTS audio
  • Atmospheric background music generation
  • Final trailer export with mixed audio

The project is implemented using two separate Google Colab notebooks for modular execution.


Academic Integrity Notice

This project is intended for academic learning and evaluation purposes.

  • The code and methodology may be referenced for learning.
  • Copying, modifying, or using this project in assignments or submissions requires prior permission from the author.
  • Unauthorized use without consent is not permitted.

Tech Stack

  • Python
  • YOLO11 (Ultralytics)
  • OpenCV
  • Scikit-learn
  • MoviePy / FFmpeg
  • HuggingFace Transformers (BLIP)
  • Coqui TTS (tacotron2-DDC)
  • Pillow
  • Scipy (WAV audio I/O, pitch shifting, reverb processing)
  • Matplotlib / NumPy / Pandas

Project Structure

cinematic-intelligence-system/
│
├── task1_character_detection_videos/
├── task2_trailer_input_videos/
├── 01_yolo_detection.ipynb
├── 02_task2_trailer_generation.ipynb
├── OEL LAB (AI).pdf
└── README.md

Dataset Description

Task 1

  • Input: Full-length video files (horizon.mp4, uncharted.mp4, need_for_speed.mp4, lotr.mp4)
  • Purpose: Character detection and ROI-based darkening transformation

Task 2

  • Input: Pre-segmented video clips (numbered .mp4 files)
  • Purpose: Feature extraction, classification, trailer generation, and cinematic post-processing

How to Run (Google Colab)

This project is fully designed for execution in Google Colab.


Open Notebooks

Open:

  • 01_yolo_detection.ipynb
  • 02_task2_trailer_generation.ipynb

Mount Google Drive

from google.colab import drive
drive.mount('/content/drive')

Install Dependencies

pip install ultralytics opencv-python moviepy transformers scikit-learn matplotlib pandas numpy scipy Pillow TTS
apt-get install -y ffmpeg fonts-dejavu

TASK 1 — Character Detection & ROI Processing

Objective

Detect characters in video frames, apply bounding boxes, extract regions of interest (ROIs), apply a darkening transformation, and reconstruct the output video.


Workflow

1. YOLO11 Object Detection

  • Load yolo11n.pt and run inference on each frame
  • Detect all objects with confidence ≥ 0.5

2. Bounding Box Visualization

  • Draw green bounding boxes around all detected objects

3. ROI Extraction & Transformation

  • Crop the detected region (ROI) from each frame
  • Apply a darkening effect (pixel values scaled to 50%)
  • Reinsert the modified ROI back into the frame

4. Output Video Generation

  • Save the processed video using OpenCV's VideoWriter

Processed videos:

  • output_detection.mp4 (horizon)
  • uncharted_output_detection.mp4
  • needForSpeed_output_detection.mp4
  • lotr_output_detection.mp4
  • horizon(1)_output_detection.mp4

TASK 2 — Intelligent Trailer Generation System

Overview

Task 2 works on pre-segmented clips, where each clip is independently analyzed and ranked for impact. No scene detection or video splitting is performed.


1. Feature Extraction

Each clip is sampled every 5 frames and converted into a feature vector:

  • Motion score (mean absolute frame difference)
  • Brightness mean and variance
  • YOLO11 object count per frame
  • Scene cut rate (large brightness jumps between frames)
  • Audio energy and MFCC mean — zero-filled

Features are saved to features.csv.


2. Classification Model

Labels are generated automatically using a weighted composite impact score:

  • Motion (40%), Objects (35%), Scene cuts (15%), Brightness variance (10%)
  • Median split → +1 (high-impact) / -1 (low-impact)

Model used: Logistic Regression (L2-regularised, with StandardScaler)

  • Stratified K-Fold cross-validation (up to 5 folds)
  • Saved as impact_model.pkl and scaler.pkl

3. Trailer Generation (5 Clips)

  • Clips ranked by composite score (motion + objects)
  • Top 5 clips selected using a greedy diversity constraint (at least 2 index positions apart)
  • Clips ordered low → high impact for narrative suspense build
  • Merged using MoviePy → saved as trailer.mp4

4. Creepy Visual Transformation

Applied per-frame to trailer.mp4 → output: creepy_trailer.mp4

Effects include:

  • Face darkening (upper 40% of person bounding box shadowed)
  • Red glowing eyes (overlaid at estimated eye positions)
  • Random blood drip lines (30% probability per person detection)
  • Background desaturation (everything outside person boxes goes grayscale)
  • Cinematic vignette (edge darkening)
  • Brightness flicker (random ±15–20% per frame)
  • Glitch shift (6% probability, horizontal pixel band displacement)
  • Fog overlay (15% probability per frame)

5. NLP Caption Generation

Key frames are sampled from each selected clip and captioned using BLIP (Salesforce/blip-image-captioning-base).

Captions are transformed into cinematic horror-style text via:

  • Word-level substitution dictionary (e.g. "man" → "figure", "walks" → "lurks")
  • Random creepy prefix (e.g. "They were warned...")
  • Random creepy suffix (e.g. "Something watches.")

Captions are rendered as text overlays onto creepy_trailer.mp4captioned_trailer.mp4.


6. TTS Audio & Background Music

Each caption is converted to speech, then processed with:

  • Pitch-shifting (lower, creepier tone)
  • Reverb effect
  • Whisper-style amplitude envelope

Atmospheric background music is generated programmatically:

  • Dark sine-wave drone (40Hz, 60Hz, 90Hz, 135Hz layers)
  • Heartbeat pulse at ~62 BPM

Voice clips and drone are mixed and saved as final_audio.wav.


7. Final Export

Video and audio are merged using FFmpeg:

captioned_trailer.mp4 + final_audio.wav → FINAL_TRAILER_PRODUCTION.mp4

Outputs

Generated outputs include:

  • FINAL_TRAILER_PRODUCTION.mp4 — final trailer with captions and audio
  • creepy_trailer.mp4 — visually transformed trailer
  • trailer.mp4 — raw assembled trailer (5 clips)
  • features.csv — extracted clip features and labels
  • creepy_music.wav — intensity-driven atmospheric music (per-clip)
  • drone_bed.wav — background drone layer mixed under voice
  • impact_model.pkl / scaler.pkl — trained classification model
  • YOLO detection output videos (Task 1)

Outputs Location (Google Drive)

All outputs are stored externally due to large file sizes:

🔗 Google Drive Link: https://drive.google.com/drive/folders/1LcevU8MvltBCN0XaVQAyoz4Z5IRDgusD?usp=drive_link


Pipeline Overview

Input Videos / Clips
        ↓
Task 1: YOLO Detection + ROI Darkening
        ↓
Output Videos (per input file)
        ↓
Task 2: Feature Extraction (visual + object count)
        ↓
Logistic Regression Classification (+1 / -1)
        ↓
Top 5 High-Impact Clip Selection (diversity-aware)
        ↓
Trailer Assembly (narrative order: low → high impact)
        ↓
Creepy Visual Transformations (YOLO-guided, per-frame)
        ↓
BLIP Caption Generation + Horror NLP Transform
        ↓
Caption Overlay onto Video
        ↓
TTS Voice Generation + Atmospheric Music
        ↓
FFmpeg Audio/Video Merge
        ↓
FINAL_TRAILER_PRODUCTION.mp4

Author

Romaisa | Maham Anjum | Malaika

AI Lab - Spring 2026 | BS Artificial Intelligence


Final Note

This system demonstrates a full pipeline combining computer vision, machine learning, NLP, and audio synthesis for intelligent cinematic trailer generation.

About

This project was developed as part of the OEL AI Lab (Spring 2026), using a set of pre-selected video clips for analysis and processing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors