Skip to content

ecfromthedc/RT-vision-core

Repository files navigation

RT-Vision-Core: Video Transcription for Obsidian

A powerful tool to transcribe YouTube videos and Instagram posts/reels into Obsidian-compatible markdown notes using OpenAI's Whisper AI.

Features

  • YouTube Support: Download and transcribe any YouTube video
  • Instagram Support: Download and transcribe Instagram posts and reels
  • AI Transcription: Uses OpenAI Whisper for high-quality transcription
  • Multiple Languages: Auto-detects language or specify manually
  • Obsidian Integration: Creates formatted markdown notes ready for your vault
  • Timestamps: Optional timestamp support for detailed notes
  • Batch Processing: Process multiple URLs in one command
  • Flexible Models: Choose from 5 Whisper models based on speed vs accuracy needs

Installation

Prerequisites

  • Python 3.8 or higher
  • ffmpeg (for audio/video processing)

Install ffmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows: Download from ffmpeg.org or use:

winget install ffmpeg

Setup

  1. Clone the repository:
git clone <repository-url>
cd RT-vision-core
  1. Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure (optional):
cp .env.example .env
# Edit .env to configure Whisper model, language, and Obsidian vault path

Usage

Basic Usage

Transcribe a YouTube video:

python main.py https://www.youtube.com/watch?v=dQw4w9WgXcQ

Transcribe an Instagram post/reel:

python main.py https://www.instagram.com/p/ABC123/

Process multiple URLs:

python main.py https://www.youtube.com/watch?v=video1 https://www.instagram.com/reel/xyz/

Advanced Options

Include timestamps in transcription:

python main.py --timestamps https://www.youtube.com/watch?v=dQw4w9WgXcQ

Use a different Whisper model:

python main.py --model medium https://www.youtube.com/watch?v=dQw4w9WgXcQ

Specify language (skip auto-detection):

python main.py --language es https://www.youtube.com/watch?v=dQw4w9WgXcQ

Command Line Options

  • urls: One or more YouTube or Instagram URLs (required)
  • --timestamps: Include timestamps in the transcription
  • --model {tiny,base,small,medium,large}: Whisper model to use (default: base)
  • --language LANG: Language code for transcription (e.g., en, es, fr)

Whisper Models

Choose based on your needs:

Model Speed Accuracy Memory Best For
tiny ⚡⚡⚡⚡⚡ ⭐⭐ ~1 GB Quick drafts
base ⚡⚡⚡⚡ ⭐⭐⭐ ~1 GB Balanced (default)
small ⚡⚡⚡ ⭐⭐⭐⭐ ~2 GB Good quality
medium ⚡⚡ ⭐⭐⭐⭐⭐ ~5 GB High accuracy
large ⭐⭐⭐⭐⭐ ~10 GB Best quality

Output

Transcriptions are saved as markdown files in the transcripts/ directory with the following format:

---
title: Video Title
source: https://www.youtube.com/watch?v=...
date: 2025-11-10
language: en
channel: Channel Name
platform: YouTube
tags:
  - transcription
  - en
---

# Video Title

## Metadata

- **Source:** https://www.youtube.com/watch?v=...
- **Channel:** Channel Name
- **Date:** 2025-11-10
- **Duration:** 15 minutes
- **Language:** en

## Description

Original video description...

## Transcription

Transcribed text goes here...

Obsidian Integration

If you set OBSIDIAN_VAULT_PATH in your .env file, notes will be automatically copied to your Obsidian vault:

OBSIDIAN_VAULT_PATH=/path/to/your/obsidian/vault

Project Structure

RT-vision-core/
├── main.py                    # CLI entry point
├── config.py                  # Configuration settings
├── youtube_downloader.py      # YouTube video/audio download
├── instagram_downloader.py    # Instagram post/reel download
├── transcriber.py             # Whisper transcription
├── obsidian_formatter.py      # Markdown formatting
├── requirements.txt           # Python dependencies
├── .env.example              # Example environment config
├── downloads/                # Temporary download directory
└── transcripts/              # Output markdown files

Examples

YouTube Video with Timestamps

python main.py --timestamps --model small https://www.youtube.com/watch?v=dQw4w9WgXcQ

This will create a note with timestamped segments:

## Transcription

**[00:00]** Welcome to this video about...
**[00:15]** In this tutorial, we'll cover...
**[00:45]** First, let's talk about...

Instagram Reel in Spanish

python main.py --language es https://www.instagram.com/reel/ABC123/

Batch Processing

python main.py \
  https://www.youtube.com/watch?v=video1 \
  https://www.youtube.com/watch?v=video2 \
  https://www.instagram.com/p/post1/ \
  https://www.instagram.com/reel/reel1/

Troubleshooting

Common Issues

  1. ffmpeg not found: Install ffmpeg using instructions above
  2. Out of memory: Use a smaller Whisper model (tiny or base)
  3. Instagram download fails: Some private or restricted posts may not be accessible
  4. Slow transcription: Use a smaller model or ensure you have GPU support

GPU Acceleration

For faster transcription, install PyTorch with CUDA support:

# For NVIDIA GPUs
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is open source and available under the MIT License.

Acknowledgments

Support

For issues and questions, please open an issue on GitHub.

About

We are optimizing rising tides work flows buildling automations for social media and discord and our trello

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages