RT-Vision-Core: Video Transcription for Obsidian

A powerful tool to transcribe YouTube videos and Instagram posts/reels into Obsidian-compatible markdown notes using OpenAI's Whisper AI.

Features

YouTube Support: Download and transcribe any YouTube video
Instagram Support: Download and transcribe Instagram posts and reels
AI Transcription: Uses OpenAI Whisper for high-quality transcription
Multiple Languages: Auto-detects language or specify manually
Obsidian Integration: Creates formatted markdown notes ready for your vault
Timestamps: Optional timestamp support for detailed notes
Batch Processing: Process multiple URLs in one command
Flexible Models: Choose from 5 Whisper models based on speed vs accuracy needs

Installation

Prerequisites

Python 3.8 or higher
ffmpeg (for audio/video processing)

Install ffmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows: Download from ffmpeg.org or use:

winget install ffmpeg

Setup

Clone the repository:

git clone <repository-url>
cd RT-vision-core

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configure (optional):

cp .env.example .env
# Edit .env to configure Whisper model, language, and Obsidian vault path

Usage

Basic Usage

Transcribe a YouTube video:

python main.py https://www.youtube.com/watch?v=dQw4w9WgXcQ

Transcribe an Instagram post/reel:

python main.py https://www.instagram.com/p/ABC123/

Process multiple URLs:

python main.py https://www.youtube.com/watch?v=video1 https://www.instagram.com/reel/xyz/

Advanced Options

Include timestamps in transcription:

python main.py --timestamps https://www.youtube.com/watch?v=dQw4w9WgXcQ

Use a different Whisper model:

python main.py --model medium https://www.youtube.com/watch?v=dQw4w9WgXcQ

Specify language (skip auto-detection):

python main.py --language es https://www.youtube.com/watch?v=dQw4w9WgXcQ

Command Line Options

urls: One or more YouTube or Instagram URLs (required)
--timestamps: Include timestamps in the transcription
--model {tiny,base,small,medium,large}: Whisper model to use (default: base)
--language LANG: Language code for transcription (e.g., en, es, fr)

Whisper Models

Choose based on your needs:

Model	Speed	Accuracy	Memory	Best For
tiny	⚡⚡⚡⚡⚡	⭐⭐	~1 GB	Quick drafts
base	⚡⚡⚡⚡	⭐⭐⭐	~1 GB	Balanced (default)
small	⚡⚡⚡	⭐⭐⭐⭐	~2 GB	Good quality
medium	⚡⚡	⭐⭐⭐⭐⭐	~5 GB	High accuracy
large	⚡	⭐⭐⭐⭐⭐	~10 GB	Best quality

Output

Transcriptions are saved as markdown files in the transcripts/ directory with the following format:

---
title: Video Title
source: https://www.youtube.com/watch?v=...
date: 2025-11-10
language: en
channel: Channel Name
platform: YouTube
tags:
  - transcription
  - en
---

# Video Title

## Metadata

- **Source:** https://www.youtube.com/watch?v=...
- **Channel:** Channel Name
- **Date:** 2025-11-10
- **Duration:** 15 minutes
- **Language:** en

## Description

Original video description...

## Transcription

Transcribed text goes here...

Obsidian Integration

If you set OBSIDIAN_VAULT_PATH in your .env file, notes will be automatically copied to your Obsidian vault:

OBSIDIAN_VAULT_PATH=/path/to/your/obsidian/vault

Project Structure

RT-vision-core/
├── main.py                    # CLI entry point
├── config.py                  # Configuration settings
├── youtube_downloader.py      # YouTube video/audio download
├── instagram_downloader.py    # Instagram post/reel download
├── transcriber.py             # Whisper transcription
├── obsidian_formatter.py      # Markdown formatting
├── requirements.txt           # Python dependencies
├── .env.example              # Example environment config
├── downloads/                # Temporary download directory
└── transcripts/              # Output markdown files

Examples

YouTube Video with Timestamps

python main.py --timestamps --model small https://www.youtube.com/watch?v=dQw4w9WgXcQ

This will create a note with timestamped segments:

## Transcription

**[00:00]** Welcome to this video about...
**[00:15]** In this tutorial, we'll cover...
**[00:45]** First, let's talk about...

Instagram Reel in Spanish

python main.py --language es https://www.instagram.com/reel/ABC123/

Batch Processing

python main.py \
  https://www.youtube.com/watch?v=video1 \
  https://www.youtube.com/watch?v=video2 \
  https://www.instagram.com/p/post1/ \
  https://www.instagram.com/reel/reel1/

Troubleshooting

Common Issues

ffmpeg not found: Install ffmpeg using instructions above
Out of memory: Use a smaller Whisper model (tiny or base)
Instagram download fails: Some private or restricted posts may not be accessible
Slow transcription: Use a smaller model or ensure you have GPU support

GPU Acceleration

For faster transcription, install PyTorch with CUDA support:

# For NVIDIA GPUs
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is open source and available under the MIT License.

Acknowledgments

OpenAI Whisper for transcription
yt-dlp for YouTube downloads
Instaloader for Instagram downloads

Support

For issues and questions, please open an issue on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RT-Vision-Core: Video Transcription for Obsidian

Features

Installation

Prerequisites

Install ffmpeg

Setup

Usage

Basic Usage

Advanced Options

Command Line Options

Whisper Models

Output

Obsidian Integration

Project Structure

Examples

YouTube Video with Timestamps

Instagram Reel in Spanish

Batch Processing

Troubleshooting

Common Issues

GPU Acceleration

Contributing

License

Acknowledgments

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
config.py		config.py
instagram_downloader.py		instagram_downloader.py
main.py		main.py
obsidian_formatter.py		obsidian_formatter.py
requirements.txt		requirements.txt
transcriber.py		transcriber.py
youtube_downloader.py		youtube_downloader.py

Folders and files

Latest commit

History

Repository files navigation

RT-Vision-Core: Video Transcription for Obsidian

Features

Installation

Prerequisites

Install ffmpeg

Setup

Usage

Basic Usage

Advanced Options

Command Line Options

Whisper Models

Output

Obsidian Integration

Project Structure

Examples

YouTube Video with Timestamps

Instagram Reel in Spanish

Batch Processing

Troubleshooting

Common Issues

GPU Acceleration

Contributing

License

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages