Multimodal extractors for video, image, audio, text & PDF — turn any file into searchable vector embeddings (SigLIP, Gemini, E5, CLAP, ArcFace).
-
Updated
Jun 16, 2026 - Python
Multimodal extractors for video, image, audio, text & PDF — turn any file into searchable vector embeddings (SigLIP, Gemini, E5, CLAP, ArcFace).
Reverse video search using TimeSformer transformer embeddings and FAISS vector indexing. Upload a video, retrieve visually similar clips from UCF-101 in milliseconds.
Add a description, image, and links to the video-embeddings topic page so that developers can more easily learn about it.
To associate your repository with the video-embeddings topic, visit your repo's landing page and select "manage topics."