𝗜𝗺𝗮𝗴𝗲 𝗖𝗮𝗽𝘁𝗶𝗼𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗢𝗰𝗰𝗹𝘂𝘀𝗶𝗼𝗻 𝗔𝗻𝗮𝗹𝘆𝘀𝗶𝘀 | 𝗩𝗶𝗧-𝗚𝗣𝗧𝟮 | 𝗦𝗺𝗼𝗹𝗩𝗟𝗠 | 𝗕𝗘𝗥𝗧
-
Updated
May 3, 2025 - Jupyter Notebook
𝗜𝗺𝗮𝗴𝗲 𝗖𝗮𝗽𝘁𝗶𝗼𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗢𝗰𝗰𝗹𝘂𝘀𝗶𝗼𝗻 𝗔𝗻𝗮𝗹𝘆𝘀𝗶𝘀 | 𝗩𝗶𝗧-𝗚𝗣𝗧𝟮 | 𝗦𝗺𝗼𝗹𝗩𝗟𝗠 | 𝗕𝗘𝗥𝗧
Flask-based AI app that summarizes surveillance videos using Whisper (audio), ViT-GPT2 (frame captions), and Groq LLM (narratives). Produces both general and law enforcement-style summaries.
AI-powered image captioning using InceptionV3+LSTM and ViT-GPT2 models. Trained on Flickr8k dataset with interactive Streamlit interface.
A powerful Streamlit application that analyzes images using multiple vision models and responds to queries about visual content through conversational AI.
An AI-powered image captioning app built with Streamlit, using ViT-GPT2 for caption generation and YOLOv8 for object detection. The app provides enhanced captions by integrating detected objects into the generated text.
Developed an image captioning system using the BLIP model to generate detailed, context-aware captions. Achieved an average BLEU score of 0.72, providing rich descriptions that enhance accessibility and inclusivity.
The chrome extension that gets input images and generates the captions for them.
NLP and Computer Vision prototype for smart-glasses visual assistance using ViT-GPT2 image captioning and text-to-speech.
Add a description, image, and links to the vit-gpt2 topic page so that developers can more easily learn about it.
To associate your repository with the vit-gpt2 topic, visit your repo's landing page and select "manage topics."