Skip to content

BhaveshBhakta/Smart-Navigation-Stick-Using-VLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Smart-Navigation-System-for-Visually-Impaired

An AI-powered assistive navigation system for visually impaired individuals.

This project analyzes visual scenes using computer vision and Vision-Language Models to provide real-time audio guidance. The system detects objects in the environment, generates descriptive captions of the scene, and converts them into speech to help visually impaired users understand their surroundings.

The system integrates object detection (YOLOv8), image captioning (BLIP), and text-to-speech (gTTS) in a multi-step pipeline. Users can upload images, process videos, or use a live webcam to receive spoken navigation guidance.


Key Features

1. Intelligent Scene Understanding

Object Detection (YOLOv8): Detects important objects such as people, vehicles, chairs, buses, and obstacles in real time.

Image Captioning (BLIP): Generates natural language descriptions of the scene using a Vision-Language Model.

Navigation Alerts: Combines detected objects and scene captions to produce meaningful navigation guidance.


2. Multi-Modal Input Support

The system supports multiple input sources:

Image Upload: Analyze static images to generate scene descriptions.

Video Processing: Extract frames from videos and analyze each scene.

Live Webcam Navigation: Provide real-time environmental descriptions.


3. Audio Guidance

Text-to-Speech Conversion (gTTS): Convert generated scene descriptions into spoken guidance.

Accessibility Support: Designed to help visually impaired individuals understand their surroundings through audio feedback.


4. Interactive Web Interface

Gradio Interface: Provides an easy-to-use web application where users can:

  • Upload images
  • Upload videos
  • Use a live webcam

The interface displays:

  • Detected objects
  • Scene descriptions
  • Audio guidance

Website Overview

Screenshot 2026-03-15 220031

Quick Start

Clone the repository

git clone https://github.com/BhaveshBhakta/Smart-Navigation-Stick-Using-VLM.git
cd Smart-Navigation-Stick-Using-VLM

Install dependencies

pip install -r requirements.txt

Install YOLOv8 model

YOLOv8 will automatically download the model weights when running for the first time.


Run the application

python app.py

Open the web interface

After running the application, open the local Gradio interface:

http://127.0.0.1:7860

High-Level Architecture

archVLM

System Workflow

  1. Capture visual input from images, videos, or webcam.
  2. Detect objects in the environment using YOLOv8.
  3. Generate scene descriptions using the BLIP Vision-Language Model.
  4. Combine detected objects and captions to create navigation guidance.
  5. Convert navigation text into audio using Google Text-to-Speech.
  6. Deliver spoken guidance to the user.

Project Structure

Smart-Navigation-System
│
├── app.py
├── modules
│   ├── caption.py
│   ├── detection.py
│   ├── navigation.py
│   ├── tts.py
│   ├── video_processing.py
│   └── webcam_processing.py
│
├── training
│   ├── train_blip.py
│   ├── evaluate_model.py
│   └── dataset_loader.py
│
├── dataset
│
├── requirements.txt
└── README.md

Roadmap & Future Work

Distance Estimation

Integrate depth estimation to determine how far objects are from the user.

Mobile Deployment

Convert the system into a mobile application for real-world use.

Improved Scene Understanding

Use advanced Vision-Language Models such as InstructBLIP or LLaVA for richer descriptions.

Edge Deployment

Optimize models for deployment on edge devices such as Raspberry Pi or smart glasses.

About

An AI system for the visually impaired using YOLOv8 and BLIP to provide real-time audio descriptions of their environment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages