Smart-Navigation-System-for-Visually-Impaired

An AI-powered assistive navigation system for visually impaired individuals.

This project analyzes visual scenes using computer vision and Vision-Language Models to provide real-time audio guidance. The system detects objects in the environment, generates descriptive captions of the scene, and converts them into speech to help visually impaired users understand their surroundings.

The system integrates object detection (YOLOv8), image captioning (BLIP), and text-to-speech (gTTS) in a multi-step pipeline. Users can upload images, process videos, or use a live webcam to receive spoken navigation guidance.

Key Features

1. Intelligent Scene Understanding

Object Detection (YOLOv8): Detects important objects such as people, vehicles, chairs, buses, and obstacles in real time.

Image Captioning (BLIP): Generates natural language descriptions of the scene using a Vision-Language Model.

Navigation Alerts: Combines detected objects and scene captions to produce meaningful navigation guidance.

2. Multi-Modal Input Support

The system supports multiple input sources:

Image Upload: Analyze static images to generate scene descriptions.

Video Processing: Extract frames from videos and analyze each scene.

Live Webcam Navigation: Provide real-time environmental descriptions.

3. Audio Guidance

Text-to-Speech Conversion (gTTS): Convert generated scene descriptions into spoken guidance.

Accessibility Support: Designed to help visually impaired individuals understand their surroundings through audio feedback.

4. Interactive Web Interface

Gradio Interface: Provides an easy-to-use web application where users can:

Upload images
Upload videos
Use a live webcam

The interface displays:

Detected objects
Scene descriptions
Audio guidance

Website Overview

Quick Start

Clone the repository

git clone https://github.com/BhaveshBhakta/Smart-Navigation-Stick-Using-VLM.git
cd Smart-Navigation-Stick-Using-VLM

Install dependencies

pip install -r requirements.txt

Install YOLOv8 model

YOLOv8 will automatically download the model weights when running for the first time.

Run the application

python app.py

Open the web interface

After running the application, open the local Gradio interface:

http://127.0.0.1:7860

High-Level Architecture

System Workflow

Capture visual input from images, videos, or webcam.
Detect objects in the environment using YOLOv8.
Generate scene descriptions using the BLIP Vision-Language Model.
Combine detected objects and captions to create navigation guidance.
Convert navigation text into audio using Google Text-to-Speech.
Deliver spoken guidance to the user.

Project Structure

Smart-Navigation-System
│
├── app.py
├── modules
│   ├── caption.py
│   ├── detection.py
│   ├── navigation.py
│   ├── tts.py
│   ├── video_processing.py
│   └── webcam_processing.py
│
├── training
│   ├── train_blip.py
│   ├── evaluate_model.py
│   └── dataset_loader.py
│
├── dataset
│
├── requirements.txt
└── README.md

Roadmap & Future Work

Distance Estimation

Integrate depth estimation to determine how far objects are from the user.

Mobile Deployment

Convert the system into a mobile application for real-world use.

Improved Scene Understanding

Use advanced Vision-Language Models such as InstructBLIP or LLaVA for richer descriptions.

Edge Deployment

Optimize models for deployment on edge devices such as Raspberry Pi or smart glasses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart-Navigation-System-for-Visually-Impaired

Key Features

1. Intelligent Scene Understanding

2. Multi-Modal Input Support

3. Audio Guidance

4. Interactive Web Interface

Website Overview

Quick Start

Clone the repository

Install dependencies

Install YOLOv8 model

Run the application

Open the web interface

High-Level Architecture

System Workflow

Project Structure

Roadmap & Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
modules		modules
training		training
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
temp.jpg		temp.jpg

Folders and files

Latest commit

History

Repository files navigation

Smart-Navigation-System-for-Visually-Impaired

Key Features

1. Intelligent Scene Understanding

2. Multi-Modal Input Support

3. Audio Guidance

4. Interactive Web Interface

Website Overview

Quick Start

Clone the repository

Install dependencies

Install YOLOv8 model

Run the application

Open the web interface

High-Level Architecture

System Workflow

Project Structure

Roadmap & Future Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages