Skip to content

Jranii/ai_assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI-Enabled Real-Time Voice Transcription & Reflexive Question Generator

Python Flask Transformers PyTorch Status

---

πŸ“Œ Project Overview

This project is an AI-powered healthcare consultation assistant built using:

  • Flask
  • Hugging Face Transformers
  • OpenAI Whisper
  • Microsoft Phi-2
  • Google FLAN-T5
  • OCR + Speech Technologies

The system simulates a doctor-patient consultation workflow where:

βœ… Doctor asks medical questions
βœ… Patient responds using voice
βœ… Speech converts into text in real time
βœ… AI analyzes responses intelligently
βœ… AI generates contextual follow-up questions
βœ… Final medical summary is automatically generated


✨ Core Features


πŸŽ™οΈ 1. Real-Time Voice Transcription

Uses OpenAI Whisper model to:

  • Capture microphone audio
  • Detect language automatically
  • Convert speech into text

Supported Languages

  • English
  • Hindi

πŸ€– 2. AI Reflexive Question Generation

Uses Microsoft Phi-2 Transformer Model to:

  • Analyze patient responses
  • Understand conversation context
  • Generate intelligent medical follow-up questions

Example

Doctor:
Do you smoke?

Patient:
Yes occasionally.

AI Follow-Up:
How many cigarettes do you smoke per day?


🩺 3. Medical Consultation Flow

The consultation supports:

  • Structured medical interview flow
  • Dynamic AI-generated follow-up questions
  • Conversation history tracking
  • Automated consultation state management

πŸ“ 4. AI Medical Summary Generation

Uses Google FLAN-T5 model to:

  • Analyze complete conversation transcript
  • Generate concise medical summaries
  • Highlight possible risks and important observations

πŸ–ΌοΈ 5. OCR Image Upload Support

Users can upload:

  • Medical reports
  • Prescriptions
  • Lab reports

The system:

  • Extracts text using OCR (Tesseract)
  • Processes extracted text
  • Generates AI summaries automatically

πŸ”Š 6. Text-to-Speech Support

Doctor questions are converted into audio using gTTS.

Benefits:

  • Audio-based interaction
  • Accessibility support
  • Natural conversational experience

πŸ› οΈ Technologies Used


Backend

  • Python
  • Flask

AI / NLP Models

  • OpenAI Whisper
  • Microsoft Phi-2
  • Google FLAN-T5

AI Libraries

  • Hugging Face Transformers
  • PyTorch

Speech & Audio

  • gTTS
  • SpeechRecognition

OCR

  • pytesseract
  • Pillow (PIL)

Frontend

  • HTML
  • CSS
  • JavaScript
  • Web Speech API

πŸ“‚ Project Structure

AI_Assistant/                          # Root project folder containing the complete AI healthcare assistant application
β”‚
β”œβ”€β”€ app.py                             # Main Flask backend application handling routes, API calls, conversation flow, TTS, OCR, and frontend communication
β”‚
β”œβ”€β”€ config.py                          # Centralized configuration file for environment variables, logging setup, and application-level settings
β”‚
β”œβ”€β”€ medical_script.json                # Structured predefined medical questionnaire used during consultation flow
β”‚
β”œβ”€β”€ requirements.txt                   # List of all Python dependencies/packages required to run the project
β”‚
β”œβ”€β”€ README.md                          # Complete project documentation, setup guide, workflow explanation, and developer handover notes
β”‚
β”œβ”€β”€ assets/                            # Stores README screenshots/images
β”‚   └── homepage.png                   # Homepage screenshot used in README
β”‚
β”œβ”€β”€ uploads/                           # Stores uploaded images temporarily for OCR processing (created automatically during runtime)
β”‚
β”œβ”€β”€ question.mp3                       # Generated doctor audio question file using gTTS (runtime generated)
β”‚
β”œβ”€β”€ consultation_summary.json          # Final generated consultation summary and transcript output (runtime generated)
β”‚
β”œβ”€β”€ modules/                           # Folder containing all core AI/NLP/business logic modules
β”‚   β”‚
β”‚   β”œβ”€β”€ __init__.py                    # Marks 'modules' directory as a Python package
β”‚   β”‚
β”‚   β”œβ”€β”€ nlu.py                         # AI follow-up question generation using Phi-2
β”‚   β”‚
β”‚   β”œβ”€β”€ state_manager.py               # Conversation state and history management
β”‚   β”‚
β”‚   β”œβ”€β”€ summary.py                     # Medical summary generation using FLAN-T5
β”‚   β”‚
β”‚   └── transcription.py               # Whisper-based speech-to-text transcription
β”‚
└── templates/                         # Flask HTML templates
    β”‚
    └── index.html                     # Frontend chat interface and browser speech logic

πŸ“Έ Project Screenshot

Homepage


πŸ“„ File-by-File Explanation


πŸ“Œ app.py

Main Flask backend application.

Responsibilities

  • Initializes Flask server
  • Handles frontend routes
  • Manages API endpoints
  • Starts consultation
  • Processes patient responses
  • Generates doctor audio
  • Handles image uploads
  • Performs OCR processing

Main Routes

  • /
  • /start
  • /send_response
  • /upload_image

πŸ“Œ config.py

Stores application configuration settings.

Contains

  • Environment variables
  • Logging configuration
  • Timeout values

πŸ“Œ medical_script.json

Contains predefined medical interview questions.

Example Structure

{
  "modules": [
    {
      "name": "General Health",
      "questions": [
        "How are you feeling today?",
        "Do you have fever?"
      ]
    }
  ]
}

πŸ“Œ modules/nlu.py

Natural Language Understanding module.

Responsibilities

  • Load Microsoft Phi-2 model
  • Analyze patient responses
  • Generate contextual follow-up questions

πŸ“Œ modules/state_manager.py

Responsible for:

  • Conversation state management
  • Conversation history tracking
  • Question tracking

Acts as the memory manager of the system.


πŸ“Œ modules/summary.py

Uses FLAN-T5 model to:

  • Generate final consultation summaries
  • Highlight important medical risks
  • Produce concise reports

πŸ“Œ modules/transcription.py

Handles:

  • Speech-to-text conversion
  • Microphone audio capture
  • Language detection
  • Whisper transcription pipeline

NOTE:
Current web version mainly uses browser speech recognition.
This module is primarily useful for CLI/local transcription workflows.


πŸ“Œ templates/index.html

Frontend user interface.

Contains

  • Chat interface
  • Speech buttons
  • Audio playback
  • AJAX communication with Flask backend

βš™οΈ Prerequisites

Before running the project, ensure the following are installed.


1️⃣ Python

Required Version

  • Python 3.9+

Verify Installation

python --version

2️⃣ pip

Python package manager.

Verify

pip --version

3️⃣ Virtual Environment Support

Verify

python -m venv --help

4️⃣ Modern Browser

Recommended

  • Google Chrome

Reason

Better support for:

  • Web Speech API
  • Microphone access
  • Audio playback

5️⃣ Working Microphone

Required for:

  • Real-time speech input

πŸš€ Installation Setup


STEP 1: Open Project Folder

Open terminal inside project directory.

cd AI_Assistant

STEP 2: Create Virtual Environment

Windows

python -m venv venv

macOS / Linux

python3 -m venv venv

Purpose

  • Creates isolated Python environment
  • Prevents dependency conflicts
  • Keeps project dependencies separate

STEP 3: Activate Virtual Environment

Windows

venv\Scripts\activate

macOS / Linux

source venv/bin/activate

After activation, terminal may show:

(venv)

STEP 4: Install Dependencies

pip install -r requirements.txt --upgrade

This installs:

  • Flask
  • Transformers
  • Torch
  • gTTS
  • pytesseract
  • Pillow
  • SpeechRecognition
  • etc.

NOTE:
First installation may take time because AI models are large.


STEP 5: Run Application

python app.py

Expected Output

* Running on http://127.0.0.1:5000

STEP 6: Open Browser

Open:

http://127.0.0.1:5000

πŸ”„ FULL APPLICATION FLOW (VERY DETAILED)

This section explains EXACTLY what happens internally from the moment the developer saves files and runs the project.


🧠 COMPLETE SYSTEM FLOW


STEP 1: Developer Saves Files

Developer presses:

CTRL + S

Files saved:

app.py
config.py
modules/nlu.py
modules/summary.py
modules/state_manager.py
modules/transcription.py
templates/index.html
medical_script.json

STEP 2: Application Starts

Command:

python app.py

Python starts executing:

app.py

Execution begins from:

if __name__ == "__main__":

STEP 3: Flask Server Initializes

Inside:

app = Flask(__name__)

Flask initializes:

  • Web server
  • API routing
  • Frontend rendering system

STEP 4: All Modules Load

Files loaded:

  • config.py
  • modules/nlu.py
  • modules/summary.py
  • modules/state_manager.py

Libraries loaded:

  • Flask
  • torch
  • transformers
  • gTTS
  • pytesseract

STEP 5: Browser Opens Application

User opens:

http://127.0.0.1:5000

Browser sends request to Flask backend.


STEP 6: Frontend Loads

Flask renders:

templates/index.html

Browser displays:

  • Start Consultation button
  • Chat interface
  • Speak button
  • Upload image option

STEP 7: User Starts Consultation

Frontend sends POST request to:

/start

Backend function executed:

start_conversation()

STEP 8: Medical Script Loads

File used:

medical_script.json

Questions are loaded into queue.


STEP 9: Doctor Question Generated

Question:

  • displayed on frontend
  • converted to audio using gTTS

Generated file:

question.mp3

STEP 10: User Speaks

Browser activates microphone using:

  • Web Speech API

Voice converts into text.


STEP 11: Backend Receives Response

Frontend sends response to:

/send_response

Backend:

  • saves conversation history
  • checks stop phrases
  • generates next question

STEP 12: AI Reflexive Question Generation

File responsible:

modules/nlu.py

Model used:

Microsoft Phi-2

AI analyzes:

  • patient response
  • context
  • medical intent

Then generates intelligent follow-up questions.


STEP 13: Consultation Continues

Loop repeats:

Speak
β†’ Convert Speech
β†’ Analyze Response
β†’ Generate Follow-Up
β†’ Ask Next Question

STEP 14: Final Summary Generation

File responsible:

modules/summary.py

Model used:

Google FLAN-T5

Generates:

  • medical summary
  • risks
  • important observations

STEP 15: Final Result Displayed

Frontend displays:

  • consultation summary
  • transcript
  • completion message

πŸ–ΌοΈ IMAGE UPLOAD FLOW


STEP 1: User Uploads Image

Examples:

  • Prescription
  • Lab report
  • Medical document

STEP 2: Flask Receives Image

Route executed:

/upload_image

STEP 3: OCR Executes

Libraries used:

  • pytesseract
  • Pillow (PIL)

System extracts text from uploaded image.


STEP 4: AI Summary Generated

Extracted text sent to:

modules/summary.py

FLAN-T5 generates concise medical summary.


STEP 5: Summary Displayed

Frontend shows:

  • extracted text
  • generated summary
  • insights

πŸ“Œ Important Notes


⏳ Initial Startup May Be Slow

Reason:

  • AI models download from Hugging Face
  • Large model weights load into memory

⚑ GPU Recommended

For better performance:

  • NVIDIA GPU with CUDA recommended

🌐 Browser Recommendation

Recommended

  • Google Chrome

Avoid

  • Older browsers with limited speech API support

πŸš€ Future Improvements

Possible enhancements:

  • Database integration
  • Authentication system
  • Multi-user support
  • PDF report generation
  • Medical entity extraction
  • Emotion detection
  • Doctor dashboard
  • Cloud deployment
  • Streaming transcription
  • Real-time multilingual translation

πŸ‘¨β€πŸ’» Author Notes

This project demonstrates:

βœ… AI + Healthcare integration
βœ… Real-time NLP systems
βœ… Transformer model pipelines
βœ… Voice-enabled conversational AI
βœ… End-to-end AI application architecture

Suitable for:

  • AI/ML portfolios
  • NLP projects
  • Healthcare AI demos
  • Research prototypes
  • Internship showcases

⭐ If You Like This Project

Please consider:

  • Starring the repository
  • Forking the project
  • Contributing improvements

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors