AI-Enabled Real-Time Voice Transcription & Reflexive Question Generator

---

📌 Project Overview

This project is an AI-powered healthcare consultation assistant built using:

Flask
Hugging Face Transformers
OpenAI Whisper
Microsoft Phi-2
Google FLAN-T5
OCR + Speech Technologies

The system simulates a doctor-patient consultation workflow where:

✅ Doctor asks medical questions
✅ Patient responds using voice
✅ Speech converts into text in real time
✅ AI analyzes responses intelligently
✅ AI generates contextual follow-up questions
✅ Final medical summary is automatically generated

✨ Core Features

🎙️ 1. Real-Time Voice Transcription

Uses OpenAI Whisper model to:

Capture microphone audio
Detect language automatically
Convert speech into text

Supported Languages

English
Hindi

🤖 2. AI Reflexive Question Generation

Uses Microsoft Phi-2 Transformer Model to:

Analyze patient responses
Understand conversation context
Generate intelligent medical follow-up questions

Example

Doctor:
Do you smoke?

Patient:
Yes occasionally.

AI Follow-Up:
How many cigarettes do you smoke per day?

🩺 3. Medical Consultation Flow

The consultation supports:

Structured medical interview flow
Dynamic AI-generated follow-up questions
Conversation history tracking
Automated consultation state management

📝 4. AI Medical Summary Generation

Uses Google FLAN-T5 model to:

Analyze complete conversation transcript
Generate concise medical summaries
Highlight possible risks and important observations

🖼️ 5. OCR Image Upload Support

Users can upload:

Medical reports
Prescriptions
Lab reports

The system:

Extracts text using OCR (Tesseract)
Processes extracted text
Generates AI summaries automatically

🔊 6. Text-to-Speech Support

Doctor questions are converted into audio using gTTS.

Benefits:

Audio-based interaction
Accessibility support
Natural conversational experience

🛠️ Technologies Used

Backend

Python
Flask

AI / NLP Models

OpenAI Whisper
Microsoft Phi-2
Google FLAN-T5

AI Libraries

Hugging Face Transformers
PyTorch

Speech & Audio

gTTS
SpeechRecognition

OCR

pytesseract
Pillow (PIL)

Frontend

HTML
CSS
JavaScript
Web Speech API

📂 Project Structure

AI_Assistant/                          # Root project folder containing the complete AI healthcare assistant application
│
├── app.py                             # Main Flask backend application handling routes, API calls, conversation flow, TTS, OCR, and frontend communication
│
├── config.py                          # Centralized configuration file for environment variables, logging setup, and application-level settings
│
├── medical_script.json                # Structured predefined medical questionnaire used during consultation flow
│
├── requirements.txt                   # List of all Python dependencies/packages required to run the project
│
├── README.md                          # Complete project documentation, setup guide, workflow explanation, and developer handover notes
│
├── assets/                            # Stores README screenshots/images
│   └── homepage.png                   # Homepage screenshot used in README
│
├── uploads/                           # Stores uploaded images temporarily for OCR processing (created automatically during runtime)
│
├── question.mp3                       # Generated doctor audio question file using gTTS (runtime generated)
│
├── consultation_summary.json          # Final generated consultation summary and transcript output (runtime generated)
│
├── modules/                           # Folder containing all core AI/NLP/business logic modules
│   │
│   ├── __init__.py                    # Marks 'modules' directory as a Python package
│   │
│   ├── nlu.py                         # AI follow-up question generation using Phi-2
│   │
│   ├── state_manager.py               # Conversation state and history management
│   │
│   ├── summary.py                     # Medical summary generation using FLAN-T5
│   │
│   └── transcription.py               # Whisper-based speech-to-text transcription
│
└── templates/                         # Flask HTML templates
    │
    └── index.html                     # Frontend chat interface and browser speech logic

📸 Project Screenshot

📄 File-by-File Explanation

📌 `app.py`

Main Flask backend application.

Responsibilities

Initializes Flask server
Handles frontend routes
Manages API endpoints
Starts consultation
Processes patient responses
Generates doctor audio
Handles image uploads
Performs OCR processing

Main Routes

/
/start
/send_response
/upload_image

📌 `config.py`

Stores application configuration settings.

Contains

Environment variables
Logging configuration
Timeout values

📌 `medical_script.json`

Contains predefined medical interview questions.

Example Structure

{
  "modules": [
    {
      "name": "General Health",
      "questions": [
        "How are you feeling today?",
        "Do you have fever?"
      ]
    }
  ]
}

📌 `modules/nlu.py`

Natural Language Understanding module.

Responsibilities

Load Microsoft Phi-2 model
Analyze patient responses
Generate contextual follow-up questions

📌 `modules/state_manager.py`

Responsible for:

Conversation state management
Conversation history tracking
Question tracking

Acts as the memory manager of the system.

📌 `modules/summary.py`

Uses FLAN-T5 model to:

Generate final consultation summaries
Highlight important medical risks
Produce concise reports

📌 `modules/transcription.py`

Handles:

Speech-to-text conversion
Microphone audio capture
Language detection
Whisper transcription pipeline

NOTE:
Current web version mainly uses browser speech recognition.
This module is primarily useful for CLI/local transcription workflows.

📌 `templates/index.html`

Frontend user interface.

Contains

Chat interface
Speech buttons
Audio playback
AJAX communication with Flask backend

⚙️ Prerequisites

Before running the project, ensure the following are installed.

1️⃣ Python

Required Version

Python 3.9+

Verify Installation

python --version

2️⃣ pip

Python package manager.

Verify

pip --version

3️⃣ Virtual Environment Support

Verify

python -m venv --help

4️⃣ Modern Browser

Reason

Better support for:

Web Speech API
Microphone access
Audio playback

5️⃣ Working Microphone

Required for:

Real-time speech input

🚀 Installation Setup

STEP 1: Open Project Folder

Open terminal inside project directory.

cd AI_Assistant

STEP 2: Create Virtual Environment

Windows

python -m venv venv

macOS / Linux

python3 -m venv venv

Purpose

Creates isolated Python environment
Prevents dependency conflicts
Keeps project dependencies separate

STEP 3: Activate Virtual Environment

Windows

venv\Scripts\activate

macOS / Linux

source venv/bin/activate

After activation, terminal may show:

(venv)

STEP 4: Install Dependencies

pip install -r requirements.txt --upgrade

This installs:

Flask
Transformers
Torch
gTTS
pytesseract
Pillow
SpeechRecognition
etc.

NOTE:
First installation may take time because AI models are large.

STEP 5: Run Application

python app.py

Expected Output

* Running on http://127.0.0.1:5000

STEP 6: Open Browser

Open:

http://127.0.0.1:5000

🔄 FULL APPLICATION FLOW (VERY DETAILED)

This section explains EXACTLY what happens internally from the moment the developer saves files and runs the project.

🧠 COMPLETE SYSTEM FLOW

STEP 1: Developer Saves Files

Developer presses:

CTRL + S

Files saved:

app.py
config.py
modules/nlu.py
modules/summary.py
modules/state_manager.py
modules/transcription.py
templates/index.html
medical_script.json

STEP 2: Application Starts

Command:

python app.py

Python starts executing:

app.py

Execution begins from:

if __name__ == "__main__":

STEP 3: Flask Server Initializes

Inside:

app = Flask(__name__)

Flask initializes:

Web server
API routing
Frontend rendering system

STEP 4: All Modules Load

Files loaded:

config.py
modules/nlu.py
modules/summary.py
modules/state_manager.py

Libraries loaded:

Flask
torch
transformers
gTTS
pytesseract

STEP 5: Browser Opens Application

User opens:

http://127.0.0.1:5000

Browser sends request to Flask backend.

STEP 6: Frontend Loads

Flask renders:

templates/index.html

Browser displays:

Start Consultation button
Chat interface
Speak button
Upload image option

STEP 7: User Starts Consultation

Frontend sends POST request to:

/start

Backend function executed:

start_conversation()

STEP 8: Medical Script Loads

File used:

medical_script.json

Questions are loaded into queue.

STEP 9: Doctor Question Generated

Question:

displayed on frontend
converted to audio using gTTS

Generated file:

question.mp3

STEP 10: User Speaks

Browser activates microphone using:

Web Speech API

Voice converts into text.

STEP 11: Backend Receives Response

Frontend sends response to:

/send_response

Backend:

saves conversation history
checks stop phrases
generates next question

STEP 12: AI Reflexive Question Generation

File responsible:

modules/nlu.py

Model used:

Microsoft Phi-2

AI analyzes:

patient response
context
medical intent

Then generates intelligent follow-up questions.

STEP 13: Consultation Continues

Loop repeats:

Speak
→ Convert Speech
→ Analyze Response
→ Generate Follow-Up
→ Ask Next Question

STEP 14: Final Summary Generation

File responsible:

modules/summary.py

Model used:

Google FLAN-T5

Generates:

medical summary
risks
important observations

STEP 15: Final Result Displayed

Frontend displays:

consultation summary
transcript
completion message

🖼️ IMAGE UPLOAD FLOW

STEP 1: User Uploads Image

Examples:

Prescription
Lab report
Medical document

STEP 2: Flask Receives Image

Route executed:

/upload_image

STEP 3: OCR Executes

Libraries used:

pytesseract
Pillow (PIL)

System extracts text from uploaded image.

STEP 4: AI Summary Generated

Extracted text sent to:

modules/summary.py

FLAN-T5 generates concise medical summary.

STEP 5: Summary Displayed

Frontend shows:

extracted text
generated summary
insights

📌 Important Notes

⏳ Initial Startup May Be Slow

Reason:

AI models download from Hugging Face
Large model weights load into memory

⚡ GPU Recommended

For better performance:

NVIDIA GPU with CUDA recommended

🌐 Browser Recommendation

Avoid

Older browsers with limited speech API support

🚀 Future Improvements

Possible enhancements:

Database integration
Authentication system
Multi-user support
PDF report generation
Medical entity extraction
Emotion detection
Doctor dashboard
Cloud deployment
Streaming transcription
Real-time multilingual translation

👨‍💻 Author Notes

This project demonstrates:

✅ AI + Healthcare integration
✅ Real-time NLP systems
✅ Transformer model pipelines
✅ Voice-enabled conversational AI
✅ End-to-end AI application architecture

Suitable for:

AI/ML portfolios
NLP projects
Healthcare AI demos
Research prototypes
Internship showcases

⭐ If You Like This Project

Please consider:

Starring the repository
Forking the project
Contributing improvements

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
modules		modules
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
consultation_summary.json		consultation_summary.json
main.py		main.py
medical_script.json		medical_script.json
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AI-Enabled Real-Time Voice Transcription & Reflexive Question Generator

📌 Project Overview

✨ Core Features

🎙️ 1. Real-Time Voice Transcription

Supported Languages

🤖 2. AI Reflexive Question Generation

Example

🩺 3. Medical Consultation Flow

📝 4. AI Medical Summary Generation

🖼️ 5. OCR Image Upload Support

🔊 6. Text-to-Speech Support

🛠️ Technologies Used

Backend

AI / NLP Models

AI Libraries

Speech & Audio

OCR

Frontend

📂 Project Structure

📸 Project Screenshot

📄 File-by-File Explanation

📌 app.py

Responsibilities

Main Routes

📌 config.py

Contains

📌 medical_script.json

Example Structure

📌 modules/nlu.py

Responsibilities

📌 modules/state_manager.py

📌 modules/summary.py

📌 modules/transcription.py

📌 templates/index.html

Contains

⚙️ Prerequisites

1️⃣ Python

Required Version

Verify Installation

2️⃣ pip

Verify

3️⃣ Virtual Environment Support

Verify

4️⃣ Modern Browser

Recommended

Reason

5️⃣ Working Microphone

🚀 Installation Setup

STEP 1: Open Project Folder

STEP 2: Create Virtual Environment

Windows

macOS / Linux

Purpose

STEP 3: Activate Virtual Environment

Windows

macOS / Linux

STEP 4: Install Dependencies

This installs:

STEP 5: Run Application

Expected Output

STEP 6: Open Browser

🔄 FULL APPLICATION FLOW (VERY DETAILED)

🧠 COMPLETE SYSTEM FLOW

STEP 1: Developer Saves Files

STEP 2: Application Starts

STEP 3: Flask Server Initializes

STEP 4: All Modules Load

STEP 5: Browser Opens Application

STEP 6: Frontend Loads

STEP 7: User Starts Consultation

STEP 8: Medical Script Loads

STEP 9: Doctor Question Generated

STEP 10: User Speaks

STEP 11: Backend Receives Response

STEP 12: AI Reflexive Question Generation

STEP 13: Consultation Continues

📌 `app.py`

📌 `config.py`

📌 `medical_script.json`

📌 `modules/nlu.py`

📌 `modules/state_manager.py`

📌 `modules/summary.py`

📌 `modules/transcription.py`

📌 `templates/index.html`

Packages