This project is an AI-powered healthcare consultation assistant built using:
- Flask
- Hugging Face Transformers
- OpenAI Whisper
- Microsoft Phi-2
- Google FLAN-T5
- OCR + Speech Technologies
The system simulates a doctor-patient consultation workflow where:
β
Doctor asks medical questions
β
Patient responds using voice
β
Speech converts into text in real time
β
AI analyzes responses intelligently
β
AI generates contextual follow-up questions
β
Final medical summary is automatically generated
Uses OpenAI Whisper model to:
- Capture microphone audio
- Detect language automatically
- Convert speech into text
- English
- Hindi
Uses Microsoft Phi-2 Transformer Model to:
- Analyze patient responses
- Understand conversation context
- Generate intelligent medical follow-up questions
Doctor:
Do you smoke?
Patient:
Yes occasionally.
AI Follow-Up:
How many cigarettes do you smoke per day?
The consultation supports:
- Structured medical interview flow
- Dynamic AI-generated follow-up questions
- Conversation history tracking
- Automated consultation state management
Uses Google FLAN-T5 model to:
- Analyze complete conversation transcript
- Generate concise medical summaries
- Highlight possible risks and important observations
Users can upload:
- Medical reports
- Prescriptions
- Lab reports
The system:
- Extracts text using OCR (Tesseract)
- Processes extracted text
- Generates AI summaries automatically
Doctor questions are converted into audio using gTTS.
Benefits:
- Audio-based interaction
- Accessibility support
- Natural conversational experience
- Python
- Flask
- OpenAI Whisper
- Microsoft Phi-2
- Google FLAN-T5
- Hugging Face Transformers
- PyTorch
- gTTS
- SpeechRecognition
- pytesseract
- Pillow (PIL)
- HTML
- CSS
- JavaScript
- Web Speech API
AI_Assistant/ # Root project folder containing the complete AI healthcare assistant application
β
βββ app.py # Main Flask backend application handling routes, API calls, conversation flow, TTS, OCR, and frontend communication
β
βββ config.py # Centralized configuration file for environment variables, logging setup, and application-level settings
β
βββ medical_script.json # Structured predefined medical questionnaire used during consultation flow
β
βββ requirements.txt # List of all Python dependencies/packages required to run the project
β
βββ README.md # Complete project documentation, setup guide, workflow explanation, and developer handover notes
β
βββ assets/ # Stores README screenshots/images
β βββ homepage.png # Homepage screenshot used in README
β
βββ uploads/ # Stores uploaded images temporarily for OCR processing (created automatically during runtime)
β
βββ question.mp3 # Generated doctor audio question file using gTTS (runtime generated)
β
βββ consultation_summary.json # Final generated consultation summary and transcript output (runtime generated)
β
βββ modules/ # Folder containing all core AI/NLP/business logic modules
β β
β βββ __init__.py # Marks 'modules' directory as a Python package
β β
β βββ nlu.py # AI follow-up question generation using Phi-2
β β
β βββ state_manager.py # Conversation state and history management
β β
β βββ summary.py # Medical summary generation using FLAN-T5
β β
β βββ transcription.py # Whisper-based speech-to-text transcription
β
βββ templates/ # Flask HTML templates
β
βββ index.html # Frontend chat interface and browser speech logicMain Flask backend application.
- Initializes Flask server
- Handles frontend routes
- Manages API endpoints
- Starts consultation
- Processes patient responses
- Generates doctor audio
- Handles image uploads
- Performs OCR processing
//start/send_response/upload_image
Stores application configuration settings.
- Environment variables
- Logging configuration
- Timeout values
Contains predefined medical interview questions.
{
"modules": [
{
"name": "General Health",
"questions": [
"How are you feeling today?",
"Do you have fever?"
]
}
]
}Natural Language Understanding module.
- Load Microsoft Phi-2 model
- Analyze patient responses
- Generate contextual follow-up questions
Responsible for:
- Conversation state management
- Conversation history tracking
- Question tracking
Acts as the memory manager of the system.
Uses FLAN-T5 model to:
- Generate final consultation summaries
- Highlight important medical risks
- Produce concise reports
Handles:
- Speech-to-text conversion
- Microphone audio capture
- Language detection
- Whisper transcription pipeline
NOTE:
Current web version mainly uses browser speech recognition.
This module is primarily useful for CLI/local transcription workflows.
Frontend user interface.
- Chat interface
- Speech buttons
- Audio playback
- AJAX communication with Flask backend
Before running the project, ensure the following are installed.
- Python 3.9+
python --versionPython package manager.
pip --versionpython -m venv --help- Google Chrome
Better support for:
- Web Speech API
- Microphone access
- Audio playback
Required for:
- Real-time speech input
Open terminal inside project directory.
cd AI_Assistantpython -m venv venvpython3 -m venv venv- Creates isolated Python environment
- Prevents dependency conflicts
- Keeps project dependencies separate
venv\Scripts\activatesource venv/bin/activateAfter activation, terminal may show:
(venv)pip install -r requirements.txt --upgrade- Flask
- Transformers
- Torch
- gTTS
- pytesseract
- Pillow
- SpeechRecognition
- etc.
NOTE:
First installation may take time because AI models are large.
python app.py* Running on http://127.0.0.1:5000Open:
http://127.0.0.1:5000
This section explains EXACTLY what happens internally from the moment the developer saves files and runs the project.
Developer presses:
CTRL + S
Files saved:
app.py
config.py
modules/nlu.py
modules/summary.py
modules/state_manager.py
modules/transcription.py
templates/index.html
medical_script.json
Command:
python app.pyPython starts executing:
app.py
Execution begins from:
if __name__ == "__main__":Inside:
app = Flask(__name__)Flask initializes:
- Web server
- API routing
- Frontend rendering system
Files loaded:
config.pymodules/nlu.pymodules/summary.pymodules/state_manager.py
Libraries loaded:
- Flask
- torch
- transformers
- gTTS
- pytesseract
User opens:
http://127.0.0.1:5000
Browser sends request to Flask backend.
Flask renders:
templates/index.html
Browser displays:
- Start Consultation button
- Chat interface
- Speak button
- Upload image option
Frontend sends POST request to:
/start
Backend function executed:
start_conversation()File used:
medical_script.json
Questions are loaded into queue.
Question:
- displayed on frontend
- converted to audio using gTTS
Generated file:
question.mp3
Browser activates microphone using:
- Web Speech API
Voice converts into text.
Frontend sends response to:
/send_response
Backend:
- saves conversation history
- checks stop phrases
- generates next question
File responsible:
modules/nlu.py
Model used:
Microsoft Phi-2
AI analyzes:
- patient response
- context
- medical intent
Then generates intelligent follow-up questions.
Loop repeats:
Speak
β Convert Speech
β Analyze Response
β Generate Follow-Up
β Ask Next Question
File responsible:
modules/summary.py
Model used:
Google FLAN-T5
Generates:
- medical summary
- risks
- important observations
Frontend displays:
- consultation summary
- transcript
- completion message
Examples:
- Prescription
- Lab report
- Medical document
Route executed:
/upload_imageLibraries used:
- pytesseract
- Pillow (PIL)
System extracts text from uploaded image.
Extracted text sent to:
modules/summary.py
FLAN-T5 generates concise medical summary.
Frontend shows:
- extracted text
- generated summary
- insights
Reason:
- AI models download from Hugging Face
- Large model weights load into memory
For better performance:
- NVIDIA GPU with CUDA recommended
- Google Chrome
- Older browsers with limited speech API support
Possible enhancements:
- Database integration
- Authentication system
- Multi-user support
- PDF report generation
- Medical entity extraction
- Emotion detection
- Doctor dashboard
- Cloud deployment
- Streaming transcription
- Real-time multilingual translation
This project demonstrates:
β
AI + Healthcare integration
β
Real-time NLP systems
β
Transformer model pipelines
β
Voice-enabled conversational AI
β
End-to-end AI application architecture
Suitable for:
- AI/ML portfolios
- NLP projects
- Healthcare AI demos
- Research prototypes
- Internship showcases
Please consider:
- Starring the repository
- Forking the project
- Contributing improvements
