Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions avatar/streaming/.env.sample
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Azure Speech - Resource Configuration
SPEECH_REGION = ""
SPEECH_KEY = ""

# Azure Speech - Text to Speech Avatar Configuration
AVATAR_CHARACTER = ""
AVATAR_STYLE = ""
IS_CUSTOM_AVATAR = ""
AZURE_OPENAI_SYSTEM_PROMPT = "You are an AI language model integrated with a Text-to-Speech system.
Please provide all responses in plain text without any markdown formatting or special symbols like #, *, _.
Avoid using headings, bullet points, or any other markdown syntax.
Your responses should be suitable for direct verbal communication.
Your name is Lisa, an AI assistant that helps people find information.
Always provide responses that are concise and conversational, strictly limiting responses to 50 words (in one paragraph) and suitable for verbal delivery within 15 seconds.
If a response would exceed this limit, summarize the key points to fit within it and steer the user to ask more details instead.
If a user's question falls outside available data or context, respond with, 'I can't help with that specific query' or similar message.
Again, make sure that the responses are concisely and accurately summarized under 50 words."

# Azure Speech - Text to Speech Voice Configuration
TTS_VOICE = ""
# CUSTOM_VOICE_ENDPOINT=""
# PERSONAL_VOICE_SPEAKER_PROFILE=""

# Azure OpenAI - Resource Configuration
AZURE_OPENAI_ENDPOINT = ""
AZURE_OPENAI_API_KEY = ""
AZURE_OPENAI_DEPLOYMENT_NAME = ""

# Azure Search - Resource Configuration (optional, only required for 'on your data' scenario)
COGNITIVE_SEARCH_ENDPOINT = ""
COGNITIVE_SEARCH_API_KEY = ""
COGNITIVE_SEARCH_INDEX_NAME = ""

# CSS Variables (Landscape)
WEBPAGE_BACKGROUND_LANDSCAPE = "https://github.com/aadrikasingh/Azure-Text-To-Speech-Avatar/blob/main/static/image/landscape.png?raw=true"
WEBPAGE_CHAT_FONTCOLOR_LANDSCAPE = "#EEE"
BUTTON_COLOR_LANDSCAPE = "#3E66BA"
BUTTON_HOVER_LANDSCAPE = "#28a745"
BUTTON_ICON_COLOR_LANDSCAPE = "#FFF"

# CSS Variables (Portrait)
WEBPAGE_BACKGROUND_PORTRAIT = "https://github.com/aadrikasingh/Azure-Text-To-Speech-Avatar/blob/main/static/image/portrait.png?raw=true"
WEBPAGE_CHAT_FONTCOLOR_PORTRAIT = "#EEE"
BUTTON_COLOR_PORTRAIT ="#3E66BA"
BUTTON_HOVER_PORTRAIT = "#28a745"
BUTTON_ICON_COLOR_PORTRAIT = "#FFF"
3 changes: 3 additions & 0 deletions avatar/streaming/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/.venv
/__pycache__
.env
159 changes: 159 additions & 0 deletions avatar/streaming/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# Transforming Digital Interactions with Hyper-Realistic Custom Avatars and Custom Neural Voices

This innovative solution combines Azure Text-to-Speech Custom Avatar Real-time API service and Custom Neural Voices to deliver hyper-realistic avatars with lifelike expressions and movements. Paired with advanced AI capabilities, these avatars enable seamless, human-like interactions tailored to diverse applications, from customer support to educational tools. By leveraging Retrieval Augmented Generation (RAG) using Azure OpenAI and Azure AI Search, the system ensures precise, contextually aware responses, redefining the way we engage and communicate in the digital age.

---

## Pre-requisites

Ensure the following Azure services are deployed before running this project:

1. **Azure Speech Service**:
- For Text-to-Speech (TTS) and Speech-to-Text (STT) functionalities.
2. **Azure OpenAI Service**:
- For natural language response generation using GPT models.
3. **Azure AI Search Service**: _(optional, if using your own data)_
- For contextual data retrieval using the "Bring Your Own Data" feature of Azure OpenAI.
- You can follow the instructions [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/use-your-data-quickstart?tabs=command-line%2Cjavascript-keyless%2Ctypescript-keyless%2Cpython-new&pivots=programming-language-studio).
4. **Azure Storage Account**: _(optional, if using your own data)_
- To store customer-provided data for the search service.

---

## Setup Instructions

### Step 1: Clone the Repository
Clone the repository to your local environment:

```
git clone https://github.com/aadrikasingh/Azure-Text-To-Speech-Avatar.git
cd Azure-Text-To-Speech-Avatar
```

### Step 2: Install Dependencies
Install required Python packages using:

```
pip install -r requirements.txt
```

### Step 3: Configure Environment Variables
Create a `.env` file in the project root and set the following environment variables:

**(Please ensure you begin with the `.env.sample` template)**

#### Azure Speech Configuration
```
SPEECH_REGION = "<Azure Speech Region, e.g. westus2>"
SPEECH_KEY = "<Azure Speech API Key>"
```

#### Avatar Configuration
```
AVATAR_CHARACTER="<Avatar Character Name>"
AVATAR_STYLE="<Avatar Style>"
IS_CUSTOM_AVATAR="<True/False>"
```

#### Neural Voice Configuration
```
TTS_VOICE="<Name of the TTS voice>"
CUSTOM_VOICE_ENDPOINT="<Optional: Endpoint for your Custom Neural Voice>"
PERSONAL_VOICE_SPEAKER_PROFILE="<Optional: Speaker profile ID for Personal Neural Voice>"
```

#### Azure OpenAI Configuration
```
AZURE_OPENAI_ENDPOINT="<Azure OpenAI Endpoint>"
AZURE_OPENAI_API_KEY="<Azure OpenAI API Key>"
AZURE_OPENAI_DEPLOYMENT_NAME="<Deployment Name>"
AZURE_OPENAI_SYSTEM_PROMPT="<System Prompt - update as needed>
```

#### Azure AI Search Configuration (Optional)
```
COGNITIVE_SEARCH_ENDPOINT="<Azure AI Search Endpoint>"
COGNITIVE_SEARCH_API_KEY="<Azure Search API Key>"
COGNITIVE_SEARCH_INDEX_NAME="<Search Index Name>"
```

#### Webpage Customization
For customizing the UI:
```
WEBPAGE_BACKGROUND_LANDSCAPE="<URL to Landscape Background>"
WEBPAGE_CHAT_FONTCOLOR_LANDSCAPE="#EEE"
BUTTON_COLOR_LANDSCAPE="#3E66BA"
BUTTON_HOVER_LANDSCAPE="#28a745"

WEBPAGE_BACKGROUND_PORTRAIT="<URL to Portrait Background>"
WEBPAGE_CHAT_FONTCOLOR_PORTRAIT="#EEE"
BUTTON_COLOR_PORTRAIT="#3E66BA"
BUTTON_HOVER_PORTRAIT="#28a745"
```

#### Set the welcome message
Please change line 267 & 268 in static/js/chat.js file

---

## Running the Application

1. **Start the Flask Application**:

Run the following command to launch the web app:
```
python -m flask run -h 0.0.0.0 -p 5000
```

2. **Access the Web Interface (Landscape Orientation)**:

Open your browser and navigate to:
```
http://localhost:5000/chat
```

3. **Access the Web Interface (Portrait Orientation)**:

Open your browser and navigate to:
```
http://localhost:5000/portrait
```

4. **Initialize the Avatar Session**:
- Click the first button **(Start Avatar Session)** to establish a connection with Azure TTS Avatar services.
- If successful, you will see a live avatar video.

4. **Interact with the Avatar**:
- Click the second button **(Start Microphone)** to enable speech input (ensure you allow microphone access in your browser).
- Speak or type queries (with the **Chat** button)
- The avatar will respond with synchronized audio and video.

---

## Additional Features

- **Interrupt Speech**:
Use the **"Stop Speaking"** button to halt the avatar mid-sentence.

- **Clear Chat History**:
Reset the session by clicking the **"Clear Chat History"** button.

- **Close Avatar Session**:
End the avatar interaction with the **"Close Avatar Session"** button.

---

## Screenshots

### Landscape Mode
![Landscape Mode](https://github.com/aadrikasingh/Azure-Text-To-Speech-Avatar/blob/main/assets/landscape.png?raw=true)

### Portrait Mode
![Portrait Mode](https://github.com/aadrikasingh/Azure-Text-To-Speech-Avatar/blob/main/assets/portrait.png?raw=true)

---

## Adaptation
This implementation is adapted from the sample tutorial code provided by Microsoft. For more details, refer to the [original tutorial](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/js/browser/avatar).

---
Loading