Azure TTS Custom Avatar with Custom Neural Voice (CNV) Accelerator under Avatar/streaming #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

LSTii wants to merge 22 commits into Azure:main from LSTii:Azure-TTS-Custom-Avatar-with-Custom-Neural-Voice

avatar/streaming/.env.sample

-Original file line number
+Diff line change
@@ -0,0 +1,46 @@
+    # Azure Speech - Resource Configuration
+    SPEECH_REGION = ""
+    SPEECH_KEY = ""
+    # Azure Speech - Text to Speech Avatar Configuration
+    AVATAR_CHARACTER = ""
+    AVATAR_STYLE = ""
+    IS_CUSTOM_AVATAR = ""
+    AZURE_OPENAI_SYSTEM_PROMPT = "You are an AI language model integrated with a Text-to-Speech system.
+    Please provide all responses in plain text without any markdown formatting or special symbols like #, *, _.
+    Avoid using headings, bullet points, or any other markdown syntax.
+    Your responses should be suitable for direct verbal communication.
+    Your name is Lisa, an AI assistant that helps people find information.
+    Always provide responses that are concise and conversational, strictly limiting responses to 50 words (in one paragraph) and suitable for verbal delivery within 15 seconds.
+    If a response would exceed this limit, summarize the key points to fit within it and steer the user to ask more details instead.
+    If a user's question falls outside available data or context, respond with, 'I can't help with that specific query' or similar message.
+    Again, make sure that the responses are concisely and accurately summarized under 50 words."
+    # Azure Speech - Text to Speech Voice Configuration
+    TTS_VOICE = ""
+    # CUSTOM_VOICE_ENDPOINT=""
+    # PERSONAL_VOICE_SPEAKER_PROFILE=""
+    # Azure OpenAI - Resource Configuration
+    AZURE_OPENAI_ENDPOINT = ""
+    AZURE_OPENAI_API_KEY = ""
+    AZURE_OPENAI_DEPLOYMENT_NAME = ""
+    # Azure Search - Resource Configuration (optional, only required for 'on your data' scenario)
+    COGNITIVE_SEARCH_ENDPOINT = ""
+    COGNITIVE_SEARCH_API_KEY = ""
+    COGNITIVE_SEARCH_INDEX_NAME = ""
+    # CSS Variables (Landscape)
+    WEBPAGE_BACKGROUND_LANDSCAPE = "https://github.com/aadrikasingh/Azure-Text-To-Speech-Avatar/blob/main/static/image/landscape.png?raw=true"
+    WEBPAGE_CHAT_FONTCOLOR_LANDSCAPE = "#EEE"
+    BUTTON_COLOR_LANDSCAPE = "#3E66BA"
+    BUTTON_HOVER_LANDSCAPE = "#28a745"
+    BUTTON_ICON_COLOR_LANDSCAPE = "#FFF"
+    # CSS Variables (Portrait)
+    WEBPAGE_BACKGROUND_PORTRAIT = "https://github.com/aadrikasingh/Azure-Text-To-Speech-Avatar/blob/main/static/image/portrait.png?raw=true"
+    WEBPAGE_CHAT_FONTCOLOR_PORTRAIT = "#EEE"
+    BUTTON_COLOR_PORTRAIT ="#3E66BA"
+    BUTTON_HOVER_PORTRAIT = "#28a745"
+    BUTTON_ICON_COLOR_PORTRAIT = "#FFF"

avatar/streaming/.gitignore

-Original file line number
+Diff line change
@@ -0,0 +1,3 @@
+    /.venv
+    /__pycache__
+    .env

avatar/streaming/README.md

-Original file line number
+Diff line change
@@ -0,0 +1,159 @@
+    # Transforming Digital Interactions with Hyper-Realistic Custom Avatars and Custom Neural Voices
+    This innovative solution combines Azure Text-to-Speech Custom Avatar Real-time API service and Custom Neural Voices to deliver hyper-realistic avatars with lifelike expressions and movements. Paired with advanced AI capabilities, these avatars enable seamless, human-like interactions tailored to diverse applications, from customer support to educational tools. By leveraging Retrieval Augmented Generation (RAG) using Azure OpenAI and Azure AI Search, the system ensures precise, contextually aware responses, redefining the way we engage and communicate in the digital age.
+    ---
+    ## Pre-requisites
+    Ensure the following Azure services are deployed before running this project:
+. **Azure Speech Service**:
+       - For Text-to-Speech (TTS) and Speech-to-Text (STT) functionalities.
+. **Azure OpenAI Service**:
+       - For natural language response generation using GPT models.
+. **Azure AI Search Service**: _(optional, if using your own data)_
+       - For contextual data retrieval using the "Bring Your Own Data" feature of Azure OpenAI.
+       - You can follow the instructions [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/use-your-data-quickstart?tabs=command-line%2Cjavascript-keyless%2Ctypescript-keyless%2Cpython-new&pivots=programming-language-studio).
+. **Azure Storage Account**: _(optional, if using your own data)_
+       - To store customer-provided data for the search service.
+    ---
+    ## Setup Instructions
+    ### Step 1: Clone the Repository
+    Clone the repository to your local environment:
+    ```
+    git clone https://github.com/aadrikasingh/Azure-Text-To-Speech-Avatar.git
+    cd Azure-Text-To-Speech-Avatar
+    ```
+    ### Step 2: Install Dependencies
+    Install required Python packages using:
+    ```
+    pip install -r requirements.txt
+    ```
+    ### Step 3: Configure Environment Variables
+    Create a `.env` file in the project root and set the following environment variables:
+    **(Please ensure you begin with the `.env.sample` template)**
+    #### Azure Speech Configuration
+    ```
+    SPEECH_REGION = "<Azure Speech Region, e.g. westus2>"
+    SPEECH_KEY = "<Azure Speech API Key>"
+    ```
+    #### Avatar Configuration
+    ```
+    AVATAR_CHARACTER="<Avatar Character Name>"
+    AVATAR_STYLE="<Avatar Style>"
+    IS_CUSTOM_AVATAR="<True/False>"
+    ```
+    #### Neural Voice Configuration
+    ```
+    TTS_VOICE="<Name of the TTS voice>"
+    CUSTOM_VOICE_ENDPOINT="<Optional: Endpoint for your Custom Neural Voice>"
+    PERSONAL_VOICE_SPEAKER_PROFILE="<Optional: Speaker profile ID for Personal Neural Voice>"
+    ```
+    #### Azure OpenAI Configuration
+    ```
+    AZURE_OPENAI_ENDPOINT="<Azure OpenAI Endpoint>"
+    AZURE_OPENAI_API_KEY="<Azure OpenAI API Key>"
+    AZURE_OPENAI_DEPLOYMENT_NAME="<Deployment Name>"
+    AZURE_OPENAI_SYSTEM_PROMPT="<System Prompt - update as needed>
+    ```
+    #### Azure AI Search Configuration (Optional)
+    ```
+    COGNITIVE_SEARCH_ENDPOINT="<Azure AI Search Endpoint>"
+    COGNITIVE_SEARCH_API_KEY="<Azure Search API Key>"
+    COGNITIVE_SEARCH_INDEX_NAME="<Search Index Name>"
+    ```
+    #### Webpage Customization
+    For customizing the UI:
+    ```
+    WEBPAGE_BACKGROUND_LANDSCAPE="<URL to Landscape Background>"
+    WEBPAGE_CHAT_FONTCOLOR_LANDSCAPE="#EEE"
+    BUTTON_COLOR_LANDSCAPE="#3E66BA"
+    BUTTON_HOVER_LANDSCAPE="#28a745"
+    WEBPAGE_BACKGROUND_PORTRAIT="<URL to Portrait Background>"
+    WEBPAGE_CHAT_FONTCOLOR_PORTRAIT="#EEE"
+    BUTTON_COLOR_PORTRAIT="#3E66BA"
+    BUTTON_HOVER_PORTRAIT="#28a745"
+    ```
+    #### Set the welcome message
+    Please change line 267 & 268 in static/js/chat.js file
+    ---
+    ## Running the Application
+. **Start the Flask Application**:
+       Run the following command to launch the web app:
+       ```
+       python -m flask run -h 0.0.0.0 -p 5000
+       ```
+. **Access the Web Interface (Landscape Orientation)**:
+       Open your browser and navigate to:
+       ```
+       http://localhost:5000/chat
+       ```
+. **Access the Web Interface (Portrait Orientation)**:
+       Open your browser and navigate to:
+       ```
+       http://localhost:5000/portrait
+       ```
+. **Initialize the Avatar Session**:
+       - Click the first button **(Start Avatar Session)** to establish a connection with Azure TTS Avatar services.
+       - If successful, you will see a live avatar video.
+. **Interact with the Avatar**:
+       - Click the second button **(Start Microphone)** to enable speech input (ensure you allow microphone access in your browser).
+       - Speak or type queries (with the **Chat** button)
+       - The avatar will respond with synchronized audio and video.
+    ---
+    ## Additional Features
+    - **Interrupt Speech**:
+      Use the **"Stop Speaking"** button to halt the avatar mid-sentence.
+    - **Clear Chat History**:
+      Reset the session by clicking the **"Clear Chat History"** button.
+    - **Close Avatar Session**:
+      End the avatar interaction with the **"Close Avatar Session"** button.
+    ---
+    ## Screenshots
+    ### Landscape Mode
+    ![Landscape Mode](https://github.com/aadrikasingh/Azure-Text-To-Speech-Avatar/blob/main/assets/landscape.png?raw=true)
+    ### Portrait Mode
+    ![Portrait Mode](https://github.com/aadrikasingh/Azure-Text-To-Speech-Avatar/blob/main/assets/portrait.png?raw=true)
+    ---
+    ## Adaptation
+    This implementation is adapted from the sample tutorial code provided by Microsoft. For more details, refer to the [original tutorial](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/js/browser/avatar).
+    ---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure TTS Custom Avatar with Custom Neural Voice (CNV) Accelerator under Avatar/streaming #40

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Azure TTS Custom Avatar with Custom Neural Voice (CNV) Accelerator under Avatar/streaming #40

Are you sure you want to change the base?

Uh oh!

Azure TTS Custom Avatar with Custom Neural Voice (CNV) Accelerator under Avatar/streaming #40

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!