Interactive Kiosk AI Chatbot

版本: 2022   发布日期: 03/22/2022  

最近更新日期: 05/04/2022

 Overview


The Interactive Kiosk AI Chatbot is a development framework containing automatic speech recognition (ASR), text to speech (TTS), and natural language processing (NLP) microservices. This reference implementation (RI) leverages the deep learning algorithms and models contained in the Intel® Distribution of OpenVINO™ toolkit.

Use the framework to:

  • Build and deploy a kiosk designed for a banking use case.
  • Develop for your own use case. 

Customize the RI by replacing microservice components with your own.

Select Configure & Download to download the reference implementation.

Configure & Download


Screenshot of the Sample Web UI Dashboard


Time to Complete

Programming
Language

Software Dependencies

Approximately 2-3 hours

Python 3*


Docker* 

Docker Compose* 

Intel® Distribution of OpenVINO™ toolkit 

Rasa Open Source* 

Open Bank Project* (OBP*) 

Portainer* 


Other Requirements 

Some software packages require root permissions. 


Target System Requirements

  • Ubuntu* 18.04 LTS or 20.04.2 LTS 
  • 8th Generation (and above) Intel® Core™ Processor, 32GB RAM and 100GB of free space on HDD 
  • Wired (USB) microphone: mic array, headset, conference device, etc.
  • Wired speaker: headset, headphones, conference device, etc.
  • Network with speed great than 60 MBPS to support speech

Wired Microphone and Speaker Device(s) Required
The software uses:
-  A microphone/recording device for recording voice audio
A speaker/playback device for playing audio

The software has been tested with these devices: 
- Seeed Studio ReSpeaker Mic Array v2.0* 
- Plantronics Blackwire 3220 Series* 
- Maono AU-A04 Condenser Microphone Kit* 
- Logitech H340 USB PC Headset with Noise Canceling Mic* 

The recording and playback devices may be separate devices or reside on the same device, such as a headset with a microphone. The software may work with other wired microphones, mic arrays, speakers, and headset combinations. Wireless headsets are not supported. 


Learning Objectives

Use the reference implementation to: 

  • Integrate an end-to-end speech and audio framework of microservices
  • Incorporate a deep learning microservice architecture into a retail banking proof-of-concept 
  • Leverage components of Intel® Distribution of OpenVINO™ toolkit 

Customize, expand, or replace microservices provided in the RI to extend the capabilities. 

Helpful Background Knowledge

It is helpful to have a working knowledge of:  

  • Machine learning technologies and deep learning models for ASR, TTS, NLP, and Rasa Open Source 
  • Docker and associated commands 
  • Authentication methods, such as JSON Web Tokens (JWT) 

It is possible to install and use the reference implementation without this background knowledge. 

How It Works

The Interactive Kiosk AI Chatbot consists of a microservice framework with deep learning algorithms. Table 2 lists the microservices and models included. 

Table 2: Reference Implementation Software Components
MICROSERVICE FUNCTION MODELS & COMPONENTS
Automatic Speech Recognition (ASR) Converts speech-to-text using four pre-trained deep learning models

Supported models

  • Kaldi*
  • Mozilla DeepSpeech*
  • QuartzNet*
  • Facebook Hugging Face wav2vec2-base-960h*
Natural Learning Processing (NLP) Classifies the input text based on keywords and  corresponding intents and makes software calls based on the classified intent. For bank-related intents, the NLP makes and receives calls to and from Open Bank Project (OBP).
Text to Speech (TTS) Converts text to speech. 

Supported models:

  • forward tacotron*
  • melgan*

For more details, refer to Text to Speech Python* Demo.

Audio Ingestion Receives the audio in wave file format and sends to ASR.

For more information, see, WAVE File Ingestion (Default).
The RI publishes on the ZeroMQ* specified port by topic.
Audio Ingestion2 Records audio from microphone and requires user interaction. 

For more information, see, Live Speech Ingestion.
The RI uses the ReSpeaker SDK*.
AuthZ Checks user credentials at login. The RI uses OBP, which requires users to register and obtain credentials.

 

In addition to ASR, NLP, and TTS, the RI uses:  

  • ZeroMQ: A data bus for brokerless asynchronous messaging and exchange of data across multiple microservices. The RI supports the publish–subscribe data distribution pattern for the exchange of messages by the microservices over ZeroMQ.  
  • Open Bank Project (OBP): A sandbox that allows users to experiment with banking APIs outside a production environment. The RI uses a cloud-based OBP sandbox, but users can also deploy the OBP server locally. 

NOTE: Replace Microservices in Production Environments
For example, replace the OBP developer sandbox, which is intended as a placeholder for customized and/or proprietary banking APIs.  

Software Flow: A Banking Query Example

The RI responds to some user greetings, basic conversational speech, and a few banking-related requests/queries. Figure 1 illustrates the software flow through the architecture components.

Microservices and Architectural Components

Figure 1: High-level  Architecture and Software Flow

 


 

The numbered steps below correspond to the numbered software flow in Figure 1. The software flow assumes:

  • A wake word: a word that alerts the RI to start listening for speech. The current wake word is ReSpeaker (ree-spee-kr). To change the wake word, see Modify Wake Word.
  • Subscription to bus topics: a string of characters representing data categories (e.g., ASR subscribes to audio) on the ZeroMQ bus.  
  • Post-authentication: user credentials already verified.
  • Banking-related speech: query or request related to banking. When a user makes a comment or request unrelated to banking (e.g., “Good morning”), steps 7 and 8 in Figure 1 do not occur.
  • OBP access: online sandbox, not installed locally.  
  1. A user says, “[Wake word][pause one second], I would like to know my account balance.” 
  2. Audio input from the microphone is passed to audio ingestion. 
  3. Audio ingestion publishes the audio buffer to ZeroMQ with the topic audio
  4. The ASR receives the audio buffers in chunks. ASR converts speech-to-text using pre-trained deep learning models.  
  5. The ASR publishes the text on the ZeroMQ databus with the topic text
  6. The NLP receives text and classifies it by intent, groups of keywords the NLP recognizes.  
  7. The NLP makes a corresponding REST API call to the OBP server to query the account balance.  
  8. OBP server responds to the REST API call with the account balance. 
  9. The NLP processes and publishes the REST API call response on the ZeroMQ databus with the topic nlp. 
  10. The TTS converts the text to waveform audio file (wav) format.  
  11. The TTS publishes the text and wav on the ZeroMQ with the topic tts
  12. The TTS output is sent to the speaker. The RI replies, “Balance for the savings account ending with ____ is ____ dollars.” 

​NOTE: Online Sandbox
In Get Started: Steps 1-3, the OBP is assumed to be running via the online sandbox.

What You'll Do

The application presents two run methods, types of ingestion, described in Table 3.

Here is a sneak preview of what you’ll do for both types of ingestion in Get Started Steps 1-3 and Tutorials:

  • Step 1: Connect the Devices: connect recording and playback devices.
  • Step 2: Install the Reference Implementation: download the software package, set configuration, and build and deploy the source code for both types of ingestion. 
  • Step 3: Run the Application: interact with your microphone and speaker to experience both WAV file and live speech ingestion with a scripted conversation scenario. 
  • Tutorials: complete tutorials to replace the ASR model and modify the NLP. 

 

Table 3: Run Methods
RUN METHODS DESCRIPTION INPUT OUTPUT
WAVE File Ingestion (Default)  A quick test of the RI with sample audio files to illustrate how it responds to speech. 

Recommended for first-time deployment. 
The software uses supplied sample audio files (audio0.wav-audio3.wav). 

No user input is required. Input is supplied with wav files. 
  • Listen via  speaker.
  • Read log files.
     
Live Speech Ingestion  An interactive test of the RI that involves speaking words and phrases into a microphone.  User must speak into a microphone. 
  • Listen via speaker.
  • Read log files.

 

NOTE: OBP Credentials Required for Get Started
OBP credentials are used during Step 2: Configure and Build.
For credential creation instructions, see Open Bank Project Credentials.  

Get Started

Step 1: Connect the Devices

Before installing the application, connect input and output devices on a Linux* host (i.e., the system on which the RI is running). 

  1. On the Linux host, connect a wired microphone and a wired speaker.  
  2. Open Settings with the nine-button menu, bottom right of the screen, and choose Sound. If a sound application is running, it will appear in this dialog. Close any sound applications (e.g., media players, YouTube*). 

Identify Audio Devices 

After connecting the devices, use the aplay command to check if they are recognized as available recording and playback options. 

To list the devices:

  1. Open a terminal and run the command:
    aplay -l
    aplay output of playback and recording devices

    Figure 2: Audio Device List

  2. Note the name of connected, available devices. In Figure 2, the example name:device pairs are:

    ArrayUAC10: SeeedStudio ReSpeaker 4 Mic Array*
    USB: Jabra* Speak 410* USB

    The list of available recording and playback devices in the configuration will contain the names ArrayUAC10 and USB

    NOTE: Device Recognition Problems
    If connected devices do not appear in the aplay list, try other devices or Restart the ALSA Firmware. 

Step 2: Install the Reference Implementation 

Check Prerequisites 

The RI requires additional software which is listed in the Table 4 below. If these applications are already installed, move on to the next section.  

To check prerequisites: 

  1. Open another terminal. Determine if prerequistes are installed with version commands:
    docker --version 
    docker-compose –version  
    git --version 
    curl --version 
  2. Install any prerequisites not installed with the instructions and links in Table 4:
    Table 4: Prerequisites
    PREQUISITES INSTALLATION INSTRUCTIONS
    git (required) Install git:
    sudo apt-get install git
    curl (required) Install curl:
    sudo apt install curl
    Docker Engine (required) See Install Docker Engine on Ubuntu. 
    To eliminate the need for prefacing commands with sudo, add your user to the group after Docker installation as described in Modify Subgroup. 
    Docker Compose (required) See Install Docker Compose.
    Portainer (optional) See Portainer.

    NOTE:  Document Does Not Contain Portainer Instructions
    For information about Portainer, see the Portainer documentation View container logs

Download

To obtain the software package: 

  1. Select Configure & Download.

    NOTE: Link Issues
    If there is an issue opening the link with a Chrome* browser, try clearing the browser history or use Firefox*.

  2. Select download options:
    Version or Tag: 1.5 
    Target System: Ubuntu 18.04 LTS or Ubuntu 20.04 LTS 
    Distribution: Download Recommended Configuration
  3. Select Download.  
  4. When the License Agreement appears, review the Edge Software License and choose Accept. The file interactive_kiosk_ai_chatbot.zip will begin downloading.
  5. A product key will be issued and displayed at the top left of the page and sent to the email address used to login. Copy and save the product key

    NOTE: Product Key
    The product key is sent once and will not change upon the next installation.

Prepare and Unzip

After the download, the product key page displays Steps 1-3. 

NOTE: User Permissions
- User permissions may require prefacing the commands below with sudo. 
- To avoid using sudo, see Modify Subgroup.

To start:

  1. Prepare the system as described in Step 1.  
  2. Run the Step 2 pip install command as directed. This command determines system dependencies.  
  3. Extract the downloaded file: 
    unzip interactive_kiosk_ai_chatbot.zip 

     

  4. Navigate to the interactive_kiosk_ai_chatbot directory: 
    cd interactive_kiosk_ai_chatbot 

     

  5. Change permission of the executable edgesoftware file:
    ​chmod 755 edgesoftware

     

Install

The installation process will ask for a product key and configuration details before the build starts.

To start the installation: 

  1. Run Step 3 on the product key page, as directed, or run the installation command below: 
    ./edgesoftware install

     

  2. Enter the product key when prompted, as in Figure 3:Product Key

    Figure 3: Product Key


     
  3. The RI verifies dependencies and system requirements and retrieves an audio device list as seen in Figure 3. 

Configure and Build

A first time build and install may take up to 45 minutes to complete. 

To set the configuration options:

  1. Use output from aplay to choose from the list of available recording and playback devices when prompted with Choose a recording device and a playback device to test. Enter the device name for the recording and playback devices and then press Enter. Playback and recording lists

    Figure 4: Recording and Playback Devices

  2. To test device compatibility, press Enter to begin recording. The application records for 20 seconds. The recording will play automatically after the 20 seconds has elapsed.
     a. If you hear playback, the RI has found your device. Follow the prompts and enter y. Continue with the next step.
     b. If you indicate you don’t hear playback with n, the installation process will stop. Connect another device and go back to the start of installation, Install. 
     c. If you don’t have another device or the device(s) you’ve tried are not recognized, see Restart the ALSA Firmware and go back to the start of installation, Install. 
  3. Set the ingestion type. Wave ingestion (1) is recommended for first time deployment. 
  4. Set the ASR model. Quartznet (1) is recommended for first time deployment.

    Figure 5: Ingestion Type, ASR Type, and OBP Banking Credentials

  5. For OBP-related configuration, enter the OBP credentials: USERNAME, PASSWORD, and APIKEY. See Open Bank Project Credentials.
  6. The configuration uses the host system’s proxy.
  7. The build will begin after CONFIGURATION END displays.

    WARNING: Adjust Volume
     If the RI is configured for WAVE file ingestion, the audio starts as soon as the message Starting Chatbot Services appears. Adjust the volume of the speaker to hear the audio responses of the chatbot services. 


    ===Check for Success===
    Check for the following output:

    Successful Build

    Figure 6: Successful Build


    See Troubleshooting for help with error messages.

Step 3: Run the Application

WAVE File Ingestion (Default)

Ingestion starts as soon as the software installation has concluded. The software cycles through the audio input files and plays responses to the audio queries until you stop the software or switch run methods.

You’ll hear these NLP responses: 

  • “Welcome to the future of banking.”
  • “The balance for the saving account ending in ____ is ____ [unit of currency].”
  • “You will find one near ____ Road.”
  • “You have two savings accounts with our bank.”

REMINDER: No Speech Input Required for WAVE Ingestion


To listen: 

  1. Listen to output through the speaker device you chose. You will hear the NLP responses listed above. 

To read the log files: 

  1. Open two terminal windows to see input and output speech in the log files, . 
  2. In a terminal(s), list the container IDs:
    sudo docker ps --format "table {{.Image}}\t{{.Status}}\t{{.ID}}"

     

  3. View the log by running docker logs with the NLP container ID:
    docker logs -f <NLP container-ID>

    The NLP lists bot the user’s input and the bot’s output.

    Log Output Example:
    User: may i know my account balance
    Bot: The balance for the savings account ending with **** is ****.**** [unit of currency].”

     
  4. Repeat step three in the other terminal with the ASR container ID:
    docker logs -f <ASR container-ID>

    Log Output Example:
    Quartznet Inffered Output: ' good morning '

    NOTE: Additional Usage Tips:
    - To use Portainer to view log files, see Portainer documentation View container logs.
    - To learn more about the edgesoftware installation script, see Command Line Interface.

Live Speech Ingestion

The application listens for speech input as soon as installation concludes. The instructions below outline how to speak to the application, read responses, or listen to responses through a speaker. Stop and restart the RI to switch between speech ingestion configurations, integrating your own audio files, or modifying the ASR model. 

NOTE: Restarting
- If the RI is already running as WAVE File Ingestion, stop and restart (Steps 1 and 2). If the RI hasn't been started, perform Step 2.
- Restarting does not rebuild images. Starting and stopping may take up to two minutes to complete as the application stops and cleans up Docker processes.

  1. Uninstall (stop): 
    ​./edgesoftware uninstall -a

     

  2. Install (start): 
    ​./edgesoftware install

     

  3. To configure live speech ingestion, enter 2 when prompted for ingestion type. 

    NOTE: Ambient Noise
    If you hear output before speaking, the RI has picked up on ambient noise, or the software is still configured for WAVE File Ingestion. See Troubleshooting.

To speak:

  1. Start by waking the application with the wake word, a word that alerts the RI to start listening for speech. The wake word is ReSpeaker (ree-spee-kr).

    Example:
    “[ReSpeaker][pause 1 second], good morning.”

To listen: 

  1. Listen to output through the speaker device you chose. For possible NLP responses, see the Table 5 below.

To read the log files and check for a response: 

  1. Open two terminal windows. 
  2. In a terminal(s), list the container IDs:
    sudo docker ps --format "table {{.Image}}\t{{.Status}}\t{{.ID}}"

     

  3. View the log by running docker logs with the NLP container ID:
    ​docker logs -f <NLP container-ID>

     

  4. Repeat step three in the other terminal with ASR container ID.
    docker logs -f <ASR container-ID>

    NOTE: Additional Usage Tips:
    - To use Portainer to view log files, see Portainer documentation View container logs.
    - To learn more about the edgesoftware installation script, see Command Line Interface.

  5. Try other recognized phrases as listed in Table 5:
    Table 5: Possible Speech Input and Outputs for Testing
    SPEECH INPUT EXAMPLE OUTPUT
    [Wake word] [pause 1 second], good morning. Welcome to the future of banking.
    [Wake word] [pause 1 second], I would like to know my account details. You have two savings accounts with our bank.
    [Wake word] [pause 1 second],I would like to my my account balance. The balance for the saving account ending in ____ is ____ [unit of currency].
    [Wake word] [pause 1 second], where is the nearest cash machine. You will find one near ___ Road.
    [Wake word] [pause 1 second], goodbye. Goodbye. Thank you for banking with us. See you soon.

     

Tutorials

Add a Keyword in the NLP

The instructions in this section describe how to add a keyword to an intent in the NLP. For a quick guide to NLP terminology, see Modify NLP.

To add (or delete) a new keyword:

  1. Open ~nlp/rasa_api_server/data/nlu.md.
  2. Find intent:greet:
    ## intent:greet
    - hey
    - hello
    - hi
    - hi there
    - HI
    - hi!
    - Hi
    - good morning
    - good evening
    - hey there
    - start
    - open session
    - good afternoon
    - how does this work

     

  3. Add a new keyword or multiple keywords to the greet intent on a new line starting with a “- “. Below are some potential keywords.
    Examples:
    #intent:greet
    -	How do you do?
    -	Howdy.
    -	Hiya.

    NOTE: Sentence Punctuation and Capitalization Not Necessary

  1. Stop the chatbot stack if already deployed with the command:
    /edgesoftware uninstall -a

     

  2. Delete the action server image if it is running with the command: 
    docker image rmi <action_server Image ID>
  3. Deploy the stack with command: 
    ./edgesoftware install

    NOTE: Image Rebuild Times
    Rebuilding any RI image may take several minutes.

  4. Test the new keyword as input by speaking the keyword.

Replace the ASR Model

For information about the supported models, see US English ASR Models. 


To change the model in the RI:

  1. Stop the reference implementation if it’s running:
    ./edgesoftware uninstall -a 

     

  2. Deploy again:
    ./edgesoftware install

     

  3. Choose the desired model when prompted. 
  4. Test the model with WAV file and live speech ingestion to determine how well it performs. 

Summary and Next Steps

In Get Started and Tutorials, you learned to:

  • Install and build a framework of microservices for wave and speech ingestion.  
  • Interact with your microphone and speaker to experience both WAV file and live speech ingestion in a defined conversation scenario.
  • Complete a tutorial to replace the ASR Model and modify the NLP.

Expand the RI with:

  • a digital human interface (DHI): an personality-based interface, usually involving a rendered avatar, that can be added to the framework
  • a text-based interface: an interface similar to direct messaging applications featuring text-based exchanges

For more about the model details, model training, and other reference topics, see Reference.

Reference

The Command Line Interface (CLI)

To stop and start the RI:

  • Uninstall (stop): 
    ./edgesoftware uninstall -a

     

  • Install (start):
    ./edgesoftware install

    NOTE: Reinstall Time: Approximately 1-2 Minutes 
    - Starting and stopping may take several minutes to complete as the application stops and cleans up Docker processes. 
    - The reinstall is approximately 1-2 minutes if no images have been delete

To see the available commands:

  • Display usage:
    ./edgesoftware

     

To see module IDs:

  • List:
    ./edgesoftware list -d

     

Feature Support

NOTE: Retrain Models and Replace Components in Production Environments 
- Retrain models for other languages and accents. To understand model accuracy factors, see US English ASR Models to understand accuracy factors. 
- Replace both OBP and Rasa Open Source in a production environment. 

The ASR supports the following models:

  • Kaldi
  • Mozilla DeepSpeech
  • Quartznet
  • Facebook Hugging Face

The first three models in the list above are distributed as part of Intel® Distribution of OpenVINO™ toolkit. All models are trained for US English.

The NLP uses:

  • Rasa Open Source: A developer playground for building automated text and voice-based conversation software.
  • Open Bank Project APIs: The Open Bank Project is a developer sandbox for testing banking API integration. Supported APIs include checking account balance, checking available cash machine, and list accounts.

The TTS supports the following models: 

  • forward tacotron
  • meglan

This feature produces .wav and .txt content that can be used to integrate a custom Digital Human Interface (DHI).

US English ASR Models

Table provides model details about accuracy as well as links for more information.

Table 6: Model Details
MODEL WORD RATE ERROR (WER) ACCURACY FACTORS TO LEARN MORE
Kaldi 10.5% for input as in trained set of the Librispeech Corpus
  • Accent and intonation of speaker
  • Age of speaker (under-representation of children’s speech),
  • Noise level
  • Microphone used
  • Distance of a speaker to microphone

    The model cannot be used in production due to limited training dataset and domain. 
Kaldi Librispeech
DeepSpeech 5.9%
  • Best in low-noise environments
  • Bias toward US male accents
DeepSpeech 
QuartzNet

LibriSpeech: 3.79%

Dev-other: 10.05%

Trained on Six Datasets
  • LibriSpeech
  • Mozilla Common Voice (validated clips from en_1488h_2019-12-10)
  • WSJ 
  • Fisher
  • Switchboard
  • NSC Singapore English
QuartzNet
Huggingface 3.4%

Hours of Training:

  • 16kHz sampled speech audio 
  • 960 hours on Librispeech
Hugging Face

 

Open Bank Project Credentials

The RI uses Open Bank Project (OBP) as the banking server and authentication method.  

Follow the steps below for registering, obtaining a consumer key, and generating a token on the OBP server. This enables Interactive Kiosk AI Chatbot to make API calls.

Register with Open Bank Project

  1. Register at Open Bank Project. 
  2. Fill out the registration form and choose Sign Up.
    The form requires:
  • First name
  • Last name
  • Email address
  • Username
  • Password

NOTE: Save username and password. 

  1. Check the email address you used above for confirmation. 

Generate Consumer API Key

  1. To generate a consumer API key, choose Get API key from the top menu of Open Bank Project. Log on when prompted.
  2. From the Application Type pulldown, choose Public. 
  3. Fill in the remaining fields and click Register consumer. 
  4. Save the consumer API key.

Create a Sandbox Account

To create a new sandbox account with the OBP with desired currency (e.g., USD or EUR):

  1. Copy the URL below to your browser’s URL textbox: https://apisandbox.openbankproject.com/create-sandbox-account.
  2. Enter the following: 
  • Bank: Bank-of-Pune
  • Desired Account ID: alphanumeric string of randomized characters. 
  • Desired Account Current: EUR
  • Desired Initial Balance: 1000
  1. Save Bank and Desired Account ID for verifying later.
  2. Choose Create Account.

    ===Check for Success==
    You will see the message below if a test bank account was created:
    Account _______ has been created at bank chase with an initial balance of 1000.00 [unit of currency].

Check for Sandbox Account Balance

To see account balance: 

  1. Logon at Open Bank Project. 
  2. Choose API Explorer.
  3. Choose Get Account Balances.
  4. Insert valid values for 
  • Bank: Bank-of-pune in our example in the previous section.
  • Accounts: Desired Account ID in previous section (refresh this list by selecting it). 
  • Views: Auditor
  1. Choose Get and examine the results displayed below the Get button.
    Example: 
    {
      "account_id":"12345678910111213",
      "bank_id":"bank-of-pune",
      "account_routings":[],
      "label":"",
      "balances":[{
        "type":"OpeningBooked",
        "currency":"EUR",
        "amount":"1000.00"
      }]
    }

     

Modify NLP

A basic understanding of Rasa is necessary to add more banking APIs or replace the default banking server, OBP, with a custom banking server solution. The following tutorial demonstrates a simple modification to the NLP that does not require in-depth knowledge.

Below is a table of terminology used in the instructions in the Tutorial Add a Keyword in NLP.

Table 7: Rasa Terminology
RASA TERMINOLOGY DEFINITION
Keywords Words or phrases are significant in a verbal or text interaction. Example:
“Hello there!” “List account for _____”. 
Intents Groups of keywords. 
Example: 
#intent:greet
-    hello there
-    hey
-    hiya
Actions Source code that executes to support banking interactions and plays TTS responses.
Stories Relationships between intents and actions.


Figure 6 illustrates the relationships of files in the NLP.

Files in the NLP

Figure 7: Files in the NLP


To learn more about Rasa Open Source and its terminology, see

Modify the Wake Word

The wake word (or wake-up word) activates listening in the RI. Preface each interaction by reciting the wake word followed by a pause and then the greeting or request. Currently the default wake word is ReSpeaker (pronounced ree-spee-kr).

Example:
"[ReSpeaker] [pause], good afternoon."


The word chatbot is another wake word that has been validated. Choose wake words carefully as background noise can distort the sound of the word. Wake words that sound like other words perform poorly.


To modify:

  1. Stop the reference implementation if it is running:
    ./edgesoftware uninstall -a

     

  2. Open the file compose/docker-compose-frontend-respeaker.yml in the archive (i.e., zip file). Do not unzip the archived file. 
  3. In the section services:environment, modify the wake word by setting WAKE_UP_WORD under the audio-ingestion2 service. 
  4. Save the archive.
  5. Deploy the reference implementation:
     ./edgesoftware install

Train the Models

DeepSpeech ASR

To retrain the ASR’s DeepSpeech model:

  1. To understand how to re-train DeepSpeech’s model, refer to the DeepSpeech documentation Training Your Own Model.
  2. For pre-trained open-source demo models, refer to Overview of OpenVINO™ Toolkit Intel's Pre-Trained Models. 
  3. To convert the model generated by Step 1 to Intermediate Representation, use the tutorial Convert TensorFlow* DeepSpeech Model to the Intermediate Representation. 
  4. Update the path for the IR generated in Step 2 in ‘inference:model_xml’, ‘inference:model_bin’, and ‘~asr_deepspeech/src/model/deepspeech.cfg’.
  5. Build and restart the DeepSpeech ASR container.

Kaldi ASR

To retrain the pre-trained Kaldi model:

  1. Download the appropriate model.
  2. Extract the zip file.
  3. Refer to the steps in the Readme file to retrain model or fine-tune the model.
  4. Convert the model generated by Step 3 to intermediate representation (IR). Use the tutorial Converting a Kaldi* Model.
  5. Update the path for the IR generated in Step 4 in acousticModelFName, outSymsFName, fsmFName, and featureTransform in ~asr_kaldi/src/model/speech_lib.cfg.
  6. Build and restart the kaldi ASR container.

Facebook Hugging Face ASR

Find the fine-tune and training new model procedures for the Hugging Face ASR at wav2 sec. 

QuartzNet ASR

Refer to for procedure to train the QuartzNet ASR model, refer to Building Speech Recognition Models for Global Languages with the Mozilla Common Voice Dataset and NVIDIA NeMo.

TTS Models

To retrain the tacotron model, refer to forward-tacotron (composite). 

Create WAV Files

The RI accepts recorded audio files for ingestion. The files must meet these requirements:

  • Format: .wav
  • Sampling rate: 16 KHz
  • Recording channel: Mono

Use audio software, such as Audacity*, to create the files.

For the section WAVE File Ingestion, create separate audio files for each entry described in the table below. 

Table 8: List of WAV Files
NAME THE FILE SPEAK AND RECORD
query1.wav "[wake word] Good morning."
query2.wav "[wake word] Where is the nearest cash machine?"
query3.wav "[wake word] List my accounts."
query4.wav "[wake word] What is my account balance."
query7.wav "[wake word] Goodbye."


To create a wav file:

  1. Set the sampling rate by choosing 16 KHz in the Project Rate dropdown. See Figure 8.
  2. Set the recording channel to Mono in the Recording channel dropdown.
  3. To record, click the record button. Speak into the mic. 
  4. Click the stop button
  5. Save file in WAV file format.
  6. Copy the files into the ./compose/audio path and restart the deployment.Waveform file creation

    Figure 8: Create a WAV file with Audacity*

Troubleshooting

Restart the ALSA Firmware

  1. If the recording test in Install fails, restart the Advanced Linux Sound Architecture (ALSA) firmware:
    pulseaudio -k && sudo alsa force-reload

     

  1. Rebuild the application. See Install.

    NOTE: If the build fails after restarting the ALSA firmware, the device is not supported. 

Check Configuration

To check which type of ingestion is running:

  1. Verify all images are built:
    docker images

    The order in which Docker lists the images may vary. 

    REMINDER: RI Images (11)
    - action_server 
    - rasa 
    - nlp_app 
    - deepspeech_asr 
    - kaldi_asr 
    - audio-ingester2 
    - audio-ingester 
    - tts 
    - authzap 
    - huggingface_asr 
    - quartznet_asr

  1. Determine if the RI is configured for wave file ingestion or live speech ingestion by listing the containers:
    docker ps

    The audio ingester container name indicates the type of configuration:

    Table 9: Status of Configuration
    AUDIO INGESTER CONTAINER STATUS OF CONFIGURATION
    chatfrontend_wave_ingestion_1 WAVE File Ingestion
     
    chatfrontend_audio_ingestion_1  Live Speech Ingestion

     

  2. If you don’t see the correct status, refer to the Docker Logs.

    NOTE: Failure - Check Logs
    Installation failure messages will be available in the log file: /var/log/esb-cli/Interactive_Kiosk_AI_Chatbot_vx.x.x/Interactive_Kiosk_AI_Chatbot/output.log.

Modify Subgroup

To ensure that Docker has sudo permission: 

  1. Add your user to the docker subgroup after Docker has been installed:
    ​sudo usermod -aG docker $USER

     

  2. Logout and login again (or restart).

This enables running the reference implementation without the sudo command.

Clean Up Stack

To clean up the running Conversational AI chatbot stack:

  1. Run the command:
    ./edgesoftware uninstall -a

     

  2. List the image IDs: 
    docker images

    REMINDER: RI Images (11)
    - action_server 
    - rasa 
    - nlp_app 
    - deepspeech_asr 
    - kaldi_asr 
    - audio-ingester2 
    - audio-ingester 
    - tts 
    - authzap 
    - huggingface_asr 
    - quartznet_asr

  3. Delete all the RI images with the command:
    docker rmi <Image ID>

     

Build Issues

If you experience a build issue, check the log file at /var/log/esb-cli/Interactive_Kiosk_AI_Chatbot_v0.8.6/Interactive_Kiosk_AI_Chatbot/install.log.

Failed to Install

Audio Device Incompatibility 
If the audio device is incompatible with the RI, installation produces the following error message:

Error Message
Failed to install Interactive_Kiosk_AI_Chatbot. ('', '[ERROR]: Plug in another audio device and start the installation again.', '')

To fix:

  1. Try connecting another device.
  2. Or try reinstalling the RI. See Install.

Missing Component

If Docker Engine or docker compose is not installed, the RI will produce the following error message:

Error Message
Failed to install Interactive_Kiosk_AI_Chatbot. ('', 'Some components of Interactive Kiosk AI Chatbot failed to start successfully: chatfrontend_wave_ingestion_1 chatfrontend_tts_1  ', '')

To identify the missing component:

  1. Run the version command of each application:
    a. Docker Engine
    docker -v

    b. Docker Compose
    docker-compose -v


    If a component is not installed, the system will produce the following error message: 

    [component]: command not found

     

  2. Install the missing component:
    a. Docker Engine
    b. Docker Compose
  3. To reinstall the RI, see Install.

Invalid Crendentials

If invalid OBP credentials are supplied, the RI will produce the following error in the authz log: 

Error Message:
DATE TIME - INFO  - [main.py:102 - main                 ] - Starting the Authz Server...
DATE TIME - INFO  - [main.py:91 - create_login_session ] - Creating new Login
https://apisandbox.openbankproject.com
Could not obtain token

  1. To check the OBP credential acceptance, view the authz log by following the instructions Docker Logs. 
  2. To reinstall the RI, see Install.

Docker Logs

Microservices produce log files. 

To see logs:

  1. In a terminal, list the container IDs:
    sudo docker ps --format "table {{.Image}}\t{{.Status}}\t{{.ID}}"

     

  2. View the log by running docker logs with the container ID:
    ​​docker logs -f <container-ID>

     

To see the input and output of the RI, use the container ID for NLP and ASR. For a description of the log file content, see Table 10.

Table 10: Container Files
CONTAINER LOG FILE USE IT TO CONFIRM
ASR Speech the ASR Recognizes
NLP Speech Output
AUTHZ OBP credentials

 

 
Example of ASR log output (user's speech input):
Quartznet Inferred Output: ' good morning '

Example of NLP log output (user's input and bot's output):
User: may i know my account balance
Bot: The balance for the savings account ending with **** is ****.**** Rupees


For more information about:

Set Log Levels

There are two log level values: 

Table 11: Log Levels
LOG LEVELS DESCRIPTION PERFORMANCE
info succinct output  Default: Best for performance.
debug verbose output Impacts performance.

 

To change log levels:

  1. Modify files for various microservice components. Find log levels in these files:
    Table 12: Configuration Files
    CONFIGURATION FILES MICROSERVICE OR COMPONENTS
    compose/docker-compose-backend.yml authz
    asr_speech
    nlp_app
    rasa_action
    compose/docker-compose-frontend.yml wave_ingestion
    tts
    For WAVE File Ingestion.
    compose/docker-compose-frontend-respeaker.yml

    audio_ingestion
    tts

    For Live Speech Ingestion.

     

  2. Locate the LOG_LEVEL value per microservice and change it to the desired reporting level, info or debug. 

Docker Pull Limit Issue

Exceeding the pull rate limit produces an error: 

Error Message
ERROR: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

In the event of this error, login with a Docker premium account.

Example:

docker login

 

产品和性能信息

1

性能因用途、配置和其他因素而异。请访问 www.Intel.cn/PerformanceIndex 了解更多信息。