Interactive Kiosk AI Chatbot
版本: 2022
发布日期: 03/22/2022
最近更新日期: 05/04/2022
Overview
The Interactive Kiosk AI Chatbot is a development framework containing automatic speech recognition (ASR), text to speech (TTS), and natural language processing (NLP) microservices. This reference implementation (RI) leverages the deep learning algorithms and models contained in the Intel® Distribution of OpenVINO™ toolkit.
Use the framework to:
- Build and deploy a kiosk designed for a banking use case.
- Develop for your own use case.
Customize the RI by replacing microservice components with your own.
Select Configure & Download to download the reference implementation.
Time to Complete |
Approximately 2-3 hours Docker Compose* Intel® Distribution of OpenVINO™ toolkit Rasa Open Source* Open Bank Project* (OBP*) Portainer* |
Other Requirements
Some software packages require root permissions.
Target System Requirements
- Ubuntu* 18.04 LTS or 20.04.2 LTS
- 8th Generation (and above) Intel® Core™ Processor, 32GB RAM and 100GB of free space on HDD
- Wired (USB) microphone: mic array, headset, conference device, etc.
- Wired speaker: headset, headphones, conference device, etc.
- Network with speed great than 60 MBPS to support speech
Wired Microphone and Speaker Device(s) Required
The software uses:
- A microphone/recording device for recording voice audio
- A speaker/playback device for playing audio
The software has been tested with these devices:
- Seeed Studio ReSpeaker Mic Array v2.0*
- Plantronics Blackwire 3220 Series*
- Maono AU-A04 Condenser Microphone Kit*
- Logitech H340 USB PC Headset with Noise Canceling Mic*
The recording and playback devices may be separate devices or reside on the same device, such as a headset with a microphone. The software may work with other wired microphones, mic arrays, speakers, and headset combinations. Wireless headsets are not supported.
Learning Objectives
Use the reference implementation to:
- Integrate an end-to-end speech and audio framework of microservices
- Incorporate a deep learning microservice architecture into a retail banking proof-of-concept
- Leverage components of Intel® Distribution of OpenVINO™ toolkit
Customize, expand, or replace microservices provided in the RI to extend the capabilities.
Helpful Background Knowledge
It is helpful to have a working knowledge of:
- Machine learning technologies and deep learning models for ASR, TTS, NLP, and Rasa Open Source
- Docker and associated commands
- Authentication methods, such as JSON Web Tokens (JWT)
It is possible to install and use the reference implementation without this background knowledge.
How It Works
The Interactive Kiosk AI Chatbot consists of a microservice framework with deep learning algorithms. Table 2 lists the microservices and models included.
MICROSERVICE | FUNCTION | MODELS & COMPONENTS |
---|---|---|
Automatic Speech Recognition (ASR) | Converts speech-to-text using four pre-trained deep learning models |
Supported models:
|
Natural Learning Processing (NLP) | Classifies the input text based on keywords and corresponding intents and makes software calls based on the classified intent. | For bank-related intents, the NLP makes and receives calls to and from Open Bank Project (OBP). |
Text to Speech (TTS) | Converts text to speech. |
Supported models:
For more details, refer to Text to Speech Python* Demo. |
Audio Ingestion | Receives the audio in wave file format and sends to ASR. For more information, see, WAVE File Ingestion (Default). |
The RI publishes on the ZeroMQ* specified port by topic. |
Audio Ingestion2 | Records audio from microphone and requires user interaction. For more information, see, Live Speech Ingestion. |
The RI uses the ReSpeaker SDK*. |
AuthZ | Checks user credentials at login. | The RI uses OBP, which requires users to register and obtain credentials. |
In addition to ASR, NLP, and TTS, the RI uses:
- ZeroMQ: A data bus for brokerless asynchronous messaging and exchange of data across multiple microservices. The RI supports the publish–subscribe data distribution pattern for the exchange of messages by the microservices over ZeroMQ.
- Open Bank Project (OBP): A sandbox that allows users to experiment with banking APIs outside a production environment. The RI uses a cloud-based OBP sandbox, but users can also deploy the OBP server locally.
NOTE: Replace Microservices in Production Environments
For example, replace the OBP developer sandbox, which is intended as a placeholder for customized and/or proprietary banking APIs.
Software Flow: A Banking Query Example
The RI responds to some user greetings, basic conversational speech, and a few banking-related requests/queries. Figure 1 illustrates the software flow through the architecture components.
Figure 1: High-level Architecture and Software Flow
The numbered steps below correspond to the numbered software flow in Figure 1. The software flow assumes:
- A wake word: a word that alerts the RI to start listening for speech. The current wake word is ReSpeaker (ree-spee-kr). To change the wake word, see Modify Wake Word.
- Subscription to bus topics: a string of characters representing data categories (e.g., ASR subscribes to audio) on the ZeroMQ bus.
- Post-authentication: user credentials already verified.
- Banking-related speech: query or request related to banking. When a user makes a comment or request unrelated to banking (e.g., “Good morning”), steps 7 and 8 in Figure 1 do not occur.
- OBP access: online sandbox, not installed locally.
- A user says, “[Wake word][pause one second], I would like to know my account balance.”
- Audio input from the microphone is passed to audio ingestion.
- Audio ingestion publishes the audio buffer to ZeroMQ with the topic audio.
- The ASR receives the audio buffers in chunks. ASR converts speech-to-text using pre-trained deep learning models.
- The ASR publishes the text on the ZeroMQ databus with the topic text.
- The NLP receives text and classifies it by intent, groups of keywords the NLP recognizes.
- The NLP makes a corresponding REST API call to the OBP server to query the account balance.
- OBP server responds to the REST API call with the account balance.
- The NLP processes and publishes the REST API call response on the ZeroMQ databus with the topic nlp.
- The TTS converts the text to waveform audio file (wav) format.
- The TTS publishes the text and wav on the ZeroMQ with the topic tts.
- The TTS output is sent to the speaker. The RI replies, “Balance for the savings account ending with ____ is ____ dollars.”
NOTE: Online Sandbox
In Get Started: Steps 1-3, the OBP is assumed to be running via the online sandbox.
What You'll Do
The application presents two run methods, types of ingestion, described in Table 3.
Here is a sneak preview of what you’ll do for both types of ingestion in Get Started Steps 1-3 and Tutorials:
- Step 1: Connect the Devices: connect recording and playback devices.
- Step 2: Install the Reference Implementation: download the software package, set configuration, and build and deploy the source code for both types of ingestion.
- Step 3: Run the Application: interact with your microphone and speaker to experience both WAV file and live speech ingestion with a scripted conversation scenario.
- Tutorials: complete tutorials to replace the ASR model and modify the NLP.
RUN METHODS | DESCRIPTION | INPUT | OUTPUT |
---|---|---|---|
WAVE File Ingestion (Default) | A quick test of the RI with sample audio files to illustrate how it responds to speech. Recommended for first-time deployment. |
The software uses supplied sample audio files (audio0.wav-audio3.wav). No user input is required. Input is supplied with wav files. |
|
Live Speech Ingestion | An interactive test of the RI that involves speaking words and phrases into a microphone. | User must speak into a microphone. |
|
NOTE: OBP Credentials Required for Get Started
OBP credentials are used during Step 2: Configure and Build.
For credential creation instructions, see Open Bank Project Credentials.
Get Started
Step 1: Connect the Devices
Before installing the application, connect input and output devices on a Linux* host (i.e., the system on which the RI is running).
- On the Linux host, connect a wired microphone and a wired speaker.
- Open Settings with the nine-button menu, bottom right of the screen, and choose Sound. If a sound application is running, it will appear in this dialog. Close any sound applications (e.g., media players, YouTube*).
Identify Audio Devices
After connecting the devices, use the aplay command to check if they are recognized as available recording and playback options.
To list the devices:
- Open a terminal and run the command:
aplay -l
Figure 2: Audio Device List
- Note the name of connected, available devices. In Figure 2, the example name:device pairs are:
ArrayUAC10: SeeedStudio ReSpeaker 4 Mic Array*
USB: Jabra* Speak 410* USB
The list of available recording and playback devices in the configuration will contain the names ArrayUAC10 and USB.NOTE: Device Recognition Problems
If connected devices do not appear in the aplay list, try other devices or Restart the ALSA Firmware.
Step 2: Install the Reference Implementation
Check Prerequisites
The RI requires additional software which is listed in the Table 4 below. If these applications are already installed, move on to the next section.
To check prerequisites:
- Open another terminal. Determine if prerequistes are installed with version commands:
docker --version docker-compose –version git --version curl --version
- Install any prerequisites not installed with the instructions and links in Table 4:
Table 4: Prerequisites PREQUISITES INSTALLATION INSTRUCTIONS git (required) Install git:
sudo apt-get install gitcurl (required) Install curl:
sudo apt install curlDocker Engine (required) See Install Docker Engine on Ubuntu.
To eliminate the need for prefacing commands with sudo, add your user to the group after Docker installation as described in Modify Subgroup.Docker Compose (required) See Install Docker Compose. Portainer (optional) See Portainer. NOTE: Document Does Not Contain Portainer Instructions
For information about Portainer, see the Portainer documentation View container logs.
Download
To obtain the software package:
- Select Configure & Download.
NOTE: Link Issues
If there is an issue opening the link with a Chrome* browser, try clearing the browser history or use Firefox*. - Select download options:
Version or Tag: 1.5
Target System: Ubuntu 18.04 LTS or Ubuntu 20.04 LTS
Distribution: Download Recommended Configuration - Select Download.
- When the License Agreement appears, review the Edge Software License and choose Accept. The file interactive_kiosk_ai_chatbot.zip will begin downloading.
- A product key will be issued and displayed at the top left of the page and sent to the email address used to login. Copy and save the product key.
NOTE: Product Key
The product key is sent once and will not change upon the next installation.
Prepare and Unzip
After the download, the product key page displays Steps 1-3.
NOTE: User Permissions
- User permissions may require prefacing the commands below with sudo.
- To avoid using sudo, see Modify Subgroup.
To start:
- Prepare the system as described in Step 1.
- Run the Step 2 pip install command as directed. This command determines system dependencies.
- Extract the downloaded file:
unzip interactive_kiosk_ai_chatbot.zip
- Navigate to the interactive_kiosk_ai_chatbot directory:
cd interactive_kiosk_ai_chatbot
- Change permission of the executable edgesoftware file:
chmod 755 edgesoftware
Install
The installation process will ask for a product key and configuration details before the build starts.
To start the installation:
- Run Step 3 on the product key page, as directed, or run the installation command below:
./edgesoftware install
- Enter the product key when prompted, as in Figure 3:
Figure 3: Product Key
- The RI verifies dependencies and system requirements and retrieves an audio device list as seen in Figure 3.
Configure and Build
A first time build and install may take up to 45 minutes to complete.
To set the configuration options:
- Use output from aplay to choose from the list of available recording and playback devices when prompted with Choose a recording device and a playback device to test. Enter the device name for the recording and playback devices and then press Enter.
Figure 4: Recording and Playback Devices
- To test device compatibility, press Enter to begin recording. The application records for 20 seconds. The recording will play automatically after the 20 seconds has elapsed.
a. If you hear playback, the RI has found your device. Follow the prompts and enter y. Continue with the next step.
b. If you indicate you don’t hear playback with n, the installation process will stop. Connect another device and go back to the start of installation, Install.
c. If you don’t have another device or the device(s) you’ve tried are not recognized, see Restart the ALSA Firmware and go back to the start of installation, Install. - Set the ingestion type. Wave ingestion (1) is recommended for first time deployment.
- Set the ASR model. Quartznet (1) is recommended for first time deployment.
Figure 5: Ingestion Type, ASR Type, and OBP Banking Credentials
- For OBP-related configuration, enter the OBP credentials: USERNAME, PASSWORD, and APIKEY. See Open Bank Project Credentials.
- The configuration uses the host system’s proxy.
- The build will begin after CONFIGURATION END displays.
WARNING: Adjust Volume
If the RI is configured for WAVE file ingestion, the audio starts as soon as the message Starting Chatbot Services appears. Adjust the volume of the speaker to hear the audio responses of the chatbot services.
===Check for Success===
Check for the following output:Figure 6: Successful Build
See Troubleshooting for help with error messages.
Step 3: Run the Application
WAVE File Ingestion (Default)
Ingestion starts as soon as the software installation has concluded. The software cycles through the audio input files and plays responses to the audio queries until you stop the software or switch run methods.
You’ll hear these NLP responses:
- “Welcome to the future of banking.”
- “The balance for the saving account ending in ____ is ____ [unit of currency].”
- “You will find one near ____ Road.”
- “You have two savings accounts with our bank.”
REMINDER: No Speech Input Required for WAVE Ingestion
To listen:
- Listen to output through the speaker device you chose. You will hear the NLP responses listed above.
To read the log files:
- Open two terminal windows to see input and output speech in the log files, .
- In a terminal(s), list the container IDs:
sudo docker ps --format "table {{.Image}}\t{{.Status}}\t{{.ID}}"
- View the log by running docker logs with the NLP container ID:
docker logs -f <NLP container-ID>
The NLP lists bot the user’s input and the bot’s output.
Log Output Example:
User: may i know my account balance
Bot: The balance for the savings account ending with **** is ****.**** [unit of currency].”
- Repeat step three in the other terminal with the ASR container ID:
docker logs -f <ASR container-ID>
Log Output Example:
Quartznet Inffered Output: ' good morning 'NOTE: Additional Usage Tips:
- To use Portainer to view log files, see Portainer documentation View container logs.
- To learn more about the edgesoftware installation script, see Command Line Interface.
Live Speech Ingestion
The application listens for speech input as soon as installation concludes. The instructions below outline how to speak to the application, read responses, or listen to responses through a speaker. Stop and restart the RI to switch between speech ingestion configurations, integrating your own audio files, or modifying the ASR model.
NOTE: Restarting
- If the RI is already running as WAVE File Ingestion, stop and restart (Steps 1 and 2). If the RI hasn't been started, perform Step 2.
- Restarting does not rebuild images. Starting and stopping may take up to two minutes to complete as the application stops and cleans up Docker processes.
- Uninstall (stop):
./edgesoftware uninstall -a
- Install (start):
./edgesoftware install
-
To configure live speech ingestion, enter 2 when prompted for ingestion type.
NOTE: Ambient Noise
If you hear output before speaking, the RI has picked up on ambient noise, or the software is still configured for WAVE File Ingestion. See Troubleshooting.
To speak:
- Start by waking the application with the wake word, a word that alerts the RI to start listening for speech. The wake word is ReSpeaker (ree-spee-kr).
Example:
“[ReSpeaker][pause 1 second], good morning.”
To listen:
- Listen to output through the speaker device you chose. For possible NLP responses, see the Table 5 below.
To read the log files and check for a response:
- Open two terminal windows.
- In a terminal(s), list the container IDs:
sudo docker ps --format "table {{.Image}}\t{{.Status}}\t{{.ID}}"
- View the log by running docker logs with the NLP container ID:
docker logs -f <NLP container-ID>
- Repeat step three in the other terminal with ASR container ID.
docker logs -f <ASR container-ID>
NOTE: Additional Usage Tips:
- To use Portainer to view log files, see Portainer documentation View container logs.
- To learn more about the edgesoftware installation script, see Command Line Interface. - Try other recognized phrases as listed in Table 5:
Table 5: Possible Speech Input and Outputs for Testing SPEECH INPUT EXAMPLE OUTPUT [Wake word] [pause 1 second], good morning. Welcome to the future of banking. [Wake word] [pause 1 second], I would like to know my account details. You have two savings accounts with our bank. [Wake word] [pause 1 second],I would like to my my account balance. The balance for the saving account ending in ____ is ____ [unit of currency]. [Wake word] [pause 1 second], where is the nearest cash machine. You will find one near ___ Road. [Wake word] [pause 1 second], goodbye. Goodbye. Thank you for banking with us. See you soon.
Tutorials
Add a Keyword in the NLP
The instructions in this section describe how to add a keyword to an intent in the NLP. For a quick guide to NLP terminology, see Modify NLP.
To add (or delete) a new keyword:
- Open ~nlp/rasa_api_server/data/nlu.md.
- Find intent:greet:
## intent:greet - hey - hello - hi - hi there - HI - hi! - Hi - good morning - good evening - hey there - start - open session - good afternoon - how does this work
- Add a new keyword or multiple keywords to the greet intent on a new line starting with a “- “. Below are some potential keywords.
Examples:#intent:greet - How do you do? - Howdy. - Hiya.
NOTE: Sentence Punctuation and Capitalization Not Necessary
- Stop the chatbot stack if already deployed with the command:
/edgesoftware uninstall -a
- Delete the action server image if it is running with the command:
docker image rmi <action_server Image ID>
- Deploy the stack with command:
./edgesoftware install
NOTE: Image Rebuild Times
Rebuilding any RI image may take several minutes. - Test the new keyword as input by speaking the keyword.
Replace the ASR Model
For information about the supported models, see US English ASR Models.
To change the model in the RI:
- Stop the reference implementation if it’s running:
./edgesoftware uninstall -a
- Deploy again:
./edgesoftware install
- Choose the desired model when prompted.
- Test the model with WAV file and live speech ingestion to determine how well it performs.
Summary and Next Steps
In Get Started and Tutorials, you learned to:
- Install and build a framework of microservices for wave and speech ingestion.
- Interact with your microphone and speaker to experience both WAV file and live speech ingestion in a defined conversation scenario.
- Complete a tutorial to replace the ASR Model and modify the NLP.
Expand the RI with:
- a digital human interface (DHI): an personality-based interface, usually involving a rendered avatar, that can be added to the framework
- a text-based interface: an interface similar to direct messaging applications featuring text-based exchanges
For more about the model details, model training, and other reference topics, see Reference.
Reference
The Command Line Interface (CLI)
To stop and start the RI:
- Uninstall (stop):
./edgesoftware uninstall -a
- Install (start):
./edgesoftware install
NOTE: Reinstall Time: Approximately 1-2 Minutes
- Starting and stopping may take several minutes to complete as the application stops and cleans up Docker processes.
- The reinstall is approximately 1-2 minutes if no images have been delete
To see the available commands:
- Display usage:
./edgesoftware
To see module IDs:
- List:
./edgesoftware list -d
Feature Support
NOTE: Retrain Models and Replace Components in Production Environments
- Retrain models for other languages and accents. To understand model accuracy factors, see US English ASR Models to understand accuracy factors.
- Replace both OBP and Rasa Open Source in a production environment.
The ASR supports the following models:
- Kaldi
- Mozilla DeepSpeech
- Quartznet
- Facebook Hugging Face
The first three models in the list above are distributed as part of Intel® Distribution of OpenVINO™ toolkit. All models are trained for US English.
The NLP uses:
- Rasa Open Source: A developer playground for building automated text and voice-based conversation software.
- Open Bank Project APIs: The Open Bank Project is a developer sandbox for testing banking API integration. Supported APIs include checking account balance, checking available cash machine, and list accounts.
The TTS supports the following models:
- forward tacotron
- meglan
This feature produces .wav and .txt content that can be used to integrate a custom Digital Human Interface (DHI).
US English ASR Models
Table provides model details about accuracy as well as links for more information.
MODEL | WORD RATE ERROR (WER) | ACCURACY FACTORS | TO LEARN MORE |
---|---|---|---|
Kaldi | 10.5% for input as in trained set of the Librispeech Corpus |
|
Kaldi Librispeech |
DeepSpeech | 5.9% |
|
DeepSpeech |
QuartzNet |
LibriSpeech: 3.79% Dev-other: 10.05% |
Trained on Six Datasets
|
QuartzNet |
Huggingface | 3.4% |
Hours of Training:
|
Hugging Face |
Open Bank Project Credentials
The RI uses Open Bank Project (OBP) as the banking server and authentication method.
Follow the steps below for registering, obtaining a consumer key, and generating a token on the OBP server. This enables Interactive Kiosk AI Chatbot to make API calls.
Register with Open Bank Project
- Register at Open Bank Project.
- Fill out the registration form and choose Sign Up.
The form requires:
- First name
- Last name
- Email address
- Username
- Password
NOTE: Save username and password.
- Check the email address you used above for confirmation.
Generate Consumer API Key
- To generate a consumer API key, choose Get API key from the top menu of Open Bank Project. Log on when prompted.
- From the Application Type pulldown, choose Public.
- Fill in the remaining fields and click Register consumer.
- Save the consumer API key.
Create a Sandbox Account
To create a new sandbox account with the OBP with desired currency (e.g., USD or EUR):
- Copy the URL below to your browser’s URL textbox: https://apisandbox.openbankproject.com/create-sandbox-account.
- Enter the following:
- Bank: Bank-of-Pune
- Desired Account ID: alphanumeric string of randomized characters.
- Desired Account Current: EUR
- Desired Initial Balance: 1000
- Save Bank and Desired Account ID for verifying later.
- Choose Create Account.
===Check for Success==
You will see the message below if a test bank account was created:
Account _______ has been created at bank chase with an initial balance of 1000.00 [unit of currency].
Check for Sandbox Account Balance
To see account balance:
- Logon at Open Bank Project.
- Choose API Explorer.
- Choose Get Account Balances.
- Insert valid values for
- Bank: Bank-of-pune in our example in the previous section.
- Accounts: Desired Account ID in previous section (refresh this list by selecting it).
- Views: Auditor
- Choose Get and examine the results displayed below the Get button.
Example:{ "account_id":"12345678910111213", "bank_id":"bank-of-pune", "account_routings":[], "label":"", "balances":[{ "type":"OpeningBooked", "currency":"EUR", "amount":"1000.00" }] }
Modify NLP
A basic understanding of Rasa is necessary to add more banking APIs or replace the default banking server, OBP, with a custom banking server solution. The following tutorial demonstrates a simple modification to the NLP that does not require in-depth knowledge.
Below is a table of terminology used in the instructions in the Tutorial Add a Keyword in NLP.
Figure 6 illustrates the relationships of files in the NLP.
Figure 7: Files in the NLP
To learn more about Rasa Open Source and its terminology, see
- Overview: Introduction to Rasa Open Source
- Terminology: Rasa Playground
- Contextual Chatbots: Rasa Docs: Build Contextual Chatbots and AI assistants with our open source conversational AI framework
Modify the Wake Word
The wake word (or wake-up word) activates listening in the RI. Preface each interaction by reciting the wake word followed by a pause and then the greeting or request. Currently the default wake word is ReSpeaker (pronounced ree-spee-kr).
Example:
"[ReSpeaker] [pause], good afternoon."
The word chatbot is another wake word that has been validated. Choose wake words carefully as background noise can distort the sound of the word. Wake words that sound like other words perform poorly.
To modify:
- Stop the reference implementation if it is running:
./edgesoftware uninstall -a
- Open the file compose/docker-compose-frontend-respeaker.yml in the archive (i.e., zip file). Do not unzip the archived file.
- In the section services:environment, modify the wake word by setting WAKE_UP_WORD under the audio-ingestion2 service.
- Save the archive.
- Deploy the reference implementation:
./edgesoftware install
Train the Models
DeepSpeech ASR
To retrain the ASR’s DeepSpeech model:
- To understand how to re-train DeepSpeech’s model, refer to the DeepSpeech documentation Training Your Own Model.
- For pre-trained open-source demo models, refer to Overview of OpenVINO™ Toolkit Intel's Pre-Trained Models.
- To convert the model generated by Step 1 to Intermediate Representation, use the tutorial Convert TensorFlow* DeepSpeech Model to the Intermediate Representation.
- Update the path for the IR generated in Step 2 in ‘inference:model_xml’, ‘inference:model_bin’, and ‘~asr_deepspeech/src/model/deepspeech.cfg’.
- Build and restart the DeepSpeech ASR container.
Kaldi ASR
To retrain the pre-trained Kaldi model:
- Download the appropriate model.
- Extract the zip file.
- Refer to the steps in the Readme file to retrain model or fine-tune the model.
- Convert the model generated by Step 3 to intermediate representation (IR). Use the tutorial Converting a Kaldi* Model.
- Update the path for the IR generated in Step 4 in acousticModelFName, outSymsFName, fsmFName, and featureTransform in ~asr_kaldi/src/model/speech_lib.cfg.
- Build and restart the kaldi ASR container.
Facebook Hugging Face ASR
Find the fine-tune and training new model procedures for the Hugging Face ASR at wav2 sec.
QuartzNet ASR
Refer to for procedure to train the QuartzNet ASR model, refer to Building Speech Recognition Models for Global Languages with the Mozilla Common Voice Dataset and NVIDIA NeMo.
TTS Models
To retrain the tacotron model, refer to forward-tacotron (composite).
Create WAV Files
The RI accepts recorded audio files for ingestion. The files must meet these requirements:
- Format: .wav
- Sampling rate: 16 KHz
- Recording channel: Mono
Use audio software, such as Audacity*, to create the files.
For the section WAVE File Ingestion, create separate audio files for each entry described in the table below.
NAME THE FILE | SPEAK AND RECORD |
---|---|
query1.wav | "[wake word] Good morning." |
query2.wav | "[wake word] Where is the nearest cash machine?" |
query3.wav | "[wake word] List my accounts." |
query4.wav | "[wake word] What is my account balance." |
query7.wav | "[wake word] Goodbye." |
To create a wav file:
- Set the sampling rate by choosing 16 KHz in the Project Rate dropdown. See Figure 8.
- Set the recording channel to Mono in the Recording channel dropdown.
- To record, click the record button. Speak into the mic.
- Click the stop button.
- Save file in WAV file format.
- Copy the files into the ./compose/audio path and restart the deployment.
Figure 8: Create a WAV file with Audacity*
Troubleshooting
Restart the ALSA Firmware
- If the recording test in Install fails, restart the Advanced Linux Sound Architecture (ALSA) firmware:
pulseaudio -k && sudo alsa force-reload
- Rebuild the application. See Install.
NOTE: If the build fails after restarting the ALSA firmware, the device is not supported.
Check Configuration
To check which type of ingestion is running:
- Verify all images are built:
docker images
The order in which Docker lists the images may vary.REMINDER: RI Images (11)
- action_server
- rasa
- nlp_app
- deepspeech_asr
- kaldi_asr
- audio-ingester2
- audio-ingester
- tts
- authzap
- huggingface_asr
- quartznet_asr
- Determine if the RI is configured for wave file ingestion or live speech ingestion by listing the containers:
docker ps
The audio ingester container name indicates the type of configuration:
Table 9: Status of Configuration AUDIO INGESTER CONTAINER STATUS OF CONFIGURATION chatfrontend_wave_ingestion_1 WAVE File Ingestion
chatfrontend_audio_ingestion_1 Live Speech Ingestion - If you don’t see the correct status, refer to the Docker Logs.
NOTE: Failure - Check Logs
Installation failure messages will be available in the log file: /var/log/esb-cli/Interactive_Kiosk_AI_Chatbot_vx.x.x/Interactive_Kiosk_AI_Chatbot/output.log.
Modify Subgroup
To ensure that Docker has sudo permission:
- Add your user to the docker subgroup after Docker has been installed:
sudo usermod -aG docker $USER
- Logout and login again (or restart).
This enables running the reference implementation without the sudo command.
Clean Up Stack
To clean up the running Conversational AI chatbot stack:
- Run the command:
./edgesoftware uninstall -a
- List the image IDs:
docker images
REMINDER: RI Images (11)
- action_server
- rasa
- nlp_app
- deepspeech_asr
- kaldi_asr
- audio-ingester2
- audio-ingester
- tts
- authzap
- huggingface_asr
- quartznet_asr - Delete all the RI images with the command:
docker rmi <Image ID>
Build Issues
If you experience a build issue, check the log file at /var/log/esb-cli/Interactive_Kiosk_AI_Chatbot_v0.8.6/Interactive_Kiosk_AI_Chatbot/install.log.
Failed to Install
Audio Device Incompatibility
If the audio device is incompatible with the RI, installation produces the following error message:
Error Message
Failed to install Interactive_Kiosk_AI_Chatbot. ('', '[ERROR]: Plug in another audio device and start the installation again.', '')
To fix:
- Try connecting another device.
- Or try reinstalling the RI. See Install.
Missing Component
If Docker Engine or docker compose is not installed, the RI will produce the following error message:
Error Message
Failed to install Interactive_Kiosk_AI_Chatbot. ('', 'Some components of Interactive Kiosk AI Chatbot failed to start successfully: chatfrontend_wave_ingestion_1 chatfrontend_tts_1 ', '')
To identify the missing component:
- Run the version command of each application:
a. Docker Enginedocker -v
b. Docker Composedocker-compose -v
If a component is not installed, the system will produce the following error message:[component]: command not found
- Install the missing component:
a. Docker Engine
b. Docker Compose - To reinstall the RI, see Install.
Invalid Crendentials
If invalid OBP credentials are supplied, the RI will produce the following error in the authz log:
Error Message:
DATE TIME - INFO - [main.py:102 - main ] - Starting the Authz Server...
DATE TIME - INFO - [main.py:91 - create_login_session ] - Creating new Login
https://apisandbox.openbankproject.com
Could not obtain token
- To check the OBP credential acceptance, view the authz log by following the instructions Docker Logs.
- To reinstall the RI, see Install.
Docker Logs
Microservices produce log files.
To see logs:
- In a terminal, list the container IDs:
sudo docker ps --format "table {{.Image}}\t{{.Status}}\t{{.ID}}"
- View the log by running docker logs with the container ID:
docker logs -f <container-ID>
To see the input and output of the RI, use the container ID for NLP and ASR. For a description of the log file content, see Table 10.
CONTAINER LOG FILE | USE IT TO CONFIRM |
---|---|
ASR | Speech the ASR Recognizes |
NLP | Speech Output |
AUTHZ | OBP credentials |
Example of ASR log output (user's speech input):
Quartznet Inferred Output: ' good morning '
Example of NLP log output (user's input and bot's output):
User: may i know my account balance
Bot: The balance for the savings account ending with **** is ****.**** Rupees
For more information about:
- The purpose of each microservice, see How it Works.
- Running the application, return to WAV File Ingestion or Live Speech Ingestion.
- Models, see Train the Models.
Set Log Levels
There are two log level values:
LOG LEVELS | DESCRIPTION | PERFORMANCE |
---|---|---|
info | succinct output | Default: Best for performance. |
debug | verbose output | Impacts performance. |
To change log levels:
- Modify files for various microservice components. Find log levels in these files:
Table 12: Configuration Files CONFIGURATION FILES MICROSERVICE OR COMPONENTS compose/docker-compose-backend.yml authz
asr_speech
nlp_app
rasa_actioncompose/docker-compose-frontend.yml wave_ingestion
tts
For WAVE File Ingestion.compose/docker-compose-frontend-respeaker.yml audio_ingestion
ttsFor Live Speech Ingestion.
- Locate the LOG_LEVEL value per microservice and change it to the desired reporting level, info or debug.
Docker Pull Limit Issue
Exceeding the pull rate limit produces an error:
Error Message
ERROR: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
In the event of this error, login with a Docker premium account.
Example:
docker login
产品和性能信息
性能因用途、配置和其他因素而异。请访问 www.Intel.cn/PerformanceIndex 了解更多信息。