Cognitive IVR, Conversational AI-IVR, Conversational AI Automation, Cognitive Engagement and Natural Language IVR seem to be all the rage lately. And for good reason, there are many benefits to these technologies, yet most of them are still prone to challenges and the different terminology can be confusing. For most purposes, we consider them all basically the same in concept, and we built Cognitive Voice Automation Suite (CVA) to use natural language to engage a customer in an interactive and automated fashion.

The promise of a Conversational IVR is huge, however creating transformational Customer Experiences with natural language can still be a challenge. There are many points of failure, and if one piece of the process fails, everything downstream suffers.


Speech-to-Text is the first breakpoint in any voice application. Speech-to-Text software creates a transcript of the customer’s conversation and dialogue in real-time. This serves as inputs into the dialogue manager. There are several solid options for this, including: Google’s Speech-to-Text, Amazon Transcribe and Microsoft Cognitive Services. Yet as good as they are, they are still prone to inaccuracies. Thus, the Natural Language understanding models in your Cognitive IVR need to be trained to anticipate failures during the transcription process.

As example, capturing a customer's address is often problematic. For instance, even the best Speech-to-Text engine might create a transcript that reads something like “1735 Pendleton ever knew” instead of "Avenue". Which of course, would make no sense to a typical bot. However, in the Xaqt Conversational AI-Assurance process, our engineers train the natural language engine to handle these types of nuances so that they can be properly resolved. Additionally, addresses are queried against Xaqt's Address resolution server in real time to further improve accuracy and ensure that the correct address is captured and verified back to the customer. In the event that an address is handled incorrectly, a Conversational Quality Assurance Analyst listens to the call recording to understand why the call failed and trains the model for the next caller.

Addresses aren't the only problematic inputs for natural language IVRs either. Some of these could also include, how to understand different dialects and accents. As such, we have a team of people monitoring interactions and continually training our natural language IVR models. This approach both improves Customer Experience and the effectiveness of the Conversational IVR.

Even a one percent improvement in effectiveness can translate to significant outcomes over time.

Natural Language Understanding and Dialogue Management

Capturing an accurate transcript of a customer’s request is only the first step in creating a Conversational AI based customer interaction. Once a transcript is generated, either through speech-to-text or a chatbot, it gets handed-off to the Natural Language Understanding (NLU) engine.

The NLU engine is where the customers' request gets interpreted by the Cognitive IVR. The call center bot parses the transcript and deciphers the caller's Intent. An intent, in conversational AI terms, is the reason for the customer's call or what they want to accomplish. Maybe they have a question they need answered or need to open a support ticket.

Once the caller's Intent is established and the appropriate entities are extracted and slotted, then that information gets structured and then handed-off to the dialogue manager that then determines how to process the information. Think of the dialogue, or conversation manager, as the core set of instructions for determining what action to take or how to respond to the customer. This could include something like, requesting information from a knowledge base to answer the customer's question or passing customer information to a support system or CRM to look-up information for the customer, such as the status of their order.

Text-to-Speech and Call Control

Once the AI bot has taken the appropriate action and determined how to respond to the customer, it will create a transcript that needs to be read back to a caller. That transcript then gets handed-off to the Text-to-Speech engine to generate the voice response.

In theory, this should be a straightforward process. The natural language bot receives a script and then processes and synthesizes the speech. In practice though, there are many ways this process fails and can create a poor customer experience.

For example, the reading back of numbers can be problematic. Let's say you need to confirm the of address that the caller provided. In the previous example of 1735 Pendleton Avenue, the bot will have several options for speaking 1735 to the caller:

* One-thousand-thirty-five Pendleton Avenue (this format is usually the default)

* Seventeen-Thirty-Five Pendleton Avenue

* One-Seven-Three-Five Pendleton Avenue

This is where Speech Synthesis Markup Language (SSML) comes into play. SSML allows you create set-of instructions for the call center bot on the specifics of how you would like the transcript read and pronounced to the caller. It does this through a set of standardized tags, much like HTML. In the address example, you would create a tag around the street number and specify in which format you would like it spoken. SSML even allows for custom pronunciation, intonation and punctuation.

In summary, the raw transcript instructs your text-to-speech what to say, and the SSML tags tell it how to say it. Xaqt's Cognitive Automation Suite gives you full control to bring the right personality to your call center bot. We also take the complexity out of the process, so you don't have to hire your own software developers.

The transcript also needs to include instructions for the ACD or voice platform on what to do after it responds to the customer. We refer to this as Call Control Instructions, and it could include parameters that captures more information from the customer, tells the ACD where to route the call, or even end the call of it was successful. Cognitive Voice Automation Suite enables call control across any phone system.

Context is Critical in Automating Voice Interactions

Whereas most vendors on the market have placed their focus on natural language understanding and chatbots only, we built Cognitive Voice Automation Suite with knowledge and business integrations at the core. In other words, not only can our voicebots converse with a customer, but they can do so intelligently and be smart enough to take the appropriate automated action for each customer.

Take a transit agency for example. People may call wanting to know where their bus is, or why it's late. With a traditional speech IVR, you can program the voice inputs and outputs, but it will lack the required knowledge to actually answer the question - "where's my bus?" However, with Xaqt's integrated data analytics platform at the core, not only do we know where the bus is at but we know where it's going and when it will get there. Thereby enabling an entirely new set of use cases and interaction types. We call this context making, and it plays out across any scenario in your enterprise where a customer interaction requires access to knowledge. And, as new data is added to our cognitive system, it can answer more questions and do more things.

The ability to leverage Human-Assisted AI and Human-in-the-Loop to train deeply in domains with context is critical to success in voice automation. In the Transit example, existing solutions force a caller to speak or enter their route numbers and Stop IDs that might not be known or readily available to person on the phone. What the caller really wants to say is, "I'm going from Charlestown to Kendall, when will my bus be here and will when will I arrive?" In order to translate that into something a call center bot can process, it has to know that Charlestown has two bus stops that the caller could be at, and that the Charlestown to Harvard Square run is Route 86, and then identify the current buses on that route that could be the one their looking for. Or, maybe the caller says, "I'm standing in front of Zume's coffee shop, and I'm going to Kendall." Again, the Cognitive IVR has to relate Zume's coffee shop to the appropriate bus stop, and also know that the caller is referring to Kendall Square and not a Kindle device. These scenarios can be configured up front, and then through continuous monitoring and context building, the transit bot gets smarter and will address more and more customer interactions and improve caller experience while they're at it.

As another example, you may have a voice application or call center bot that's really good at automating calls from citizens that need to report a pothole. But what happens when that voicebot needs to know when trash day is, or wants to know when the construction on their street will end? For most call center bot platforms, that's an entirely new set of challenges to solve for. However, with Xaqt's context engine, we provide the tools to quickly build up your knowledge base, connect the underlying systems, and implement the end-to-end process automation to handle new customer interactions. This approach enables the deployment of highly specialized voicebots and automations and provide a quality experience for customers.

Basically, the more data you connect, the more questions your bot will be able to answer. And, the more business applications you connect, the more processes your bot can automate. This serves as the foundation for Robotic Process Automation. By starting small, you can have some quick wins and scale as you add more data and processes.