Building the Platform to Power Voice AI Experiences

Peter Cahill

Today, I’m excited to formally announce Voysis, the complete voice AI platform. Our platform enables companies to create their own voice ‘intelligence’ – think Alexa or Siri – but all knowing and knowledgeable about a specific brand’s products and services.

I’ll dive deeper into Voysis later in this post but before I get there I’d like to share how how I started working in voice technology R&D.

Finding my way to voice

Dating back almost six decades, a lot of people have focused on the advancement of human-computer interaction through voice. In 2003, I was drawn to the study of text-to-speech due to the challenging nature of it. While in many applications computers are used to reduce data or to perform repetitive tasks, text-to-speech is a creative problem.

Text-to-speech systems input a string of text, perhaps with punctuation, and the system needs to decide on not just what words to say but how to say it. At a very high level, a typical input could be 100 characters of a language, which may be around 100 bytes, and from that the system would be required to create about 3 seconds of audio which could be around 288,000 bytes depending on the sampling rate. So the task is to convert this highly abstract data into something much more verbose, and if any of the output data is slightly off, a native human listener can often spot it with ease. This task is difficult for human speakers as well, for people to consistently deliver speech in controllable styles, they need to be trained as voice actors, and even then it’s still a hard thing to do. Then consider the complexities that exist across different voices, languages, accents, etc.

After working on text-to-speech for many years, I transitioned into a much broader range of speech technologies when I was faculty in University College Dublin: speech recognition, voice activity detection, natural language understanding, and the list goes on.

The rise of voice experiences with merit

The rate of progress of both AI and speech technology is faster now than it has been any time in the past 20 years, and the pace is only increasing. In part, this is driven by a general belief that voice technologies are capable of powering superior interfaces and experiences, which is in turn attracting both more talent to work in the field and more financial resources. The platform companies’ systems, such as Siri and Amazon Alexa, are driving a rapid change in consumer behavior and making the masses aware of some of the benefits of both voice and conversational interfaces. Over the next twelve months, we’ll see continued demand for voice interfaces as more and more end users see how these interfaces can help them with their daily tasks.

While general voice assistants are great at a bunch of tasks, including setting timers, turning on and off switches, and playing music, they’re still a long way off natural interaction, where a user shouldn’t need to know how to talk to them or what their capabilities are. Most of these assistants are positioned strategically with the aim of becoming a first point of contact, where similar to a browser, or Google search, end users will go to their assistant by default, and from there access other apps or websites.

The side effect of this is that if a user wants to talk to an app, they often need to leave it and access voice functionality through an assistant application. This means that the assistant is ultimately hijacking the direct relationship the consumer has with the app. At the same time, these assistants are closed platforms, where the functionality that can be added is very limited even with third party plugins and skills.

This is not our view of how it should be. At Voysis we want to enable consumers to evolve their existing relationships with apps and websites to the next level, with direct voice integrations and offer features that are relevant for the vertical in question.

Voysis: The complete voice AI platform

Our platform enables companies to create their own voice ‘intelligence’ – think Alexa or Siri – but all knowing and knowledgeable about a specific brand’s products and services. Through both an API and SDK approach, the Voysis platform can, in essence, automatically create an Alexa with deep domain knowledge tailored for any company.

This is no easy task. Literally all of the large platform corporations with their relatively infinite engineering and compute resources such as Google, Amazon, and Apple, ended up acquiring small companies to obtain much of the speech and language technologies that today is the bedrock of their voice technology stack including their respective generic voice assistants. This clearly demonstrates that there is a history of smaller, singularly focused teams succeeding in this domain where the tech giants themselves have come up short.

Unlike existing voice platforms which were built out of traditional IVR components for automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS), Voysis was designed from day one to be a complete voice AI platform, and as a result we don’t follow this approach of chaining several black boxes together to form a system, instead we focus on end-to-end modelling, and avoid hard decisions wherever possible. This is where our logo comes from, illustrated as a transparent, fluid, breathing circle – the complete opposite of a chain of rigid black boxes. (The creation of our logo, brand mark and voice icons is a blog for another day!)

Our platform is powered by our proprietary deep learning engine, which was built from the ground up specifically for modelling speech and language tasks. In addition, we’ve built a full set of natural language capabilities into our platform, which currently supports 16 languages, including Mandarin Chinese, Russian, Arabic and many European languages.

We focus on deep integrations, where for a specific vertical or use case, the Voysis platform powers fully voice enabled solutions that deliver real utility and purpose. These solutions are then made available with a ‘self-serve’ model, so that any brand in that vertical can simply push their data through our platform and instantly have a voice assistant that they can integrate directly in their app or on their website.

I’m also excited to announce our series A financing, of $8M, led by Polaris Partners. We will use this capital to further grow our R&D efforts in addition to bringing the platform to market. To aid with that we have also recently opened an office in Boston, MA.

Get the latest content from Voysis.
The Voysis blog is the place for voice insights, best practices, and technical articles written by industry experts. We promise to be kind to your inbox.