Text to Speech (TTS) technologies convert text into spoken words audio and are acquiring a tremendous importance in daily life, not only as an help to people with visual impairments but also to power video games, language learning and/or translation apps, vocal assistants (like Amazon Alexa and Google Home) and to produce contents for voice apps.
And the good news is that to profit of the advantages of text to speech technology you don’t have to invest huge amounts of money in research and development because since this technology is so widespread there is plenty of APIs (Application Programming Interface) that are able to connect with TTS services to add speech synthesis to their products.
Which speech recognition API to choose
So in this rich landscape the difficulty is deciding which one could be the best speech recognition API.
In choosing you should estimate several things: the cost (there are free services, but typically TSS services are charged per letter), the output quality (if you need a truly realistic human-like voice), the difficulty in setting it up and of integrating it – if needed – with other APIs or systems, the services it offers (e.g. how many languages and the size of texts it can handle).
As you can imagine the big players didn’t underestimate the potentiality of this field and are developing their APIs.
Google Text to Speech API for example can handle more than 180 voices and can generate a synthetic voice that is really human-like with their wave-net generated voices (wawe-net models are so good also because they have been trained using raw audio samples of actual humans speaking). You can choose between several voices that differ by language, gender, and accent (for some languages). Of course Google Text to Speech is perfectly integrated with all other google services.
Amazon Polly API, part of Amazon Web Services, offers 12 months of free access. It is based on deep learning technologies and features news narration and a variety of languages. It is easy to use and install and after the free plan you are charged according to the amount of characters you convert to speech.
Watson Text to Speech API by IBM can personalize pronunciation and it offers a lite plan, which is free. Obviously it works together with Watson Speech to Text and with Watson assistant: a powerful combination!
Also Microsoft Azure does offer a free starting account with text to speech services. You can choose from more than 200 voices in more than 50 languages and you can also add emotional nuances (emphatic, blissful, etc) and tones of voices (customer care, news reading, etc).
There are also speech to text APIs that are not for general use but are more targeted, like Natural Reader, which is mainly for private use (reading aloud docs, pdfs, web pages and the like), Capti Voice, which is aimed at the education world, and Voice Dream Reader, which is optimized for the mobile, and many more.
So, in this offer so various and abundant you have to concentrate on what your needs really are, what systems and programming languages you are already using, what resources do you have to put into this project and then make an informed choice.