Google Cloud updates AI-powered speech tools for corporations

by Brett Harper

Google Cloud on Thursday announced it is updating its Text-to-Speech merchandise with greater voice and greater languages. Google has also advanced the great of its Speech-to-Text transcription tools and is bringing a number of its functions into general availability. The updates need to assist builders in constructing original voice applications which can reach thousands and thousands of extra-human beings and capacity extra correctly.

For Text-to-Speech, Google has doubled the range of voices, considering its remaining update in August. It’s brought help for seven new languages or editions, together with Danish, Portuguese/Portugal, Russian, Polish, Slovakian, Ukrainian, and Norwegian Bokmål — all in beta. The product now supports a total of 21 languages.


Across one’s new languages, Google has added 31 new WaveNet voices and 24 new widespread voices. Google says it now supports a complete of 106 views. WaveNet is a deep neural community for generating raw audio, creating voices that might be more natural-sounding than trendy textual content-to-speech voices. The technology was created via DeepMind, the AI organization Google, received in 2014.

“Thanks to precise get admission to WaveNet technology powered by using Google Cloud TPUs; we can construct new voices and languages faster and simpler than is usual within the enterprise,” Google product manager Dan Aharon said in a weblog publish.

Google’s number one competition for Text-to-Speech offerings is Amazon Web Services’ Polly, which according to its website, currently allows fifty-eight voices.

In addition to adding new voices, Google’s Text-to-Speech Device Profiles function is now ordinarily available. This we could customers optimize audio playback on one-of-a-kind hardware, including headphones for media programs like podcasts.

Meanwhile, for Speech-to-Text, Google is bringing premium models for video into widespread availability and a more great smartphone, which had been rolled out in beta remaining yr. The video model based totally on technology similar to what YouTube uses for computerized captioning now has 64 percent fewer transcription errors, Google announced. The better cellphone version now has 62 percent fewer mistakes.

Google turned into capable of progressing the models, requiring customers who used the premium offerings to proportion utilization records through facts logging. Starting now, clients can use the enhanced telephone version without opting in to data sharing, even as individuals who opt-in will pay a lower price. Prices also are more economical for all top-class video model customers, and those who decide on statistics sharing get an additional discount.

Google is also announcing the overall availability of multi-channel recognition, which enables the Speech-to-Text API to distinguish among a couple of audio channels. This is useful for eventualities related to multiple humans, such as doing assembly analytics.

Related Posts