Google Cloud updates AI-powered speech tools for corporations

by Brett Harper

Google Cloud on Thursday announced it is updating its Text-to-Speech merchandise with greater voice and greater languages. Google has also advanced the great of its Speech-to-Text transcription tools and is bringing a number of its functions into general availability. The updates need to assist builders in constructing original voice applications which can reach thousands and thousands of extra-human beings and capacity extra correctly.

For Text-to-Speech, Google has doubled the range of voices to be had considering its remaining update in August. It’s brought help for seven new languages or editions, together with Danish, Portuguese/Portugal, Russian, Polish, Slovakian, Ukrainian and Norwegian Bokmål — all in beta. The product now supports a total of 21 languages.


Across one’s new languages, Google has added 31 new WaveNet voices and 24 new widespread voices. Google says it now supports a complete of 106 views.

WaveNet is a deep neural community for generating raw audio, which creates voices which might be more natural-sounding than trendy textual content-to-speech voices. The technology becomes created via DeepMind, the AI organization Google received in 2014.

“Thanks to precise get admission to WaveNet technology powered by using Google Cloud TPUs; we can construct new voices and languages faster and simpler than is usual within the enterprise,” Google product manager Dan Aharon said in a weblog publish.

Google’s number one competition for Text-to-Speech offerings is Amazon Web Services’ Polly, which according to its website currently allows fifty-eight voices.

In addition to adding new voices, Google’s Text-to-Speech Device Profiles function is now ordinarily available. This we could customers optimize audio playback on one of a kind kinds of hardware, inclusive of headphones for media programs like podcasts.

Meanwhile, for Speech-to-Text, Google is bringing into widespread availability premium models for video and a more great smartphone, which had been rolled out in beta remaining yr. The video model, that’s based totally on technology similar to what YouTube uses for computerized captioning, now has 64 percentage fewer transcription errors, Google announced. The better cellphone version now has 62 percent fewer mistakes.

Google turned into capable of progressed the models utilizing requiring customers who used the premium offerings to proportion utilization records thru facts logging. Starting now, clients can use the enhanced telephone version without opting into data sharing, even as individuals who opt-in will pay a lower price. Prices also are more economical for all top class video model customers, and those who decide into statistics sharing gets an additional discount.

Google is also announcing the overall availability of multi-channel recognition, which enables the Speech-to-Text API to distinguish among a couple of audio channels. This is useful for in eventualities related to multiple humans, such as doing assembly analytics.

Related Posts