Google Develops “TACOTRON” - a human like text-to-speech AI - India Vel

India Vel

India Vel

Latest

Post Top Ad

Monday 1 January 2018

Google Develops “TACOTRON” - a human like text-to-speech AI

Artificial Intelligence is booming exponentially and talk of the town technology since recent years. Researchers and scientists truly believe that AI would be the future technology that dominates all over the globe in all aspects of fields.



As a next step towards AI technology, Google has developed a text-to-speech system like a human speech articulation.

The text-to-speech is called as “Tacotron 2”, that delivers an AI generated computer speech that almost matches with the voice of humans.

Google's CEO Mr. Sundar Pichai announced at the Google i/o developers  conference that the company is shifting its focus from mobile-first to AI-first . And it has launched several products and features based on AI technology like Google Lens, Smart Reply for Gmail and Google Assistant for iPhone.

The system first creates a spectrogram of the text, a visual representation of how the speech should sound – according to the paper published.

That image is put through Google's existing WaveNet algorithm, which uses the image and brings AI closer than ever to clearly distinguish imitate or mimicking human speech. The algorithm can easily learn different voices and even generates artificial breaths.

The researchers quoted that the MOS – Mean Opinion Score for the model is 4.53, comparable to a MOS of 4.58.

"Tacotron 2" can detect from context the difference between the noun "desert" and the verb "desert," as well as the noun "present" and the verb "present," and alter its pronunciation accordingly.

It can place emphasis on capitalized words and apply the proper inflection when asking a question rather than making a statement.

Still the developers of Google did not reveal much information to figure out how far they have come in developing this system.


According to the paper, “Based on the paper, it's highly probable that "gen" indicates speech generated by Tacotron 2 and "gt" is real human speech. ("GT" likely stands for "ground truth," a machine learning term that basically means "the real deal".

No comments:

Post a Comment

Post Bottom Ad

Pages