Small white version
Speech synthesis is about letting the machine imitate human speech. That is to input a paragraph of text, and finally output a paragraph of speech.
To make a comparison, when you think of a piece of content in the "brain" of the machine, or when you see a paragraph, you know which words should be read:
- Disassemble the text and get the duration and frequency of the phoneme. Just like we sometimes disassemble the text's radicals and suffixes to get the text pronunciation.
- Know which combinations of words will be a word, and say this in a way that is easy for humans to understand.
- In the process of speaking, the person's speaking habits, pronunciation features, accent characteristics, etc. will be combined to obtain a voice with obvious human characteristics. (Google has made a machine sound that really wants humans,View video demo）
Just like anthropology, machines also need to use a large number of voice clips as "listening materials" to learn pronunciation skills. And have to learn some linguistic rules, such as grammar and rhythm, in order to be like human beings, speaking tone, context, can express the meaning beyond the literal. Otherwise, the machine will only say that the birth is hard, there is no emotion, and there is no coherence (like the dialogue of robot characters in some anime or game works).
The sound synthesis technology is currently mainly used in reading software, navigation software, dialogue question answering system and so on.
Baidu Encyclopedia version
Speech synthesis is a technique for producing artificial speech by mechanical and electronic methods. TTS technology (also known as text-to-speech technology) is a part of speech synthesis. It is a technology that converts computer-generated or externally-entered text information into an audible and fluent Chinese spoken language output.
Speech synthesis is the artificial generation of human speech. A computer system for this purpose is called a speech computer or a speech synthesizer and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems present symbolic language representations, such as transcription of speech into pronunciation.