What is speech recognition? What is his value and what is his technical principle? This article will answer your common questions about speech recognition.
What is Speech Recognition Technology (ASR)?
To make a dialogue with people, you need to implement three steps:
Corresponding to the work of "ears", "brains", "mouths", the machine must understand the human speech, it is inseparable from the speech recognition technology (ASR).
Speech recognition has become a very common technology that everyone often uses in their daily lives:
- Apple users must have experienced Siri, which is a typical speech recognition.
- There is a function in WeChat that is "text-to-text", which also uses voice recognition
- Recently popular smart speakers are products with speech recognition as the core.
- Compared with the new car, the basic function of voice control is also the voice recognition.
Speech recognition technology
The speech recognition technology is split down and can be divided into "input-encoding-decoding-output" 4 processes.
How does speech recognition work?
First of all, the sound itself is a kind of wave, just like we often use a segment of waveform to represent the audio.
Then follow the steps:
- After the signal processing of the audio, it is split according to the frame (millisecond level), and the segmented waveform is changed into a multi-dimensional according to the characteristics of the human ear.vectorInformation
- Identify these frame information as status (can be understood as an intermediate process, a ratiophonemeStill small process)
- Combine the states to form phonemes (usually 3 states = 1 phonemes)
- Finally, the phonemes are composed of words (dà jiā hǎo) and concatenated into sentences. Thus, this can be converted from speech to text.
Baidu Encyclopedia and Wikipedia