What is speech recognition? What is his value and what is his technical principle? This article will answer your common questions about speech recognition.
What is Speech Recognition Technology (ASR)?
To make a dialogue with people, you need to implement three steps:
Corresponding to the work of "ears", "brains", "mouths", the machine must understand the human speech, it is inseparable from the speech recognition technology (ASR).
Speech recognition has become a very common technology that everyone often uses in their daily lives:
- Apple users must have experienced Siri, which is a typical speech recognition.
- There is a function in WeChat that is "text-to-text", which also uses voice recognition
- Recently popular smart speakers are products with speech recognition as the core.
- Compared with the new car, the basic function of voice control is also the voice recognition.
Speech recognition technology
The speech recognition technology is split down and can be divided into "input-encoding-decoding-output" 4 processes.
How does speech recognition work?
First of all, the sound itself is a kind of wave, just like we often use a segment of waveform to represent the audio.
Then follow the steps:
- After the signal processing of the audio, it is split according to the frame (millisecond level), and the segmented waveform is changed into a multi-dimensional according to the characteristics of the human ear.vectorInformation
- Identify these frame information as status (can be understood as an intermediate process, a ratiophonemeStill small process)
- Combine the states to form phonemes (usually 3 states = 1 phonemes)
- Finally, the phonemes are composed of words (dà jiā hǎo) and concatenated into sentences. Thus, this can be converted from speech to text.
Baidu Encyclopedia and Wikipedia
Speech recognition technology, also known as Automatic Speech Recognition (ASR), aims to convert vocabulary content in human speech into computer readable input such as buttons, binary codes or sequences of characters. Unlike speaker recognition and speaker confirmation, the latter attempts to identify or confirm the speaker of the speech rather than the vocabulary content contained therein.
Speech recognition is an interdisciplinary sub-area of computational linguistics, and its development methods and techniques enable the recognition and translation of spoken language by computer. It is also known as Automatic Speech Recognition (ASR), Computer Speech Recognition or Speech to Text (STT). It combines knowledge and research in the fields of linguistics, computer science and electrical engineering.
Some speech recognition systems require "training" (also called "registration"), in which individual speakers read text or isolated vocabulary into the system.The system analyzes a person’s specific voice and uses it to fine-tune the recognition of that person’s voice, thereby improving accuracy.A system that does not use training is called a "speaker-independent" system.The system that uses training is called "speaker dependence".
Speech to text isTTSNot STT, is STT written by Wikipedia? ?
STT does not appear in the article...
Speech to text, speech to text, STT, no problem,TTSIt is text-to-speech, here is another name for ASR
The layout and ideas of the article are very good! Very convenient to learn.
Thanks for sharing!