Convolutional Neural Network-CNN Already very powerful, why do you need RNN?
This article will explain the unique value of RNN in an easy-to-understand way – processing sequence data. It also explains some of the flaws in RNN and its variant algorithms.
Finally, I will introduce you to the practical application value and usage scenarios of RNN.
Why do you need RNN? What is the unique value?
Convolutional Neural Network – CNN Most of the common algorithms are one-to-one correspondence between input and output, that is, one input gets an output. There is no connection between the different inputs.
But in some scenarios, one input is not enough!
In order to fill in the empty space below, it is not appropriate to take any of the preceding words. We need to know not only all the previous words, but also the order between the words.
This kind of scenario that needs to process "sequential data-a series of interdependent data streams" needs to be solved by using RNN.
Typical centralized sequence data:
- Text content in the article
- Audio content in speech
- Price trend in the stock market
The reason why RNN can effectively process sequence data is mainly based on his special operation principle. Let me introduce you to the basic operation principle of RNN.
The basic principle of RNN
The structure of traditional neural networks is relatively simple: input layer-hidden layer-output layer. As shown below:
The biggest difference between RNN and traditional neural networks is that each time the previous output is taken to the next hidden layer and trained together. As shown below:
Let's take a look at a specific case to see how RNN works:
If we need to judge the user's intention to speak (ask the weather, ask the time, set an alarm...), the user says "what time is it?" We need to segment this sentence first:
Then input the RNN in order, we first use "what" as the input of the RNN to get the output "01"
Then, we input "time" into the RNN network in order, and get the output "02".
We can see this process, when you enter "time",The output of the previous "what" also has an effect (half of the hidden layer is black).
By analogy, all of the previous inputs have an impact on future output. You can see that the circular hidden layer contains all the previous colors. As shown below:
When we judge the intent, we only need the output of the last layer "05", as shown below:
The shortcomings of RNN are also obvious
Through the above examples, we have found that short-term memory has a large impact (such as the orange region), but long-term memory effects are small (such as black and green regions), which is the short-term memory problem of RNN.
- RNN has short-term memory problems and cannot handle very long input sequences
- Training RNN requires significant cost
Due to the short-term memory problem of RNN, and later the RNN-based optimization algorithm, let's briefly introduce it to you.
RNN optimization algorithm
RNN to LSTM-Long Short-Term Memory Network
RNN is a rigid logic. The later the input is, the earlier the input is less affected and the logic cannot be changed.
LSTM The biggest change I made was to break the rigid logic and switch to a flexible set of logic—just retain important information.
Simply put: focus on!
For example, let's quickly read the following passage:
When we finish reading quickly, we may only remember the following points:
LSTM is similar to the above focus.He can retain "important information" in longer sequence data, ignoring less important information. This solves the problem of short-term memory of RNN.
The specific technical implementation principle is not here. If you are interested, you can take a look at the detailed introduction of LSTM.Long and short term memory network – LSTM"
From LSTM to GRU
Gated Recurrent Unit-GRU is a variant of LSTM. He retains the characteristics of LSTM to focus and forget unimportant information, and it will not be lost during long-term propagation.
GRU mainly makes some simplifications and adjustments on the LSTM model, which can save a lot of time when the training data set is large.
RNN application and usage scenarios
As long as it involves the processing of sequence data, you can use it.NLP It is a typical application scenario.
Text generation: Similar to the above fill-in-the-blank questions, give the context, and then predict what the words in the space are.
machine translationTranslation work is also a typical sequence problem, and the order of words directly affects the results of translation.
Speech Recognition: Determine what the corresponding text is based on the input audio.
Generate image description: Speaking like a picture, giving a picture that describes the content in the picture. This is often a combination of RNN and CNN.
Video tag: He breaks up the video into pictures and then uses the image description to describe the picture content.
The unique value of RNN is that it can effectively process sequence data. For example: article content, voice audio, stock price trends...
The reason why he can process sequence data is because the input in front of the sequence will also affect the output behind, which is equivalent to having a "memory function". But RNN has serious short-term memory problems, and long-term data has little impact (even if it is important information).
Therefore, variant algorithms such as LSTM and GRU have appeared based on RNN. These variant algorithms have several main features:
- Long-term information can be effectively retained
- Pick important information to keep, unimportant information will choose "forgotten"
A few typical applications of RNN are as follows:
- Text generation
- Speech Recognition
- machine translation
- Generate image description
- Video tag
Baidu Encyclopedia + Wikipedia