Convolutional Neural Network-CNN Already very powerful, why do you need RNN?
This article will explain the unique value of RNN in an easy-to-understand way – processing sequence data. It also explains some of the flaws in RNN and its variant algorithms.
Finally, I will introduce you to the practical application value and usage scenarios of RNN.
Why do you need RNN? What is the unique value?
Convolutional Neural Network – CNN Most of the common algorithms are one-to-one correspondence between input and output, that is, one input gets an output. There is no connection between the different inputs.
But in some scenarios, one input is not enough!
In order to fill in the empty space below, it is not appropriate to take any of the preceding words. We need to know not only all the previous words, but also the order between the words.
This kind of scenario that needs to deal with "sequence data-a series of interdependent data streams" needs to be solved by using RNN.
Typical centralized sequence data:
- Text content in the article
- Audio content in speech
- Price trend in the stock market
- ……
The reason why RNN can effectively process sequence data is mainly based on his special operation principle. Let me introduce you to the basic operation principle of RNN.
The basic principle of RNN
The structure of traditional neural networks is relatively simple: input layer-hidden layer-output layer. As shown below:
The biggest difference between RNN and traditional neural networks is that each time the previous output is taken to the next hidden layer and trained together. As shown below:
Let's take a look at a specific case to see how RNN works:
If we need to judge the user's intention to speak (ask the weather, ask the time, set an alarm...), the user says "what time is it?" We need to segment this sentence first:
Then input the RNN in order, we first use "what" as the input of the RNN and get the output "01"
Then, we input "time" to the RNN network in order and get the output "02".
We can see this process, when you enter "time",The output of the previous "what" also has an effect (half of the hidden layer is black).
By analogy, all of the previous inputs have an impact on future output. You can see that the circular hidden layer contains all the previous colors. As shown below:
When we judge the intention, we only need the output "05" of the last layer, as shown in the figure below:
The shortcomings of RNN are also obvious
Through the above examples, we have found that short-term memory has a large impact (such as the orange region), but long-term memory effects are small (such as black and green regions), which is the short-term memory problem of RNN.
- RNN has short-term memory problems and cannot handle very long input sequences
- Training RNN requires significant cost
Due to the short-term memory problem of RNN, and later the RNN-based optimization algorithm, let's briefly introduce it to you.
RNN optimization algorithm
RNN to LSTM-Long Short-Term Memory Network
RNN is a rigid logic. The later the input is, the earlier the input is less affected and the logic cannot be changed.
LSTM The biggest change I made was to break the rigid logic and switch to a flexible set of logic—just retain important information.
Simply put: focus on!
For example, let's quickly read the following passage:
When we finish reading quickly, we may only remember the following points:
LSTM is similar to the above focus.He can keep "important information" in longer series of data and ignore unimportant information. This solves the problem of short-term memory of RNN.
The specific technical implementation principle is not here. If you are interested, you can take a look at the detailed introduction of LSTM.Long and short term memory network – LSTM"
From LSTM to GRU
Gated Recurrent Unit-GRU is a variant of LSTM. He retains the characteristics of LSTM to focus and forget unimportant information, and it will not be lost during long-term propagation.
GRU mainly makes some simplifications and adjustments on the LSTM model, which can save a lot of time when the training data set is large.
RNN application and usage scenarios
As long as it involves the processing of sequence data, you can use it.NLP It is a typical application scenario.
Text generation: Similar to the above fill-in-the-blank questions, give the context, and then predict what the words in the space are.
machine translationTranslation work is also a typical sequence problem, and the order of words directly affects the results of translation.
Speech Recognition: Determine what the corresponding text is based on the input audio.
Generate image description: Speaking like a picture, giving a picture that describes the content in the picture. This is often a combination of RNN and CNN.
Video tag: He breaks up the video into pictures and then uses the image description to describe the picture content.
Final Thoughts
The unique value of RNN is that it can effectively process sequence data. For example: article content, voice audio, stock price trends...
The reason why he can process sequence data is because the input in front of the sequence will also affect the output behind, which is equivalent to having a "memory function". But RNN has serious short-term memory problems, and long-term data has little impact (even if it is important information).
Therefore, variant algorithms such as LSTM and GRU have appeared based on RNN. These variant algorithms have several main features:
- Long-term information can be effectively retained
- Pick important information to keep, unimportant information will choose "forgotten"
A few typical applications of RNN are as follows:
- Text generation
- Speech Recognition
- machine translation
- Generate image description
- Video tag
Baidu Encyclopedia + Wikipedia
Recurrent Neural Network (RNN) is a kind of recurrent neural network with sequence data as input, recursion in the evolution direction of the sequence and all nodes (cyclic units) form a closed loop by chain connection. (recursive neural network).
The study of cyclic neural networks began in the 80-90 era in the twentieth century and developed into an important deep learning algorithm in the early 21st century, in which bidirectional cyclic neural networks (Bidirectional RNN, Bi-RNN) and long Long Short-Term Memory Networks (LSTM) are common recurrent neural networks.
Cyclic neural networks have memory, parameter sharing and Turing completeness, so they can learn the nonlinear characteristics of sequences with high efficiency. Cyclic neural networks have important applications in the field of Natural Language Processing (NLP), such as speech recognition, language modeling, machine translation, etc., and are also used in various time series predictions or convolutional neural networks (Convoutional Neural Network). , CNN) combines to deal with computer vision problems.
A cyclic neural network (RNN) is a type of neural network in which the connections between nodes form a directed graph along a sequence. This allows it to show the temporal dynamic behavior of the time series. Unlike feedforward neural networks, RNNs can use their internal state (memory) to process input sequences. This makes them suitable for tasks such as unsegmented, connecting handwriting recognition or speech recognition.
The term "recurrent neural network" is used indiscriminately to refer to two broad categories of networks having similar general structures, one of which is a finite pulse and the other is an infinite pulse. Both types of networks exhibit temporal dynamic behavior. The finite-pulse recursive network is a directed acyclic graph that can be expanded and replaced with a strict feedforward neural network, which is a non-expanding directed cyclic graph.
Both finite pulse and infinite pulse periodic networks can have additional storage states, and storage can be directly controlled by the neural network. If the store contains a time delay or has a feedback loop, the store can also be replaced by another network or chart. This controlled state is called a gated state or gated memory and is part of a long-short-term memory network (LSTM) and gated loop unit.
6 Comments
Written very well, especially graphic and visually beautiful, it is easy to understand that Xiaobian continues to cheer! What tools are these dynamic maps drawn?
Found on medium, the picture is not made by me.
Can you mark the source of the medium
The basic principle comes from medium, I don’t remember the source...
Why not talk about the parameter update method of rnn?e.g. BPTT RTRL
Because I'm not professional