Understanding RNNs in a single article

Convolutional Neural Network-CNN Already powerful, why do you still need RNN?

This article will explain the unique value of RNN in a simple and understandable way-processing sequence data. It will also explain some of the shortcomings of RNN and its variant algorithms.

Finally, I will introduce the practical application value and usage scenarios of RNN.


Why is RNN needed? What is the unique value?

Convolutional Neural Network – CNN Most of the ordinary algorithms have a one-to-one correspondence between input and output, that is, one input gets one output. There is no connection between different inputs.

Most algorithms have a one-to-one correspondence between input and output

But in some scenarios, one input is not enough!

In order to fill in the blanks below, it is not appropriate to take any of the previous words. We need to know not only all the words in front, but also the order between words.

Processing of sequence data

This kind of scenario that needs to process "sequence data-a series of interdependent data streams" needs to be solved by using RNN.

Typical centralized sequence data:

  1. Text in the article
  2. Audio content in speech
  3. Price trends in the stock market
  4. ……

The reason why RNN can effectively process sequence data is mainly based on its special operating principle. Let us introduce the basic operating principle of RNN.


The basic principle of RNN

The structure of traditional neural networks is relatively simple: input layer-hidden layer-output layer.As shown below:

Traditional neural network

The biggest difference between RNN and traditional neural networks is that each time the previous output is brought to the next hidden layer and trained together. As shown below:

RNN difference

Let's use a specific case to see how RNN works:

If we need to judge the user's intention to speak (asking the weather, asking the time, setting an alarm...), the user says "what time is it?" We need to segment this sentence first:

Tokenize the input

Then input the RNN in order, we first use "what" as the input of the RNN and get the output "01"

Enter what, and get output 01

Then, we input "time" to the RNN network in order, and get output "02".

In this process, we can see that when "time" is entered,The previous "what" output also had an effect (half of the hidden layer was black).

By analogy, all the previous inputs have an impact on the future output. You can see that the circular hidden layer contains all the previous colors. As shown below:

The embodiment of RNN's `` memory '' effect on the previous input

When we judge the intention, we only need the output "05" of the last layer, as shown in the figure below:

The output of the last layer of RNN is what we ultimately want

The disadvantages of RNN are also obvious

Color distribution in hidden layers

Through the above example, we have found that short-term memory effects are large (such as orange areas), but long-term memory effects are small (such as black and green areas). This is the short-term memory problem of RNN.

  1. RNN has short-term memory problems and cannot handle very long input sequences
  2. Training RNNs requires significant costs

Due to the short-term memory problem of RNN, later RNN-based optimization algorithms appeared. Let me give you a brief introduction below.


RNN's optimization algorithm

RNN to LSTM-Long and Short Term Memory Network

RNN is a kind of rigid logic. The later the input is, the greater the influence, the earlier the input is, the smaller the influence is, and this logic cannot be changed.

LSTM The biggest change that has been made is to break this rigid logic and instead use a set of flexible logic-only retaining important information.

To put it simply: Grasp the point!

RNN sequence logic to LSTM focus logic

For example, let's quickly read the following paragraph:

Read this passage quickly

When we finish reading quickly, we may only remember the following important points:


LSTM is similar to the key points above,He can keep "important information" in long series of data and ignore unimportant information. This solves the problem of short-term memory of RNN.

The specific technical implementation principles are not expanded here. If you are interested, you can see the detailed introduction of LSTM.Long Short-Term Memory Network-LSTM"


From LSTM to GRU

Gated Recurrent Unit-GRU is a variant of LSTM.He retains the characteristics of LSTM to focus on and forget unimportant information, and it will not be lost during long-term propagation.

GRU mainly made some simplifications and adjustments on the model of LSTM

GRU mainly makes some simplifications and adjustments on the LSTM model, which can save a lot of time when the training data set is relatively large.


Applications and usage scenarios of RNN

As long as it involves the processing of sequence data, it can be used.NLP Is a typical application scenario.

Application and usage scenarios of RNN

Text generation: Similar to the fill-in-the-blank questions above, give context and then predict what the words in the space are.

machine translation: Translation work is also a typical sequence problem. The order of words directly affects the translation result.

Speech Recognition: Determine what the corresponding text is based on the input audio.

Generate image description: Similar to talking and looking at pictures. Give a picture to describe the content in the picture. This is often a combination of RNN and CNN.

Generate image description

Video tagging: He breaks down the video into pictures, and then uses image descriptions to describe the content of the pictures.



The unique value of RNN is that it can effectively process sequence data.For example: article content, voice audio, stock price trends...

The reason he can process sequence data is because the previous input in the sequence will also affect the subsequent output, which is equivalent to having a "memory function". However, RNN has serious short-term memory problems, and the long-term data impact is small (even if it is important information).

So based on RNN, there are variant algorithms such as LSTM and GRU. These variant algorithms have several characteristics:

  1. Long-term information can be effectively retained
  2. Select important information to keep, and choose "forgotten" for non-important information

Several typical applications of RNN are as follows:

  1. Text generation
  2. Speech Recognition
  3. machine translation
  4. Generate image description
  5. Video tagging


Baidu Encyclopedia + Wikipedia

Baidu Encyclopedia version

Recurrent Neural Network (RNN) is a type of recurrent neural network that takes sequence data as input, performs recursion in the evolution direction of the sequence, and connects all nodes (circular units) in a chain to form a closed loop. Road (recursive neural network).

Research on recurrent neural networks began in the 80s and 90s, and developed into important deep learning algorithms in the early XNUMXs. Bidirectional RNN, Bi-RNN ) And Long Short-Term Memory networks (LSTM) are common recurrent neural networks.

The recurrent neural network has memory, parameter sharing, and Turing completeness, so it can learn the nonlinear characteristics of sequences with high efficiency. Recurrent neural networks have important applications in natural language processing (NLP), such as speech recognition, language modeling, and machine translation. They are also used in various types of time series forecasting or in conjunction with convolutional neural networks (Convoutional Neural Network (CNN).

Read more

Wikipedia version

A recurrent neural network (RNN) is a type of neural network in which connections between nodes form a directed graph along a sequence. This allows it to show the time dynamic behavior of time series. Unlike feed-forward neural networks, RNNs can use their internal state (memory) to process input sequences. This makes them suitable for tasks such as unsegmentation, connected handwriting recognition or speech recognition.

The term "recurrent neural network" is used indiscriminately to refer to two broad classes of networks with similar general structure, one of which is a finite pulse and the other is an infinite pulse. Both types of networks exhibit temporal dynamic behavior. Finite impulse recurrent network is a directed acyclic graph, which can be expanded and replaced with a strict feedforward neural network, while infinite impulse recurrent network is a directed uncirculated graph that cannot be expanded.

Both finite-pulse and infinite-pulse periodic networks can have additional storage states, and storage can be controlled directly by the neural network. If the store contains a time delay or has a feedback loop, the store can also be replaced by another network or chart. This controlled state is called a gated state or gated memory, and is part of the Long Short-Term Memory Network (LSTM) and gated loop unit.

Read more