A text to understand the cyclic neural network RNN

Convolutional Neural Network-CNN Already very powerful, why do you need RNN?

This article will explain the unique value of RNN in an easy-to-understand way – processing sequence data. It also explains some of the flaws in RNN and its variant algorithms.

Finally, I will introduce you to the practical application value and usage scenarios of RNN.


Why do you need RNN? What is the unique value?

Convolutional Neural Network – CNN Most of the common algorithms are one-to-one correspondence between input and output, that is, one input gets an output. There is no connection between the different inputs.

Most algorithms are one-to-one correspondence between input and output.

But in some scenarios, one input is not enough!

In order to fill in the empty space below, it is not appropriate to take any of the preceding words. We need to know not only all the previous words, but also the order between the words.

Processing of sequence data

This kind of scenario that needs to deal with "sequence data-a series of interdependent data streams" needs to be solved by using RNN.

Typical centralized sequence data:

  1. Text content in the article
  2. Audio content in speech
  3. Price trend in the stock market
  4. ……

The reason why RNN can effectively process sequence data is mainly based on his special operation principle. Let me introduce you to the basic operation principle of RNN.


The basic principle of RNN

The structure of traditional neural networks is relatively simple: input layer-hidden layer-output layer. As shown below:

Traditional neural network

The biggest difference between RNN and traditional neural networks is that each time the previous output is taken to the next hidden layer and trained together. As shown below:

RNN difference

Let's take a look at a specific case to see how RNN works:

If we need to judge the user's intention to speak (ask the weather, ask the time, set an alarm...), the user says "what time is it?" We need to segment this sentence first:

Word segmentation

Then input the RNN in order, we first use "what" as the input of the RNN and get the output "01"

Enter what to get the output 01

Then, we input "time" to the RNN network in order and get the output "02".

We can see this process, when you enter "time",The output of the previous "what" also has an effect (half of the hidden layer is black).

By analogy, all of the previous inputs have an impact on future output. You can see that the circular hidden layer contains all the previous colors. As shown below:

The embodiment of RNN's "memory" effect on the previous input

When we judge the intention, we only need the output "05" of the last layer, as shown in the figure below:

The output of the last layer of RNN is what we ultimately want.

The shortcomings of RNN are also obvious

Color distribution in the hidden layer

Through the above examples, we have found that short-term memory has a large impact (such as the orange region), but long-term memory effects are small (such as black and green regions), which is the short-term memory problem of RNN.

  1. RNN has short-term memory problems and cannot handle very long input sequences
  2. Training RNN requires significant cost

Due to the short-term memory problem of RNN, and later the RNN-based optimization algorithm, let's briefly introduce it to you.


RNN optimization algorithm

RNN to LSTM-Long Short-Term Memory Network

RNN is a rigid logic. The later the input is, the earlier the input is less affected and the logic cannot be changed.

LSTM The biggest change I made was to break the rigid logic and switch to a flexible set of logic—just retain important information.

Simply put: focus on!

RNN's sequence logic to LSTM's focus logic

For example, let's quickly read the following passage:

Read this paragraph quickly

When we finish reading quickly, we may only remember the following points:


LSTM is similar to the above focus.He can keep "important information" in longer series of data and ignore unimportant information. This solves the problem of short-term memory of RNN.

The specific technical implementation principle is not here. If you are interested, you can take a look at the detailed introduction of LSTM.Long and short term memory network – LSTM"


From LSTM to GRU

Gated Recurrent Unit-GRU is a variant of LSTM. He retains the characteristics of LSTM to focus and forget unimportant information, and it will not be lost during long-term propagation.

GRU mainly made some simplifications and adjustments on the LSTM model.

GRU mainly makes some simplifications and adjustments on the LSTM model, which can save a lot of time when the training data set is large.


RNN application and usage scenarios

As long as it involves the processing of sequence data, you can use it.NLP It is a typical application scenario.

RNN application and usage scenarios

Text generation: Similar to the above fill-in-the-blank questions, give the context, and then predict what the words in the space are.

machine translationTranslation work is also a typical sequence problem, and the order of words directly affects the results of translation.

Speech Recognition: Determine what the corresponding text is based on the input audio.

Generate image description: Speaking like a picture, giving a picture that describes the content in the picture. This is often a combination of RNN and CNN.

Generate image description

Video tag: He breaks up the video into pictures and then uses the image description to describe the picture content.


Final Thoughts

The unique value of RNN is that it can effectively process sequence data. For example: article content, voice audio, stock price trends...

The reason why he can process sequence data is because the input in front of the sequence will also affect the output behind, which is equivalent to having a "memory function". But RNN has serious short-term memory problems, and long-term data has little impact (even if it is important information).

Therefore, variant algorithms such as LSTM and GRU have appeared based on RNN. These variant algorithms have several main features:

  1. Long-term information can be effectively retained
  2. Pick important information to keep, unimportant information will choose "forgotten"

A few typical applications of RNN are as follows:

  1. Text generation
  2. Speech Recognition
  3. machine translation
  4. Generate image description
  5. Video tag


Baidu Encyclopedia + Wikipedia

Baidu Encyclopedia version

Recurrent Neural Network (RNN) is a kind of recurrent neural network with sequence data as input, recursion in the evolution direction of the sequence and all nodes (cyclic units) form a closed loop by chain connection. (recursive neural network).

The study of cyclic neural networks began in the 80-90 era in the twentieth century and developed into an important deep learning algorithm in the early 21st century, in which bidirectional cyclic neural networks (Bidirectional RNN, Bi-RNN) and long Long Short-Term Memory Networks (LSTM) are common recurrent neural networks.

Cyclic neural networks have memory, parameter sharing and Turing completeness, so they can learn the nonlinear characteristics of sequences with high efficiency. Cyclic neural networks have important applications in the field of Natural Language Processing (NLP), such as speech recognition, language modeling, machine translation, etc., and are also used in various time series predictions or convolutional neural networks (Convoutional Neural Network). , CNN) combines to deal with computer vision problems.

Read More

Wikipedia version

A cyclic neural network (RNN) is a type of neural network in which the connections between nodes form a directed graph along a sequence. This allows it to show the temporal dynamic behavior of the time series. Unlike feedforward neural networks, RNNs can use their internal state (memory) to process input sequences. This makes them suitable for tasks such as unsegmented, connecting handwriting recognition or speech recognition.

The term "recurrent neural network" is used indiscriminately to refer to two broad categories of networks having similar general structures, one of which is a finite pulse and the other is an infinite pulse. Both types of networks exhibit temporal dynamic behavior. The finite-pulse recursive network is a directed acyclic graph that can be expanded and replaced with a strict feedforward neural network, which is a non-expanding directed cyclic graph.

Both finite pulse and infinite pulse periodic networks can have additional storage states, and storage can be directly controlled by the neural network. If the store contains a time delay or has a feedback loop, the store can also be replaced by another network or chart. This controlled state is called a gated state or gated memory and is part of a long-short-term memory network (LSTM) and gated loop unit.

Read More