What is LSTM?
Long- and short-term memory networks - often referred to as LSTM - are a special type RNNCan learn long-term dependence. Proposed by Hochreiter and Schmidhuber (1997), and improved and promoted by many people in the following work. LSTM has performed very well on a wide variety of issues and is now widely used.
LSTM is explicitly designed to avoid long-term dependencies. Remembering information for a long time is actually the default behavior of LSTM, not something you need to study hard!
All recurrent neural networks have chain repeating modules of neural networks. In a standard RNN, this repeating module has a very simple structure, such as only a single tanh layer.
LSTM also has this similar chain structure, but repeating modules have different structures. Not a single neural network layer, but four, and interacting in a very special way.
Don't worry about the details. We will walk through the diagrams of LSTM later. Now let's try to familiarize ourselves with the symbols we will use.
In the above diagram, each row contains a complete vector, from the output of one node to the input of other nodes. Pink circles represent point-by-point operations, such as vector addition; and yellow boxes represent learning neural network layers. A row merge represents a concatenation, while a branch indicates that its content is being copied, and the copy will go to a different location.
The core idea of LSTM
The key to LSTM is the cell state, the horizontal line above the graph.
The cell state is a bit like a conveyor belt. It runs through the chain and has only a few minor linear interactions. Information can easily flow in a constant way.
The LSTM can add or remove information to the cell state through the fine structure of the so-called "gate."
The door can optionally pass information. They consist of an S-shaped neural network layer and point-by-point multiplication.
The output value of the S-shaped network is between 0 and 1, indicating how much information passes. The 0 value means "no information passed" and the 1 value means "all information passed."
One LSTM has three such doors for maintaining and controlling cell status.
If you are interested in the detailed technical principles, you can check out this article.Illustrated Guide to LSTM's and GRU's: A step by step explanation"
Baidu Encyclopedia + Wikipedia