A long-term short-term memory network - LSTM

What is LSTM?

Long- and short-term memory networks - often referred to as LSTM - are a special type RNNCan learn long-term dependence. Proposed by Hochreiter and Schmidhuber (1997), and improved and promoted by many people in the following work. LSTM has performed very well on a wide variety of issues and is now widely used.

LSTM is explicitly designed to avoid long-term dependencies. Remembering information for a long time is actually the default behavior of LSTM, not something you need to study hard!

All recurrent neural networks have chain repeating modules of neural networks. In a standard RNN, this repeating module has a very simple structure, such as only a single tanh layer.

In RNN, there is only a single tanh layer

LSTM also has this similar chain structure, but repeating modules have different structures. Not a single neural network layer, but four, and interacting in a very special way.

LSTM has 4 neural network layers

Don't worry about the details. We will walk through the diagrams of LSTM later. Now let's try to familiarize ourselves with the symbols we will use.

Meaning of different symbols

In the above diagram, each row contains a complete vector, from the output of one node to the input of other nodes. Pink circles represent point-by-point operations, such as vector addition; and yellow boxes represent learning neural network layers. A row merge represents a concatenation, while a branch indicates that its content is being copied, and the copy will go to a different location.

 

The core idea of ​​LSTM

The key to LSTM is the cell state, the horizontal line above the graph.

The cell state is a bit like a conveyor belt. It runs through the chain and has only a few minor linear interactions. Information can easily flow in a constant way.

The key to LSTM is the state of the cell, the horizontal line above the figure.

The LSTM can add or remove information to the cell state through the fine structure of the so-called "gate."

The door can optionally pass information. They consist of an S-shaped neural network layer and point-by-point multiplication.

LSTM can add or remove information to the cell state through the fine structure of the so-called "gate"

The output value of the S-shaped network is between 0 and 1, indicating how much information passes. The 0 value means "no information passed" and the 1 value means "all information passed."

One LSTM has three such doors for maintaining and controlling cell status.

If you are interested in the detailed technical principles, you can check out this article.Illustrated Guide to LSTM's and GRU's: A step by step explanation"

 

Baidu Encyclopedia + Wikipedia

Baidu Encyclopedia version

The Long-Short Term Memory (LSTM) paper was first published in 1997. Due to its unique design structure, LSTM is suitable for processing and predicting important events with very long intervals and delays in time series.

The performance of LSTM is usually better than time recurrent neural networks and Hidden Markov Models (HMM), such as for non-segmented continuous handwriting recognition. In 2009, the artificial neural network model built with LSTM won the ICDAR handwriting recognition competition. LSTM is also commonly used for autonomous speech recognition, and 2013 used the TIMIT Natural Speech Database to achieve a record of 17.7% error rates. As a nonlinear model, LSTM can be used as a complex nonlinear unit for constructing larger deep neural networks.

Read More

Wikipedia version

Long-term and short-term memory (LSTM) units are units of the recurrent neural network (RNN). An RNN composed of LSTM units is commonly referred to as an LSTM network (or simply LSTM). The public LSTM unit consists of a unit, an input gate, an output gate, and a forgotten gate. The unit remembers the values ​​in any time interval and the three gates control the flow of information into and out of the unit.

The LSTM network is well suited for classification, processing and prediction based on time series data, as there may be an lag of unknown duration between important events in the time series. LSTM was developed to deal with the problems of explosions and disappearances that may be encountered when training traditional RNNs. The relative insensitivity to gap length is an advantage of LSTM versus RNN, hidden Markov models and other sequence learning methods in many applications.

Read More