Read the Encoder-Decoder and Seq2Seq in one article

Encoder-Decoder is NLP A model framework in the field. It is widely used in tasks such as machine translation and speech recognition.

This article details Encoder-Decoder, Seq2Seq, and their upgrades.You have been warned!.

To learn more about NLP-related content, please visit the NLP topic, and a 59-page NLP document download is available for free.

Visit the NLP topic and download a 59-page free PDF


What is Encoder-Decoder?

The Encoder-Decoder model is primarily a concept in the NLP world. It is not a special algorithm, but a general term for a class of algorithms. Encoder-Decoder is a generic framework in which different algorithms can be used to solve different tasks.

Encoder-Decoder This framework is a good illustration of the core ideas of machine learning:

Transforming real-world problems into mathematical problems and solving real-world problems by solving mathematical problems.

Encoder is also called an encoder. Its role is to "transform real problems into mathematical problems"

Encoder turns real problems into math problems

Decoder, also known as decoder, is used to "solve mathematical problems and translate them into real-world solutions"

Decoder solves mathematical problems and transforms them into real-world solutions

Linking 2 links and expressing them in a generic diagram is like this:

Graphic Encoder-Decoder

Regarding the Encoder-Decoder, there are 2 points to be explained:

  1. Regardless of the length of the input and output, the length of the "vector c" in the middle is fixed (this is also its defect, which will be explained in detail below)
  2. Different encoders and decoders can be selected depending on the task (can be one RNN But usually its variant LSTM Or CRANE

As long as it conforms to the above framework, it can be collectively referred to as the Encoder-Decoder model. Speaking of the Encoder-Decoder model, a term is often mentioned - Seq2Seq.


What is Seq2Seq?

Seq2Seq (short for Sequence-to-sequence), as literally, enters a sequence and outputs another sequence. The most important aspect of this structure is that the length of the input sequence and the output sequence are variable. For example, the following picture:

As shown above: 6 Chinese characters are input, and 3 English words are output. The length of the input and output are different.

The origin of Seq2Seq

Before the Seq2Seq framework was proposed, deep neural networks achieved very good results in image classification and other issues. In the problem that it is good at solving, the input and output can usually be represented as a fixed-length vector. If the length is slightly changed, the zero-padding operation is used.

However, many important issues, such as machine translation, speech recognition, automatic dialogue, etc., are expressed in sequence, and their length is not known in advance. Therefore, how to break through the limitations of the previous deep neural network, so that it can adapt to these scenarios, has become a research hotspot in 13 years, and the Seq2Seq framework came into being.

Relationship between "Seq2Seq" and "Encoder-Decoder"

Seq2Seq (emphasis purpose) does not specifically refer to a specific method, and meets the purpose of "input sequence, output sequence", and can be collectively referred to as the Seq2Seq model.

The specific methods used by Seq2Seq are basically in the scope of the Encoder-Decoder model (emphasis method).

To sum up:

  • Seq2Seq belongs to the broad category of Encoder-Decoder
  • Seq2Seq emphasizes the purpose, Encoder-Decoder emphasizes the method


What are the applications of Encoder-Decoder?

Some applications of Encoder-Decoder

Machine translation, dialogue robot, poetry generation, code completion, article summary (text-text)

"Text-Text" is the most typical application, and the length of its input sequence and output sequence may be quite different.

Google's paper on machine translation using Seq2SeqSequence to Sequence Learning with Neural Networks"

Seq2Seq application: machine translation

Speech recognition (audio-text)

Speech recognition also has strong sequence characteristics and is more suitable for the Encoder-Decoder model.

Google's paper on speech recognition using Seq2SeqA Comparison of Sequence-to-Sequence Models for Speech Recognition"

Seq2Seq application: speech recognition

Image description generation (picture-text)

Popularly speaking, it is "look at the picture and talk", the machine extracts the features of the picture, and then expresses it in words. This application is a combination of computer vision and NLP.

Image description generated paperSequence to Sequence – Video to Text"

Seq2Seq application: image description generation


Defects of Encoder-Decoder

As mentioned above, there is only one "vector c" between the Encoder and the Decoder, and the length of c is fixed.

For the sake of understanding, we analogize the process of "compression-decompression":

Compressing an 800X800 pixel image into 100KB looks clearer. Compressing an 3000X3000 pixel image into 100KB will then look blurry.

Disadvantages of Encoder-Decoder: Loss of information when input is too long

Encoder-Decoder is a similar problem: when the input information is too long, some information will be lost.


Attention solves the problem of information loss

The attention mechanism is to solve the problem of "too long information and information loss".

Attention The characteristic of the model is that Eecoder no longer encodes the entire input sequence into a fixed-length "intermediate vector C", but encodes it into a sequence of vectors. introduced Attention Of Encoder-Decoder The model is as follows:

Graphical attention

In this way, when each output is generated, it is possible to make full use of the information carried by the input sequence. And this method has achieved very good results in the translation task.

Attention is an important point of knowledge. To learn more about Attention, check outRead the Attention (essential principle + 3 big advantage + 5 big type)"