Encoder-Decoder is NLP A model framework in the field. It is widely used in tasks such as machine translation and speech recognition.
This article details Encoder-Decoder, Seq2Seq, and their upgrades.You have been warned!.
To learn more about NLP-related content, please visit the NLP topic, and a 59-page NLP document download is available for free.
What is Encoder-Decoder?
The Encoder-Decoder model is primarily a concept in the NLP world. It is not a special algorithm, but a general term for a class of algorithms. Encoder-Decoder is a generic framework in which different algorithms can be used to solve different tasks.
Encoder-Decoder This framework is a good illustration of the core ideas of machine learning:
Transforming real-world problems into mathematical problems and solving real-world problems by solving mathematical problems.
Encoder is also known as an encoder. Its role is to "transform real problems into mathematical problems."
Decoder, also known as the decoder, his role is to "solve mathematical problems and transform them into real-world solutions."
Linking 2 links and expressing them in a generic diagram is like this:
Regarding the Encoder-Decoder, there are 2 points to be explained:
- Regardless of the length of the input and output, the length of the "vector c" in the middle is fixed (this is also its drawback, as explained below)
- Different encoders and decoders can be selected depending on the task (can be one RNN But usually its variant LSTM Or CRANE ）
As long as it conforms to the above framework, it can be collectively referred to as the Encoder-Decoder model. Speaking of the Encoder-Decoder model, a term is often mentioned - Seq2Seq.
What is Seq2Seq?
Seq2Seq (short for Sequence-to-sequence), as literally, enters a sequence and outputs another sequence. The most important aspect of this structure is that the length of the input sequence and the output sequence are variable. For example, the following picture:
As shown above: 6 Chinese characters are input, and 3 English words are output. The length of the input and output are different.
The origin of Seq2Seq
Before the Seq2Seq framework was proposed, deep neural networks achieved very good results in image classification and other issues. In the problem that it is good at solving, the input and output can usually be represented as a fixed-length vector. If the length is slightly changed, the zero-padding operation is used.
However, many important issues, such as machine translation, speech recognition, automatic dialogue, etc., are expressed in sequence, and their length is not known in advance. Therefore, how to break through the limitations of the previous deep neural network, so that it can adapt to these scenarios, has become a research hotspot in 13 years, and the Seq2Seq framework came into being.
Relationship between "Seq2Seq" and "Encoder-Decoder"
Seq2Seq (emphasis) does not specifically refer to specific methods. For the purpose of "input sequence, output sequence", it can be collectively referred to as Seq2Seq model.
The specific methods used by Seq2Seq are basically in the scope of the Encoder-Decoder model (emphasis method).
To sum up:
- Seq2Seq belongs to the broad category of Encoder-Decoder
- Seq2Seq emphasizes the purpose, Encoder-Decoder emphasizes the method
What are the applications of Encoder-Decoder?
Machine translation, dialogue robot, poetry generation, code completion, article summary (text-text)
"Text-Text" is the most typical application, and the length of its input sequence and output sequence may be quite different.
Google's paper on machine translation using Seq2SeqSequence to Sequence Learning with Neural Networks"
Speech recognition (audio-text)
Speech recognition also has strong sequence characteristics and is more suitable for the Encoder-Decoder model.
Google's paper on speech recognition using Seq2SeqA Comparison of Sequence-to-Sequence Models for Speech Recognition"
Image description generation (picture-text)
The popular way is to "see the picture and talk", the machine extracts the picture features, and then expresses them in words. This application is a combination of computer vision and NLP.
Image description generated paperSequence to Sequence – Video to Text"
Defects of Encoder-Decoder
As mentioned above: there is only one "vector c" between the Encoder and the Decoder to convey information, and the length of c is fixed.
For ease of understanding, our analogy is the process of "compression-decompression":
Compressing an 800X800 pixel image into 100KB looks clearer. Compressing an 3000X3000 pixel image into 100KB will then look blurry.
Encoder-Decoder is a similar problem: when the input information is too long, some information will be lost.
Attention solves the problem of information loss
The Attention mechanism is to solve the problem of "excessive information and loss of information."
Attention The characteristic of the model is that Eecoder no longer encodes the entire input sequence into a fixed-length "intermediate vector C", but encodes a sequence of vectors. introduced Attention Of Encoder-Decoder The model is as follows:
In this way, when each output is generated, it is possible to make full use of the information carried by the input sequence. And this method has achieved very good results in the translation task.
Attention is an important point of knowledge. To learn more about Attention, check outRead the Attention (essential principle + 3 big advantage + 5 big type)"