Read the article Word2vec

Word2vec is Word Embedding One of the ways NLP field. He is the process of turning words into "computable" "structured" vectors. This article will explain the principles, advantages and disadvantages of Word2vec.

This way is more mainstream before 2018, but with BERTThe emergence of GPT2.0, this method is not the best way to do it.

To learn more about NLP-related content, please visit the NLP topic, and a 59-page NLP document download is available for free.

Visit the NLP topic and download a 59-page free PDF


What is Word2vec?

What is Word Embedding?

Before explaining Word2vec, you need to explain Word Embedding. It is a vector that converts "uncalculable" and "unstructured" words into "calculable" and "structured".

This step is to "transform the real problem into a mathematical problem"Is a very critical step in artificial intelligence.

To learn more, see this article: "A word to understand word embedding (compared with other text representations + 2 mainstream algorithms)"

word embedding: transform unstructured data into structured data

Turning a real problem into a mathematical problem is only the first step, and you need to solve this mathematical problem later. So the model of Word Embedding itself is not important.The important thing is the generated result - the word vector. This word vector is used directly in subsequent tasks.


What is Word2vec?

Word2vec is one of the ways of Word Embedding. He was a new word embedding method proposed by Google's Mikolov in 2013.

The location of Word2vec throughout NLP can be represented by the following image:

word2vec's position in nlp

Before the advent of Word2vec, there were already some ways of Word Embedding, but the previous methods were not mature and there was no large-scale application.

The training model and usage of Word2vec are detailed below.


2 training mode for Word2vec

CBOW (Continuous Bag-of-Words Model) and Skip-gram (Continuous Skip-gram Model) are two training modes of Word2vec. Here's a simple explanation:


The current value is predicted by context. Equivalent to deducting a word from a sentence, letting you guess what the word is.

CBOW predicts the current value by context


Use the current word to predict the context. It is equivalent to giving you a word that lets you guess what words may appear in front and behind.

Skip-gram uses the current word to predict the context



To increase speed, Word2vec often uses 2 acceleration methods:

  1. Negative Sample
  2. Hierarchical Softmax

The specific acceleration method will not be explained in detail, and you can find the information yourself if you are interested.


Advantages and disadvantages of Word2vec

It should be noted that Word2vec is the product of the previous generation (before 18). After 18, you want to get the best results. You have not used Word Embedding, so you will not use Word2vec.


  1. Since Word2vec considers the context, it works better than the previous Embedding method (but not as good as 18 years later)
  2. Less dimension than the previous Embedding method, so it's faster
  3. Versatile and can be used in a variety of NLP tasks

Word2vec works well in similarity calculations

Things to note:

  1. Since words and vectors are one-to-one, the problem of polysemous words cannot be solved.
  2. Word2vec is a static way, although it is versatile, it cannot be dynamically optimized for specific tasks.

Word2vec can't solve the problem of polysemous words


Baidu Encyclopedia

Baidu Encyclopedia version

Word2vec is a group of related models used to generate word vectors. These models are shallow and double-layered neural networks that are trained to reconstruct the linguistic text. The network is expressed in words, and it is necessary to guess the input words in adjacent positions. Under the word bag model assumption in word2vec, the order of the words is not important. After the training is completed, the word2vec model can be used to map each word to a vector, which can be used to represent the relationship between words and words, which is the hidden layer of the neural network.

Read More