A text to understand the generation of confrontation network GANs

Generating adversarial networks-GAN is a very popular unsupervised algorithm in the last 2 years. It can generate very realistic photos, images and even videos. It will be used in the photo processing software in our mobile phone.

This article will introduce in detail the original design, basic principles, 10 typical algorithms and 13 practical applications of the Generative Adversarial Network-GAN.


The original intention of GAN

In a word, the motivation for designing GANs is automation.

Manual extraction of features - automatic extraction of features

we are at"A text to understand deep learning (concept + advantages and disadvantages + typical algorithm)As mentioned in the book, the most special and powerful part of deep learning is the ability to learn feature extraction by yourself.

The core difference between traditional machine learning and deep learning

The super power of the machine can solve many problems that cannot be solved manually. After automation, the learning ability is stronger and the adaptability is stronger.

Manually judge the quality of the generated results - automatic judgment and optimization

we are at"Supervised learningAs mentioned in the article, the training set requires a large amount of manual labeling data, which is costly and inefficient. The same is true for the quality of the results of manual judgment, and there are problems of high cost and low efficiency.

GAN can automatically complete this process and continuously optimize it. This is a very efficient and low-cost way. How does GAN achieve automation? Now we explain his principle.


Basic Principles of Generating Adversarial Network GANs

Large vernacular version

I know that there is a very good explanation, everyone should understand:

Suppose that the security of a city is chaotic. Soon, countless thieves will appear in this city. Among these thieves, some may be masters of theft, and some may not be technical at all. If the city begins to rectify its law and order and suddenly launch a "campaign" to combat crime, the police will resume patrols in the city. Soon, a group of "skillless" thieves will be caught. The reason why the thieves who have no technical content are caught is because the police's skills are not working. After catching a group of low-end thieves, it is difficult to say how the public security level of the city has become, but it is clear that the city The average level of thieves has greatly improved.

Severe police force led to thief level improvement

The police began to train their own crime-killing techniques and began to seize the increasingly rampant thieves. With the arrest of these professional recidivists, the police have also developed special skills. They can quickly find suspicious people from a group of people, so they go to the front and check the suspects. The thieves have a hard time. Because the level of the police has greatly improved, if you still want to behave like a ghost, you will soon be caught by the police.

Frequently improve skills, more thieves are caught

In order to avoid being arrested, the thieves worked hard to be less "suspicious", and while the devil was one foot tall and the road high, the police were constantly improving their level, trying to distinguish the thief from the innocent ordinary people. With this "communication" and "study" between the police and the thieves, the thieves have become very cautious. They have very high stealing skills and behave exactly like ordinary people, and the police have developed "eyes of fire" "When suspicious people are found, they can be found immediately and controlled in time-eventually, we get the strongest thief and the strongest police at the same time.

At the same time, I got the strongest thief and the strongest policeman.


Non-big vernacular version

Generative Adversarial Network (GAN) consists of 2 important parts:

  1. Generator): Generate data through the machine (in most cases, the image), the purpose is to "cheat" the discriminator
  2. Discriminator): Determine whether the image is real or machine-generated, the purpose is to find out the "fake data" made by the generator.

Generated against network GANs consisting of generators and discriminators

The following describes the process in detail:

The first stage: fixed "discriminator D", training "generator G"

We use an OK discriminator, let a "generator G" continuously generate "false data", and then give this "discriminator D" to judge.

At the beginning, "Generator G" was still weak, so it was easy to get rid of.

However, with continuous training, the skills of "Generator G" continued to improve, and eventually deceived "Discriminator D".

At this time, the "discriminator D" basically belongs to the state of blind guessing, and the probability of judging whether it is false data is 50%.

Fixed discriminator

The second stage: fixed "generator G", training "discriminator D"

When the first stage is passed, it is meaningless to continue training "Generator G". At this time we fixed "Generator G" and then started training "Discriminator D".

"Discriminator D" has improved his discrimination ability through continuous training. In the end, he can accurately judge all fake pictures.

At this time, "Generator G" has been unable to fool "Discriminator D".

Fixed generator, training discriminator

Cycle Phase 1 and Phase 2

Through continuous cycles, the capabilities of "Generator G" and "Discriminator D" are getting stronger and stronger.

In the end we got a very good "generator G", and we can use it to generate the picture we want.

The actual application section below will show a lot of "stunning" cases.

Loop training, 2 is getting stronger and stronger

If you are interested in the detailed technical principles of GAN, you can check out the following 2 articles:

'Generating Confrontation Network (GAN) Beginner's Guide – with code"

'Long text interpretation generates detailed principles against network GAN (20 minute reading)"


Advantages and disadvantages of GAN

3 advantages

  1. Better modeling of data distribution (images sharper and clearer)
  2. In theory, GANs can train any kind of generator network. Other frameworks require generator networks to have some specific form of functionality, such as the output layer being Gaussian.
  3. There is no need to use the Markov chain to repeatedly sample, without inferring in the learning process, without complicated variational lower bounds, avoiding the difficulty of approximating the difficult probability of calculation.

2 defects

  1. Hard to train, unstable. Good synchronization is required between the generator and the discriminator, but in actual training it is easy for D to converge and G to diverge. D/G training requires careful design.
  2. Mode Collapse issue. The learning process of GANs may have a missing pattern, the generator begins to degenerate, and the same sample points are always generated, and the learning cannot be continued.

Extended reading:Why is it so difficult to train against the network??"Reading this article is very demanding on mathematics."


10 typical GAN ​​algorithms

There are hundreds of GAN algorithms, and everyone's research on GANs has increased exponentially. At present, hundreds of forums per month are on adversarial networks.

The following figure shows the number of papers published on GANs per month:

The paper on GANs is growing exponentially

If you are interested in GANs algorithms,GANs Zoo"Look at almost all the algorithms. We have selected 10 more representative algorithms for you from many algorithms, and the technician can look at his paper and code.

algorithm paper 代码
GAN Paper address Code address
DCGAN Paper address Code address
CGAN Paper address Code address
CycleGAN Paper address Code address
CoGAN Paper address Code address
ProGAN Paper address Code address
WGAN Paper address Code address
SAGAN Paper address Code address
BigGAN Paper address Code address

The above content is organized fromGenerative Adversarial Networks – The Story So FarIn the original text, there are some rough explanations for the algorithm. If you are interested, you can take a look.


13 practical applications of GAN

GAN does not look as intuitive as "speech recognition" or "text mining". But his application has already entered our lives. Here are some practical applications of GANs.

Generate image dataset

The training of artificial intelligence requires a large number of data sets. If all of them are collected and labeled manually, the cost is very high. GAN can automatically generate some data sets to provide low-cost training data.

Vector algorithm case for generating face of GANs


Generating face photos

Generating face photos is an application that everyone is familiar with, but the photos that are generated to be used are questions that need to be considered. Because this kind of face photo is still on the edge of the law.

Examples of GANs' ability progress from 2014 year to 2017 year


Generate photos, cartoon characters

GAN can not only generate human faces, but also other types of photos, even comic characters.

GANs generated photos

Cartoon characters generated by GANs


Image to image conversion

Simply put, converting a form of image into another form of image is as magical as adding a filter. E.g:

  • Convert drafts into photos
  • Convert satellite photos to images of Google Maps
  • Convert photos into oil paintings
  • Convert daylight into night

Example of using pix2pix from sketch to color photo

GANs application - photo to oil painting, horse to zebra, winter to summer, photo to google map


Text to image conversion

The title in 2016 is " StackGAN: Image synthesis using realistic text from StackGAN The paper demonstrates the use of GAN, especially their StackGAN, to generate realistic photos from textual descriptions of simple objects such as birds and flowers.

Get text descriptions of birds and examples of GAN generated photos from StackGAN


Semantic-image-photo conversion

The title in 2017 is " High-resolution image synthesis and semantic manipulation with conditional GAN In the paper, it is demonstrated that a conditional GAN ​​is used to generate a realistic image in the case of a semantic image or sketch as an input.

Examples of semantic images and city landscape photos generated by GAN


Automatically generate models

The title in 2017 is " Posture guide image generation In the paper, mannequins can be automatically generated and new poses can be used.

GAN generated a new model pose


Photo to Emojis

GANs can automatically generate corresponding emoticons (Emojis) from face photos.

Examples of celebrity photos and gems generated by GAN


Photo editing

Using GAN can generate specific photos, such as changing hair color, changing facial expressions, and even changing gender.

Edit photos with IcGAN


Forecast the looks of different ages

Give a face photo, GAN can help you predict what you will look like at different ages.

Example of a face photo generated with GANs of different apparent ages


Improve photo resolution and make photos clearer

Give GAN a photo, and he will be able to generate a higher resolution photo, making this photo clearer.

GANs adds resolution to the original photo to make the photo clearer


Photo fix

If there is a problem in an area in the photo (such as being painted or erased), GAN can repair the area and restore the original state.

Covering a part of the middle of the photo, GANs can be well repaired


Automatically generate 3D models

Given multiple 2D images at different angles, you can generate an 3D model.

From the 2D image to the 3D chair model



Baidu Encyclopedia + Wikipedia

Baidu Encyclopedia version

Generative Adversarial Networks (GAN) is a deep learning model and one of the most promising methods for unsupervised learning in complex distribution in recent years. The model passes (at least) two modules in the framework: the generated model (Generative ModelAnd discriminant models (Discriminative ModelThe mutual game learning produces a fairly good output. In the original GAN ​​theory, G and D are not required to be neural networks, and only need to be able to fit the corresponding generation and discriminant functions. However, deep neural networks are generally used as G and D in practice. An excellent GAN application requires a good training method, otherwise the output may be unsatisfactory due to the freedom of the neural network model.

Read More

Wikipedia version

Generating Confrontation Network (GAN) is a type of artificial intelligence algorithm for unsupervised machine learning, implemented by two neural network systems that compete with each other in the zero-sum game framework. They were introduced by Ian Goodfellow and others. In 2014 this technology can generate observers of photos that look at least on the surface of real people, with many real-world features (although people in the test can really tell in many cases).

Read More


Extended reading

Prospective article (18)