In recent years, we have witnessed countlessCNNThe birth of. These networks have become so deep that it becomes very difficult to visualize the entire model. We stop tracking them and think of them as black box models.

Ok, maybe you didn't. But if you still have embarrassment, then you have come to the right place! This article is 10common The visualization of the CNN architecture is truly up to you. These illustrations provide a more compact view of the entire model without having to scroll down a few times just to see the softmax layer. In addition to these images, I have some notes on how they evolve over time-from 5 to 50 convolutional layers, from ordinary convolutional layers to modules, from 2-3 towers to 32 towers, from 7 To 7 to 5 ⨉5-but more later.

By'common', I mean those pre-trained weights that are usually shared by deep learning libraries (such as TensorFlow, Keras, and PyTorch) for users to use, as well as models that are usually taught in class. Some of these models are inImageNet Large Scale Visual Identity Challenge(ILSVRC)Success in the competition.

10 CNN architecture and their paper publication year
The 10 architectures to be discussed and the year in which their papers are published.
The pre-training weight provided by Keras, we will discuss the 6 architecture. Adapted from the form in the Keras documentation.
The pre-training weight provided by Keras, we will discuss the 6 architecture. be adapted fromKeras documentationForm.

The motivation for writing this article is that there aren't many blogs and articles with these compact visualizations (if you know anything, please share them with me). So I decided to write one for our reference. To this end, I have read the papers and code (mainly from TensorFlow and Keras) to present these videos.

Here, I want to add that the many CNN architectures that we have seen in the wild are the result of many factors-improved computer hardware, ImageNet competition, solving specific tasks, new ideas, etc. Google researcher Christian Szegedy once mentioned this

"This advancement is not only the result of more powerful hardware, larger data sets and larger models, but also the result of new ideas, algorithms and improved network architecture." (Szegedy et al., 2014)

Now let's move on to discuss these beasts and see how the network architecture improves over time!

About visualization
Note Please note that I have excluded information such as the number of convolution filters, padding, stride, loss, and flattening operations in the illustrations.

Content (sorted by publication year)

  1. LeNet-5
  2. AlexNet
  3. VGG-16
  4. Inception-V1
  5. Inception-V3
  6. RESNET-50
  7. Xception
  8. Inception-V4
  9. Inception-ResNets
  10. ResNeXt-50


1. LeNet-5 (1998)

Figure 1: LeNet-5 architecture, based on theirpaper

LeNet-5 is one of the simplest architectures. It has 2 convolutional layers and 3 fully connected layers (hence the "5"-the name of the neural network from theconvolutionFully connectedIt is very common to get the number of layers). The average collection layer we now know is calledSubsampling layerIt has a trainable weight (this is not the current practice of designing CNN). The architecture has about60,000 parameters.

⭐️What is a novel?

The architecture has become the standard "template": stacking convolution and pooling layers, and ending the network with one or more fully connected layers.


2. AlexNet (2012)

Figure 2: AlexNet architecture, based on theirpaper.

With60M parameter, AlexNet has 8 layers – 5 convolutions and 3 full connections. AlexNet just stacked several layers on top of LeNet-5. At the time of publication, the authors pointed out that their architecture is "one of the largest convolutional neural networks in the ImageNet subset so far."

⭐️What is a novel?

They are the first to implement rectified linear units (ReLUs) as an activation function.

2. Overlapping pools in CNN.


3. VGG-16 (2014 years)

Figure 3: VGG-16 architecture, based on theirpaper.

By now, you have noticed that CNN is getting deeper and deeper. This is because the most straightforward way to improve the performance of deep neural networks is to increase their size (Szegedy et al.). People of Visual Geometry (VGG) invented VGG-16, which has 13 convolutional layers and 3 fully connected layers, with AlexNet's ReLU tradition. Again, this network just adds more cascading to AlexNet. It consists of138M 参数It takes up about 500MB of storage space😱. They also designed a deeper variant VGG-19.

⭐️What is a novel?

  1. As mentioned in their abstract, the contribution of this article is designDeeperNetwork (about twice that of AlexNet).



Figure 4: Inception-v1 architecture. The CNN has two secondary networks (discarded during inference). The architecture is based on the figure 3Paper.

This has5MThe 22 layer architecture of the parameter is called Inception-v1. Here, the network network (seeappendix) Method is used extensively, as described in this article. This is done through the'initial module'. The architecture design of the Inception module is the product of research on the approximate sparse structure (read the paper to learn more!). Each module proposes 3 ideas:

  1. With different filtersparallelconvolution, followed by concatenation, capturing different features with 1×1,3×3 and 5×5 to “cluster” them. This idea was proposed by Arora et al. in the text,Provable learned some deep representationsIt is recommended to use a layer-by-layer structure in which people should analyze the relevant statistics of the last layer and cluster them into groups of cells with high correlation.
  2. 1×1 convolution is used to reduce the number of dimensions to eliminate computational bottlenecks.
  3. 1×1 convolutionOnconvolutionInsideAdd nonlinearity (based on Network In Network papers).
  4. The author also introduced twoAuxiliary classifier,To encourage inClassifierThe lower stages are differentiated, increasing the gradient signal that is propagated back and providing additional regularization. SaidAuxiliary network(Connected to the auxiliary classification branch) will be discarded at the inference time.

It is worth noting that "the main hallmark of this architecture is to increase the utilization of computing resources within the network."

The module names (Stem and Inception) were not used for this version of Inception until later versions (ie Inception-v4 and Inception-ResNets). I added them here for comparison.

⭐️What is a novel?

  1. Build the network with dense modules/blocks. We are not stacking convolutional layers, but stacking modules or blocks, which are convolutional layers. So the name is Inception (refer to the 2010 sci-fi movie starring Leonardo DiCaprio)Inception).


  • Paper: MoreLearn more about convolution
  • Author: Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. Google, University of Michigan, University of North Carolina
  • Published in: 2015 IEEE Computer Vision and Pattern Recognition Conference (CVPR)


Figure 5: Inception-v3 architecture. The CNN has a secondary network (discarded when reasoning). *Note: All convolutional layers are followed by batch specifications and ReLU activation. The architecture is based on their GitHub 代码.

Inception-v3 is the successor to Inception-v1 with24Mparameter. Waiting for the location of Inception-v2? Don't worry about it-it is an early prototype of v3, so it is very similar to v3 but not commonly used. When the authors launched Inception-v2, they conducted many experiments on it and recorded some successful adjustments. Inception-v3 is the network that contains these adjustments (adjust the optimizer, lose functionality and add batch normalization to the auxiliary layer in the auxiliary network).

The motivation for Inception-v2 and Inception-v3 is to avoidRepresentational bottleneck(This means drastically reducing the input dimension of the next layer) and performing more efficient calculations by using factorization methods.

The name of the module (Stem, Inception-A, Inception-B, etc.) is not used in this version of Inception until its later versions, Inception-v4 and Inception-ResNets. I added them here for comparison.

⭐️What is a novel?

  1. The first designers to use batch standardization (for simplicity, not reflected in the above figure).

✨From the previous versionInception-v1 improvedwhat?

  1. willn × nThe volume integral solution is asymmetric convolution: 1× nn ×1 convolution
  2. Decompose 5×5 convolution into two 3×3 convolution operations
  3. Replace 7×7 with a series of 3×3 circles


6. ResNet-50 (2015)

Figure 6: ResNet-50 architecture, based on keras-team's GitHub 代码.

Yes, this is the answer to the question you saw at the top of the article.

From the past few CNNs, all we have seen are more and more layers in the design, and better performance has been achieved. But "as the depth of the network increases, the accuracy becomes saturated (this may not be surprising), and then quickly degrades." The people at Microsoft Research solved this problem with ResNet-using skip connections (also known as shortcut connections, disabled Poor), while building a deeper model.

ResNet is one of the early adopters of batch standardization (volume standard documents written by Ioffe and Szegedy were submitted to ICML in 2015 year). Shown above is ResNet-50, with26Mparameter.

The basic building blocks of ResNets are conv and identity blocks. Because they look similar, you might simplify ResNet-50 like this (don't quote me this!):

⭐️What is a novel?

  1. PromotionSkip the connection (they are not the first to use a skip connection).
  2. Design deeper CNN (up to 152 layer) without affecting the generalization capabilities of the model
  3. First use batch standardization.


7. Xception (2016)

Figure 7: Xception architecture, based on keras-team's GitHub 代码. Depth separable convolution is represented by'conv sep'.

Xception is an adaptation of Inception, where the Inception module has been replaced by a deep separable convolution. It and Inception-v1 (23MThe number of parameters is roughly the same.

Xception introduces the Inception hypothesiseXtreme(So ​​named). What is the initial hypothesis? Thankfully, this is explicitly mentioned in this article (thanks to Francois!).

  • First, cross-channel (or cross-feature mapping) correlation is captured by 1 x 1 convolution.
  • Therefore, the spatial correlation within each channel is captured by conventional 3 x 3 or 5 x 5 convolution.

Pushing this idea to the extreme meansEachThe channel executes 1×1, and then theEachThe output executes 3×3.This is the same as replacing the initial module with a depth separable convolution.

⭐️What is a novel?

  1. The CNN is introduced entirely based on the depth separable convolutional layer.



Figure 8: Inception-v4 architecture. The CNN has a secondary network (discarded when reasoning). *Note: All convolutional layers are followed by batch specifications and ReLU activation. The architecture is based on their GitHub 代码.

People from Google use Inception again v4,43MThe parameters are attacked. Again, this is an improvement of Inception-v3. The main difference is the small changes in the Stem group and the Inception-C module. The authors also "made a unified choice for each grid-sized Inception block." They also mentioned that "remaining connections can significantly increase training speed."

In summary, please note that Inception-v4 works better due to the increased size of the model.

✨From the previous versionInception-v3 has startedWhat improvements??

  1. Change the Stem module.
  2. Add more Inception modules.
  3. Selecting the Inception-v3 module uniformly means using the same number of filters for each module.


9. Inception-ResNet-V2 (2016)

Figure 9: Inception-ResNet-V2 architecture. *Note: All convolutional layers are followed by batch specifications and ReLU activation. The architecture is based on their GitHub 代码.

In the same paper as Inception-v4, the same author also introduced Inception-ResNets-a series of Inception-ResNet-v1 and Inception-ResNet-v2. The last member of the series has56Mparameter.

✨From the previous versionInception-v3 has startedWhat improvements??

  1. Convert the Inception module toResidual Inception block.
  2. Add more Inception modules.
  3. Add a new type of Inception module (Inception-A) after the Stem module.


10. ResNeXt-50 (2017)

Figure 10: ResNeXt architecture, based on their paper.

If you are considering ResNets, yes, they are relevant. ResNeXt-50 has25MParameters (ResNet-50 has 25.5M). The difference between ResNeXts is that parallel towers/branches/paths are added to each module, as shown in the "Total 32 Towers" above.

⭐️What is a novel?

  1. Expand the number of parallel towers in the module ("cardinality") (I mean this has been explored by the Inception network...)


Appendix: Network Network (2014 Year)

Recall that in convolution, the value of a pixel is a linear combination of the weight of the filter and the current sliding window. The authors suggest that instead of this linear combination, let us have a mini-neural network with 1 hidden layers. This is the Mlpconv they created. So what we are dealing with here is the (simple 1 hidden layer) network (convolutional neural network).

This idea of ​​Mlpconv is likened to 1×1 convolution and becomes the main feature of the Inception architecture.

⭐️What is a novel?

  1. MLP convolutional layer, 1×1 convolution
  2. Global average merging (take the average of each feature map and enter the resulting vector into the softmax layer)


  • paper:Network in the network
  • Author: Lin Min, Chen Qiang, Shui Yan. National University of Singapore
  • arXiv preprint, 2013

Let's show them here again for easy reference:











Neural network visualization resources

Here are some resources to visualize your neural network:

Similar article

CNN architecture: LeNet, AlexNet, VGG, GoogLeNet, ResNet, etc...

A simple guide to the initial web version


I used the paper presented above for reference. In addition to the following, here are some of the other things I use in this article:

 Hard Team implements deep learning model(

Convolutional Neural Network Architecture Lecture: From LeNet to ResNet(

Comment: NIN-Network Network (Image Classification)(

Did you find any errors in the visualization? What else do you think I should include? Give me a comment below!

This article was transferred from awardsdatascience,Original address