Convolutional Neural Network-CNN is best at image processing. It is inspired by the human visual nervous system.
CNN has the big features of 2:
- Can effectively reduce the size of large data into small data
- Can effectively retain image features, in line with the principle of image processing
At present, CNN has been widely used, such as face recognition, autopilot, Mito show, security and many other fields.
What problem has CNN solved?
Before the advent of CNN, images were a problem for artificial intelligence, with 2 reasons:
- The amount of data that the image needs to process is too large, resulting in high cost and low efficiency.
- It is difficult to retain the original features in the process of digitization, resulting in low accuracy of image processing.
Here's a detailed description of the 2 questions:
The amount of data that needs to be processed is too large
The image is made up of pixels, each of which is made up of colors.
Now a picture is 1000×1000 pixels or more, and each pixel has RGB 3 parameters to represent the color information.
If we process an image of 1000×1000 pixels, we need to process 3 million parameters!
Such a large amount of data processing is very resource intensive, and this is just a picture that is not too big!
Convolutional Neural Network-The first problem that CNN solves is to "simplify complex problems", reduce the dimensions of a large number of parameters to a small number of parameters, and then do processing.
More importantly: in most scenarios, we will not affect the results. For example, the 1000 pixel image is reduced to 200 pixels, which does not affect the naked eye to recognize whether the picture is a cat or a dog. The same is true for the machine.
Retain image features
The traditional way of digitizing pictures is to simplify it, just like the process in the following figure:
If a circle is 1 and no circle is 0, then the different positions of the circle will produce completely different data representations. But from a visual point of view,The content (essence) of the image has not changed, only the position has changed..
So when we move the objects in the image, the parameters derived in the traditional way will vary greatly! This is not in line with the requirements of image processing.
CNN solves this problem. It preserves the characteristics of the image in a visually similar way. When the image is flipped, rotated or transformed, it can effectively recognize similar images.
So how is the convolutional neural network implemented? Before we understand the principles of CNN, let's take a look at the human visual principle.
Human visual principle
Many research results of deep learning are inseparable from the study of the cognitive principles of the brain, especially the study of visual principles.
The Nobel Prize in Medicine for 1981 was presented to David Hubel (American neurobiologist born in Canada) and TorstenWiesel, and Roger Sperry. The main contribution of the first two is "Discovered the information processing of the visual system"The visual cortex is hierarchical.
The human visual principle is as follows: starting from the initial signal intake (pupils in the pupils), followed by a preliminary treatment (some cells in the cerebral cortex find the edges and directions), and then abstracted (the brain determines that the shape of the object in front of the eye is a circle Shaped) and then further abstracted (the brain further determines that the object is a balloon). The following is an example of human face recognition for face recognition:
For different objects, human vision is also cognized by hierarchical grading in this way:
We can see that the features at the bottom layer are basically similar, that is, the various edges, the more up, the more features (wheels, eyes, torso, etc.) of such objects can be extracted, to the top, different advanced The features are ultimately combined into corresponding images, enabling humans to accurately distinguish between different objects.
Then we can naturally think of it: Can not imitate this feature of the human brain, construct a multi-layered neural network, the lower layer recognizes the primary image features, and some of the underlying features form a higher level feature, and finally through multiple levels The combination, and finally the classification at the top level?
The answer is yes, and this is the source of inspiration for many deep learning algorithms, including CNN.
Convolutional Neural Network - The Basic Principles of CNN
A typical CNN consists of 3 parts:
- Convolution layer
- Pooling layer
- Fully connected layer
If it is simple to describe:
The convolutional layer is responsible for extracting local features in the image; the pooling layer is used to significantly reduce the parameter magnitude (dimension reduction); the fully connected layer is similar to the traditional neural network portion and is used to output the desired result.
The following principles are explained in order to be easy to understand and ignore a lot of technical details. If you are interested in the detailed principles, you can watch this video.Convolutional neural network》.
Convolution - extraction features
The operation of the convolutional layer is as shown in the figure below, using a convolution kernel to scan the entire picture:
This process we can understand is that we use a filter (convolution kernel) to filter the small areas of the image to get the eigenvalues of these small areas.
In a specific application, there are often multiple convolution kernels. It can be considered that each convolution kernel represents an image mode. If an image block is convolved with the convolution kernel, the image block is considered to be Very close to this convolution kernel. If we design 6 convolution kernels, we can understand that we think there are 6 underlying texture patterns on this image, that is, we can draw an image using the basic mode in 6. The following are examples of 25 different convolution kernels:
Summary: The convolutional layer extracts the local features in the image by filtering the convolution kernel, similar to the feature extraction of human vision mentioned above.
Pooling layer (downsampling) - data dimensionality reduction to avoid overfitting
The pooling layer is simply a downsampling, which can greatly reduce the dimensions of the data. The process is as follows:
In the above figure, we can see that the original picture is 20×20, we downsample it, the sampling window is 10×10, and finally it is downsampled into a 2×2 feature map.
The reason for this is that even after the convolution is done, the image is still large (because the convolution kernel is small), so in order to reduce the data dimension, the downsampling is performed.
Summary: The pooling layer can reduce the data dimension more effectively than the convolutional layer. This can not only greatly reduce the amount of computation, but also effectively avoid overfitting.
Fully connected layer - output
This part is the last step. The data processed by the convolutional layer and the pooled layer is input to the fully connected layer to get the final desired result.
After the dimensionality reduction of the convolutional layer and the pooling layer, the fully connected layer can "run", otherwise the amount of data is too large, the calculation cost is high, and the efficiency is low.
A typical CNN is not just the 3 layer structure mentioned above, but a multi-layer structure, such as the structure of LeNet-5 as shown below:
Convolutional layer-Pooling layer-Convolutional layer-Pooling layer-Convolutional layer-Fully connected layer
After understanding the basic principles of CNN, let's focus on what the actual application of CNN is.
What are the practical applications of CNN?
Convolutional Neural Network-CNN is very good at processing images.The video is a superposition of images, so it is also good at processing video content.Here are some more mature applications?:
Image classification, retrieval
Image classification is a relatively basic application, which can save a lot of labor costs and effectively classify images. For some specific areas of the image, the accuracy of the classification can reach 95%+, which is already a highly usable application.
Typical scene: image search...
Target location detection
You can position the target in the image and determine the location and size of the target.
Typical scenarios: Autonomous driving, security, medical...
Simple understanding is a pixel-level classification.
He can distinguish between foreground and background pixel-level, and then advanced to identify targets and classify them.
Typical scene: Meitu Xiuxiu, video post-processing, image generation...
Face recognition has become a very popular application and has been widely used in many fields.
Typical scene: security, finance, life...
Bone recognition is a key bone that recognizes the body and the action of tracking bones.
Typical scenarios: security, movies, image and video generation, games...
Today we introduced the value, basic principles and application scenarios of CNN. The summary is as follows:
The value of CNN:
- Ability to effectively reduce the amount of large data to a small amount of data (without affecting the results)
- Ability to preserve the characteristics of the image, similar to the human visual principle
The basic principle of CNN:
- Convolution layer – the main role is to preserve the characteristics of the picture
- Pooling layer – the main function is to reduce the dimensionality of the data, which can effectively avoid over-fitting
- Fully connected layer – output the results we want according to different tasks
Practical application of CNN:
- Image classification, retrieval
- Target location detection
- Target segmentation
- Face recognition
- Skeletal recognition
Baidu Encyclopedia + Wikipedia