Author Archive

Feature Engineering – Feature Engineering

Understanding feature engineering in one article

Feature engineering is an important part of the machine learning workflow. It "translates" the original data into a form that the model can understand.

This article will introduce the basic concepts, importance and performance evaluation of feature engineering in 4 steps.

The importance of feature engineering

Everyone has heard two classic quotes from American computer scientist Peter Norvig:

A simple model based on a large amount of data is better than a complex model based on a small amount of data.

This sentence illustrates the importance of the amount of data.

More data is better than smart algorithms, and good data is better than more data.

This sentence is about the importance of feature engineering.

Therefore, how to use the given data to exert greater data value is what feature engineering needs to do.

In a 16-year survey, it was found that 80% of the work of a data scientist is spent on acquiring, cleaning, and organizing data.The time to construct the machine learning pipeline is less than 20%.Details are as follows:

80% of the work of a data scientist is spent on acquiring, cleaning and organizing data

  • Set the training set: 3%
  • Cleaning and organizing data: 60%
  • Collect data set: 19%
  • Mining data patterns: 9%
  • Adjustment algorithm: 5%
  • Other: 4%

PS: Data cleaning and organizing data are also the "most annoying" tasks for data scientists.Those who are interested can read this original article:

Data Sources:"Data Scientists Spend Most of Their Time Cleaning Data"

What is feature engineering

Let's first take a look at the position of feature engineering in the machine learning process:

The place of feature engineering in the machine learning process

As can be seen from the above figure, the feature engineering is between the original data and the feature.His task is to "translate" the original data into features.

Features: It is the numerical expression of the original data, and the expression that the machine learning algorithm model can directly use.

Feature engineering is a process that transforms data into features that can better represent business logic, thereby improving the performance of machine learning.

This may not be easy to understand.In fact, feature engineering is very similar to cooking:

We bought the ingredients, washed and cut vegetables, and then started cooking according to our own preferences to make delicious meals.

Feature engineering is very similar to cooking

In the above example:

Ingredients are like raw data

The processes of cleaning, cutting vegetables, and cooking are like feature engineering

The delicious meal made at the end is the characteristic

Humans need to eat processed food, which is safer and more delicious.The machine algorithm model is similar. The original data cannot be directly fed to the model, and the data needs to be cleaned, organized, and converted.Finally, the features that can be digested by the model can be obtained.

In addition to converting raw data into features, there are two important points that are easily overlooked:

Key 1: Better representation of business logic

Feature engineering can be said to be a mathematical expression of business logic.

The purpose of our use of machine learning is to solve specific problems in the business.There are many ways to convert the same raw data into features. We need to choose those that can "better represent business logic" to better solve the problem.Rather than those simpler methods.

Focus 2: Improve machine learning performance

Performance means shorter time and lower cost. Even the same model will have different performance due to different feature engineering.So we need to choose those feature projects that can exert better performance.

4 steps to evaluate feature engineering performance

The business evaluation of feature engineering is very important, but there are various methods, and different businesses have different evaluation methods.

Only the performance evaluation method is introduced here, which is relatively general.

4 steps to evaluate feature engineering performance

  1. Before applying any feature engineering, get the benchmark performance of the machine learning model
  2. Apply one or more feature engineering
  3. For each feature project, obtain a performance index and compare it with the benchmark performance
  4. If the performance increase is greater than a certain threshold, then feature engineering is considered to be beneficial and applied on the machine learning pipeline

For example: the accuracy of the benchmark performance is 40%, after applying a certain feature engineering, the accuracy is increased to 76%, then the change is 90%.

(76%-40%) / 40%=90%

Final Thoughts

Feature engineering is the most time-consuming work in the machine learning process, and it is also one of the most important tasks.

Feature engineering definition: It is a process that transforms data into features that can better represent business logic, thereby improving the performance of machine learning.

Two key points that feature engineering is easily overlooked:

  1. Better representation of business logic
  2. Improve machine learning performance

The 4 steps of feature engineering performance evaluation:

  1. Before applying any feature engineering, get the benchmark performance of the machine learning model
  2. Apply one or more feature engineering
  3. For each feature project, obtain a performance index and compare it with the benchmark performance
  4. If the performance increase is greater than a certain threshold, then feature engineering is considered to be beneficial and applied on the machine learning pipeline

Capsule neural network

back ground

Geoffrey Hinton is one of the pioneers of deep learning and the inventor of classic algorithms for neural networks such as back propagation. He and his team have proposed a new neural network based on a structure called capsules. , And also published an inter-capsule dynamic routing algorithm for training capsule networks.

research problem

TraditionalCNNThere are deficiencies (explained in detail below). How to solve the deficiency of CNN, Hinton proposed a network that is more effective for image processing-capsule network, which combines the advantages of CNN while taking into account the relative position of CNN loss, Angle and other information, which improves the recognition effect.

Research motivation

CNN flaws

CNN focuses on detecting important features in image pixels. Consider a simple face detection task. A face consists of an oval representing the face shape, two eyes, a nose, and a mouth. Based on the principle of CNN, as long as these objects exist, there is a strong stimulus, so the spatial relationship of these objects is not so important.
As shown in the figure below, the right picture is not a human face but it has all the objects needed by the human face. Therefore, it is very likely that CNN has activated the judgment of a human face through the objects it has, thereby making the result judgment wrong.
Re-examine the working method of CNN. The high-level features are the weighted sum of low-level feature combinations. The activation of the previous layer is multiplied and added with the neuron weights of the next layer, and then activated by a non-linear activation function. In such an architecture, the positional relationship between high-level features and low-level features becomes blurred (I think there are still some that are not well used). The way CNN solves this problem is to expand the horizon of the next convolution kernel through the largest pooling layer or perhaps the convolution layer (I think the largest pooling layer will lose information or even important information anyway) .

Inverse graphics

Computer graphics is based on the hierarchical representation of geometric data to construct a visual image. Its structure takes into account the relative positions of objects. The relative position and orientation of geometric objects are represented by matrices. Specific software accepts these representations as input. And turn them into images on the screen (rendering).
Hinton was inspired by this, thinking that the brain does exactly the opposite of rendering, called inverse graphics. From the visual information received by the eye, the brain parses the hierarchical representation of its world and tries to match the learned pattern and store it in the brain. Recognition of the relationship between them, noticed that the representation of objects in the brain does not depend on perspective.
Therefore, what is now to be considered is how to model these hierarchical relationships in neural networks. In computer graphics, the relationship between three-dimensional objects in three-dimensional graphics can be represented by poses. The essence of poses is translation and rotation. Hinton proposed that preserving the hierarchical pose relationship between object parts is important to correctly classify and identify objects. The capsule network combines the relative relationships between objects and is numerically represented as a 4-dimensional pose matrix. When the model has pose information, it can be easily understood that what it sees is what it has seen before, but only changes its perspective.As shown in the figure below, the human eye can easily distinguish the Statue of Liberty, but the angle is different, but CNN is difficult to do it. The capsule network that gathers pose information can also determine the different angle of the Statue of Liberty .

Capsule network advantages

  • Because the capsule network collects pose information, it can learn a good representation effect through a small amount of data, so this is also a big improvement over CNN. For example, in order to recognize handwritten numbers, the human brain needs dozens of up to hundreds of examples, but CNN needs tens of thousands of data sets to train good results, which is obviously too violent!
  • The way of thinking is closer to the human brain, and the hierarchical relationship of internal knowledge representation in neural networks is better modeled. The intuition behind the capsule is very simple and elegant.

Cons of Capsule Network

  • The current implementation of the capsule network is much slower than other modern deep learning models (I think it is the effect of updating the coupling coefficient and the convolution layer overlay), and improving the training efficiency is a big challenge.

Research Projects

What is a capsule

An excerpt from "Transforming Autoencoders" by Hinton et al. On the concept of capsules is as follows.

Artificial neural networks should not pursue perspective invariance in "neuronal" activities (using a single scalar output to summarize the activity of repeating feature detectors in a local pool), but should use local "capsules" The input performs some fairly complex internal calculations, and then encapsulates the results of these calculations into a small vector containing informative output.Each capsule learns to identify a visual entity that is implicitly defined within a limited range of observation conditions and deformations, and outputs the probability that the entity exists within a limited range and a set of "instance parameters"The instance parameters may include precise pose, lighting conditions, and deformation information relative to the implicitly defined typical version of this visual entity. When the capsule works normally, the probability of the existence of the visual entity is locally invariant-when the entity moves on the appearance manifold within a limited range covered by the capsule, the probability does not change. Instance parameters are "isovariant"-as the observation conditions change, when the entity moves on the appearance manifold, the instance parameters will change accordingly, because the instance parameters represent the intrinsic coordinates of the entity on the appearance manifold.

In simple terms, it can be understood as:
  • Artificial neurons output a single scalar. The convolutional network uses a convolution kernel so that the results calculated by the same convolution kernel for each region of a two-dimensional matrix are stacked together to form the output of the convolution layer.
  • The maximum pooling method is used to achieve the perspective invariance. Because the maximum pool continuously searches the area of ​​the two-dimensional matrix and selects the largest number in the area, it meets the activity invariance we want (that is, we slightly adjust the input and the output is still the same) In other words, on the input image, we slightly transform the object we want to detect, and the model can still detect the object.
  • The pooling layer loses valuable information and does not take into account the relative spatial relationship between the encoded features. Therefore, we should use capsules. All important information about the state of the features in the capsule detection will be encapsulated by the capsule in a vector form ( Neurons are scalar)
The comparison of capsules and artificial neurons is as follows:

Inter-capsule dynamic routing algorithm

Low-level capsuleNeed to decide how to send its output vector to the high-level capsule. Low-level capsule changes scalar weightcij The output vector is multiplied by the weight and sent to the high-level capsule as the input of the high-level capsule. About weightscij Need to know there are:
  • Weights are non-negative scalars
  • For each lower capsuleIn terms of ownershipcij The sum is equal to 1
  • For each lower capsuleIn terms of the number of weights equal to the number of high-level capsules
  • These weights are determined by an iterative dynamic routing algorithm
The low-level capsule sends its output to the high-level capsule that expresses "agree", the algorithm pseudo-code is as follows:
The weight update can be intuitively understood with the following figure.
The output of two of the high-level capsules is in purple vector v1 和 v2 Indicates that the orange vector represents input received from a low-level capsule, and the other black vectors represent input received from other low-level capsules. Purple output on the left v1 And orange input u1|1 They point in opposite directions, so they are not similar, which means that their dot product is negative and the routing coefficient will be reduced when updating c11 . Purple output on the right v2 And orange input u2|1 Pointing in the same direction, they are similar, so the routing coefficient when updating parameters c12 Will increase. Repeat this process on all high-level capsules and all their inputs to get a set of routing parameters to achieve the best match between the output from the low-level capsule and the high-level capsule output.
How many routing iterations are taken?The paper tested the values ​​in a certain range on the MNIST and CIFAR datasets, and obtained the following conclusions:
  • More iterations often lead to overfitting
  • 3 iterations are recommended in practice

Overall framework

CapsNet consists of two parts: an encoder and a decoder. The first 3 layers are the encoder and the last 3 layers are the decoder:
  • First layer: convolution layer
  • Second layer: PrimaryCaps layer
  • Third layer: DigitCaps layer
  • The fourth layer: the first fully connected layer
  • The fifth layer: the second fully connected layer
  • The sixth layer: the third fully connected layer

Encoder

The encoder accepts one28 × 28MNIST digital image as input, encode it as an instance parameter16Dimensional vector.

Convolution layer

  • Input: 28 × 28 images (monochrome)
  • Output: 20 × 20 × 256 tensor
  • 卷积核:256个步长为1的9×9×1的核
  • Activation function: ReLU
PrimaryCaps layer (32 capsules)
  • Input: 20 × 20 × 256 tensor
  • 输出:6×6×8×32张量(共有32个胶囊)
  • 卷积核:8个步长为1的9×9×256的核/胶囊
DigitCaps layer (10 capsules)
  • Enter:
    6 × 6 × 8 × 32 tensor
  • Output:
    16 × 10 matrix

Loss function

 decoder

The decoder accepts a 16-dimensional vector from the correct DigitCap and learns to encode it into a digital image (note that only the correct DigitCap vector is used during training, and the incorrect DigitCap is ignored). The decoder is used as a regularizer. It accepts the output of the correct DigitCap as input, reconstructs a 28 × 28 pixel image, and the loss function is the Euclidean distance between the reconstructed image and the input image. The decoder forces the capsule to learn features useful for reconstructing the original image. The closer the reconstructed image is to the input image, the better. An example of the reconstructed image is shown below.
The first fully connected layer
  • Input: 16 × 10 matrix
  • Output: 512 vector
Second fully connected layer
  • Input: 512 vector
  • Output: 1024 vector
Third fully connected layer
  • Input: 1024 vector
  • Output: 784 vector

参考资料 

https://mp.weixin.qq.com/s?__biz=MzI3ODkxODU3Mg==&mid=2247484099&idx=1&sn=97e209f1a9860c8d8c51e81d98fc8a0a&chksm=eb4ee600dc396f16624a33cdfc0ead905e62ae9447b49b20146020e6cbd7d71f089101512a40&scene=21#wechat_redirect
https://mp.weixin.qq.com/s?__biz=MzI3ODkxODU3Mg==&mid=2247484165&idx=1&sn=0ca679e3a5f499f8d8addb405fe3df83&chksm=eb4ee7c6dc396ed0a330fcac12690110bcaf9a8a10794dbc5e1a326c69ecbb140140f55fd6ba&scene=21#wechat_redirect
https://mp.weixin.qq.com/s?__biz=MzI3ODkxODU3Mg==&mid=2247484433&idx=1&sn=3afe4605bc2501eebbc41c6dd1af9572&chksm=eb4ee0d2dc3969c4619d6c1097d5c949c76c6c854e60d36eba4388da2c3855747818d062c90a&scene=21#wechat_redirect
https://mp.weixin.qq.com/s/6CRSen8P6zKaMGtX8IRfqw
Original link:
https://www.cnblogs.com/CZiFan/p/9803067.html

Artificial intelligence technology has great potential in epidemic prevention

Recently, a sudden outbreak of new coronavirus pneumonia (NCP) has taken people by surprise. However, in order to achieve better anti-epidemic effects, many scientific researchers have applied many technical methods to fight the epidemic. Among them, artificial intelligence technology has become one of the powerful weapons in this battle against epidemic and epidemic prevention; it has helped epidemic prevention and epidemic prevention in epidemic prevention and control, image analysis, auxiliary diagnosis, and vaccine development.