A decision tree is a logically simple machine learning algorithm. It is a tree structure, so it is called a decision tree.
This article introduces the basic concepts of decision trees, the 3 steps of decision tree learning, the typical decision tree algorithms of 3, and the 10 advantages and disadvantages of decision trees.
What is a decision tree?
Decision tree is an algorithm for solving classification problems. To understand classification problems and regression problems, you can see here.Supervised learning 2 tasks: regression, classification》.
The decision tree algorithm uses a tree structure that uses layered reasoning to achieve the final classification. The decision tree consists of the following elements:
- Root node: contains the complete set of samples
- Internal node: corresponding feature attribute test
- Leaf node: represents the result of the decision
In the prediction, a certain attribute value is used to judge at the internal node of the tree, and which branch node is entered according to the judgment result until the leaf node is reached, and the classification result is obtained.
This is a supervised learning algorithm based on if-then-else rules. These rules of the decision tree are obtained through training rather than artificially.
Decision tree is the simplest machine learning algorithm. It is easy to implement, highly interpretable, and fully in line with human intuitive thinking. It has a wide range of applications.
The above statement is too abstract, let's look at a practical example. Banks use machine learning algorithms to determine whether to issue loans to customers. For this purpose, it is necessary to examine the customer's annual income and whether there are two indicators of real estate. The leader arranges you to implement this algorithm, and you think of the simplest linear model and quickly complete the task.
First determine the customer's annual income indicator. If it is greater than 20 million, you can make a loan; otherwise continue to judge. Then determine if the customer has a property. If you have a property, you can make a loan; otherwise you can't make a loan.
The decision tree for this example is shown below:
3 steps for decision tree learning
Feature selection determines which features are used to make judgments. In the training data set, there may be many attributes for each sample, and the roles of different attributes may be large or small. Therefore, the function of feature selection is to screen out features with high correlation with classification results, that is, features with strong classification ability.
The criteria commonly used in feature selection are: information gain.
Decision tree generation
After selecting the feature, it triggers from the root node, calculates the information gain of all features for the node, selects the feature with the largest information gain as the node feature, and establishes the child node according to the different values of the feature; the same way is generated for each child node. New child nodes until the information gain is small or no features are available.
Decision tree pruning
The main purpose of pruning is to combat "overfitting" and to reduce the risk of overfitting by actively removing parts of the branch.
3 typical decision tree algorithm
ID3 is the first proposed decision tree algorithm, which uses information gain to select features.
He is an improved version of ID3. Instead of using information gain directly, he introduces the "information gain ratio" indicator as the basis for selecting features.
CART (Classification and Regression Tree)
This algorithm can be used for classification as well as for regression problems. The CART algorithm replaces the information entropy model with a Gini coefficient.
Advantages and disadvantages of decision trees
- Decision trees are easy to understand and interpret, can be visually analyzed, and easy to extract rules;
- Both nominal and numerical data can be processed simultaneously;
- More suitable for processing samples with missing attributes;
- Ability to handle irrelevant features;
- When testing data sets, it runs faster;
- A viable and effective result for large data sources in a relatively short period of time.
- It is prone to overfitting (random forests can greatly reduce overfitting);
- It is easy to ignore the correlation of attributes in the dataset;
- For data with inconsistent sample sizes in each category, different decision criteria will lead to different attribute selection tendencies when making attribute partitions in the decision tree; information gain criteria have preference for more desirable attributes (typically ID3) Algorithm), while the Gain Rate Criterion (CART) has a preference for a small number of attributes, but CART does not simply use the gain rate to divide the attribute directly, but instead uses a heuristic rule. It is the use of information gain, which has this drawback, such as RF).
- The ID3 algorithm calculates the information gain when the result biases the value more.