If you don't know what an ML model is, take a look at thisarticle.

Learning machine learning courses and reading articles about it does not necessarily tell you which machine learning model to use. They just give you an intuitive look at how these models work, which may prevent you from choosing the right model for your problem.

At the beginning of my journey with ML, regarding problem-solving, I would try many ML models and use the most effective model. I still do this now, but I follow some best practices – about how to choose a machine learning model – I learned from experience , I learned from my intuition and colleagues that these best practices make things easier, and this is what I collected.

I will tell you which machine learning model to use based on the nature of the problem, I will try to explain some concepts.

## Category

First, if you have a classification problem "predicting the class of a given input."

Keep in mind that the number of classes you will classify the input, because some classifiers do not support multiple types of predictions, they only support 2 class predictions.

### – Slow but accurate

- Nonlinear SVMAbout useSVMFor more information, see the end of the category.
**Attention points**. - Random forest
- Neural Networks(requires a large number of data points)
- Gradient lifting tree(similar to random forests, but easier to overwork)

### -Fast

- Interpretable model:
**Decision tree***和***Logistic regression** - Unexplained model:
**Linear SVM***和***Naive Bayes**

### Note: SVM kernel usage (course from Andrew NG)

- Use a linear kernel when the number of features is greater than the number of observations.
- When the number of observations is greater than the number of features, a Gaussian core is used.
- If the number of observations is greater than 50k, speed may be an issue when using Gaussian kernels; therefore, one may want to use linear kernels.

## return

If you have a regression problem "this predicts a continuous value, such as predicting the price of the house gives the size of the house, the number of rooms, etc."

### – Accurate but slow

- Random forest
- Neural Networks(requires a large number of data points)
- Gradient lifting tree(similar to random forests, but easier to overwork)

### -Fast

## Clustering

If you have a clustering problem "divide the data into groups of k based on their characteristics, so that objects in the same group have some degree of similarity."

**Hierarchical clustering**(also known as**Hierarchical clustering analysis**Or** HCA**) is a cluster analysis method designed to build a cluster hierarchy. Hierarchical clustering strategies are usually divided into two types:

**Cohesiveness**: This is a "bottom-up" approach: each observation starts in its own cluster, and when a cluster moves up, they are merged.**Split**: This is a "top-down" approach: all observations start in a cluster and recursively perform segmentation as a group moves down.

**Non-level clustering:**

- DBSCAN(You don't need to specify the value of k, the number of clusters)
- K mean
- Gaussian mixture model

If you are using**Classification data**Clustering

## Dimensional reduction

使用**Principal Component Analysis (PCA)**

**PCA**Can be thought of as fitting the data*n*An ellipsoid, in which each axis of the ellipsoid represents the main component. If an axis of an ellipsoid is small, the variance along that axis is also small, and by omitting the axis and its corresponding principal component from the representation of the data set, we only lose a small amount of information.

If you want to do it**Topic modeling**(described below), you can use**Singular value decomposition**(**SVD**)or**Potential Dirichlet analysis**(**LDA**) and use in the case of probabilistic topic modeling**LDA**.

**Topic modeling**Is a statistical model used to discover the abstract "topics" that appear in a collection of documents. Topic modeling is a commonly used text mining tool for discovering hidden semantic structures in text bodies.

I hope it will be easier for you now, and I will update the article based on the information you get from feedback and experimentation.

I will leave you two great summaries.

This article was transferred from awardsdatascience,Original address

## Comments