This article is reproduced from the public number qubit,Original address

If you have a lot of data**No**.**will get **.**Standard**.**签**,How to do?

**Unsupervised learning**It is a very confusing category in machine learning algorithms that is responsible for solving these "no-ground-truth" data.

This article will talk about what is unsupervised learning, and what other algorithms for machine learning are.**Essential difference**What are the difficulties when using it, and the recommended portal for reading.

## What is unsupervised learning?

The easiest way to understand is to think of the algorithm as**examination**. On the roll**Each question corresponds to an answer**The score depends on how close your answer is to the standard answer. However, if there is no answer to the question, how do you rate yourself?

Move this set of things to machine learning. Traditional data sets have**TAG** (equivalent to the bid), the logic is "**X leads to Y**For example, we want to know that more people on Twitter have higher incomes. Then, input is the number of fans, output is the income, try to find the relationship between the two sets of data.

Each star is a data point, and machine learning is to draw a line that can connect those points, to explain the relationship between input and output. But in unsupervised learning,**There is no output of this thing**.

What we have to do is analyze the input, which is the number of fans. But no income, or Y. It’s like having only the exam, no answer.

In fact, it is not necessarily that there is no Y. Maybe we just can't get income data. But it doesn't matter, it's important**You don't need to draw the line between X and Y**,**Don't need to find the relationship between them**The.

Then, unsupervised learning**aims**What is it? If only input has no output, what should we do?

## Unsupervised learning

### Clustering

Any industry needs an understanding of users: who are they? What prompted them to make a purchase decision?

Usually, users can follow certain criteria**Divided into groups**. These standards can be as simple as age, gender, or as complex as a user's portrait, such as a purchase process. Unsupervised learning can help us**automatic**Complete this task.

The clustering algorithm will run through our data and then find out a few**Natural clustering** (Natural Clusters). Take the user as an example. One group may be an artist of 30 years old, and the other group may be a multi-millionaire dog at home. We can ourselves**Select the number of clusters**So that you can adjust the various groups**granularity** (Granularity).

There are several clustering methods available:

**· ****K-Means clustering**, divide all data points into K mutually exclusive groups. The complexity is how**Select the size of K**.

**· ****Hierarchical clustering** (Hierarchical Clustering), dividing all data points into groups, and their**Subgroup**In the same way, it forms like a genealogy**Tree**. For example, first group users by age, and then subdivide each group according to other criteria.

**· ****Probability clustering** (Probabilistic Clustering), follow all data points**Probability**To group.**K-Means**In fact, it is a special form of it, that is, the probability is always**0 or 1**Case. So this clustering method is also affectionately called "fuzzy K-Means".

There are no essential differences between these methods, and writing code may be as long as this -

The output of any clustering algorithm will be all**data point**And their corresponding**Group**. This requires us to judge ourselves, what does output mean, or what the algorithm has found. The beauty of data science lies in**Output plus human interpretation**Will produce value.

### Data Compression

In the past decade, the computing power and storage capacity of the device have increased a lot. However, even today we still have reasons,**Make the data set as small as possible and as efficient as possible**. This means that you just have to let the algorithm run the necessary data without doing too much training.

Unsupervised learning can use a name**Data reduction** (Dimentionality Reduction) way to do this.

Data dimensionality reduction**dimension**", is the number of columns in the data set. The concept behind this method is the same as Information Theory: assuming the data set**Many data is redundant**So, just take a part, you can represent the entire data set.

In practical applications, we need to use some mysterious way to combine some parts of the data set to convey some meaning. Here are the more common ones we use.**Two dimensionality reduction methods**-

**· ****Principal component analysis algorithm** (PCA) to find out how to link most of the changes in the dataset**Linear combination**.**· ****Singular value decomposition** (SVD), decompose the matrix of data into**Three small matrices**.

These two methods, as well as other more complicated methods of dimensionality reduction, are used.**Linear algebra**The concept of decomposing the matrix into an easily digestible way to facilitate the transfer of information.

Data dimensionality reduction can play a very important role in machine learning algorithms. Take**image**For example, in computer vision,**An image is a huge data set**It is also very laborious to train. And if you can narrow down the training data set, the model can run faster. This is also why PCA and SVD are common tools for image preprocessing.

## Unsupervised deep learning

It is not surprising that unsupervised learning has expanded the territory into neural networks and deep learning. This field is still very young, but already**Self-encoder** (Autoencoder) such a pioneer.

The logic behind the encoder and data compression algorithms is similar, with a subset that reflects the characteristics of the original data set. Like a neural network, self-encoder utilization**Weights**Convert input to ideal output. But here, output and input are not two different things, output is just a lighter representation of input.

In computer vision, a self-encoder is used in**Image Identification**In the algorithm. Now, it has also extended its reach to more areas such as sound and speech recognition.

## What are the practical difficulties?

In addition to finding the right algorithms and hardware, unconstrained learning comes with a mysterious temperament –**I don't know if the task is finished.**.

In supervised learning, we will set a set**standard**To make decisions about model debugging. Indicators such as Precision and Recall tell us how accurate the current model is, and then we can adjust the parameters to optimize the model. If the score is low, it is necessary to continue to adjust.

However, the data of unsupervised learning is not labeled, and it is difficult for us to determine the set of metrics reasonably. Take clustering as an example. How do you know if the classification of K-Means is good (for example, the K value is not suitable)? There is no standard, we may need a bit of creativity.

“Is unsupervised learning used in my work?” is a question that people often ask. Here, specific issues need to be analyzed in detail. Also take the user grouping as an example, only when your users are really**Natural clustering**When matching, the clustering method is effective.

Although there are some risks, the best test method may be to put an unsupervised model**real world**Inside, see what happens - let the algorithm with clustering and no clustering do**Compared**See if clustering can lead to more effective information.

Of course, researchers are also trying to write unsupervised learning algorithms that come with (relative) objective criteria. So where is the chestnut?

## Comment

But didn't you make it clear about the input and output of twitter fans?So sad