This article is reproduced from the public number qubit,Original address

If you have a lot of dataNo.will get .Standard.,How to do?

Unsupervised learningIt is a very confusing category in machine learning algorithms that is responsible for solving these "no-ground-truth" data.

This article will talk about what is unsupervised learning, and what other algorithms for machine learning are.Essential differenceWhat are the difficulties when using it, and the recommended portal for reading.

What is unsupervised learning?

The easiest way to understand is to think of the algorithm asexamination. On the rollEach question corresponds to an answerThe score depends on how close your answer is to the standard answer. However, if there is no answer to the question, how do you rate yourself?

Move this set of things to machine learning. Traditional data sets haveTAG (equivalent to the bid), the logic is "X leads to YFor example, we want to know that more people on Twitter have higher incomes. Then, input is the number of fans, output is the income, try to find the relationship between the two sets of data.

Each star is a data point, and machine learning is to draw a line that can connect those points, to explain the relationship between input and output. But in unsupervised learning,There is no output of this thing.

What we have to do is analyze the input, which is the number of fans. But no income, or Y. It’s like having only the exam, no answer.

In fact, it is not necessarily that there is no Y. Maybe we just can't get income data. But it doesn't matter, it's importantYou don't need to draw the line between X and Y,Don't need to find the relationship between themThe.

Then, unsupervised learningaimsWhat is it? If only input has no output, what should we do?

Unsupervised learning

Clustering

Any industry needs an understanding of users: who are they? What prompted them to make a purchase decision?

Usually, users can follow certain criteriaDivided into groups. These standards can be as simple as age, gender, or as complex as a user's portrait, such as a purchase process. Unsupervised learning can help usautomaticComplete this task.

The clustering algorithm will run through our data and then find out a fewNatural clustering (Natural Clusters). Take the user as an example. One group may be an artist of 30 years old, and the other group may be a multi-millionaire dog at home. We can ourselvesSelect the number of clustersSo that you can adjust the various groupsgranularity (Granularity).

There are several clustering methods available:

· K-Means clustering, divide all data points into K mutually exclusive groups. The complexity is howSelect the size of K.

· Hierarchical clustering (Hierarchical Clustering), dividing all data points into groups, and theirSubgroupIn the same way, it forms like a genealogyTree. For example, first group users by age, and then subdivide each group according to other criteria.

· Probability clustering (Probabilistic Clustering), follow all data pointsProbabilityTo group.K-MeansIn fact, it is a special form of it, that is, the probability is always0 or 1Case. So this clustering method is also affectionately called "fuzzy K-Means".

There are no essential differences between these methods, and writing code may be as long as this -

The output of any clustering algorithm will be alldata pointAnd their correspondingGroup. This requires us to judge ourselves, what does output mean, or what the algorithm has found. The beauty of data science lies inOutput plus human interpretationWill produce value.

Data Compression

In the past decade, the computing power and storage capacity of the device have increased a lot. However, even today we still have reasons,Make the data set as small as possible and as efficient as possible. This means that you just have to let the algorithm run the necessary data without doing too much training.

Unsupervised learning can use a nameData reduction (Dimentionality Reduction) way to do this.

Data dimensionality reductiondimension", is the number of columns in the data set. The concept behind this method is the same as Information Theory: assuming the data setMany data is redundantSo, just take a part, you can represent the entire data set.

In practical applications, we need to use some mysterious way to combine some parts of the data set to convey some meaning. Here are the more common ones we use.Two dimensionality reduction methods-

· Principal component analysis algorithm (PCA) to find out how to link most of the changes in the datasetLinear combination.
· Singular value decomposition (SVD), decompose the matrix of data intoThree small matrices.

These two methods, as well as other more complicated methods of dimensionality reduction, are used.Linear algebraThe concept of decomposing the matrix into an easily digestible way to facilitate the transfer of information.

Data dimensionality reduction can play a very important role in machine learning algorithms. TakeimageFor example, in computer vision,An image is a huge data setIt is also very laborious to train. And if you can narrow down the training data set, the model can run faster. This is also why PCA and SVD are common tools for image preprocessing.

Unsupervised deep learning

It is not surprising that unsupervised learning has expanded the territory into neural networks and deep learning. This field is still very young, but alreadySelf-encoder (Autoencoder) such a pioneer.

The logic behind the encoder and data compression algorithms is similar, with a subset that reflects the characteristics of the original data set. Like a neural network, self-encoder utilizationWeightsConvert input to ideal output. But here, output and input are not two different things, output is just a lighter representation of input.

In computer vision, a self-encoder is used inImage IdentificationIn the algorithm. Now, it has also extended its reach to more areas such as sound and speech recognition.

What are the practical difficulties?

In addition to finding the right algorithms and hardware, unconstrained learning comes with a mysterious temperament –I don't know if the task is finished..

In supervised learning, we will set a setstandardTo make decisions about model debugging. Indicators such as Precision and Recall tell us how accurate the current model is, and then we can adjust the parameters to optimize the model. If the score is low, it is necessary to continue to adjust.

However, the data of unsupervised learning is not labeled, and it is difficult for us to determine the set of metrics reasonably. Take clustering as an example. How do you know if the classification of K-Means is good (for example, the K value is not suitable)? There is no standard, we may need a bit of creativity.

“Is unsupervised learning used in my work?” is a question that people often ask. Here, specific issues need to be analyzed in detail. Also take the user grouping as an example, only when your users are reallyNatural clusteringWhen matching, the clustering method is effective.

Although there are some risks, the best test method may be to put an unsupervised modelreal worldInside, see what happens - let the algorithm with clustering and no clustering doComparedSee if clustering can lead to more effective information.

Of course, researchers are also trying to write unsupervised learning algorithms that come with (relative) objective criteria. So where is the chestnut?