Understanding Unsupervised Learning in One Article

Unsupervised learning is a way of learning in the field of machine learning. This article will explain his basic concepts to everyone, tell you what specific scenarios can be used for unsupervised learning.

Finally, give you an example to illustrate two types of unsupervised learning thinking: clustering and dimensionality reduction.And the specific 2 algorithms.


What is unsupervised learning?

Unsupervised learning is a type of machine learningTraining method / learning method :

Unsupervised learning is a branch of machine learning

To understand unsupervised learning in comparison with supervised learning:

  1. Supervised learning is a purposeful training method, you know what you get; andUnsupervised learning is a training method without a clear purpose, you cannot know in advance what the results will be.
  2. Supervised learning needs to label the data; andUnsupervised learning does not need to label data.
  3. Supervised learning can measure results because it has clear goals; andUnsupervised learning is almost impossible to quantify.

Supervised Learning vs Unsupervised Learning

A brief summary:

Unsupervised learning is a training method for machine learning. It is essentially a statistical method. It can be found in unlabeled data as a training method for some underlying structures.

It mainly has 3 characteristics:

  1. Unsupervised learning has no clear purpose
  2. Unsupervised learning does not need to label data
  3. Unsupervised learning cannot quantify effects

This explanation is difficult to understand. Let us use some specific cases to tell you some practical application scenarios of unsupervised learning. Through these actual scenarios, you can understand the value of unsupervised learning.


Use cases for unsupervised learning

Discover anomalous data with unsupervised learning

Case 1: Anomaly found

There are many illegal activities that require "money laundering". These money laundering activities are different from the behavior of ordinary users. What is the difference?

If artificial analysis is a very costly and complicated matter, we can classify users based on the characteristics of these behaviors, and it is easier to find those users with abnormal behaviors, and then analyze in depth what their behaviors are different. Whether it belongs to the category of illegal money laundering.

Through unsupervised learning, we can quickly classify behaviors. Although we don't know what these classifications mean, through this classification, we can quickly exclude normal users and conduct in-depth analysis of abnormal behaviors in a more targeted manner.


Segment users with unsupervised learning

Case 2: User segmentation

This is very meaningful for the advertising platform. Not only do we segment users according to dimensions such as gender, age, geographic location, etc., we can also classify users by user behavior.

Through user segmentation in many dimensions, ad placement can be more targeted and effective.


Make recommendations to users with unsupervised learning

Case 3: Recommendation system

Everyone has heard the story of "beer + diapers". This story is an example of recommending related products based on users' buying behavior.

For example, when you shop on Taobao, Tmall, and JD.com, you will always recommend some related products based on your browsing behavior. Some products are recommended through clustering through unsupervised learning.The system will find some users with similar purchase behaviors, and recommend the most "favorite" products of such users.


Common 2 types of unsupervised learning algorithms

Common two types of algorithms are: clustering, dimensionality reduction

2 mainstream unsupervised learning methods: clustering, dimensionality reduction

Clustering: In short, it is an automatic classification method. In supervised learning, you know exactly what each classification is, but clustering is not. You don't know what each of the several clustered clusters means.

Dimension reduction: Dimension reduction looks a lot like compression. This is to reduce the complexity of the data while preserving the relevant structure as much as possible.


"Clustering Algorithm" K-means clustering

K-means clustering is to set the number of groupings as K and automatically group them.

The steps for K-means clustering are as follows:

  1. Define K centers of gravity.Initially these centers of gravity are random (there are also some more effective algorithms for initializing the centers of gravity)
  2. Find the nearest center of gravity and update the cluster assignment. Each data point is assigned to one of these K clusters. Each data point is assigned to the cluster of gravity that is closest to them. The measure of "proximity" here is a hyperparameter-usually the Euclidean distance.
  3. Move the center of gravity to the center of their cluster. The new position of the center of gravity of each cluster is obtained by calculating the average position of all data points in the cluster.

Repeat steps 2 and 3 until the position of the center of gravity no longer changes significantly in each iteration (that is, until the algorithm converges).

The process is as follows:

K-means clustering process


"Clustering Algorithm" Hierarchical Clustering

Hierarchical clustering is more suitable if you don't know how many categories it should be divided into. Hierarchical clustering builds a multi-level nested classification, similar to a tree structure.

Hierarchical clustering

The steps of hierarchical clustering are as follows:

  1. Start with N clusters, one for each data point.
  2. The two clusters closest to each other are merged into one. Now you have N-1 clusters.
  3. Recalculate the distance between these clusters.
  4. Repeat steps 2 and 3 until you get a cluster containing N data points.
  5. Select a number of clusters and draw a horizontal line in the tree.


"Dimensionality Reduction Algorithm" Principal Component Analysis-PCA

Principal component analysis transforms multiple indicators into a few comprehensive indicators.

Principal component analysis is often used to reduce the dimensionality of the data set while maintaining the feature that the variance of the data set contributes the most. This is done by preserving low-order principal components and ignoring higher-order principal components. Such low-order components can often retain the most important aspects of the data.

Transformation steps:

  1. The first step is to compute theCovariance matrix S (this is a non-standard PCA, standard PCA calculationCorrelation coefficientmatrixC)
  2. The second step is to calculate the covariance matrix S (or C)Feature vector e1,e2,...,eN and eigenvalues, t = 1,2, XNUMX, ..., N
  3. The third step projects the data into the space where the feature vectors are expanded. Use the following formula, where the BV value is the value of the corresponding dimension in the original sample. Principal component analysis formula


"Dimensionality Reduction Algorithm" Singular Value Decomposition – SVD

Singular Value Decomposition is an important matrix factorization in linear algebra, and Singular Value Decomposition is a generalization of feature decomposition on arbitrary matrices. It has important applications in signal processing and statistics.

Learn more about singular value decomposition, seeWikipedia


Generating models andGAN

The simplest goal of unsupervised learning is to train the algorithm to generate its own data examples, but the model should not simply reproduce the previously trained data, otherwise it is a simple memory behavior.

It must be to build a basic class model from the data.Instead of generating photos of specific horses or rainbows, but generating collections of pictures of horses and rainbows; not specific words from specific speakers, but the general distribution of words spoken.

The guiding principle of generative models is that being able to construct a convincing data example is the strongest evidence for understanding it.As the physicist Richard Feynman said: "What I cannot create, I do not understand."

For images, the most successful generative model so far is the Generative Adversarial Network (GAN).It consists of two networks: a generator and a discriminator, which are respectively responsible for forging pictures and identifying true and false.

GAN-generated image

The purpose of the generator to produce the images is to induce the discriminator to believe that they are real, and at the same time, the discriminator will be rewarded for finding the fake picture.

The images that GAN started generating were messy and random, and were refined in many iterations to form more realistic images, which could not even be distinguished from real photos. Recently Nvidia's GauGAN can also generate pictures based on user sketches.


Baidu Encyclopedia and Wikipedia

Baidu Encyclopedia version

There is often a problem in real life: lack of sufficient prior knowledge, so it is difficult to manually label categories or the cost of manual category labeling is too high. Naturally, we want computers to do this work for us, or at least provide some help. Solving various problems in pattern recognition based on training samples of unknown category (not labeled) is called unsupervised learning.

Read more


Wikipedia version

Unsupervised learning is a branch of machine learning that learns from unlabeled, classified, or classified test data. Unsupervised learning is not responding to feedback, but identifying and responding to commonalities in the data based on the existence of such commonalities in each new data. Alternatives include supervised learning and reinforcement learning. The central application of unsupervised learning is in domain density estimation in statistics, [1] although unsupervised learning includes many other structural domains that involve summarizing and interpreting the characteristics of the data.

Read more


Further reading