OnPrevious articleUnsupervised learning is introduced as a set of statistical tools with a set of features but no goals. Therefore, this tutorial will be different from other tutorials because we are unable to make predictions.

Instead, we will use k-means clustering to perform on the image.Color quantization.

Then we will use PCA to reduce the visualization of dimensions and data sets.

Complete laptop, pleaseClick Here .

Rotate your Jupyter notebook and let's go!

set up

Before we start any implementation, we will import some libraries that will become convenient later:

Unlike previous tutorials, we don't import datasets. Instead, we will useScikit-learnThe data provided by the library.

Color quantization-k-means clustering

Soon, color quantization is a technique that reduces the number of different colors used in an image. This is especially useful for compressing images while maintaining image integrity.

First, we import the following libraries:

Please note that we have imported a nameLoad_sample_imageSample data set. This only contains two images. We will use one of them to perform color quantization.

So let's show you the image we will use for this exercise:

You should see:

The original image

Now, for color quantization, different steps must be followed.

First, we need to change the image to an 2D matrix to do it:

Then, we train our model to aggregate colors so that there are 64 different colors in the image:

Then we build a helper function to help us reconstruct the image using the number of colors specified:

Finally, we can now visualize the appearance of the image using 64 colors and how it compares to the original image:

Original image of 96 615 color
64 color reconstruction image

Of course, we can see some differences, but overall, the integrity of the image is protected! Explore different numbers of clusters! For example, if you specify 10 colors, you can use the following:

Reconstructed image of 10 colors

Dimensionality reduction-PCA

In this exercise, we will use PCA to reduce the dimensions of the dataset so that we can easily visualize it.

So let's take it fromScikit-learnImport the iris dataset:

Now we will calculate the first two main components and see the variance ratio that each component can interpret:

From the code block above, you should see that the first principal component contains the variance of 92%, and the second principal component contains the variance of 5%. Therefore, this means that only two features are sufficient to explain the variance of 97% in the dataset!

Now we can use it to easily draw data in two dimensions:

You get:


Only! You now know how to implement k-means and PCA! Again, keep in mind that unsupervised learning is difficult because there are no error indicators to evaluate the execution of the algorithm. Moreover, these techniques are typically used for exploratory data analysis prior to conducting supervised learning.

This article was transferred from awardsdatascience,Original address