"Building a deep learning model in a few minutes? Training takes hours! I don't even have a good enough machine." Countless people in data science say that they always try to avoid building depth on their own machines. Learning model.

In fact, you don't need to work for Google or other large technology companies to use deep learning datasets. You can build your own neural network from scratch in minutes, without renting Google's server. It is no longer just a dream. Fast.ai students designed a model on the Imagenet dataset in just 18 minutes. This article will show a similar model building process.

Deep learning is a very broad field, so in this article we will focus on solving image classification projects. In addition, we will use a very simple deep learning architecture to improve accuracy.

You can consider the python code you see in this article as a reference for building an image classification model. Once you have a good understanding of the concept, you can try to participate in the competition!

What is image classification?

Observe the following picture:

You will immediately realize that this is a (luxury) car. Take a step back and analyze how you came to this conclusion-you see a picture and classify its category (a car in this case). In short, that's all about image classification.

Classify a given image, there may be n categories in the image. Inspecting and classifying images manually is a very tedious process. When we face a large number of pictures, such as 10000 or even 100000, this task is almost impossible to complete. How useful would it be if the entire process could be automated and the images quickly labeled based on the corresponding classes?

Self-driving cars are a good example of understanding the application of image classification in the real world. In order to achieve autonomous driving, we can build an image classification model to identify various objects on the road, such as cars, people, moving objects, and so on. We will see a few use cases later in this article, with many more applications around.

Now that we have the subject, let's dive into how to build an image classification model, what its prerequisites are, and how it is implemented in Python.

Structure of image data

In order to solve the problem of image classification, our data needs to adopt a specific format. We will see this in several sections, but before that, keep these instructions in mind.

You need two folders, one for the training set and one for the test set. The training set folder contains a .csv file and an image folder:

  • The csv file contains the names of all the training images and their corresponding tags
  • The image folder contains all the training images

The .csv file in the test set is different from the training set. The .csv file in the test set contains the names of all test images, but not their corresponding tags. Can you guess why? Our model will be trained on the images in the training set, and label prediction will be performed on the images in the test set.

If your data is not in the format described above, you need to convert it accordingly (otherwise the prediction will be wrong and useless).

 Decomposition model building process

Before delving into Python code, let's take a moment to understand how image classification models are usually designed. We can roughly divide this process into 4 stages. Each phase takes some time to execute:

  1. Loading and preprocessing data-30% of the time
  2. Define model architecture-10% of the time
  3. Training the model-50% of the time
  4. Performance evaluation-10% of the time

Now let's explain each step above in more detail. This part is important because not all models are built at once. You need to return after each iteration, fine-tune the steps, and run again. A solid understanding of the basic concepts will greatly help speed up the entire process.

Phase 1: data loading and preprocessing

As far as deep learning models are concerned, data is gold. If there are a large number of images in the training set, your image classification model will be more likely to perform well. Also, the shape of the data varies according to the architecture / framework we use.

Therefore, it is highly recommended that you learn "The Basics of Image Processing in Python" to further understand how preprocessing processes image data. But we don't need to go this far. To understand how our model handles unseen data (and before exposing it to the test set), we need to create a validation set. This is done by dividing the training set data.

In short, the model is trained on the training set and verified on the validation data. Once we are satisfied with the performance of the model on the validation set, we can use it to make predictions on the test data.

Time required for this step: about 2-3 minutes.

Phase 2: Define the model architecture

This is another key step in our deep learning model building process. We must define the appearance of the model, which requires answering the following questions:

  • How many convolutional layers are needed?
  • What is the activation function of each layer?
  • How many hidden units are there in each layer?

There are still many issues that are not listed here one by one. These are essentially hyperparameters of the model and they play an important role in determining how accurate the predictions are.

How do we determine these values? good question! A good idea is to choose these values ​​based on existing research. Another idea is to keep trying these values ​​until you find the best match, but this can be a very time-consuming process.

Time required for this step: about 1 minute.

Phase 3: Train the model

 To train the data, we need:

  • Training images and their corresponding correct labels
  • Validate the image and its corresponding correct labels (we only use these labels to validate the model, not the training phase)

We also specified the number of periods in this step. First, we will run the model for 10 periods (you can change the number of periods later).

Time required for this step: Since the training requires model learning structure, we need about 5 minutes to complete this step.

It's time to make predictions!

Phase 4: Evaluate the performance of the model

Finally, we load the test data (images) and complete the preprocessing steps there. We then use the training model to predict classes for these images.

Time required for this step: no more than 1 minute.

 Establish a problem statement and understand the data

We will accept a very cool challenge to understand image classification. Build a model to classify a given set of images based on clothing (shirts, pants, shoes, socks, etc.). This is actually a problem faced by many e-commerce retailers, which makes it a more interesting computer vision problem.

This challenge is called "identifying clothing" and is one of the practical problems encountered on the DataHack platform.

We have a total of 70000 images (28 x 28 dimensions), of which 60000 are from the training set and 10000 are from the test set. The training images are pre-labeled according to the clothing type, for a total of 10 categories. Of course, the test images have no labels. The challenge is to identify the type of clothing in all test images.

We will build the model on Google Colab as it provides a free GPU for our training model.

Steps to build an image classification model

It's time to inspire Python skills. Finally arrived at the hands-on part!

  1. Install Google Colab
  2. Import library
  3. Loading and preprocessing data (3 minutes)
  4. Create validation set
  5. Define the model structure (1 minute)
  6. Training model (5 minutes)
  7. Forecast (1 minute)

Let's go step by step:

Step 1: Install Google Colab

Because we want to import data from Google Drive links, we need to add a few lines of code to Google Colabnotebook. Create a new python 3notebook and write the following code block:

!pip install PyDrive

This will install PyDrive. Now we will import some necessary libraries:

import os

from pydrive.auth import GoogleAuth

from pydrive.drive import GoogleDrive

from google.colab import auth

from oauth2client.client import GoogleCredentials

Next we will create a drive variable to access GoogleDrive:

auth.authenticate_user()

gauth = GoogleAuth()

gauth.credentials = GoogleCredentials.get_application_default()

drive = GoogleDrive(gauth)

To download the dataset, we will use the ID of the uploaded file on Google Drive:

download = drive.CreateFile({'id': '1BZOv422XJvxFUnGh-0xVeSvgFgqVY45q'})

Replace "id" in the above code with the id of the file. Now we will download this file and unzip it:

download.GetContentFile('train_LbELtWX.zip')

!unzip train_LbELtWX.zip

These code blocks must be run every time the notebook is started. 

Step 2: Import the libraries needed in the model building phase.

import keras

from keras.models import Sequential

from keras.layers import Dense, Dropout, Flatten

from keras.layers import Conv2D, MaxPooling2D

from keras.utils import to_categorical

from keras.preprocessing import image

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from keras.utils import to_categorical

from tqdm import tqdm

Step 3: Recall the preprocessing steps we discussed earlier.We will repeat the above steps here after loading the data.

train = pd.read_csv('train.csv')

Next, we will read all the training images, store them in a list, and finally convert the list into a numpy array.

# We have grayscale images, so while loading the images we will keep grayscale=True, if you have RGB images, you should set grayscale as False

train_image = []

for i in tqdm(range(train.shape[0])):

    img = image.load_img('train/'+train['id'][i].astype('str')+'.png', target_size=(28,28,1), grayscale=True)

    img = image.img_to_array(img)

    img = img/255

    train_image.append(img)

X = np.array(train_image)

Since this is a multi-class classification problem (10 classes), we will one-hot encode the target variable.

y=train['label'].values
y = to_categorical(y)

Step 4: Create a validation set from the training data

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)

Step 5: Define the model structure

You will create a simple architecture with two convolutional layers, a hidden layer, and an output layer.

model = Sequential()

model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=(28,28,1)))

model.add(Conv2D(64, (3, 3), activation='relu'))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Dropout(0.25))

model.add(Flatten())

model.add(Dense(128, activation='relu'))

model.add(Dropout(0.5))

model.add(Dense(10, activation='softmax'))

Next, compile the created model.

model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])

Step 6: Train the model.

In this step you will train the model on the training set images and validate it using the (you guessed it) validation set.

model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))

Step 7: Forecast!

We will first follow the steps performed when processing the training data. Use the model.predict_classes () function to load a test image and predict its classes.

download = drive.CreateFile({'id': '1KuyWGFEpj7Fr2DgBsW8qsWvjqEzfoJBY'})

download.GetContentFile('test_ScVgIM0.zip')

!unzip test_ScVgIM0.zip

Import the test file:

test = pd.read_csv('test.csv')

Now, read in and store all the test images:

test_image = []

for i in tqdm(range(test.shape[0])):

    img = image.load_img('test/'+test['id'][i].astype('str')+'.png', target_size=(28,28,1), grayscale=True)

    img = image.img_to_array(img)

    img = img/255

    test_image.append(img)

test = np.array(test_image)

# making predictions

prediction = model.predict_classes(test)

We will also create a submission file and upload it to the Datahack platform page (see how the results appear on the leaderboard).

download = drive.CreateFile({'id': '1z4QXy7WravpSj-S4Cs9Fk8ZNaX-qh5HF'})

download.GetContentFile('sample_submission_I5njJSF.csv')

# creating submission file

sample = pd.read_csv('sample_submission_I5njJSF.csv')

sample['label'] = prediction

sample.to_csv('sample_cnn.csv', header=True, index=False)

Download this sample_cnn.csv file and upload it to the contest page. The results have been generated and checked on the leaderboard. This will give you a reference solution to get started with any image classification problem!

You can try hyperparameter tuning and regularization techniques to further improve the performance of the model.

Accept another challenge

Let's test the learning effect on different datasets. We will solve the "Identify Numbers" exercise in this section and download the dataset. Before continuing, try to solve the problem yourself. You already have the tools to solve the problem-you just need to apply them! Check back for your results, or if you get stuck at some point.

In this challenge, we need to identify the numbers in a given image. There are a total of 70000 images-49000 tagged images in the training set and 21000 in the test set (test images have no tags). We need to identify / predict the categories of these unlabeled images.

Ready to get started? Excellent! Create a new python 3 notebook and run the following code:

# Setting up Colab

!pip install PyDrive

import os

from pydrive.auth import GoogleAuth

from pydrive.drive import GoogleDrive

from google.colab import auth

from oauth2client.client import GoogleCredentials

auth.authenticate_user()

gauth = GoogleAuth()

gauth.credentials = GoogleCredentials.get_application_default()

drive = GoogleDrive(gauth)

# Replace the id and filename in the below codes

download = drive.CreateFile({'id': '1ZCzHDAfwgLdQke_GNnHp_4OheRRtNPs-'})

download.GetContentFile('Train_UQcUa52.zip')

!unzip Train_UQcUa52.zip

# Importing libraries

import keras

from keras.models import Sequential

from keras.layers import Dense, Dropout, Flatten

from keras.layers import Conv2D, MaxPooling2D

from keras.utils import to_categorical

from keras.preprocessing import image

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from keras.utils import to_categorical

from tqdm import tqdm

train = pd.read_csv('train.csv')

# Reading the training images

train_image = []

for i in tqdm(range(train.shape[0])):

    img = image.load_img('Images/train/'+train['filename'][i], target_size=(28,28,1), grayscale=True)

    img = image.img_to_array(img)

    img = img/255

    train_image.append(img)

X = np.array(train_image)

# Creating the target variable

y=train['label'].values

y = to_categorical(y)

# Creating validation set

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)

# Define the model structure

model = Sequential()

model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=(28,28,1)))

model.add(Conv2D(64, (3, 3), activation='relu'))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Dropout(0.25))

model.add(Flatten())

model.add(Dense(128, activation='relu'))

model.add(Dropout(0.5))

model.add(Dense(10, activation='softmax'))

# Compile the model

model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])

# Training the model

model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))

download = drive.CreateFile({'id': '1zHJR6yiI06ao-UAh_LXZQRIOzBO3sNDq'})

download.GetContentFile('Test_fCbTej3.csv')

test_file = pd.read_csv('Test_fCbTej3.csv')

test_image = []

for i in tqdm(range(test_file.shape[0])):

    img = image.load_img('Images/test/'+test_file['filename'][i], target_size=(28,28,1), grayscale=True)

    img = image.img_to_array(img)

    img = img/255

    test_image.append(img)

test = np.array(test_image)

prediction = model.predict_classes(test)

download = drive.CreateFile({'id': '1nRz5bD7ReGrdinpdFcHVIEyjqtPGPyHx'})

download.GetContentFile('Sample_Submission_lxuyBuB.csv')

sample = pd.read_csv('Sample_Submission_lxuyBuB.csv')

sample['filename'] = test_file['filename']

sample['label'] = prediction

sample.to_csv('sample.csv', header=True, index=False)

Submit this file on the practice page and get a pretty good accurate number. This is a good start, but there is always room for improvement. Continue adjusting the hyperparameter values ​​to see if you can improve the basic model.

This article is transferred from the public number core reading technique,Original address