In recent years, deep learning has revolutionized computer vision. With a variety of learning resources everywhere, anyone can master the latest technology in days (or even hours) and apply it to their field. As deep learning becomes more common, an important question is how to apply it creatively in different fields.

Today, deep learning in the field of computer vision has solved a lot of image recognition, target detection andImage segmentationOther issues. In these areas, deep neural networks exhibit extremely excellent performance.

Even if your data is not visual, you can also take advantage of these visual domain deep learning models (especially CNN The power of the model - all you have to do is transform your data from a non-visual area into an image, and then apply the model trained from the image to your data. In theory, any data with local correlation can be used.Convolutional networkProcessing, so you will be surprised to find that this method is surprisingly good.

In this article, I will briefly introduce 3 cases and see how companies can creatively apply visual deep learning models to non-visual areas. In these three cases, the basic method is to convert the non-visual problem into a problem suitable for image segmentation, and then use the deep learning model to solve.

Case 1: Oil Industry

Beam pumps are commonly used in the oil industry to extract oil or natural gas from the ground. They are powered by an engine connected to a walking beam. The walking beam transmits the rotational motion of the engine to the vertical reciprocating motion of the sucker rod to draw the oil to the ground.

A stepper pump also becomes a pumping unit.


As a complex system, beam pumps are prone to failure. To aid in the diagnosis, a dynamometer measuring the beam load was installed on the wash plate. The dynamometer creates a dynamometer pump card, as shown in the figure below, showing the load during the engine's spin cycle.

Dynamometer card


When the beam pump fails, the shape of the dynamometer card changes. Professional technicians are usually invited to detect the dynamometer card and determine where problems occur and propose solutions. This process is very time consuming and only a very professional person can solve the problem effectively.

On the other hand, this process seems completely automated. I have tried many classic machine learning systems to solve this problem before, but the result is not very good, the correct rate is only about 60%.

As one of the many oilfield services companies, Baker Hughes has adopted an innovative approach to applying deep learning to this issue. They first converted the dynamometer card into an image and used it as an input to the pre-trained ImageNet model. The results were very exciting, using only the image classification pre-training model and fine-tuning based on the new data, the correct rate instantly increased from 60% to 93%; after further optimization of the model, they even increased the correct rate to 97%.

An example of Beck Hughes using the system. The picture on the left is an input picture, and the picture on the right is a real-time classification of defect patterns. The entire system only needs to be run on a portable device, and the inferential time is shown in the lower right corner.

Baker Hughes has not only achieved higher precision than previous classical machine learning methods, but they no longer need beam pump technologists to spend a lot of time diagnosing problems. In the event of a machine failure, they can repair it immediately.

To find out more about this case, you can:

  • Read some articles that work like this: https://www.knepublishing.com/index.php/KnE-Engineering/article/download/3083/6587 
  • Or watch the video: https://v.qq.com/x/page/h08318aglac.html

Case 2: Online Fraud Detection

Computer users have unique patterns and habits when using a computer, and the way you use the mouse when browsing a web page or the way you type your keyboard when composing an e-mail is unique.

In this particular case, Splunk solves the problem of categorizing users based on how they use their computer mouse. If your system can uniquely identify users based on their mouse usage patterns, they can be used for fraud detection. Imagine this situation: a fraudster steals someone's login name and password, then uses them to log in and shop at an online store. Since everyone uses a computer mouse in a way that is unique, the system can easily detect this anomaly and prevent fraudulent transactions and notify the real account owner.

All mouse activities can be collected using special JavaScript code, and the program can record mouse activities every 5 – 10 milliseconds. As a result, the data for each user will contain approximately 5000 – 10000 data points per user per page. There are two challenges here: First, each user has a large amount of data; second, the data sets of different users contain different numbers of data points. This is very inconvenient. If the sequence lengths are different, a more complex deep learning framework is usually required.

The solution is to convert the mouse activity of each user on each web page into a single image. In each image, the mouse movement is represented by a line whose color encodes the mouse speed, and the left and right clicks are represented by green and red circles. This method of processing the initial data solves these two problems: first, all images have the same size; secondly, an image-based deep learning model can now be used with this data.

In each of the figures, the mouse motion is represented as a line, the color of the line represents the mouse speed; the left click is represented as a green circle, and the right click is represented as a red circle.


Splunk built a deep learning system for user classification using TensorFlow + Keras, and they conducted two experiments:

Classification of financial service website user groups - frequent and non-customer groups when accessing similar pages. They used a relatively small training data set containing only 2000 images. After training for only 16 minutes on a modified architecture based on VGG2, the system recognizes both categories with an accuracy of more than 80%.

User's personal classification. The task is to make predictions for a given user to determine whether the user is the user or another imitator. It is also a very small training data set with only 360 images; it is also based on the VGG16 framework, but it takes a little bit of adjustment to prevent overfitting due to the small data set. After 3 minutes of training, you can achieve an accuracy of about 78%. Considering that the task itself is challenging, this result is quite exciting.

For more information, read the full article on this system and experiment: https://www.splunk.com/blog/2017/04/18/deep-learning-with-splunk-and-tensorflow-for-security-catching -the-fraudster-in-neural-networks-with-behavioral-biometrics.html

Case 3: Acoustic detection of whales

In this example, Google used a convolutional neural network to analyze the sound record and detect the humpback whale. This is very useful for the study of humpback whales, such as tracking the movement of individual whales, the attributes of songs, the number of whales, and so on. What is interesting here is not the purpose of their research, but how to preprocess the data to facilitate the use of convolutional neural networks.

The method of converting audio data into an image is to use a spectrogram. A spectrogram is a visual representation of audio data based on frequency characteristics.

An example: a spectrum of a man saying "nineteenth century".


After converting the acoustic data into a spectrogram, Google researchers used the ResNet-50 framework to train the model. The performance of the models they trained reached:

  • 90% precision: 90% in an audio clip classified as a whale sound is correct;
  • 90% recall rate: The recording of a given whale sound, with the possibility of 90% being marked as a whale.

This result is impressive and will greatly contribute to the study of whales.

Let's switch the focus from whales to what you can do when working with audio data. When creating a spectrogram, you can choose which frequency to use, depending on your audio data type. For human speech, humpback songs, industrial equipment recordings, etc., you may need different frequencies, because important information is often included in different frequency bands in different situations, and you must rely on your domain knowledge to select parameters. For example, if you are dealing with human speech data, then your preferred one should be the Mel frequency cepstral coefficient.

There are currently some great software to handle audio. Librosa (https://librosa.github.io/librosa/) is a free audio analysis Python library that can be used to generate spectrograms using the CPU. If you are using TensorFlow for development and want to GPU This is also possible with spectrogram calculations (https://www.tensorflow.org/api_guides/python/contrib.signal#Computing_spectrograms).

For more information on how Google uses the humpback whale data, check out the Google AI blog post: https://ai.googleblog.com/2018/10/acoustic-detection-of-humpback-whales.html.

In summary, the general approach outlined in this article follows two steps. First find a way to convert data to an image, then use a pre-trained convolutional network or train a convolutional network from scratch. The first step is harder than the second step. It requires you to think creatively about how to convert your data into an image. I hope that the examples I provide will help solve your problem.