Creating an excellent machine learning system is an art.
There are many factors to consider when building an excellent machine learning system. But what often happens is that we as data scientists only worry about certain parts of the project.
But have we ever considered how to deploy a model after having a model?
I have seen many machine learning projects, but many of them are doomed to fail because they have not made a production plan from the beginning.
This article is about a process required for a successful ML project - a production project.
1. Establish a baseline at the beginning
You don't actually need to build a model to get the benchmark results.
Suppose we will use RMSE as an indicator of the time series model. We evaluated the model on the test set and the RMSE is 3.64.
Is 3.64 a good RMSE? How do we know? We need a baseline RMSE.
This may come from the currently used model for the same task. Or by using some very simple heuristics. For time series models, the baseline for failure is the prediction for the last day. That is, predict the number of the previous day.
Or how the image classification task. Samples of 1000 markers are extracted and sorted by human. Human accuracy can be your benchmark. If humans are unable to achieve 70% prediction accuracy on a task, then when your model reaches a similar level, you can always consider automating the process.
Learn: Before you create a model, learn about the results you want to get. Putting out some expectations will only disappoint you and your customers.
2. Continuous integration is the way forward
You have now created the model. Its performance is better than the baseline/current model on the local test dataset. Should we move forward?
We have two choices -
- Enter an infinite loop to further improve our model.
- Test our models in a production environment, get more insights about possible problems, and then passContinuous integrationContinue to improve our model.
I am a fan of the second method. In his awesome Third game Named build learning machine project in Coursera Deep learning major, Andrew Ng said -
“Don't try to design and build a perfect system from the start. Instead, you can quickly build and train a basic system in just a few days. Even if the basic system is a far cry from the “best” system you can build, check the basics. The functionality of the system is still very valuable: you will quickly find clues to show you the most valuable investment direction."
Completion is more important than perfection.
Learn: If your new model is better than the current model in production, or if the new model is better than the benchmark, then it doesn't make sense to wait for the line to go online.
3. Your model may go into production
Is your model better than the benchmark? It performs better on local test datasets, but does it really work well overall?
To test the validity of your model's assumptions over existing models, you can setA / B test. Some users (test groups) see predictions from the model, while some users (controls) see predictions from previous models.
In fact, this is the right way to deploy the model. You may find that your model is not as good as it seems.
The mistake is not wrong. The mistake is not to expect us to be wrong.
It's hard to point out the real reason why the model performs poorly in a production environment, but some of the reasons may be:
- You may see that the data obtained in real time is very different from the training data.
- Or you did not complete the pre-processing pipeline correctly.
- Otherwise you will not be able to measure performance correctly.
- Maybe there is an error in your implementation.
Learn: Do not go into full production.A/B testing is always a great way to move forward. Be prepared to rely on (perhaps the old model). There may always be things that may be unpredictable and crash.
4. Your model may not even be in production
I created this impressive ML model, which provides 90% accuracy, but it takes about 10 seconds to get the prediction.
is it acceptable?Maybe for some use cases, but not actually.
In the past, many of the winners of the Kaggle competition eventually created the monster ensemble, which ranked the top in the rankings. Below is a particularly exciting example model for winning Kaggle's Otto classification challenge:
Another example is the Netflix Million Dollar Recommendation Engine Challenge.Due to the engineering costs involved, the Netflix team eventually NeverUse a successful solution.
So how to make the model accurate and easy on the machine?
This is a teacher-student model orKnowledge distillationthe concept of. In knowledge distillation, we trained smaller student models on larger trained teacher models.
Here, we use the soft tag/probability in the teacher model and use it as training data for the student model.
The key is that the teacher is outputting the classroom probability – “soft tag” instead of “hard tag”. For example, the fruit classifier might say "Apple 0.9, Pear 0.1" instead of "Apple 1.0, Pear 0.0." Why? Because these "soft labels" are more informative than the original labels - tell the students, yes, the specific apples are a bit like pears. The student model can usually be very close to the level of the teacher, even if the parameters used are reduced by an order of magnitude of 1-2! - source
Learn:Sometimes we don't have a lot of calculations available when we predict, so we want a lighter model. We can try to build simpler models or try to use knowledge refinement for such use cases.
5. Maintenance and feedback loop
The world is not constant, so is the weight of your model.
The world around us is changing rapidly, and the methods that may be applicable two months ago may not be important now. To some extent, the model we build is a reflection of the world. If the world is changing, our model should be able to reflect this change.
Model performance typically decreases over time.
Therefore, we must consider the way to upgrade the model during the maintenance cycle from the beginning.
The frequency of this cycle is entirely up to the business problem you are trying to solve. In the ad forecasting system, users tend to be fickle and purchase patterns are constantly appearing, so the frequency needs to be very high. In the commentary sentiment analysis system, the frequency does not have to be as high because the language does not change its structure.
I still want to admitThe importance of feedback loops in machine learning systems. Suppose you predict that in a dog-to-cat classifier, the likelihood of a particular image being a dog is small. What can we learn from these low-confidence examples? You can send it to a manual review to see if it can be used to retrain the model. In this way, we can train the classifier on an indeterminate instance.
Learn:When considering production, a plan is needed to use feedback to maintain and improve the model.
These are important things I found before considering putting the model into production.
Although this is not a list of things you need to consider, the list of things that may go wrong is incomplete, but it will undoubtedly be a thoughtful one when you next create a machine learning system.
This article is transferred from medium,Original address