When I was in college, there was an ice cream shop nearby, and I went to see it with a few friends. We walked in and it looked completely normal-they have all the usual flavors such as mint, chocolate, etc. However, at the end of the counter, they have this flavor called "Broccoli Surprise". A naturally curious person, I have to try. I asked the waiter behind the counter for samples. It is white with a few green spots, sweet and creamy. I am confused-there is no smell of broccoli here. So I asked, "What's the surprise?" "No broccoli," she answered with a smile.

Machine learning (ML) also has surprises. One of the biggest misconceptions about ML deployment within an organization is understanding the difficulties and values.

Integrating ML into your business workflow can be divided into five activities:

Define key performance indicators – Key performance indicators allow us to measure and discuss what we are trying to improve. Common KPIs include customer retention, revenue generation or employee turnover. Setting KPIs is a key step in machine learning because they will ultimately drive optimization to high-performance models.

Data collection – Collect data that will be used to train ML algorithms. Yes, if you lack data, you can use ML models produced by others. However, these business considerations are similar to other SaaS products, so let us exclude them from the scope.

infrastructure – ML infrastructure includes various software: data management, annotation tools, model training and testing environment. This infrastructure is an upfront investment, but it can iterate and improve models and data sets more effectively.

Optimize ML algorithm – Here we consider which model to use based on a given data set/problem, the amount of necessary training data, the layers in the neural network, and hyperparameter adjustments. There are too many choices.

integrated – Getting the ML model to work in a vacuum is a great achievement, but it is not until the model is integrated with the real workflow that it starts to have a real business impact. Integration is the process of building pipelines and structures that can seamlessly transfer information and data between users and computers.

Based on many conversations with companies interested in deploying machine learning, high-perception efforts are needed to optimize and benefit from machine learning algorithms.

There are several possible reasons for this:

  • For most practitioners, optimizing the ML model is the biggest "unknown" in the stack, so it's easy to imagine it's more complex and time consuming than it actually is.
  • Usability heuristics-Since ML algorithms and optimizations are more discussed in the literature and media, people usually think that they play a larger role than the actual implementation process.


When I talked to experienced practitioners who built and extended these ML systems within Google, I heard a very different story. Based on these conversations, the relative effort required to optimize the ML algorithm is much less, but Data collectionBuilding infrastructure 和 integral Every need to do more work. The difference between expectations and reality is far-reaching.

Define KPI – Once we deploy a data-driven system, we will spend less time and organizational resources to select KPIs because there is a constant flow of data feedback. This eliminates the need for proxy KPIs. Since good ML depends on good data, we must have a good collection pipeline.

Data collection – Collecting data almost always underestimates the components of starting an ML project. In the previousarticleSome of the factors to consider when building a data collection and processing strategy are described.

infrastructure – Infrastructure construction, mainly software engineering tasks, not "ML tasks", is one of the most time-consuming parts of most projects.

Optimize ML algorithm – The task of training and optimizing ML models almost always costs more than expectedlessThere are two reasons for time and energy. First, performance is a powerful feature of the data you have. However, compared to cleaning up the data, the benefits of the adjustment algorithm are dwarfed. Second, the tools used to optimize the ML algorithm (such asAutoMLIt makes it easier and faster to train and optimize models based on tagged or untagged data.

integrated – Integration is another underestimated part of the ML deployment process. Error and exception handling, redundancy, and the challenge of moving from a static product to one of continuous iterations present many software, product, and engineering challenges. Think of all the technical debt hidden in your training data!

- -

ML actually has two surprises.

First, many companies mistakenly believe which parts of the ML implementation process will be difficult. Tool and technological advancements are greatly changing ML optimization, and its speed is unmatched by software infrastructure for powerful data collection and management. Like broccoli ice cream-there is usually not that much ML in an end-to-end ML system.

Second, the implementation of MLpath(Inquiring about your customer's problems, building an infrastructure to collect, interpret and process this data, etc.) is valuable, whether or not ML is actually implemented. Not all problems have ML-driven solutions, but many problems are there, and even those without problems will benefit from this journey.

This article is transferred from medium,Original address