back ground

This article is part of a large-scale study of how product managers integrate machine learning into their products (see other articles below),Brian PolidoriMyselfMBA at the University of California at Berkeley, atVince LawWith the help of our teacher.

The study aims to understand how product managers design, plan, and build products that support machine learning (ML). To achieve this understanding, we interviewed 15 product development experts from various technology companies. Among the 15 companies represented, the market value of 14 companies exceeds 10 billion, 11 is publicly listed, 6 is B2C, and 9 is B2B.

Product Manager guides the ML series:

How to manage machine learning models

Product managers need to make trade-offs and considerations when building products that support machine learning (ML). Different product use cases require different ML models. Therefore, the core principle of learning how to manage ML models is the key product manager skill set.

Balance accuracy and recall rate

Every ML model is wrong at some point. Therefore, when building a product that supports ML, you need to consider the right balance between false positives and false negatives for specific use cases.

Another way to consider this balance isAccuracy and recall. Accuracy is the percentage of true positives in the retrieved instances, and recall is the ratio of the predicted positive number to the true positive total. The reason this relationship is called balance is that the accuracy and recall rate are relatively negatively correlated (when other conditions are the same).

While only a few respondents mentioned this balance using precision and recall terminology, a large part mentions the importance of finding a balance in a particular use case. Google Photos is a great example.

When users enter search content in Google Photos, they want to reduce the number of images to make it easier to find what they want. Therefore, the goal is to reduce noise, so the cost of false positives is limited (because the alternative is looking at all the photos) but the cost of false negatives is high (because what they want may not surface). Therefore, Google Photos prioritizes high-precision high recall rates. As shown, Google Photos typically contain images that are not related to the search.

At the other end of the precision and recall spectrum, one of our respondents gave an example of their company (social networking) focusing on high precision. The use case is to determine how to rank and recommend content displayed in a user's personal feed. After conducting some experiments, the company found that if the recommended content is not relevant to the user, the user is more likely to stir.

While this result may seem obvious, the nuance is that users have the highest churn rate when using mobile devices. Due to limited real estate, the recommended content will occupy a large part of the screen, so if the recommended content is not very accurate, the user may be lost.

Based on our interviews and research, we created the following general guidelines as a starting point for considering priorities.

key point

When to prioritize recalls over precision:

  • abnormal detection
  • Compliance
  • Fraud identification

When to prioritize accuracy rather than recall:

  • Limited space (for example, mobile devices, small parts)
  • Suggest
  • Content review

Accuracy threshold

The core part of the machine learning strategy depends on the minimum level of accuracy required to achieve product market matching (the prediction score that the model correctly predicts-see the recall and precision section). A good agent for product market fit is value creation. The more value created (for the company and users), the more suitable the product market is. Therefore, the question becomes, what is the relationship between different accuracy and value creation?

The answer to this question depends on the details of the specific use case. So let's look at two examples.

Self-driving car

Initially, when the self-driving car ML model has very low accuracy, since the human driver cannot rely on the system at all, a zero value (or even a negative value) is created. At some point (A), there is a step-by-step function in which the ML model is sufficiently accurate that human drivers can start relying on systems in some highly restricted environments, such as driving a more uniform highway. Once this level of accuracy is reached, important values ​​(or product market suitability) are created for that particular use case.

However, after point A, the system reaches a slight platform, where an increase in the level of accuracy does not correspond to a corresponding increase in value creation, because human drivers still need to participate in a part of the time. For example, a car may be able to park itself for a period of time, but cannot handle high-density city streets, or know exactly where on the side of the street disappointed you. Essentially, the ML model has not yet reached the minimum accuracy threshold for the next use case-fully self-driving.

Once the ML model reaches the minimum accuracy threshold for fully automated driving, there is an example that turns into a completely new use case that does not involve human drivers at all. At this point, the product team can start thinking about products that fundamentally change the new use case, such as adding an entertainment console or creatingFleet.

Google Photos Search

Google Photo Image Search does not have a large step-by-step feature compared to autopilot accuracy values.

When a user searches for a particular photo, the user types a word to create a filter and reduces the total number of photos that need to be viewed. Even with very low precision, point A, the ML model can still create value by reducing the total number of photos the user needs to view (assuming the ML model has a lower false negative).

As accuracy continues to increase, as more and more irrelevant photos are filtered out, the speed of value creation is increasing. At some point around point B, as the accuracy continues to increase, the marginal revenue of value creation diminishes. This is because the filter has eliminated most other photos, and users can quickly and easily view all photo results without scrolling.

Each use case will have a different accuracy value creation map. Therefore, you should carefully consider the characteristics of a particular use case to determine if 1 has critical accuracy thresholds and which product changes are required by 2 above these accuracy thresholds.

Key questions about accuracy thresholds

  • What is the minimum accuracy threshold that my use case requires?
  • Is machine learning used to add manual processes?
  • As the percentage of human participation decreases, how does the functional requirements of the product change?
  • When can you completely remove humans from the process?
  • If the accuracy reaches a certain level, will the entire use case change?
  • Is value creation bounded by zero? Or, in other words, is it possible to reduce the maximum value that can be created to zero, such as reducing fraud to zero?

Exploration and exploitation

One of the main challenges of some machine learning problems is the balance between exploration and development. This problem is usually calledMulti-arm robberThe casino scene to illustrate. Imagine a gambler in a row of "one-armed robbers" slot machines. A gambler can choose to test multiple machines to find the machine with the highest payment (exploration). Or the gambler can choose a machine and try to establish an optimal strategy to win the machine (utilization).

The definition of maximization (usually indicators such as clickthrough rate):

  • explore  -Randomly test various options to some extent in an attempt to find the global maximum.
  • Number  -Optimize the largest existing problem space in the area.

This trade-off is common for recommended products that support ML. For example, Nordstroms' shopping app has limited screen space and is unable to promote products to users. If it shows the user a new product or brand that the user has never viewed, Nordstrom is exploring whether the user has a higher global maximum. This will provide Nordstrom with more information about the user's previously unknown preferences (and may find false negatives), but too many false positives can have a negative impact on the user experience.

On the other hand, Nordstrom presents users with products and brands they have seen or previously purchased in purely exploitative ways. This approach can help Nordstroms optimize only based on what the user knows (the real positive factor).

This is not a machine learning problem to be solved; many companies we interviewed are constantly optimizing and adjusting their models to find the right balance.The most likely solution is to use the ML model of exploration and development techniques.

key point

  • If you have a false negative, exploration is the best way to learn, but overexploration can lead to many false positives.
  • Exploitation is a safe method, but it is restrictive and can lead to time consuming less desirable local maxima.

This article is transferred from medium,Original address