At Dialexa, we work with businesses and startups to design, build and deploy successful data-driven products from scratch. At the heart of data-driven products is an intelligent engine that uses data to automate decision making.
For example, this is a platform for bidding on online ad placements. This is a platform that requires some manual input in the form of ads that they want to display, some optional hard bid limits, and configurable aggressiveness factors. The system itself can be driven by a machine learning agent that can bid, monitor the click rate of the ad, and possibly adjust itself online to optimize the bidding mode.
Product-centric data science and machine learning bring a whole new set of challenges, and typical data science projects are not limited. In a product-centric project, data scientists work with multidisciplinary teams of designers, software engineers, and product owners to ensure that their models are aligned with business goals, created under system constraints, and in an agile time frame. Delivery within.
Over the years, we have seen some common scenarios for multiple projects and accumulated some techniques on how to reduce risk, deliver value quickly, and develop reliable plans for the future.
Here are some tips for those who want to build data-driven products.
Get and analyze data as early as possible
Successful machine learning and data science products rely on data to survive and die. In the Kaggle competition and some academic research areas, the data is clear, accessible, trustworthy, and rich enough to train the model. However, industrial data science data is often unformatted, noisy, and subject to strict management.
One of the biggest challenges we face is to cut red tape to get the right data set. Businesses have a treasure trove of data in their warehouses, but there are multiple departments (legal, IT, governance) between your team and data, they need to approve transfers, potential negotiation processes to purchase access rights, and data from team data engineers. Complete the data contract before transferring to your team.
If you can't access this data, data scientists can only speculate what they can do with it. It is not correctly assumed that this data is ready for modeling or even the signals required to reach the target KPI. Getting data early allows the team to return quick feedback before delving into the modeling path that may not be feasible.
We recommend starting each data-driven product with a short “value proving” phase. This is the basis for a small team to complete the required data, establish a baseline using the initial naive model, and set the achievable model KPI based on the model. This is a low-risk way to verify the problem you are solving with a small amount of resources.
Understand end users
When you build your product, you are actually building a tool to solve the problems that end users are using. Users work in a variety of ways and interact with products. Data-driven products focus on the process of receiving recommendations from models and provide model feedback for learning. To build a successful data-driven product, you first need to understand how your user plan interacts with your product, what they want to see from the model, what controls they have on the output, and how they provide feedback to the system. It is important.
The web application space complements the design process for successful products through a number of integrated research phases. This phase typically involves building roles, understanding users and machines through empathy mapping and user interviews. The output at this stage is the interface design that the product owner can trust, and the interface that the engineer (and data scientist!) team can execute.
At Dialexa, we successfully injected data-centric tips and questions into these tools to gain insight into the models that users actually need. These new data points provide data scientists with metrics, model architecture requirements, and many new features they may never have considered!
A good example of a product intelligence feature is AirBnB's listing price recommendation. Some important elements of this feature that can be discovered during the research phase are:
- This is just a suggestion for the user to control the final price
- They give the main factors in choosing the price
- They allow users to directly feedback their pricing models
This feature is not perfect and has been criticized for pricing lists in other complaints. These issues can be resolved by understanding the user's concerns again. I believe that based on feedback, there are many aspects to this feature that need improvement. One way to gain user trust is to invest in a model that outputs confidence intervals in the decision. They may have to sacrifice some accuracy, but as long as it is still acceptable, the end user may be more satisfied with the entire function.
Understand your model
Just like end users, models also need love. These models are not stand-alone — the entire team must be on the same page so that engineers can write supporting software, designers can use wireframe UI, and stakeholders can set their delivery expectations. Before building a model or model-based function, it is vital that the entire team is on the same page by gathering requirements.
One of the methods we use comes from our research and design team. We have adjusted the user empathy map to empathize with model-based features.ThisIt's a good article that describes this process in depth. The point of the exercise is to let the team think about the feature and write down the following:
- Sensory – What data and variables are needed for the model?
- Yes-what are the model outputs and actions taken?
- Say-how does the user know the reason for the model's decision?
- Think about it-what hard and fast rules must this feature follow?
- Feeling-how do we know that the feature is doing what we expect?
These are our explanations for the categories in which our team works well. Categories like "say" and "feel" may be particularly difficult to surround. We let the team start thinking about the right direction by providing examples of similar features. For example, some notes for the AirBnB Price Suggestion Tool might be:
- Location data for rented units
- Listing date
- Recommended price
- a range of preferential prices
- Similar listings in the area
- Pricing factor breakdown
- Can't be lower than the minimum break-even price
- Is there any legal consideration?
- Direct user feedback from the tool
- Is the user in this range?
The output of this session is a common understanding and a series of explicit requirements for all players of this function. At a high level, data scientists can begin to design a model architecture, engineers can plan for new data sources and API endpoints, designers can wireframe components, and product owners know exactly what to deliver. What an incredible sport!
Start with simplicity and expand complexity
In the product team, the work of data scientists is often dependent on the work of other team members. Back-end engineers cannot effectively develop and test software to support their models until they can access it. In addition, product owners may want to roll out features for beta testing as soon as possible, rather than teams optimizing models.
The first and most important action taken in this situation is to communicate with your team. Record the expected inputs and outputs to resolve when the model is ready. This should be refreshed at a high level after the model empathy map! The next option to consider is not to use a machine learning model or to greatly simplify the method.
For experienced machine learning engineers, one of the most difficult realities in dealing with product teams is that machine learning is a means to an end, not an end in itself. As a machine learning engineer, he likes to read and understand the frontier developments in the field-it is difficult for me to write about this.But in fact, most of the features are successfully supported by simple models or even heuristics. -You don't need to study in depth to solve every problem.
Quickly deploy a simple model, or at least the interface of the model, to get rid of the obstacles of the rest of the team and let their gears turn. Engineers can safely start with the model and stakeholders can monitor the KPIs of the models in the product and turn them on when they are acceptable.
Let's take another AirBnB price suggestion model. After defining the full functionality, the team can build and deploy a fast heuristic engine using the average list price of the surrounding area. Engineers can develop heuristic-based models and hide them behind feature flags, waiting to open in production. At the same time, the data science team, SMBs and product owners can work together to iterate the model until it is ready and then release it to the end user.
This is a process of creating miracles for us. We have been able to iterate more and more advanced models quickly, test models in a production-like environment, and publish model-driven functionality in a completely secure manner.
These are just a few of the many technologies and processes our team uses to deliver data-driven products. In the process we are eager to share and help other product teams implement, we have learned more lessons.Back on napkin
This article is transferred from medium,Original address