This article is transferred fromThe 7 Steps of the Data Science Lifecycle – Applying AI in Business"
The full text is machine-translated, which is not fluent enough, but does not affect the overall understanding
Artificial intelligence is not IT, and the use of artificial intelligence is hardly the same as the use of traditional software solutions.
Although software is deterministic, AI is probabilistic.
The process of using algorithms to coax the value of data is a challenging and often time-consuming process.Although the leaders and executives of non-technical AI projects do not need to know how to clean up data, writePythonOr adjust for algorithm drift, but they do have to understand the experimental process experienced by subject matter experts and data scientists.Value in data.
Last week, we introducedAI deploymentOfthree phases, And this week, we will delve deeper into the seven steps of the data science life cycle itself, as well as the process aspects that non-technical project leaders should understand.The model we will use to explore the data science life cycle (shown below) is directly influenced by IBMCross-industry standard process for data mining (or CRISP-DM)Model inspiration.Our model is only slightly different-by emphasizing less technical nuances and more emphasizing the business context.In the remainder of this article, we will refer to the steps outlined in the figure below:
Different from the linear phase in the three phases of deployment (pilot, incubation, and deployment), the data science life cycle steps cycle fairly quickly. In order to iterate from the model or move towards a new phase, it usually jumps from one step to The next step.Successful results.Steps 1 and 2 (business understanding and data understanding) and steps 4 and 5 (data preparation and modeling) usually happen at the same time, so there is not even a linear listing.
The steps in the data science life cycle can be considered in order, but the rough order is not always strictly followed in actual deployment.
For example, in the process of preparing data, the team may decide to “behind” business understanding in order to meet additional budget requirements (ie, data requires intensive and timely cleanup and more staff), or to clarify business results .Similarly, in the evaluation step, the team may return to data understanding or evaluate the project plan before the solution is actually deployed.
As with the three stages of deployment, we will illustrate the following stages by using two example companies:
Example 1-A company adopts a productRecommendation engineE-commerce company.The e-commerce company sees hope to improve the value of shopping carts and improve the on-site user experience, especially for existing customers with a history of purchases and activities.
Example 2-Using a predictive analytics applicationManufacturingCompany.This manufacturing company has a strong digital infrastructure that aims to use its existing data stream to detect faults and errors in the manufacturing process before they are discovered.
1. Business understanding
- aims-Determine the business objectives of the project and the resources allocated to achieve the objectives.Question: "What are the results we are pursuing?" Question: "Is AI really the right tool to get the job done?" Question: "What is the measurable and strategic value of this potential AI program?"
- challenge– Look for reasonable opportunities and provide the company with accessible opportunities.Don't over assume what AI can do.Accept the long iteration time and key skills and abilities that companies must have in order to integrate AI into the enterprise.
- Possible person involved-
- Senior leadership
- Chief Data Scientist
- project Manager
- Functional subject matter expert
Example 1-An e-commerce company using a product recommendation engine.Discuss the various options the company has to achieve growth and profitability-does the recommendation engine take priority over other options?For such marketing projects, how should we understand our customers and their buying behavior?
Example 2-Manufacturing company using predictive analytics applications.Determine how to measure the predictive model.Think about which machines require this kind of predictive maintenance-which risks and failures are the most expensive for the company. Can we focus on those risks and crashes first?
2. Data understanding
- aims– Determine the accessibility and potential value of the data.Question: "Can we use our existing data assets to achieve our business goals?" Question: "Are there challenges to this data, or opportunities to use this data in new ways to achieve our desired business results?"
- challenge– Get the value of the data, let subject matter experts and data scientists review the data together to determine how the data should be accessed, how the data should be improved, and which features may have the highest value for business results.
- Possible person involved-
- Chief Data Scientist
- project Manager
- FEATURESSubject matter expert
Example 1-An e-commerce company using a product recommendation engine.Assess the quality of customer buying behavior.Do these data tell a coherent story?Are we confident that a customer account is one person, or multiple family members (different age, priority, gender, preference) are shopping on one account, which makes things more complicated?
Example 2-Manufacturing company using predictive analytics applications.View existing data sources from manufacturing equipment.Are time series and telemetry data from similar machines stored in a similar way and stored in the same way?Can we ensure that the data is reliable?Where is it the least reliable, can we reduce the factors that affect the data in this way?
3. Assess project needs
- aims– Determine requirements and resources to continue the project.This may include additional budgets, additional training for employees, joining other subject matter experts on cross-functional project teams, or access to new data systems.
- challenge– Let senior leaders bear the inevitable complex and changing needs of real AI projects (especially for companies that lack previous practical data science experience).
- Possible person involved-
- Senior leadership
- Chief Data Scientist
- project Manager
- Functional subject matter expert
Example 1-An e-commerce company using a product recommendation engine.The cross-functional team assigned to the project may decide that they need to access more historical data and clean up and organize resources.They can also determine-given the ROI opportunities in different parts of the business-they will want to apply the recommendation engine to two very specific product categories (rather than all products in the archive), and the team may request access to a dedicated part of the business Subject matter experts.
Example 2-Manufacturing company using predictive analytics applications.The team determines the number and types of sensors they plan to install on various devices, as well as the specific subject matter experts needed to properly set up, interpret and understand these new data streams to run successfully. PoC.
4. Data preparation
- aims-Access, clean and coordinate data.Feature engineering is used to identify and extract meaningful aspects in the data corpus.Determine the feasibility of the project based on the available data.
- challenge– Data scientists and business leaders talked frankly about the challenges and costs of organizing data. These challenges and costs are usually huge (especially in older companies, or companies with little or no data science experience in practice middle).Acknowledge that if the amount or quality of data cannot be used, the project is not feasible or feasible.
- Possible person involved-
- Senior leadership
- Chief Data Scientist
- Data Science Team
- Functional subject matter expert
Example 1-An e-commerce company using a product recommendation engine.The team will clean and unify historical data and determine the specific format that new data needs to adopt to help provide a recommendation engine.Data scientists and subject matter experts work together to determine the features in the purchase and user behavior data, which they believe is most important for initial training of their models.
Example 2-Manufacturing company using predictive analytics applications.The data science team works closely with engineers and machinists to determine the most important telemetry signals (heat, vibration) of the devices they are designed to place sensors on.Then, the initial data set is collected and analyzed, and combined in time series with the existing data stream from the central manufacturing software.Reformatting or reorganizing the sensor and core system data so that it can be used to train the model.
5. Modeling
- aims– Establish a relationship between input and output, and iterate on data and algorithms to achieve business value.
- challenge-Loop back to data processing, data understanding and business understanding in the iterative process.Convene subject matter experts to contribute to the model's assumptions and practical training.
- Possible person involved-
- Chief Data Scientist
- Data Science Team
- Functional subject matter expert
- project Manager
Example 1-An e-commerce company using a product recommendation engine.Keep in mind the success metric determined by the team-the data science team tests new product recommendations in the specific product category of interest.The feedback was used by team members and (possibly) from a small group of users to calibrate the improved shopping cart value and conversion rate.Use new features in the data or weight them at different levels to achieve the desired results.
Example 2-Manufacturing company using predictive analytics applications.The team will use past repair and failure data as well as newWorking with telemetry dataTo predict which machines are more likely to fail.In order to find more instances of machines that need to be repaired, this may require a longer time frame, or a relatively large number of machines for initial testing, because only these events can help inform the predictive power of the model.
6. Evaluation
- aims–确定我们的数据资产和模型是否能够交付所需的业务成果。这常常需要许多循环才能返回到步骤1、2、3、4或5 –因为驳斥了假设,并且出现了新的想法。
- challenge– Deal with the challenges in the assessment and determine strong and quantifiable criteria for measuring success (benchmarks are difficult to determine).Involve senior leaders and subject matter experts in a strong evaluation to ensure confidence in deployment.
- Possible person involved-
- Senior leadership
- Chief Data Scientist
- project Manager
- Functional subject matter expert
Example 1-An e-commerce company using a product recommendation engine.Over time, the team will measure their new product recommendations based on previous product lists or recommendation methods.In this evaluation phase, data scientists and subject matter experts will jointly determine what seems to be effective, ineffective, and how to adjust the recommended model’s model, data or user experience to better promote the realization of the expected results (higher shopping cart value, Higher conversion rate of users to customers).
Example 2-Manufacturing company using predictive analytics applications.The cross-functional team evaluated the predictive model recommendations to determine whether they were significantly better or worse than previous methods.In the early stages of proof-of-concept or incubation, this may be qualitative (that is, do we believe that our previous methods will detect this kind of equipment failure?), while in actual deployment, this metric will be quantitative (that is, how much ? Failure occurs once a month? How much uptime loss occurs for X-class machines each month? What is the false alarm rate of the predictive maintenance system?).
7. Deployment
- aims– Successfully integrate AI models or applications into existing business processes.Ultimately, business results must be delivered.
- challenge-Train employees to use new AI applications.Continuous maintenance is required to keep the model running and adapt to changes.
- Possible person involved-
- Chief Data Scientist
- Data Science Team
- project Manager
Example 1-An e-commerce company using a product recommendation engine.
- Phase 2: Incubation and deployment:The recommendation engine is already inSandboxThe environment has been fully tested and feedback from internal team members has been integrated into a part of the e-commerce website, and 15% of users are exposed to AI-generated recommendations instead of previous recommendations.
- Phase 3: Full deployment:The recommendation system has been integrated into the website and has become the default experience on all web interfaces that the team believes can bring value.A monitoring system was established to calibrate the results and findings of the new system, and regular meetings and diagnosis were conducted to ensure the operation and improvement of the system.
Example 2-Manufacturing company using predictive analytics applications.
- Phase 2: Incubation and deployment:The predictive maintenance system has been integrated into a part of the workflow on the production floor.Now, a small group of mechanics and engineers can use and respond to this new system under the guidance of the AI team, some of whom may not belong to the cross-functional AI team.
- Phase 3: Full deployment:Predictive maintenance has been integrated into the manufacturing workflow and has become the default process for all processing functions that the AI team believes can deliver value (area that has been tested in the PoC and incubation phases).A monitoring system was established to calibrate the results and findings of the new system, and regular meetings and diagnosis were conducted to ensure the operation and improvement of the system.
Comments