This is a series of articles that evaluate a question from various perspectives: "Do I need AI for my business? Can I use AI?"
Evaluation perspective in this issue-data.
List of article series:
Low-level logic: data-driven
Rule-based old times
Before the popularization of artificial intelligence, the products everyone used were "rule-based."
We summarize the rules and let the computer automatically execute them. Many problems in our daily work and life are handled based on rules, such as:
- The rule of various formulas in Excel is: calculate the selected area
- The rule of the message is: send the content to the recipient
- The rule of the public account is: push the content of the public account you follow
The benefits of rule-based are:So everyone can know what results are obtained under what circumstances, and everything is predictable.
But the rules-based approach also has major drawbacks:Many questions make it difficult (or even impossible) to summarize effective rules.
The age of data-based AI
The development of artificial intelligence to the present (2019), the core underlying logic is: "based on data."
The problem that rules can solve very well is of course to use rules to solve, because it has low cost and strong interpretability. However, many problems do not have a valid rule. At this time, the value of artificial intelligence is highlighted.
The "data-based" method is simply: find rules from massive data. These rules are very abstract and cannot be summarized into concrete rules. such as:
- Show the machine a lot of photos of cats and dogs, and it will have the ability to "differentiate cats and dogs"
- Give the machine a large amount of Chinese-English translation articles, it has the ability to "Chinese-English translation"
- Give the machine a huge amount of articles, and it can even have the ability to "write articles"
The benefits of data-based are:As long as there is enough high-quality data, the machine can learn certain skills, and the more data, the more powerful it is.
But the data-based approach also has obvious disadvantages:The machine can only tell you "what", but cannot tell you "why".
Extended reading:
'Why does natural language processing move from rules to statistical methods?"
'人工智能》 Has a detailed introduction to this history
To use AI, you need to know the data pyramid
The "data-based" logic has been explained above, then the "data" supports this logic.
Without data, it cannot be based on data. So if you want to use artificial intelligence, you need to consider the data 3 elements of the business scenario:
- Data available
- Comprehensive data
- Data
They have 3 pyramid-like structures. They first have "data available" and then talk about "data comprehensive". With "data comprehensive", they talk about "more data".
Data available
To solve the problem, you need to have "data related to the problem." Take the example mentioned above:
Show the machine a lot of photos of cats and dogs, and it has the ability to "differentiate cats and dogs."
The data required here is not only the photo itself, but also the cats and dogs in the photo, as shown in the following figure:
So, the questions you need to consider are:
- What are the factors that affect my problem?
- Are these factors digital? If not, can it be digitized?
- Is this data available? Is it expensive? does it worth?
Comprehensive data
If we could only see 10% of photos, it would be very difficult for you now to distinguish between cats and dogs in the photos. As shown below:
When we can see 50% of the photo, we can guess.
When we can see 100% photos, we are confident.
People are like this, so are machines. You do n’t let me see it all, how can I analyze it!
So, when we want to use artificial intelligence technology to solve practical problems, you need to analyze the problem carefully:
- What are the influencing factors? Is there corresponding data?
- Are the factors with data sufficiently comprehensive?
- Are there missing data for key factors?
Data
Let's also distinguish between cats and dogs. Cats may have more than 40 breeds, and dogs may have nearly 200 breeds. And photos can be taken from different angles, different backgrounds, and different lights, and there are countless combinations.
To effectively distinguish cats from dogs, you need a lot of different photos. Kaggle (a very authoritative AI competition website) has a lot of training data to distinguish cats and dogs, most of which are of the order of tens of thousands (1w + cats and 1w + dogs).
How much is enough?
Thousands of data are needed to distinguish cats and dogs from very simple tasks. For more complex tasks, millions or even billions of data are required. How much depends on the complexity of the problem you want to solve, model selection, and expected results.
However, there is one principle that does not change:The more data, the better!
case analysis
If you are the owner of a gaming company and want to use artificial intelligence technology to increase the revenue of games, is it possible to evaluate from a data perspective?
E-commerce platforms can make shoppers spend more money through the recommendation algorithm, so combining the recommendation algorithm in the game can theoretically allow players to spend more money.
The essence of the recommendation algorithm is: mining user needs and recommending products that match the needs to users.
Applied to the game are:Dig the user's needs and his spending power, and recommend items that match the needs to the user at an appropriate price.
Step 1: Is the data available?
Games are a very digital field, but even so, there are still some factors that are not digital. such as:
- Some gamers will chat and interact in WeChat groups. This part of data games is not available.
- My wife found out that her husband was picking up girls in the game and was forced to uninstall the game. What happens outside the game can sometimes affect the game.
- There is no data on the psychological activity of the players (this event has a great discount, but I have to hold back! Otherwise, I will have to eat instant noodles for another week ~)
Enough or not? The next step is comprehensive analysis.
"Data accessibility" seems to be a very idiot problem, but the digitization of many industries is very low, and this problem is not simple for them.
Step 2: Is the comprehensiveness of the data sufficient?
To judge the needs and spending power of players, there are roughly the following influencing factors:
- User attributes
- Player attributes (age, gender, geographic location...)
- Character attributes (level, equipment situation, number of remaining diamonds...)
- Behavioral data
- Game behavior (what you bought, how to play, what copy you played...)
- Consumer behavior (how long the activity page stays, what has been bought, how much money has been spent...)
- Player interaction (who has teamed up with, who has fought with, who has participated in activities with...)
- Chat data (with whom, what did you say, in-game + out-of-game)
- Mental activities (what you want, what you like, think it’s worth charging more...)
- Product attribute
- Commodity price
- Commodity role
- Product Features
- Conditions of purchase
Still referring to the experience of e-commerce, Amazon and Alibaba have verified:
In the absence of "chat data", "psychological activity data" and "e-commerce platform off-line data", it can still effectively tap user needs and stimulate consumption.
The game not only has the right to recommend, but also the pricing right, which can further stimulate consumption by reducing prices. So comprehensiveness is OK.
PS: Therefore, in terms of comprehensive evaluation, it is not necessary to be 100% comprehensive in theory, but to reach a usable level. This case can only be found in advance.
Step 3: Is there enough data?
The recommendation system is a special case. It has very flexible requirements for the amount of data. There are many ways to solve the problem of cold start when there is a small amount of data. As the amount of data increases, the role of the algorithm increases.
A new user downloaded Taobao and still didn't hinder the recommendation, but the more recommended it was, the more reliable it was.
PS: When evaluating whether the data is sufficient, try to find experienced technical advice.
Therefore, after an evaluation of 3 data angles, the idea of "increasing game revenue through recommendation algorithms" should be feasible.
Final Thoughts
"Data" is arguably the most important dimension when assessing whether artificial intelligence technology can be used.
When making specific evaluations, think of the following 3 questions:
- Is the data available?
- Is the data comprehensive?
- Is there much data?
3 questions need to be met at the same time to be considered "seems feasible."
There are many issues to consider when assessing whether to use artificial intelligence or not. This series will continue to be updated, follow my public account to see everything:
Public number: Xiaoqiang-me
Extended reading:
'7 steps for machine learning"
'Six steps of data collection, lay the foundation of machine learning model"
'The most common 6 big problem in AI dataset (with solution)"
Comments