“If a typical person can do a mental task in less than a second, we may use AI to automate it now or in the near future.”

– Andrew Ng

Most of the people reading this article may be familiar with machine learning and related algorithms for classifying or predicting results based on data. However, it is important to understand that machine learning is not the answer to all questions. Given the usefulness of machine learning, it is difficult to accept the best solution that is sometimes not a problem.

In this article, my goal is to convince readers that sometimes machine learning is the right solution, and sometimes it is the wrong solution.


Machine learning is a subset of artificial intelligence that has revolutionized the world we know over the past decade. The explosion of information led to the collection of large amounts of data, especially for large companies such as Facebook and Google. This amount of data, coupled with the rapid development of processor power and computer parallelism, makes it relatively easy to acquire and research large amounts of data.

Today, exaggeration about machine learning and artificial intelligence is everywhere. This may be true because the potential in this area is huge. The number of artificial intelligence consulting agencies has soared in the past few years, according toIndeedIn a report, the number of jobs related to artificial intelligence increased from 2015 to 2018 by 100%.

As of 2018 year 12 month,ForbesFound that 47% of companies have at least one artificial intelligence in their business processes.DeloitteAccording to a report by, the penetration rate of enterprise software with built-in AI and cloud-based AI development services are expected to reach 87% and 83%, respectively. These numbers are impressive-if you plan to change careers soon, artificial intelligence seems like a good choice.

Does all this look great? The company is happy, and perhaps the consumer is happy too-otherwise, the company would not use artificial intelligence.

This is great, I am a big fan of machine learning and artificial intelligence. However, sometimes using machine learning is unnecessary, meaningless, and sometimes its implementation can get you into trouble.


Limit 1-Code of Ethics

Machine learning is a subset of artificial intelligence that has revolutionized the world we know over the past decade. The explosion of information led to the collection of large amounts of data, especially for large companies such as Facebook and Google. This amount of data, coupled with the rapid development of processor power and computer parallelism, makes it relatively easy to acquire and research large amounts of data.

It's easy to understand why machine learning has such a profound impact on the world. What is not clear is what its capabilities are, and perhaps more importantly its limitations. Yuval Noah Harari is famous for creating the term “data ethics”, which refers to a new stage of hypothetical civilization that we are entering, and we believe that algorithms and data are more than our own judgments and logic.

Although you may find this idea ridiculous, remember the last time you went on vacation and followed the GPS instructions instead of your own judgment on the map-do you question the GPS judgment? As they blindly follow GPS instructions, people have begun to enter the lake.

The idea of ​​trusting data and algorithms goes beyond our own judgments, which have their strengths and weaknesses. Obviously, we benefit from these algorithms, otherwise we will not use them first. These algorithms allow us to automate processes by making informed decisions using available data. However, sometimes this means replacing someone's work with an algorithm that is accompanied by moral consequences. In addition, if there is a problem, who will we blame?

The most frequently discussed case is autonomous vehicles – how do we choose how the vehicle should react in a fatal collision? In the future, we must choose which ethical framework we want self-driving cars to follow when buying vehicles?

If my autonomous car killed someone on the road, who is wrong?

Although these are fascinating issues, they are not the primary purpose of this article. However, it is clear that machine learning cannot tell us what is the normative value we should accept, that is, how we should act in the world in certain situations. As David Hume puts it, people can't "get one from a person."


Limitation 2-Certainty issues

This is a limitation that I personally have to deal with. My area of ​​expertise is environmental science, which relies heavily on computational modeling and the use of sensor/IoT devices.

Machine learning is very powerful for sensors that can be used to help calibrate and calibrate sensors when connected to other sensors that measure environmental variables such as temperature, pressure, and humidity. The correlation between the signals from these sensors can be used to develop self-calibration procedures, which is a hot research topic in the field of atmospheric chemistry research.

However, things have become more interesting in terms of computational modeling.

Computer models that simulate global weather, Earth emissions, and the transport of these emissions are computationally expensive. In fact, its computational cost is very expensive, and even on a supercomputer, research-level simulation can take weeks.

Good examples are MM5 and WRF, which are numerical weather prediction models for climate research and provide you with weather forecasts for morning news. Want to know what the weatherman is doing all day? Run and study these models.

Running the weather model is great, but now we have a machine to learn, can we use this to get our weather forecast? Can we use data from satellites, weather stations, and use basic prediction algorithms to tell if it will rain tomorrow?

The answer is, surprisingly, yes. If we know the air pressure around an area, the moisture content of the air, the wind speed, and the information of adjacent points and their own variables, we can train, for example, neural networks. But at what cost?

Use a neural network with one thousand inputs to determine if it will rain in Boston tomorrow. However, the use of neural networks missed the entire physical characteristics of the weather system.

Machine learning is random, not deterministic.

Neural networks do not understand Newton's second law, or the density cannot be negative-there are no physical constraints.

However, this may not be a long-term limitation. Many researchers are considering adding physical constraints to neural networks and other algorithms to use them for such purposes.


Limit 3-data

This is the most obvious limitation. If the model you feed is poor, it can only give you bad results. This can be manifested in two ways: lack of data and lack ofGooddata.

Missing data

Many machine learning algorithms require a lot of data before they begin to provide useful results. A good example is the neural network. A neural network is a data phagocytic machine that requires a lot of training data. The larger the architecture, the more data is needed to produce a viable outcome. Reusing data is a bad idea, and data growth is useful to some extent, but having more data is always the preferred solution.

If you can get the data, use it.

Lack of good data

Despite its appearance, this is different from the above comments. Let's imagine that you think you can trick your neural network by generating 10,000 fake data points. What happens when you put it in?

It will train itself, and then when it's tested on an invisible data set, it won't work. You have data, but the quality of the data does not meet the standard.

Similarly, the lack of good functionality can lead to poor performance of the algorithm, and the lack of good basic factual data can also limit the functionality of the model. No company will implement a machine learning model that is worse than human error.

Similarly, applying a model trained on a set of data in one case may not be appropriate for the second case. The best example I have found so far is breast cancer prediction.

There are many images in the mammography database, but they suffer from a problem that has caused serious problems in recent years-almost all X-rays come from white women. This may not sound like a big problem, but in fact, due to various factors that may include differences in testing and access to healthcare, black womenThe likelihood of dying from breast cancer increases 42%. Therefore, in this case, algorithmic training mainly for white women will have an adverse effect on black women.

What is needed in this particular case is the greater the number of X-rays in the black patient in the training database, the more features associated with this increase in 42%, and the more equitable algorithm by layering the data set. Related axes.

If you are skeptical or would like more information, I suggest you check it outText.


Limitation 4-Misuse

Related to the second limitation discussed earlier, allegedly Machine learning crisis in academic research "Browse people use blind learning to try and analyze systems that are inherently deterministic or random."

For the reasons discussed in Limitation XNUMX, the application of machine learning on a deterministic system will succeed, but the algorithm cannot learn the relationship between the two variables, and it is not known when it violates the laws of physics. We just provided some inputs and outputs to the system and told it to learn this relationship-just like someone translates word by word from a dictionary, the algorithm seems to only master the basic physics easily.

For random (random) systems, things are a bit less obvious. The machine learning crisis of a stochastic system manifests itself in two ways:

  • P-hacker
  • Scope of analysis

P-hacker

When a person can access big data that may have hundreds, thousands, or even millions of variables, it is not too difficult to find statistically significant results (assuming that the statistical level required for most scientific studies isp <0.05). This usually leads to the discovery of false correlations, which are usually obtained by p-hacking (viewing large amounts of data until a correlation is found showing statistically significant results). These are not really relevant, just responding to noise in the measurement.

This leads to individual "phishing" through large data sets for statistically significant correlations and disguise as true correlations. Sometimes this is an innocent mistake (in this case, scientists should be better trained), but sometimes it is done to increase the number of papers published by researchers – even in academia, competition is fierce. People will do anything to improve their indicators.

Scope of analysis

Compared with statistical modeling, the scope of machine learning analysis is inherently different-statistical modeling is confirmatory in nature, and machine learning is exploratory in nature.

We can think of confirmatory analysis and models as what someone is doing in a Ph.D. Planning or research area. Imagine that you are working with a consultant and trying to develop a theoretical framework to study some real-world systems. The system has a set of predefined features that are affected by it, and after careful design of the experimental and development assumptions, you can run tests to determine the validity of the hypothesis.

On the other hand, exploratory lack of many qualities associated with confirmatory analysis. In fact, in the case of truly large amounts of data and information, due to the sheer volume of data, the validation method has completely collapsed. In other words, in the presence of hundreds, thousands, and millions of features, it is simply impossible to carefully lay out a limited set of testable hypotheses.

Therefore, in a broad sense, machine learning algorithms and methods are best suited for exploratory predictive modeling and classification using large amounts of data and computational complex features. Some people think that they can be used for "small" data, but why do this when the classic multivariate statistical method provides more information?

ML is a field that solves problems from information technology and computer science to a large extent. These fields can be both theoretical problems and application problems. Therefore, it is related to fields such as physics, mathematics, probability and statistics, but ML itself is a field that is not affected by the problems raised by other disciplines. Many of the solutions proposed by ML experts and practitioners were wrong...but they got the job done.


Limitation 5-Interpretability

Interpretability is one of the main problems of machine learning. An artificial intelligence consulting firm is trying to invest in a company that only uses traditional statistical methods, and if they think the model is unexplained, they can stop. If you can't convince your customers to understand how algorithms make decisions, how likely are they to trust you and your expertise?

Just like " Business data mining-machine learning perspective "In the blunt words:

“If you interpret the results in business terms, business managers are more likely to accept [machine learning methods] advice”

Unless these models can be explained, these models can become powerless, and the process of human interpretation follows rules far beyond the technical strength. Therefore, interpretability is the highest quality that machine learning methods should achieve if they are to be applied in practice.

Especially flowering-omics, genomics, proteomics, metabolomics, etc., have become the main goals of machine learning researchers because they rely on large and non-trivial databases. However, despite the obvious success, their method lacks interpretability.


Summary and list of Peterworth

Although it is undeniable, artificial intelligence has been opened up.a lot of promising opportunitiesBut it also led to the emergence of a state of mind that is best described as " Artificial intelligence solution "This is a philosophy, if there is enough data, machine learning algorithms canSolve all human problems.

As I hope I have clearly stated in this article, at least in the current situation, there are some restrictions to prevent this from happening. Neural networks can never tell us how to be a good person, at least for now, not knowing Newton's laws of motion or Einstein's theory of relativity. There are also basic limitations in the basic theory of machine learning, called computational learning theory, which is mainly statistical limitation. We also discuss issues related to the scope of analysis and the dangers of p-hacking, which can lead to false conclusions. There are also problems with the interpretability of the results, which can have a negative impact on companies that are unable to convince customers and investors that their methods are accurate and reliable.

Although in this article I’ve covered some of the most important limitations of artificial intelligence very extensively, I will be presenting at Peter Voss’s in October 2016.an articleA list is listed with a more comprehensive list of limitations on artificial intelligence. Although current mainstream technologies can be very powerful in narrow areas, theyusuallyA list of some or all of the constraints will be listed, which I will refer to here:

  • Every narrow application requires special training
  • Need a lotHandmade structuredTraining data
  • It is usually necessary to supervise learning: training data must be marked
  • Requires lengthy offline/batch training
  • Don't learn incrementally or interactively in real time
  • Poor transfer learning, module reusability and integration
  • The system is opaque, making them difficult to debug
  • "Long tail" cannot be reviewed or guaranteed performance
  • They encode correlations, not causal or ontological relationships
  • Do not encode entities or spatial relationships between entities
  • Handle only very narrow aspects of natural language
  • Not suitable for advanced, symbolic reasoning or planning

Having said that, machine learning and artificial intelligence will continue to revolutionize industry and will only become more common in the coming years. Although I recommend that you make the most of machine learning and artificial intelligence, I also recommend that you keep in mind the limitations of the tools you use-after all, nothing is perfect.

This article was transferred from awardsdatascience,Original address