This article is reproduced from the public number of the door venture,Original address
The ability of artificial intelligence is familiar to us, but what is its weakness and what is its limitation is the focus we need to focus on. For example, a driverless car will encounter many scenes that have never been seen in training in real road conditions.How to deal with this special case where the actual and training do not matchIt has become a big problem in front of researchers.
Recently, researchers at MIT and Microsoft have developed aUsed to identify intelligent systemsNew model.In particular, the “blind spot” of knowledge learned by the automatic driving system but not matched with the actual situation, engineers can use this model to identify and improve the automatic driving system to deal with special situations, improve the safety of the entire system.
The artificial intelligence system of driverless cars has been extensively trained in virtual simulation environments and data sets to cope with every situation that may occur on the road. But sometimes cars make unexpected mistakes in the real world, because for some unexpected events, the car should but not make the right response.
If there is an untrained unmanned vehicle, it may not be possible to distinguish between a white van and an ambulance with a flashing alarm after training on the usual data set. When it was driving on the road, the ambulance whistle passed, because it could not identify the ambulance, which is lacking in the special training concentration (the training set is usually marked with a small truck), it can not know that it needs to slow down and lean on the side, and Such an unmanned vehicle brings a series of unpredictable traffic conditions when driving on the road. The same situation can occur in sections along with police cars, fire engines and even school buses. Specially prepared for the take-away express electric car, the left and right bicycles, pedestrians everywhere, the driverless system can not handle such complicated road conditions!
In order to solve this problem, the researchers proposed new training methods to carry out more in-depth training and improvement of unmanned systems. First, the researchers used the previous method to establish an artificial intelligence system through simulation training. But when the system is running in the real world, someone will closely monitor the behavior of the system. When the system commits or will make any mistakes, humans will promptly intervene to provide human feedback to the system. The researchers then combined the training data with human feedback data and used machine learning techniques to generate an anomaly/blind spot recognition model that accurately indicates where the system requires human intervention to obtain more information to guide Correct behavior.
Researchers have validated this approach through video games, which allow humans to correct the learning path of characters in the video through simulation. The next step is to combine the model with traditional training and testing methods to train automated learning systems that require human feedback, such as autonomous vehicles and robots. This model helps the automated system to better understand what they don't know. Many times when training the system, the simulation training they receive does not match the events in the real world, and the system may make mistakes and accidents. This model can compensate for the gap between the simulation and the real world in a safe way with human behavior.
Some traditional training methods do provide human feedback in real-world testing, but only to update the system's behavioral actions. These methods do not identify blind spots in artificial intelligence systems. This newly proposed model first places the artificial intelligence system in simulation training, and the artificial intelligence system will generate some "strategies" that map each situation to the best action it can take in the simulation.The system will then be set to the real world, and when the system behaves wrong, humans will send a reminder.
Humans can provide data in a variety of ways, such as through "demo" and "correction." In the demonstration, humans act as they do in the real world, systematically observing them, and comparing human behavior with the behavior that the system will take in this situation. In the case of a driverless car, if the planned route of the car deviates from the human will, humans will manually control the car, and the system will send a signal. By observing behaviors that are consistent or inconsistent with human behavior, the system indicates which behaviors are acceptable and which are unacceptable.
At the same time, humans can also modify the system, and when the system works in the real world, humans can monitor it. The driver can sit in the driver's seat while the self-driving car travels along the planned route. If the car is driving correctly, humans do not intervene. If the car is not driving properly, humans may re-control the vehicle, and the system will signal that the car has acted improperly in this particular situation.
Once the feedback data from humans is gathered, the system can build a database of various situations. A single situation can receive many different signals, meaning that multiple labels per condition may indicate that the behavior is acceptable and unacceptable. For example, an autonomous car may have been driven many times by a big car and there is no slowdown and parking, which is recognized. However, when an ambulance that is exactly the same as the big car is coming, the self-driving car does not slow down or evade. At this time, it receives a feedback signal: the system behaves inappropriately.
At the moment, the system has received a number of contradictory signals from humans: sometimes it does not slow down from the side of the cart, it is okay; in the same situation, only the cart is replaced by an ambulance, not slowing down The past is not right. At this point, the system will notice that it is wrong, but it still doesn't know why it is wrong. After collecting all these seemingly contradictory signals, the next step is to integrate the information and ask the question: when receiving these mixed signals, commit the crime. How likely is the next error.
The ultimate goal of this new model is to mark these ambiguous situations as blind spots.. But this is not just a matter of simply counting the acceptable and unacceptable behaviors that occur in each case. For example, if the system encounters the correct action nine times out of ten times when it encounters an ambulance, it will be marked as a non-blind spot. But because inappropriate behavior is far less frequent than proper behavior, the system will eventually learn to predict that all conditions are not blind spots, which is extremely dangerous for the actual system.
To this end, researchers used a Dawid-Skene machine learning method that is commonly used for crowdsourced data processing label noise to solve this problem. The algorithm takes as input a database of various situations, and each situation has a pair of noise labels of "acceptable" and "unacceptable". Then it gathers all the data and uses some probability calculation methods to identify the predicted blind spot label mode and the predicted non-blind spot label mode. Using this information, it will output an integrated "non-blind spot" or "blind spot" label and the confidence level of the label for each situation. It is worth noting that even if acceptable behaviors are made in 90% of the cases, the algorithm can learn to recognize rare unacceptable situations as blind spots. Finally, the algorithm will generate a "heat map", and each training situation that the system experiences in the original training is arranged according to the blind spot probability from low to high.
When the system is applied to the real world, it can use this learning model to act more cautiously and intelligently. If the learning model predicts that a certain state is a blind spot with high probability, the system can consult how humans should respond and thus act more safely.
For more detailed technical details, please refer to the researcher's recent AAAI-19 paper: