This article is reproduced from the public number AI Technology Review.Original address

AI Science and Technology Review: At present, people's understanding of chat bots is still in the stage of sneaking Microsoft Xiao Bing, and it can be clearly felt that Xiao Bing is not very concerned about the relationship between contexts. Moreover, in our concept, chat robots can't really understand what humans are saying, and they can't discuss things with humans and clearly achieve common goals.

However, Facebook's artificial intelligence research organization FAIR has just opened source and publicly published chat bots have begun to negotiate and negotiate with humans for bargaining power. Through supervised learning + reinforcement learning, this chat robot can not only understand the correspondence between words and semantics, but also formulate strategies for their own goals and reach consensus with others.

The following is a detailed introduction to the AI ​​Technology Review based on the FAIR article.

Every day of life, we will continue to negotiate things with others in a blink of an eye. Either discuss which TV station to watch, or convince the children at home to eat vegetables, or bargain when buying things. What these things have in common is that they all require complex communication and reasoning skills that are difficult to see on a computer.

Up to now, the research on chat bots has formed a chat system, which can carry out short conversations and complete simple tasks like ordering restaurants. But it is still difficult to make a meaningful dialogue between the robot and the person, because it requires the robot to combine its understanding of the dialogue with its knowledge of the world, and then generate a sentence that can help it reach its goal.

Today, Facebook FAIR researchers have a new capability for open source and publicly available chat bots. This new capability is negotiation.

There is a conflict between humans with different goals, and then a compromise that is mutually agreed upon is reached through consultation. Now researchers have proved that chat bots can do the same. Chatbots with different goals (specifically implemented as end-to-end training neural networks) can make common decisions or achieve common goals with other chatbots or humans in a one-to-one negotiation.

Mission: multiple types of bargaining

FAIR researchers have studied a negotiation task under a variety of bargaining tasks. Show two agents the same set of objects (such as 2 book, 1 hat, 3 basketball), in order to be able to assign things to them, you need to teach them to negotiate the number of their own.

Each agent has its own value function, which represents how much the agent cares about each object (for example, 1 points for each basketball value in the 3 view of the agent). Then, just like in life, each agent can't know exactly the value function of other agents, and can only speculate from the dialogue (if the other party says he wants basketball, then the basketball score in his opinion) Certainly higher).

FAIR researchers have designed a lot of situations like this that need to be negotiated, and will never let the two agents simultaneously achieve their most satisfactory parting. And, if the negotiation is rejected (or if there is no agreement after the 10 round of dialogue), then both agents will get 0 points. Simply put, negotiation is the key, and if you negotiate a good result, the score is higher.

Dialogue Deduction (Dialog Rollouts)

Negotiation is a comprehensive problem of linguistic and rational. Participants must first form their own intentions and be able to express them in words. The elements of cooperation and confrontation will appear in these conversations, which requires the agents to understand and form long-term plans, and then express them accordingly in order to achieve their goals.

In order to build this kind of dialogue agent with long-term planning ability, FAIR researchers have a core technological innovation, they call this idea "dialog rollouts".

If the chat bot can build a virtual model of the talker and then “consider in advance” or pre-feel the possible direction of future conversations, they can choose to avoid uninformed, confusing or bad round-trip discussions and turn to success. Some directions go.

Specifically, FAIR has developed a novel technology such as dialogue deduction, in which an agent using this technology can always simulate the conversation to the end of the future, so that it can choose the words that will bring the highest profit in the future.

A similar approach has been applied in the game environment, but it is the first time to solve a language problem because the number of actions that can be selected is much larger. To improve efficiency, the researchers first generated a small set of discourses to choose from, and then in order to estimate the success of these words, they repeatedly simulated a complete follow-up dialogue for each of them. The predictive accuracy of this model is high enough, thanks to the fact that this technology significantly improves the level of negotiation from the following aspects:

  • Work harder during negotiations: These new agents can have a longer conversation with humans at the cost of slower acceptance of the price. It is gone when humans sometimes do not agree, and the model in this experiment will always negotiate the results of success.
  • Intelligent responseSometimes there are situations where the agent initially pretends to be interested in something that is of little value, so that in order to later give up to show that they are "compromising", this is indeed a negotiating skill that is often used by humans. This kind of behavior is not designed by the researchers, but the negotiation method that the agent finds itself in the process of finding a way to achieve the goal.
  • Produce novel sentencesAlthough the neural network model can easily repeat sentences from the training data, the study also shows that the model can generate some sentences on its own when necessary.

Establish and evaluate a negotiated data set

In order to be able to train negotiating agents and make large-scale quantitative assessments, the FAIR team used a crowdsourcing approach to create a data set for negotiating conversations between people. The people involved see the value of a set of things and everything, and then discuss how to divide them between them. Then the researchers used these conversations to train a recurrent neural network that can mimic human behavior for negotiation (RNN). At any point in the conversation, this model will guess what humans will say in this situation.

In the previous goal-oriented dialogue study, the models were completely "end-to-end" trained by human language and decision, which means that this method can be easily used in other tasks.

In order to make the model not only stay in the imitation of humans, FAIR researchers then turned the model to the goal of completing the negotiation. In order for the model to reach its goal, the researchers asked the model to negotiate with itself for thousands of rounds and used reinforcement learning to reward the model when it came to good results. In order to avoid having the algorithm generate its own set of languages, the model also has to train to generate the humanoid language.

To evaluate these negotiating agents, FAIR lets them chat online with humans. Most of the previous studies were to avoid chatting with real people, or to study areas that are less difficult, because answering a variety of human languages ​​requires training complex models.

Interestingly, in the FAIR experiment, most people did not find that it was not a real person who chatted with them, but a robot, indicating that the robot has learned how to smoothly communicate in English in this field. FAIR's best negotiating robots use intensive learning and dialogue deductions, and its performance can be compared with human negotiators. The deal it achieved was better and worse than the worse one, which also shows that FAIR's chat robot not only speaks English, but also intelligently considers what to say.

Reinforcement learning for chat bots

Supervised learning can mimic the actions of human users, but it can't specifically show the will to achieve goals. The FAIR team chose another method. They first used pre-training with supervised learning, and then fine-tuned the model with reinforcement learning combined with evaluation indicators. In the end, they learned how to match the language and meaning with supervised learning, and then use reinforcement learning to help judge what statement to say.

In enhanced learning, the agent tries to optimize his parameters based on his own conversations with another agent. But at the same time this other agent can also be a person, so FAIR uses a trained supervised learning model to imitate humans. This model that mimics humans is fixed because researchers have found that if the parameters of both models can be optimized, the dialogue between them deviates from the human language and evolves into a language of their own negotiation. At the end of each conversation, the agent will be rewarded based on the results of their own negotiations. This kind of reward is obtained by back-propagating all the language output in the whole process of the agent using the strategy gradient. The purpose is to make the agent have higher possibility to choose the action with higher reward.

Looking forward to higher development

This is a groundbreaking study for Facebook. For the entire field of research and robot developers, this is a significant advance in building robots that can reason, talk, and negotiate, all of which are personalized. An important part of the assistant.

For FAIR researchers, they also hope to continue to discuss research results and analyze the problems they want to solve with other researchers. They also expect more talented people to invest their ideas and energy to promote further development in this field.

via Deal or no deal? Training AI bots to negotiate, AI Technology Review Compilation