Liu Wei, BaiduNLPSenior R&D engineer, reading comprehension and Q&A technical leader. This article is based on the author's 2019 5 month 26 day in the "2019 Natural Language Processing Frontier Forum" automatic question and answer topic of the special report.
This report is divided into the following 4 sections:
· What is machine reading comprehension?
· Progress in reading comprehension technology
· More challenges in reading comprehension
· Baidu reading comprehension technology research work
1, what is machine reading comprehension?
Machine reading comprehension, everyone is no stranger to it. In the various language exams we have participated in in the past, there will be reading comprehension questions. Basically, the questions will ask the answerer to answer the relevant questions after reading a given article. If you replace the answerer with a machine, you can think of it as machine reading comprehension.
The form of reading comprehension questions is very diverse, including multiple-choice questions, answer questions, and so on. But from the mainstream academic research and technology landing, we are more concerned about the extraction of data understanding. Extracted data understanding refers to given chapter P and question Q. We want to extract answer A from P, and usually answer A is a continuous segment in chapter P. The following picture provides an passage on the Shangri-La Hotel in an article. If we ask the owner of the Shangri-La Hotel, we hope that the machine can extract the answer from Chapter P.
What is the significance of our study of machine reading comprehension? From the perspective of application value, machine reading comprehension can solve the last mile of the traditional search-based question and answer, that is, accurately locate the answer. The traditional search-based question and answer is usually that after inputting a question, the user retrieves a number of candidate documents from a large number of document sets, and segments and sorts the several candidate documents in a paragraph, and finally directly feeds the user as an answer in units of paragraphs. But usually such paragraphs contain more redundant information, and a lot of space is wasted on some small-screen devices, such as smart phones; on many screenless devices, such as smart speakers, a lot of redundant information is broadcast. Therefore, we hope to use the technology of reading comprehension to accurately locate the answer.
Thanks to the rapid progress of reading and understanding technology in the past two years, Baidu has put reading comprehension technology in the Baidu search question and answer. If you ask the question mentioned in Baidu APP, who is the owner of Shangri-La Hotel, it can return an answer directly. Of course, in addition to such entities or numerical answers, the technique of reading comprehension can help to better locate some long answers. In the example, "How does the Tang Dynasty die?" "How does the fried fish not stick?" problem.
2, advances in reading comprehension technology
The left test below is the best model on Stanford's leaderboard for about two years after the SQuAD dataset was released.F1It has increased by 80%; the right side of the figure below is the DuReader dataset released by Baidu. The best system on the list, ROUGE-L, has increased by 75%. It can be seen that technological progress is very fast, and there are two main reasons for these technological advances: one is the increase in the size of the data, and the other is the application of deep learning technology.
First, let's look at the change in data size. Before the 2016 year, the larger data set. For example, the MCTest released by Microsoft only contains the remaining questions of 2600. The SQuAD data set marks the 10 million problem by crowdsourcing. The scale has increased by two orders of magnitude. After that, Microsoft released MSMARCO and Baidu's DuReader also included 10 million and 30.
The increase in data size also makes the method of deep learning a rapid improvement in reading comprehension tasks. We can see that before 2016 years, everyone used more statistical learning methods, including a large number of feature projects, which was very time consuming and labor intensive. After 2016 years, after the release of the SQuAD dataset, some attention-based matching models emerged, such as BiDAF,LSTMand many more. After that, various models with complex network structures appeared, and related work tried to capture the matching relationship between the problem and the text through a complex network structure. Although some complicated feature engineering was skipped at this stage, it seems that we are stuck in more complicated network structure engineering.
After 2018 years, with the emergence of various pre-training language models, the effect of the reading comprehension model has been greatly improved, because the ability of the presentation layer has become very powerful, and the task-related network structure has become simpler.
3, more challenges in reading comprehension
After the emergence of the pre-trained language model, some simple reading comprehension data sets have been better solved, such as SQuAD1.1, and more data sets have been proposed for different challenges of language understanding. This includes SQuAD2.0, which introduces some unanswered questions; there are also multi-document data sets for search scenarios, including Microsoft's MARCO and Baidu's DuReader; and CoQA and QuAC for dialogue scenarios; Cross-chapter reasoning dataset HotpotQA; in addition to the recent two years of fire, I hope to introduce external knowledge to do reading comprehension data sets, such as ReCoRD, CommonsenseQA and so on.
4, Baidu reading comprehension technology research work
I will focus on Baidu's research work on multi-document reading comprehension V-NET, and the work on introducing external knowledge reading comprehension tasks, KT-NET, which was published last year and this year's ACL.
First, let's introduce the multi-document reading understanding work for search scenarios. There are two differences between the multi-document reading comprehension task and the single-document reading comprehension task for search scenarios:
First, the problems here are real problems from the search scene;
Second, each question contains multiple candidate paragraphs.
This feature poses some challenges for language understanding because each question contains multiple candidate documents and may contain more ambiguous and confusing information. From the example below, this question is the difference between cell mixed culture and pure culture. The keyword Culture here usually means culture. In fact, this question refers to the cultivation of cells. It can be seen that among several candidate documents, some contain cultural meanings, some contain the meaning of cell culture, which brings certain challenges to the prediction of reading comprehension models.
Although there are such challenges, if we look through careful observation, we will actually find that some candidate documents contain only some information related to the correct answer. If we extract a candidate answer from each document, let these answers be between Mutual verification or voting with each other may help us better locate the correct answer.
Based on the previous idea, we propose a model V-NET for multi-document reading comprehension. The main innovation of V-NET is that based on BiDAF, the answer verification based on attention mechanism is introduced. As you can see from the figure below, the first three layers use BiDAF to extract the answers for each document. After extracting the answer, we hope to get a representation of each answer on the fourth level. On the last level, we hope to verify the answers between each other under this representation, and then more accurately locate the answer. At the same time, we can also see the last three layers of the model, and actually specify their respective tasks. We can further introduce joint training.
The following picture shows the experimental results. We did experiments on MSMARCO. The effect of V-Net surpassed that of R-Net and S-Net. Whether it is the verification of multiple answers, joint training or the representation of answers, the model is obtained. Positive gains. At the same time, the model won the first place in the MSMARCO dataset rankings three times last year.
The second job, introducing reading comprehension of external knowledge, is a paper we accepted by ACL this year. The so-called reading comprehension of knowledge means that we hope to not only rely on the content of the document for understanding while doing reading comprehension, but also need some external knowledge as a support, so that we can answer the question correctly or answer the question better.
For example, the question on the left side of the figure below is "Who is the masterpiece of "Who is on the Road"?" The answer based on the reading comprehension of the textual representation is "Li Wei." Here we can observe that the model can better capture the type of answer. But when a fragment contains multiple candidate answers of the same type, the model is very easy to make mistakes. So in this example, the model does not extract the correct answer "Xu Wei", but treats "Li Wei" as a correct one. answer. If we can get some external knowledge from the knowledge base, for example, we can tell us that Xu Wei is an actor, it is very likely that the model can correctly judge the answer through the relevant information of the actors and representatives.
Based on this idea, we propose a model for the fusion of text representation and knowledge representation, called KT-NET, where K refers to Knowledge and T is Text. In KT-NET, the first step is to use a pre-trained language model to encode and represent each word in the question and document; in the second step, we use some traditional methods to pre-train the relationship or entity in the knowledge base. In the third step, we will retrieve some text-related knowledge from the knowledge base and obtain a pre-trained representation of the knowledge. Because these related candidate knowledge is more, we further hope that through the attention mechanism, the most relevant knowledge and text representation will be merged, and a knowledge-enhanced text representation will be obtained on the basis of fusion. Finally, Such a knowledge-enhanced text representation predicts answers based on the basic structure of KT-NET.
We also experimented on two data sets, ReCoRD and SQuAD, and finally found that KT-NET achieved better results than BERT_large on both data sets. The work related to this paper will also be open sourced on GitHub in the form of PaddleNLP.
In addition to continuous investment in technology, we also hope to use Baidu's data advantages to promote some advances in Chinese machine reading comprehension technology. So last year we released the Chinese reading comprehension dataset DuReader 2.0 for search scenarios. Compared to the SQuAD dataset, DuReader has four main characteristics: First, the problems in the DuReader dataset are all from the real problem of the search; secondly, the data The documents in the document are real documents from the whole network; third, the DuReader dataset is currently the largest Chinese dataset; fourth, the dataset contains a wealth of answers and question type annotations. Because the DuReader dataset contains a wealth of problem type annotations, DuReader includes viewpoint classes and class problems in addition to the SQuAD dataset, in addition to entity classes, data classes, description classes, and fact classes. Currently, this dataset also Can be downloaded through the Baidu brain open data platform.
Our original intention was to make Chinese reading comprehension technology progress. Therefore, in the past two years, we have jointly organized a reading comprehension evaluation task with the Chinese Information Society of China and the Chinese Computer Society. In the evaluation of last year, 1000 teams were invited to register, and 1500 results were also submitted. This year's competition continues, attracting multiple teams from 2000 to sign up. This year we also provided a baseline system based on PaddlePaddle. All contestants can use Baidu for free on AI Studio.GPUThe computing resources to train your own models.
After talking about Baidu's research on machine reading comprehension technology and some application landings, I finally want to look at the aspects of Baidu's reading comprehension technology from the perspective of industrial application requirements.
(1) requirements for model robustness in industrial applications
The deep learning model used now has many problems of model instability and robustness, including over-stability and over-sensitivity. The problem of over-stability mainly refers to the semantic change of the problem, but we found that the answer to the model prediction has not changed. Over-sensitive questions mean that the problem has changed, the semantics have not changed, but the answer to the model prediction has changed. Usually, a simple question mark is added to the question, and the answer changes. Whether it is a stable problem or a sensitive problem, it is not what we expect, especially in the application, which is very influential to the user experience. In the past two years, there have been some studies in the academic world to try to solve the problem of model robustness by confronting the generation of samples and the way of retelling the problem.
(2) requirements for model generalization capabilities in industrial applications
In the past two years, we have seen a lot of news, that is, an institution has surpassed humans in a certain data set. But in fact, think about it, we will wonder if it is really solved a certain task? Actually not, we can only say that we have solved a certain data set better, instead of really solving a certain task. In industrial applications, for example, I trained a model on an encyclopedia dataset. When applied to know the dataset, we expect to get a good effect by introducing only a small amount of annotation data. Move to a new field. Later, we will also invest more resources in this area, hoping to really improve the generalization ability of the model, so that these models can achieve better results in industrial applications.
Finally, we summarize the technical work of reading and understanding in Baidu. V-NET is a multi-document reading comprehension for search scenes. KT-NET is a fusion model of knowledge representation and text representation, and a DuReader dataset for large-scale Chinese reading comprehension.
At this point, the "2019 Natural Language Processing Frontier Forum" automatic question and answer topic "Baidu Reading Comprehension Technology Research and Application" is over, and the next issue will share new topics, so stay tuned.
Under the mission of “Understanding Language, Having Intelligence, Changing the World”, Baidu Natural Language Processing (NLP) develops core technologies for natural language processing, creates leading technology platforms and innovative products, serves global users, and makes the complex world more complex. simple.
This article is transferred from the public number Baidu NLP,Original address