The large pre-training language model is undoubtedly natural language processing (NLPThe main trend of the latest research progress.

Although many AI experts agreeAnna Rogers's statementThe use of more data and computing power to get the most advanced results is not research news, but other NLP opinion leaders have also seen some positive moments in the current trend. For example, Sebastian Ruder, research scientist at DeepMind Point outThese large language frameworks help us see the basic limitations of the current paradigm.

Since Transformers occupies the NLP leaderboard, it is often difficult to follow the amendments so that the new big language model can set another state of the art. To help you stay up-to-date with the latest NLP breakthroughs, we have summarized research papers, including current leaders in the GLUE benchmark: XLNet from Carnegie Mellon University, ERNIE 2.0 from Baidu and RoBERTa from Facebook AI.

If these accessible AI research analysis and abstracts are useful to you, you canSubscribe to our regular industry updates below.

If you want to skip, our recommended papers are as follows:

  1. XLNet: Generalized Autoregressive Pre-training for Language Understanding
  2. ERNIE 2.0: A continuous pre-training framework for language understanding
  3. RoBERTa: A robust and optimized BERT pre-training method

Large language framework

1. XLNET: Generalized Autoregressive Pre-training for Language Understanding,byZHILIN YANG,ZIHANG DAI,YIMING YANG,JAIME CARBONELL,RUSLAN SALAKHUTDINOV,QUOC V. LE

Original abstract

With the ability to model two-way contexts, pre-training methods based on automatic regression language modeling can be implemented based onBERTThe pre-trained denoising automatic coding achieves better performance. However, relying on the use of masks to corrupt the input, BERT ignores the dependence between the mask positions and is affected by pre-training – fine-tuning differences. Based on these advantages and disadvantages, we propose XLNet, a generalized autoregressive pre-training method, which (1) learns bidirectional context by maximizing the expected probability of all permutations of the decomposition order, and (2) due to its autoregressive, Overcome the limitations of BERT. formula. In addition, XLNet integrates the most advanced autoregressive modelTransform-XL's creativity is integrated into pre-training. According to experience, XLNet is superior to BERT in 20 task.

Our summary

Researchers from Carnegie Mellon University and Google have developed a new model, XLNet, for natural language processing (NLP) tasks such as reading comprehension, text categorization, sentiment analysis, and more. XLNet is a general-purpose autoregressive pre-training method that takes advantage of autoregressive language modeling (for example, Transformer-XL) and automatic encoding (such as BERT) while avoiding their limitations. Experiments show that the new model is superior to BERT and Transformer-XL, and achieves the most advanced performance on 18 NLP tasks.

What is the core idea of ​​this article?

  • XLNet combinedBERTOfTwo-way functionAnd Transformer-XLOfAutoregressive technique :
    • Like BERT, XLNet uses a two-way context, which means it looks at words before and after a given token to predict what it should be. To this end, XLNet is aimed at breaking down the orderAll possible permutationsMaximize the expected log likelihood of the sequence.
    • As an autoregressive language model, XLNet does not rely on data corruption, thus avoiding the BERT limitation due to masking-that is, pre-tracking-fine-tuning differences and the assumption that unmasked tokens are independent of each other.
  • To further improve the pre-trained architecture design, XLNet integrates Transformer-XL's fragment replay mechanism and relative encoding scheme.

What is a key achievement?

  • XLnet outperforms BERT in 20 tasks and is generally much better.
  • The new model implements state-of-the-art performance on 18 NLP tasks, including Q&A, natural language reasoning, sentiment analysis, and document ranking.

What is the idea of ​​the AI ​​community?

What is the future research area?

  • Extend XLNet to new areas such as computer vision and reinforcement learning.

What is a possible commercial application?

  • XLNet can help companies solve a variety of NLP issues, including:
    • Chat robot first-line customer support or answer product inquiries;
    • Emotional analysis of brand awareness and awareness based on customer reviews and social media;
    • Search for relevant information in a file library or online

Where can you get the implementation code?

2. ERNIE 2.0: A continuous pre-training framework for language understandingBy YU SUN, SHUOHUAN WANG, YUKUN LI, SHIKUN FENG, HAO TIAN, WU WU, HAIFENG WANG

Original abstract

Recently, pre-training models have achieved the latest results in various language understanding tasks, suggesting that pre-training of large corpora may play a crucial role in natural language processing. Current pre-training programs typically focus on training models with a few simple tasks to grasp the co-occurrence of words or sentences. However, in addition to coexistence, there are other valuable vocabulary, syntactic and semantic information in the training corpus, such as named entities, semantic proximity and text relations. To maximize the extraction of lexical, syntactic and semantic information from the training corpus, we propose a continuous pre-training framework called ERNIE 2.0, which builds and learns pre-training tasks through continuous multitasking. The experimental results show that ERNIE 2.0 is superior to BERT and XLNet in 16 tasks, including English tasks on GLUE benchmarks and several common tasks in Chinese. Source code and pre-trained models have been releasedhttps://github.com/PaddlePaddle/ERNIE.

Our summary

Most of the most advanced natural language processing models analyze the co-occurrence of words in sentences in pre-training. However, the additional information contained in the sentence includes sentence order and proximity, named entities and semantic similarities, and the model is not captured. In Baidu researchers (by solving this problem by creating a continuous pre-training framework ERNIE 2.0Ë Nhanced ř Epresentation to K Ñ Owedge I Under ntëGration), continuous introduction and training of custom tasks through multitasking. Therefore, the model can encode lexical, syntactic and semantic information across tasks without forgetting previously trained parameters. ERNIE 2.0 is superior to BERT and XLNet in the English GLUE benchmark, setting a new level of technology for Chinese processing.

What is the core idea of ​​this article?

  • The existing natural language processing model mainly solves the word-level and sentence-level reasoning tasks by using the co-occurrence information of words or sentences, and cannot grasp other valuable information contained in the training corpus.
  • In order to fully understand the vocabulary, syntax and semantic information contained in the text, the Baidu research team introduced a continuous pre-training framework ERNIE 2.0, which gradually introduces and learns pre-training tasks through multi-task learning:
    • You can freely introduce different custom tasks at any time.
    • These tasks share the same coding network and are trained through multitasking.
    • When a new task arrives, the framework incrementally trains the distributed representation without forgetting the parameters of the previous training.

What is a key achievement?

  • According to the experiments reported in the paper, ERNIE 2.0 is superior to BERT and XLNet in the English GLUE benchmark test:
    • Compared to the BERT score 80.5, the average score is 83.6.
    • It performs better than XLNet in seven of the eight individual task categories.
  • ERNIE 2.0 also sets the most advanced level of performance for many Chinese NLP missions.

What is the idea of ​​the AI ​​community?

  • ERNIE 2.0 is a popular paper on GitHub.

What is the future research area?

  • Introduce more and more diverse pre-training tasks into the ongoing pre-training framework to further improve the performance of the model.

What is a possible commercial application?

  • Like other large pre-trained language frameworks, ERNIE 2.0 can help companies complete a variety of NLP tasks, including chat bots, sentiment analysis, and information retrieval.

Where can you get the implementation code?

  • The source code and pre-training model used in this study can be  Get on GitHub.

3. ROBERTA: A Robust and Optimized BERT Pre-Training MethodBy YINHAN LIU, MYLE OTT, NAMAN GOYAL, DU Jingfei, MANDAR JOSHI, DANQI CHEN, OMER LEVY, MIKE LEWIS, LUKE ZETTLEMOYER, VESELIN STOYANOV

Original abstract

Language model pre-training brings significant performance improvements, but careful comparison between different methods is challenging. Training is computationally expensive and is usually done on private datasets of different sizes. As we will show, hyperparameter selection has a major impact on the end result. We present a BERT pre-training replication study (Devlin et al., 2019) that carefully measures the effects of many key hyperparameters and training data sizes. We have found that BERTs are significantly under-resourced and can match or exceed the performance of each model released afterwards. Our best models achieve the most advanced results on GLUE, RACE and SQuAD. These results highlight the importance of previously neglected design choices and raise questions about the sources of improvements reported recently. We release our models and code.

Our summary

Due to the introduction of the pre-training method, the natural language processing model has made significant progress, but the computational cost of training makes it difficult to copy and fine-tune parameters. In this study, Facebook AI and University of Washington researchers analyzed the training of Google's Transcoder Bidirectional Encoder Representation (BERT) model and identified several changes to the training program to improve its performance. Specifically, the researchers used new, larger data sets for training, trained the model in more iterations, and removed the next sequence to predict training goals. The resulting optimization model, RoBERTa (the robustly optimized BERT method), matches the score of the recently launched XLNet model on the GLUE benchmark.

What is the core idea of ​​this article?

  • The Facebook Artificial Intelligence research team found BERT to be very important and suggested improving its training method called RoBERTa:
    • More data: 160GB text, not the 16GB dataset originally used to train BERTs.
    • Longer training: Increase the number of iterations from 100K to 300K and then to 500K.
    • Larger batch: 8K in the original BERT base model instead of 256.
    • A larger byte-level BPE vocabulary with 50K sub-word units instead of a character-level BPE vocabulary of size 30K.
    • The next sequence prediction target is removed from the training process.
    • The masking mode applied to the training data is dynamically changed.

What is a key achievement?

  • RoBERTa outperforms BERT in all single tasks of the Common Language Understanding Assessment (GLUE) benchmark.
  • The new model matches the recently introduced XLNet model in the GLUE benchmark and sets the latest state of the art in four of the nine individual tasks.

What is the future research area?

  • Combine more complex multitasking fine-tuning programs.

What is a possible commercial application?

  • Large pre-trained language frameworks like RoBERTa can be used in business environments for a variety of downstream tasks, including dialogue systems, Q&A, document classification, and more.

Where can you get the implementation code?

This article is from TOPBOTS,Original address