This article is reproduced from the public number - AI front line.Original address

AI frontline introduction: In recent years, natural language processing technology has made great progress, becoming one of the most widely used and most mature AI technologies. But in fact, natural language processing technology has made little breakthrough in commercial application, and there are not many products that are truly successful or can perfectly meet people's daily needs.

Looking back at 2018, there is no shortage of progress and surprises in the field of natural language processing, such as Google’s BERT The model breaks the 11 world record. However, the predictions made by some experts at the beginning of the year are basically the same, and the slow progress in this area has largely not improved.

So, what is the problem with natural language processing technology in commercial applications? Why haven’t there been any major progress? Where is the key to solving the problem?

This article is a commercialization of the AI ​​front-line year-end inventory of natural language processing technology series articles, which will be applied to the most cutting-edge companies in various industries through several representative natural language processing technologies: Keda Xunfei, Jingdong, Tencent, Xiaomi and Yuxin. To explore the current state of natural language processing at the commercial application level, I hope to be able to break through the current bottleneck.

For more dry goods, please pay attention to the WeChat public account "AI Frontline" (ID: ai-front)

Commercial application status

Level of development: basic available vs medium forward

What is the current state of natural language processing technology in industrial applications? The answers given by the experts are slightly different for this question.

First of all, five experts have agreed that the application of natural language processing technology in the industry has been very extensive. According to Keda Xunfei, in contrast, speech technology, as a representative of perception, has a relatively high level of application and is in a good state of use;NLP To see the specific tasks, the actual application performance of many tasks is also good. In general, the NLP industry application is basically available, and there is still a long distance from the true sense of use.

Xiaomi believes that natural language processing technology has a wide meaning. In general terms, NLP and language technology are at a medium-to-front level in current industry applications.

In contrast, Yuxin, who has cooperated with Keda Xunfei, frankly pointed out that speech technology has made great breakthroughs in recent years due to the development of deep learning, which makes speech recognition, speech synthesis, voiceprint recognition and other technologies Can be applied to the industrial field on a large scale. However, it is still impossible to have ears and mouth. The most important thing about this intelligent system is to have a brain. If you listen to it, you need to understand it.NLU), then how to reply through your knowledge accumulation and business appeals (NLG). The core technology of this brain is the NLP related technology. However, frankly speaking, the application of NLP in the industrial field is still in the early stage. Unlike the visual field, there are already unicorn companies with specific benchmarking effects in the industry, such as Yushi and Shangtang. Many intelligent interactive systems can only do simple tasks in very narrow areas, or have artificial mental retardation problems that are often criticized. NLP technology is difficult, but it has a long way to go.

Main application areas and application scenarios

As experts say, natural language processing technology has been widely used in the industry, including e-commerce, translation, finance, intelligent hardware, medical, travel, accommodation and other industries. Application scenarios cover speech analysis, text analysis, and emotion. Analysis, intent analysis, graphics and image recognition, interactive voice response, optical character recognition, etc., have been applied to public, private and hybrid clouds. IT and telecoms, medical, retail, banking, etc. are natural language processing technologies. End user.

In these fields and scenarios, the more mature commercialization of natural language processing technologies includes search engines, personalized recommendations, advertisements, knowledge quiz/skills in the field of subdivision, etc. Commonly used natural language processing methods include word segmentation and text classification. , keyword analysis, sensitive term recognition, word frequency statistics, opinion induction, sentiment analysis, etc. However, although many companies advertise their own use of these technologies in their products, there are only a few companies that actually generate practical value and bring visible benefits to the company.

The AI ​​front line has further understanding the current application and commercialization of natural language processing technology in enterprises through the exchanges with five companies at the forefront of natural language processing technology: Keda Xunfei, Jingdong, Tencent, Xiaomi and Yuxin. The application scenarios of the five industries correspond to translation, intelligent dialogue and customer service, intelligent question and answer, intelligent hardware and finance.

A list of typical applications of natural language processing technology from five companies:  

Commercial status

So far, relying solely on NLP or voice technology to realize it is still a difficult task. In the interview, the five companies all said that there is no calculation method for the income of these two technologies in the company's internal financial report, and even whether the revenue is attributed to the dispute between the two technologies, lamenting that the voice is a very low price. , too far from the money.

As a result, the AI ​​frontline does not have data that can visually show the specific benefits of both technologies.

However, we can learn about the commercialization status of some smart hardware sales that are closely related to these two technologies.

For example, according to the 2018 China Smart Speaker Report released by the world's authoritative market research organization Canalys, Alibaba's Tmall Elf shipments 220 million, ranked first, Xiaomi Xiaoai speaker shipments 190 million, Ranked second, Baidu's shipments 100 million, ranked third. Among them, as of the end of 10, Xiaomi Xiaoai students accumulated 80 billion times to wake up, and Nissan 3400 million also indirectly brought benefits to the company.

According to media reports, the intelligent transfer robot "Xunfei hears M1" released by HKUST has been officially pre-sold. In the smart hardware previously launched, 2.0, the Xunfei translator, has sold more than 5 in monthly sales. Liu Qingfeng, chairman of Keda Xunfei, said that the company's 20C business has doubled its business and its cash flow is also very good. It has accounted for more than 2% of the company's sales revenue, and its gross profit is close to 30%. “Although the fierce market competition will make the translators There should be some small fluctuations in the price of hardware products, but the overall translation opportunity is maintained at 40% gross margin, and the price reduction range is within 50%~5%."

In addition, a set of data from the data analysis website statista also reflects the general state of the natural language processing global market. The data shows that in 2018, the global market for natural language processing reached 5.8319 billion, and by 2024 it will reach approximately 21 billion.

So, in the eyes of the industry, is the progress of commercialization or landing of NLP and voice technology successful or failed?

In this regard, the five experts gave their own different answers, but in general they affirmed the progress made in the NLP and voice fields in recent years, indicating that natural language processing technology is basically available, although far from perfect, But optimistic about its future commercial application prospects.

Yu Xin said that the first breakthrough in voice led the rise of AI. This year, based on the ELMo model, it has a good effect in many scenes, and the recent mad model of the BERT model proposed by Google in 2 months has repeatedly shown that the model migration based on reasonable model structure has great development prospects. And the continuous investment in technology will also yield fruitful results.

Keda Xunfei said that the progress and breakthrough of technology is not a one-off effort, but still believe that the vast number of AI researchers around the world can bring surprises to everyone. Regarding the commercialization level, it also involves a series of issues such as the market, the consistency of supply and demand, including foreign companies such as Google and Facebook, and domestic BATI companies, which have done a lot of exploration in various vertical fields, and there have been many progress. In general, HKUST is very optimistic about the application of AI technology. Although it still faces some technical challenges or commercial resource bottlenecks, it is believed that with the soundness and rigid demand of the AI ​​market mechanism Growth, the broader application of AI will come soon.

Xiaomi is more confident in natural language processing technology. In general, these two technologies are relatively successful in commercialization. Just because NLP technology is a supporting technology in many commercial fields, many commercial successes are not. Credit for the NLP. In fact, in recent years, the field of machine translation and text generation has made great progress. Of course, compared to other areas, the NLP field is not so large, on the one hand because the relative level of the NLP field is already relatively high (relative to images and other fields), on the other hand, the existing data-driven approach is natural. The semantic understanding of language is also quite difficult. Xiaomi believes that to truly break through, it is necessary to understand natural language.

Tencent believes that it is still necessary to find a scene. Taking vision as an example, everyone seems to be very mature in face technology. Every company has achieved 99% accuracy, but the initial commercialization is still in the security field. The products are more mature and gradually penetrated into other areas. industry. Natural language processing technology I believe also needs such a process of market cognition and acceptance. In the past two years, I have seen a lot of commercialized products, including intelligent customer service, knowledge maps, information extraction, etc., and the market fever is gradually increasing. Tencent said that it is very confident. Of course, on the one hand, the relative vision of NLP technology is not particularly mature, which requires the production, study and research to explore and improve together. On the other hand, in the commercialization aspect, it is also necessary to find some core scenarios and grasp the hand, and break through point by point.

Jingdong also believes that the current natural language understanding technology has been greatly improved by the deep learning, whether it is the head company in the field of artificial intelligence, or the innovative enterprise based on artificial intelligence high-tech, is exploring the nature. Language application technology new application scenario innovation. For example, Jingdong is currently based on leading natural language understanding technology, combined with Jingdong retail, logistics, financial and other full value chain scenarios and accurate data to create an industry-leading problem-solving intelligent dialogue system. Combining the development of Jingdong Emotional AI and knowledge maps, it also cultivates the value of practical applications such as temperature intelligent customer service, high-precision AIOT dialogue service, and large-scale personalized content generation, including pre-sale, sale and after-sale. Landing and exploring brings great value to JD's own main business.

However, natural language processing techniques also face a major difficulty in that specific scenarios require specific models. The universal language understanding model is the key to this problem.

How is the NLP and voice application deadlock broken?

In the middle of this year, Reddit's last heated debate on natural language processing technology breakthroughs left a deep impression on the author. In this discussion, some people think that the breakthroughs in the NLP and voice fields are somewhat disappointing, and the hotspots of people’s research have turned to GAN And intensive learning, it is also believed that the NLP and voice fields have become one of the most widely used and mature AI technologies so far, and the progress achieved is obvious to all.

But what everyone agrees in the comments is that the breakthroughs in the field of natural language in recent years are really lacking. Why is it difficult to break through natural language processing?

State-of-the-art understanding, reasoning models, and their respective flaws

Reading comprehension: Reading comprehension is designed to examine the precise response of a machine to a corresponding question in the case of a given chapter. Data sets such as Stanford's SQuAD are driving research in this area. With the latest deep learning methods based on attention mechanisms, the accuracy of this type of reading comprehension problem can already be done very high.

defect: The current reading comprehension still extracts the answers in the chapters given the question, and the dependence on the training data is relatively high. The machine has a long way to go through the use of all kinds of knowledge to achieve a true understanding of the chapter.

reasoning: The traditional symbolic logic reasoning industry has been studying, but the progress is relatively slow. On the one hand, because the field of reasoning is very extensive, the problem of how the machine possesses knowledge has not been solved very well. Therefore, when solving the problem of actual reasoning, it often appears to rely heavily on artificial expert knowledge and lacks good generalization. Sex.

In recent years, with the development of deep learning, some work of approximate reasoning has also been paid attention to. For example, the work represented by the knowledge map representation represents the knowledge map in the low-dimensional continuous vector space, and learns the relationship between concepts or entities through the semantic connection of the vector space, and performs shallow relationship reasoning accordingly.

The most advanced model currently open should be the DIIN mentioned in the ICLR 2018 article.NLI (Natural language reasoning) The task is mainly to judge the logical relationship between two sentences, whether it is implication, mutual exclusion, or irrelevant, and generally translates into the classification problem of sentence pairs. The DIIN model framework follows a common framework for characterization learning + interactive matching, detailing multi-head attention Used to the extreme.

defect: Approximate reasoning is currently more popular, but the overall situation is mainly based on relatively shallow reasoning. When DIIN involves time, combing, and conditional judgment, the accuracy rate drops to about 50%. As far as the NLI task is concerned, the current model has been able to achieve 80-90% accuracy in most cases, but for the long tail case, rules and regulars are better handled.

Typical technical problems and solutions

Problem 1: based on CNN versus LSTM The model's question-and-answer model often converges to a predicted answer in only half of the “read” question. This means that the model relies heavily on the surface correlation of the training data, lack of combinatoriality, and biased results.

Solve the idea 1: The model relies on the surface correlation of training data, which is essentially the common problem of current deep learning technology in the process of sequence modeling, which is essentially a generalization problem of the model.

To overcome such problems, it is necessary to study a relatively universal and universal semantic model, which has strong generalization ability, so that the semantic model can be used as the core foundation to provide semantic support for unique models in various fields. To design and implement this type of model, we need to solve the problem of unsupervised semantic learning. This type of work is currently being paid attention to in academia, such as ELMo and BERT. (Keda News Flight Expert)

Solve the idea 2: General semantic embedded large-scale pre-training may be a way to prevent overtraining. There is also a need for new synthetic models. Another approach is to build more complex, large-scale, real-world tasks, such as open dialogue or goal-oriented dialogue-related complex structural tasks, such as sales conversations, customer service conversations. These tasks will promote the emergence of more ubiquitous and more comprehensive models, because in these complex tasks, the surface-related models are not very effective, and there is a need to synthesize information models. (Jingdong expert)

Solve the idea 3: First, judge whether the model is over-fitting or under-fitting through the performance changes on the training set and the test set. If it is under-fitting, it means that the model does not capture the feature well. In this case, it should be strengthened at the feature level to deepen the model; if it is over-fitting, the complexity of the parameter exceeds the complexity of the data. Common practices include data enhancement, regularization, model simplification, dropout, early stop, and more. (Tencent expert)

Solve the idea 4: Specifically, when modeling questions with CNN and LSTM models, it is easy to capture words in the top of the questions directly related to the answer, such as “type” and “topic” types, regardless of the length of the input sequence. The attention model or CNN filtering is easy to revisit these words, causing half of the "read" problem to converge to a predicted answer.

Possible solutions include:

  1. Try to cover the attention of the coverage, the main idea is not to let some words (such as the words in the top of the question) repeatedly concerned about;
  2. use transform Instead of CNN or LSTM, the transformer's self-attention can model the dependency information inside the sentence, and can also capture long-distance dependencies, which is better than CNN and LSTM on most NLP tasks. (Millet expert)

Solve the idea 5: In text processing, CNN network is more suitable for feature extraction in short text. Although LSTM network was born at the beginning to solve the problem of gradient explosion and disappearance in long sentence training, it does show very much in relatively long sentences. Good results, but for question-and-answer data, too long text causes LSTM to forget the information before the sentence in training and can't add the feature of the problem to the network for training during training, which results in poor performance.

At present, the question-and-answer model is more inclined to use the attention-based network. The most typical one is the BERT network, which relies entirely on the attention mechanism. The other is based on the structure of the memory network, like the Deep Memory Network, which calculates the entire sentence through the memory network. Attention weight. (trust expert)

Problem 2: Modern NLP technology excels in benchmark data sets, but language understanding and reasoning for unknown natural language texts are far less than human levels. **

Solve the idea 1: This is similar to the first question and is also the generalization ability of the model. In addition to trying to design a common unsupervised semantic model, you can also focus on how to introduce various types of human knowledge into the machine's modeling learning process. Human knowledge is relatively sparse and abstract, but it has the function of associating various concepts and entities, so if the machine has the ability to learn about human knowledge, it will be more intelligent when dealing with unknown signals. (Keda News Flight Expert)

Solve the idea 2: We need a large-scale, attention-oriented structured attention mechanism, a universal language understanding model, which is the path to NLP intelligence at the human level. I think recent developments such as HAN, BERT, Transformer, and Elmo are also the way to solve this problem. (Jingdong expert)

Solve the idea 3: On the one hand, machine language understanding is really not as good as human beings. When humans understand language, they will call information outside of many languages, such as common sense, which is equivalent to joint operations by sea, land and air. The model currently only knows how to use a specific pistol, and we should have correct expectations. . On the other hand, on similar tasks and data, you can consider migration learning or pre-training models to get started quickly. (Tencent expert)

Solve the idea 4: Being good on the benchmark data set does not mean performing well on unknown data. This is the direction of the machine learning field. Existing NLP technologies are mostly based on machine learning methods, so this is one of the efforts of NLP technology. This problem is indeed very difficult. For NLP, perhaps fusion knowledge (including linguistic knowledge, human common sense, domain knowledge, etc.) is a direction to alleviate the above problems. (Millet expert)

Solve the idea 5: This problem is not limited to the NLP field. If the model trained in any scenario migrates to a new unknown domain, it will encounter a sharp decline in model performance. We need to build a system or framework that can be iterated quickly to solve problems in the unknown. The pre-trained model starts with a cold start, collects samples, continuously monitors the performance of the model, and then iterates the model quickly. (trust expert)

Question 3: How to fully measure the performance of the model on new, never seen input data? Or how to measure the generalization ability of the system? And how to deal with data distributions and tasks that have never been seen before?

Solve the idea 1: It can be measured from the performance of the model on multiple different tasks, that is, the evaluation set is expanded; on the other hand, the model can be applied to different systems for evaluation.

Strictly speaking, the unknown data in the field of natural language understanding is difficult to model or presuppose because of the huge space. Therefore, whether it is from model evaluation or model training, modeling learning of unknown data or information, They are still a big problem. (Keda News Flight Expert)

Solve the idea 2: For neural networks, there is no good theoretical guidance. The best paper of ICLR`17 gave "Understanding Deep Learning Requires Rethinking Generalization" and also explained the academic community's attention and discussion on the generalization ability of deep learning. This article has done a lot of experiments, indicating that deep neural networks are somewhat " Memorize the input sample. This article also believes that classical statistical machine learning theory and regular strategy are difficult to explain the generalization ability of deep networks. At present, there is still no accepted answer. (Tencent expert)

Solve the idea 3: At present, it is very difficult. For data distribution and tasks that have not been seen before, you can try to use migration learning and other methods to migrate the rules obtained on other data to new data or new tasks. (Millet expert)

Solve the idea 4: First of all, there must be a very reasonable and intuitive evaluation index, and secondly, it needs to be fully tested on multiple batches of data across time windows or across scenes to avoid over-fitting. (trust expert)

Problem 4: For the training of machine translation models, we use a loss function such as cross entropy or expected sentence level BLEU to train the model. These functions have been shown to have preferences and are not sufficiently relevant to human judgment. Reinforcement learning seems to be a perfect choice for NLP because it allows the model to learn human-like surveillance signals ("rewards") through trial and error in a simulated environment. However, intensive learning can't completely solve all the problems of NLP. What is the main problem when strengthening learning in NLP? How to deal with it?

Solve the idea 1: The main problem of reinforcement learning in the field of NLP is the identification of the reward signal. Since natural language is very sparse and belongs to non-continuous signals, how to define reward functions has always been a common problem in the field of natural language processing. Including the BLEU score of machine translation, the ROUGE score of the document abstract, although it is a relatively objective indicator, it cannot directly represent the semantic evaluation of human beings. So this question is essentially how to objectively assess or define semantic issues. I personally think that reinforcement learning is not yet a good application in the NLP field. (Keda News Flight Expert)

Solve the idea 2: Reinforcement learning is a very powerful and promising tool in the NLP space, but it does not solve all NLP problems. For example, reinforcement learning can't tell you what the best loss function is because it must be language/task related. And the classic RL algorithm needs to be extended to solve language problems, such as how to deal with a near-infinite action space (such as linguistic space), how to deal with an open system that can't easily imitate the environment, and how to define rewards in different dialogue tasks. How to effectively train RL with a small number of samples, how to model the language and let the training converge quickly. (Jingdong expert)

Solve the idea 3: The sequence decision characteristics of reinforcement learning are very compatible with some tasks of NLP, such as dialogue generation, abstract, translation and so on. To use reinforcement learning in NLP, we must first look at the problem of being suitable for transformation into a reinforcement learning, and whether we can define the basic elements of agent, environment, action, and reward. In addition, during the training, the reinforcement learning is relatively unstable, and it is necessary to keep the attention to the algorithm learning process at all times, whether it is in a reasonable state. When you are not sure, you can simplify the interactive environment, the reward function, and even observe how the random strategy works under a specific setting. The rest are some specific techniques, such as trying multiple random number seeds, doing data standard processing, and so on. (Tencent expert)

Solve the idea 4: One of the main difficulties in reinforcement learning is the definition of reward functions, as well as for NLP. Human beings are good at qualitative judgment, but machines need to be quantitative when learning, and qualitative to quantitative changes are very difficult. To solve this problem, we still have to work harder on the evaluation function. (Millet expert)

Solve the idea 5: The paradigm of reinforcement learning is flawed for traditional loss functions such as machine translation, sequence generation, dialogue system, and chat robot in NLP, but it is easier to define the dynamic system of reward function. However, some problems of strengthening learning itself, such as low sample utilization efficiency, unstable environment fitting and unstable training effects, are also the pain points of applying reinforcement learning in NLP. (trust expert)

Question 5: Why is the NLP model so difficult to handle in a simple common sense everyday scene? How should it be improved?

Solve the idea 1: This is actually not a problem unique to the NLP model. Even in voice, images, etc., it is still impossible to achieve good pattern recognition in some daily scenes. We think that the simple daily scene is mainly because we have accumulated all kinds of life experience, common sense and professional knowledge from small to large, but the machine or single model or system has “seen” or the training used very little information, resulting in daily processing. The scene is more difficult. How to give machine common sense reasoning is a common goal in the field of artificial intelligence. It needs to solve a series of problems such as how to define or build common sense knowledge, how to let the machine understand common sense and learn flexible reasoning. (科大讯飞)

Solve the idea 2: This is because we lack common sense databases, good common sense representations, and the semantic understanding of common sense in specific areas, and therefore affect decision making. In addition, most of the NLP benchmark evaluation criteria now do not include common sense. Therefore, the best way to learn common sense is to use more complex, large-scale, real-world tasks to promote research and technological advancement, and to motivate researchers to invent algorithms that can benefit from a good common sense model. (Jingdong expert)

Solve the idea 3: Some daily scenes have been done better, such as emotional classification, knowledge-based questions and answers in the open field. Of course, the versatile NLP model of general knowledge is not yet, and it is unlikely to have it. The individual's opinion still needs a task and a task to do well. (Tencent expert)

Solve the idea 4: Behind the daily scenes contain a variety of complex contextual features (such as weather, location, time of day, human emotions, etc.), and the machine is difficult to formally describe it well. The mechanism of human reasoning is very complicated, and the machine is still difficult to simulate. One possible improvement is to increase the training data in large quantities, and to consider integrating knowledge and data for understanding and reasoning. (Millet expert)

Solve the idea 5: The colloquial and open features of everyday scenes are very difficult in natural language processing. First of all, colloquial sentences generally have irregular grammatical structures, more modalities, and more difficulty in analysis and modeling. Second, the daily scenes are open and there is not enough knowledge base to support the modeling of daily scenes. In the opening of related products or systems, we need to work on these two points: to strengthen the work of text normalization of colloquial statements, and to reduce the openness of the scene through process guidance and product design.

In addition, other issues that need to be addressed in the field of natural language processing are task-driven dialogue systems, insufficient language resources and biases, predicting worst-case case performance, domain adaptation, meaningful text and dialogue generation, migration learning, and long-term goals. / Task-oriented human-machine dialogue, methods of collecting data, referential digestion, word sense disambiguation, text abstraction, democratization, etc. These are topics of concern to the industry, solving these technical problems, and commercialization applications can be more Smoothly carried out.

Typical application problems and solutions

Problem 1: In the field of machine translation, most of the current machine translation solutions are not perfect for the translation needs of everyday or important occasions. Take the “car accident” that Tencent Translation Jun appeared on this year's Boao Forum as an example. Let us realize that the current translation products still have a long way to go before they are truly available.

Solutions: In this regard, some experts believe that the concept of "human-machine coupling" will be regarded as the key to the future AI landing. The essence of the so-called "human-machine coupling" refers to the efficient division of labor between people and machines. In layman's terms, people go to do intellectual work that people are good at, and machines do the calculation work that the machine is good at. This is different from the usual artificial intelligence. The usual artificial intelligence wants to solve all the problems in a task by the machine, but often in many fields, the machine can not solve all the problems efficiently. At this time, people need to be involved, and work with the machine to complete a big thing.

The reason why man-machine coupling will become the key to AI landing is that the current artificial intelligence system is limited in some aspects, such as understanding and reasoning, and needs the help of manual experts. This requires a guidance and adaptation. the process of.

The application of natural language processing technology is limited, partly because of the current AI technology bottleneck, but compared to a few years ago, the products related to natural language processing technology are emerging one after another. These products have some characteristics: mining user needs, addressing user pain points, and technical reach. In-depth understanding of the development trend of various AI capabilities, combined with different scenarios, local conditions are the key.

Problem 2: In the actual experience, NLP technology in the intelligent dialogue robot system still faces great difficulties in understanding long sentences and understanding people's intentions. It is difficult for users to have a good experience.

Due to the dislocation characteristics of Chinese language, long sentence understanding has always been a problem in NLP. Although the academic community has done a lot of efforts, at present, there is still a distance from practicality. It is not realistic to solve this problem technically in the short term. Consider using some dialogue techniques to improve the user experience. It is also difficult to understand all aspects of intent, but you can build some important areas by building and using domain knowledge bases. As Yu Xin said, you can complete the 80 score first, and then slowly optimize the experience of the remaining 20 points.

Others believe that the solution to such problems does not necessarily start with technology. Consider designing corresponding product interaction logic from the product level to avoid unnecessary problems. It is necessary to know that the natural language comprehension ability of the current system cannot truly reach the human level. If the interactive mode is set too open, it will often bring many technical problems that cannot be solved. Therefore, solving this problem requires technology and products to go hand in hand and work together.

Problem 3: Another hot application is voice interaction. However, although there are many voice recognition software and voice interaction products on the market, such as smart speakers, the so-called intelligent voice products such as smart speakers can solve the problem at present. Simple, performance in complex scenarios and problem handling is not optimistic. So far, it seems that there has not been a real "voice interaction" product.

The concept of voice interaction was first implemented on the mobile assistant, and basic assistant operations were completed in the early days. But this is not a true voice interaction product. The ideal voice interaction should satisfy the voice communication between people. From the current point of view, there is still a long way to go. From the mode of development, the landing work of voice interactive products will also be carried out in the same way as voice recognition. With the continuous breakthrough of technology, the data is iterated continuously, forming a breakthrough from quantitative change to qualitative change.

Xiaomi has also added from another angle: According to Xiaomi’s chief architect, Cui Baoqiu, vice president of Xiaomi’s artificial intelligence and cloud platform, true AI products (including voice products) must be ubiquitous, omnipotent, and growing. As the shadows follow. To do this, big data, big knowledge, and big calculations are indispensable. In addition, personalization and self-learning are also essential attributes of such products.

Yu Xin also believes that true voice interaction is what you think, but it is not necessary, and it cannot be done entirely based on language understanding technology. The academic community has always studied the fusion of images, words, behaviors and other data, and some demos are also very eye-catching. Therefore, in the financial field, Yuxin uses the knowledge map constructed by global data, and continuously cooperates with partners in various technical fields, including cooperation in the fields of voice and vision, in order to create a truly intelligent and accessible interactive experience. product.

Future trend

NLP and voice technology move from independence to convergence

With the continuous development of natural language processing technology and the changing needs of users, some people believe that natural language processing technology has emerged from independent technology to integration and cooperation. In this regard, the five experts unanimously agreed.

The ability of AI must not be independent of each other. It is inevitable that natural language processing technology will move toward integration, just as multiple functional areas of the human brain work together. Xunfei has always had a layout in this respect. 2015 launched the industry's first full-duplex voice interaction system AIUI at the end of the year, which has combined the voice technology and semantic understanding technology to enhance the interaction effect. 3 has been at home for many years. There are successful cases in the fields of automobiles, home appliances, and customer service. For example, voice translation technology, a research trend is the end-to-end translation technology between the source language speech and the target language text, without the traditional multi-module pipeline process (first speech recognition after machine translation), this end-to-end direct The direct benefit of voice translation is the mitigation of the erroneous cascade of the original process.

Other trends

Yuxin:Smart question and answer and voice search will be the next trend. First of all, the big Internet giant will use voice search as its portal portal. Secondly, intelligent question and answer (including natural language understanding, natural language generation and dialogue management system) combined with intelligent customer service will greatly improve the efficiency of customer service. These two needs are just needed, and they are also the areas that are being tackled by industry, academia and research.

HKUST News:The trend in NLP and voice is unsupervised learning. The current deep learning technology can already realize the full utilization and learning of data and solve specific problems one after another with a large amount of supervised data. With the passage of time, the “oil and water” with supervised learning has become less and less. How to make the machine self-taught, learn independently, and realize unsupervised learning in the true sense is the key to the next step. It is expected that there will be breakthroughs in the semantic modeling learning of natural language and the autonomous learning in the process of human-machine dialogue, and it can be kept concerned.

a scene with great potential for commercialization in the future

HKUST News:Voice interaction with personalized features and coverage of the entire scene, the future has a great opportunity to become the main human-computer interaction mode, with broad commercial application potential; at the same time, artificial intelligence technologies such as voice, image, NLP can also help simplify many traditional industries. Workflow and improve work efficiency.

Xiaomi:NLP technology still has great commercial application potential in scenarios such as search, recommendation, Q&A and dialogue.

Yuxin:In the future, the integration of dialogue understanding and knowledge mapping will be deeper and deeper, and the depth of its connotation will be sufficient to support the business vision.