This article was transferred from the public AI Technology Base Camp.Original address

AI is very hot, but the threshold of AI is also very high, and it is still difficult for ordinary developers to catch this wave of AI dividends.

Recently, artificial intelligence company launched a new generation of intelligent Bot open platform, which integrates the three core capabilities of small robots Chatting Bot, FAQ Bot and Discovery Bot to provide intelligent robot service + artificial online service + intelligent human machine for enterprises and developers. The closed-loop use of collaborative learning, in addition to intelligent customer service scenario applications, intelligent marketing, intelligent outbound, intelligent hardware and other different types of application scenarios will also be open.

The purpose of the platform is to reduce the cost and threshold for enterprises to use and own AI technology, so that enterprises and developers can quickly develop intelligent service systems that meet their own business needs or dialogue robots with intelligent interaction capabilities.

Recently, CSDN editor-in-chief afternoon tea invited Li Bo, the rotating chairman and chief architect of the Small i Robotics Technical Committee, to discuss with us.NLPThe difficulty of technology landing, and how to reduce the threshold of developers, I hope to be inspiring to the readers.

The development direction of NLP

AI Technology Base Camp: Can you briefly introduce NLP technology?

Li BoThere are two types of NLP technology, one based on rules and one based on statistics. In recent years, statistical-based NLP technology has gained the upper hand, especially after the emergence of deep learning, statistical-based NLP technology has progressed much faster. It's not that who is better, or who is worse, they have their own strengths, such as the generalization of statistical-based models is better, but it is a black box. Some applications also need to use a combination of the two, for example, in the question and answer system, some machines are not very friendly, you can use a rule-based approach to complement, the combination of the two to achieve the degree of productization.

Aside from computing intelligence, artificial intelligence mainly consists of two levels, one is cognitive intelligence and the other is perceived intelligence. For example, common speech recognition and image recognition belong to the level of perceptual intelligence. At present, there are more mature landing applications, and NLP belongs to the cognitive intelligence level. NLP often needs to combine contextual information, even considering background knowledge, common sense knowledge, and so on. In addition, the input and output of perceptual intelligence (such as picture recognition) is generally single-turn, but NLP often requires multiple rounds of multi-turn interaction to get results. At present, NLP technology is still in the early stage of development, and there are still many difficulties that need to be broken.

AI Technology Base Camp: Do you think it will make rapid progress in the future? What other directions need to be explored?

Li Bo: Common sense and background knowledge: Using NLP to do very professional things, the effect may be very good, but it is a small probability of simple things can not be done, because of the lack of common sense. To solve this problem, you need to build a common sense library and then combine it with the model. This is a difficult point and a direction that everyone is more willing to explore.

Multimodality: Human beings understand one thing through a combination of visual and auditory senses, that is, multimodality. Assuming that the NLP system can combine audio and video to understand the user's intent, there may be a greater breakthrough.

Pre-training: This is a hot spot that everyone can try recently. Pre-training has been widely used in audio and graphics, recently Google BERT It is to apply the pre-training to the NLP field and achieve good results. This is also a hot spot in the near future. You can try it.

Reinforcement learning: In terms of cognitive intelligence, reinforcement learning is also a good direction.

How does NLP get out of the lab

AI Technology Base Camp: Can the results of academia be applied to industry in a timely manner?

Li Bo: Some technologies can be converted in time, and some technologies require a conversion cycle. This conversion cycle means that we have to consider the performance and accuracy of the commercial model, in addition to other engineering conditions, after these criteria can be met before the research results can be exported to the product. The academic community trains a model to look at the final evaluation indicator, which is a percentage result, but considers more factors when landing, such as a model with an accuracy of 99%, but the remaining 1% of the engineered workload, not Must be less than 99% work.

AI Technology Base Camp: From the lab to the commercialization of the model, what do you care most about?

Li BoThe first thing we care about is the availability of the model to meet the standards of industrialization. In addition, the UI design and experience design of the product are also very important. Unlike pictures and voice, NLP needs to be considered more in terms of UI. For example, if the accuracy of the machine translation system reaches a certain level, if the UI is not well done and the user experience is not good, it may have a great impact on the landing. This is a systematic project, including cost, user experience, and how much value it brings to customers.

AI Technology Base Camp: Regarding the difficulty of landing AI startups, can any small robots have good experience to share?

Li Bo: Compared with pictures and voices, NLP is particularly difficult. The multimodality involved in NLP is a difficult point. The second difficulty is the need to combine background knowledge and common sense. These two issues are not well handled at present. Small i is mainly combined with rules-based statistics to introduce knowledge, such as the domain semantic library I mentioned earlier, the purpose is to integrate common sense and background knowledge. Finally, the problem of personalization, the output of NLP is often related to individuals, different individuals need to give different personalized results according to information such as portraits, so that they are closer to human processing.

In addition, NLP's landing scene is not so straightforward, it needs to be combined with customers or product design. For example, to make recommendations, the focus of our training model may be several indicators corresponding to the model, such as precision, etc., but the customer sees the final recommended effect, which is the user's actual evaluation and purchase. Therefore, regardless of the effect of the laboratory, in the actual effect, it is necessary to continuously adjust the system parameters according to the feedback of the customer, adjust the training data, or combine other algorithms, etc., in order to improve the final landing effect.

After running online, we also need to iterate the model based on the log of the operation and the behavior of the customer. This is a closed loop. Rather than saying that it doesn't combine the actual scenes, the model is trained and put into use directly, and then it doesn't matter. It's not the case. It needs to constantly tune the iteration according to the operational data.

AI Technology Base Camp: How to solve the problem of model controllability?

Li BoFor example, the intelligent customer service robot we do for our customers relies mainly on the hybrid model engine to achieve controllable goals. In addition, it can be controlled by some engineering methods. For example, some sensitive content is involved in the question and answer. We can intervene in time by pre-processing and post-processing, without updating the model and restarting the system. When problems are discovered during the user's actual use, we need to have channels, methods to control the output of the system, and even logic to ensure that the system is controllable.

AI Technology Base Camp: What methods can make it controllable?

Li BoOur hybrid model engine consists of two models, one is the black box, which is the deep learning model, and the other is the semantic understanding model, which can be used for intervention based on traditional semantic expressions. The semantic understanding model can be changed directly through semantic expressions, while the deep learning model needs to be retrained if it wants to intervene. Therefore, we can let the deep learning model and the semantic understanding model work together, and then adjust the output strategies of the deep learning model and the semantic understanding model (such as priority strategy) to tune.

How do small i robots collect data?

AI Technology Base Camp: How do small i robots accumulate data?

Li BoThere are three main aspects: First, we will crawl the relevant industry data through crawlers. Second, the log data generated by our cloud products will be collected directly into our data platform. Third, the material provided by the customer, we will turn it into data and knowledge.

AI Technology Base Camp: How to deal with data collection?

Li Bo: Unstructured data: First, we will clean the collected data, then classify it by machine + manual according to the classification of knowledge, and then do some coarse-grained labeling by some means (such as rules). Afterwards, it is manually confirmed and confirmed after storage.

Semi-structured data: The original formatted document provided by the customer is classified or clustered by means of format analysis or machine learning model, and then manually combed and finally stored.

AI Technology Base Camp: The processing of data depends on the combination of machine and labor?

Li Bo: The machine does the preliminary assistance, and the final confirmation is made manually, instead of directly entering the warehouse after the machine is processed. Xiaoyi has a large data platform and an annotation system, and a laboratory system that works together to generate these industry training data and industry background knowledge, and then deploy it to the actual system in the form of a domain semantic library.

How do small i robots empower developers?

AI Technology Base Camp: There are many platforms and tools that can help developers to lower the threshold. As far as I know, Xiaoyi recently launched a new generation of intelligent Bot open platform. What can this platform bring to developers?

Li Bo: Help SMEs or developers quickly build an AI system that adapts to a variety of real-world scenarios. The first landing scene is intelligent customer service, with Q&A ability as the main factor, reducing the cost of manual customer service. The second application scenario is intelligent marketing, with marketing recommendations as the mainstay, including user portraits, which we will launch later. The third application scenario is smart outbound. We will launch more scenarios later. Developers can not only use these scenarios directly, but also extend the capabilities of the application based on the API of each scenario.

AI Technology Base Camp: Because there are all kinds of platforms and tools now, assuming I am a newbie, I will be a small project to practice, what should I do?

Li Bo: The purpose of this platform is to lower the threshold for developers. Developers who collect data themselves and then train the model have a long development cycle and have a lot of pits. Our platform has two goals, first, allowing users to use it directly; second, developers can expand their capabilities based on this platform.

Using our platform, the data that the developer needs to provide is only the basic intent of the question and answer. We have the domain semantic library to support it at the bottom. We will automatically expand the dataset at the level of the word and at the syntax level, and then automatically help you. training.

AI Technology Base Camp: Regarding the engineering practice of NLP technology, what advice do you have for developers?

Li BoThere are several suggestions for developers of NLP: First of all, you need to have a comprehensive understanding of related technologies, not necessarily to be refinement, which will help development; second, be sure to specify your input and output; Developers should pay more attention to product experience.

AI Technology Base Camp: In the end, how do you usually learn by yourself? What experiences can you share?

Li Bo: The Internet is a very good channel. I prefer the way to "learn in the process of solving problems after encountering problems." If you just study through books and ignore practice, it will be more virtual. Therefore, it is necessary to combine practice, even if you try to do some Demo. I have problems during the trial and then get answers in a variety of ways, rather than learning in the traditional way of school.