A text to understand natural language processing - NLP

There is a huge amount of textual information on the network, and NLP technology is needed to deal with these unstructured data.

This article introduces the basic concepts of NLP, 2 big tasks, 4 typical applications, and 6 practice steps.

To learn more about NLP-related content, please visit the NLP topic, and a 59-page NLP document download is available for free.

Visit the NLP topic and download a 59-page free PDF

 

Why is NLP important?

"Language understanding is the crown jewel in the field of artificial intelligence"

Bill Gates

Before the advent of artificial intelligence, machine intelligence processed structured data (such as data in Excel). But most of the data in the network is unstructured, such as: articles, pictures, audio, video...

Structured data and unstructured data

In unstructured data, the amount of text is the most. Although he does not have much space for pictures and videos, his amount of information is the largest.

In order to be able to analyze and use these textual information, we need to use NLP technology to let the machine understand and use the text information.

 

What is natural language processing-NLP

Every animal has its own language, and the machine is also!

Natural Language Processing (NLP) is a bridge between machine language and human language to achieve the purpose of human-computer communication.

Humans communicate through language, and dogs communicate through barking. The machine also has its own way of communication, that is digital information.

Different species have their own way of communication

Different languages ​​can't communicate. For example, humans can't understand dog barking. Even humans in different languages ​​can't communicate directly. They need translation to communicate.

This is especially true for computers. In order for computers to communicate with each other, people let all computers follow certain rules. These rules of computers are the language between computers.

Since there can be translations between different human languages, can humans and machines communicate directly through "translation"?

NLP is the bridge between humans and machines!

NLP is the bridge between humans and machines.

Why is "natural language" processing?

Natural language is the expression commonly used in daily life, which is what people usually mean by "speaking."

Natural language: I have a little camel on my back (unnatural language: my back is curved)

Natural language: The baby's agent sleeps baby's baby (a lot of this kind on Weibo)

 

NLP's 2 core mission

NLP has 2 core tasks: NLU and NLG

NLP has 2 core tasks:

  1. Natural Language Understanding – NLU | NLI
  2. Natural language generation-NLG

 

Natural Language Understanding – NLU|NLI

Natural language understanding means that the machine is like a human being, and has the ability to understand the language of a normal person. Because natural language has many difficulties in understanding (detailed below), NLU It is still far from being human.

è??aç??¶è ̄è ̈??ç????解就æ?? ̄å ̧??æ????æ??oå?? ̈å????äooä ̧??æ ·ï1⁄4??å??åå??æ£å ̧ ̧äooç????è ̄è ̈??ç????解è??1⁄2å????

5 difficulties in natural language understanding:

  1. Language diversity
  2. Language ambiguity
  3. Language robustness
  4. Language knowledge dependence
  5. Language context

To learn more about NLU, check out this article.A text to understand natural language understanding - NLU (basic concept + practical application + 3 implementation)"

 

Natural language generation-NLG

NLG - Convert non-language format data into a human-readable language format

NLG It is to cross the communication gap between humans and machines, and to convert non-verbal data into language formats that humans can understand, such as articles and reports.

NLG's 6 steps:

  1. Content Determination – Content Determination
  2. Text structure – Text Structuring
  3. Sentence Aggregation – Sentence Aggregation
  4. Grammaticalization – Lexicalisation
  5. Reference Expression Generation – Referring Expression Generation|REG
  6. Language Realization – Linguistic Realisation

To learn more about NLG, take a look at this article.A text to understand natural language generation – NLG (6 implementation steps + 3 typical applications)"

 

5 difficulties of NLP

5 difficulties of NLP

  1. Language is not regular, or the law is intricate.
  2. Languages ​​can be freely combined and can be combined with complex language expressions.
  3. Language is an open collection, and we can invent and create some new expressions.
  4. Language needs to be linked to practical knowledge and has certain knowledge dependence.
  5. The use of the language is based on the environment and context.

 

4 typical applications of NLP

NLP 4 typical application

emotion analysis

There are a lot of text messages on the Internet. The content of these messages is varied, but the emotions they express are the same: positive/positive-negative/negative.

Through sentiment analysis, you can quickly understand the user's grievances.

 

Chat robot

In the past, only Siri and Xiaobing robots were not very powerful, just as a way of entertainment. However, the rapid development of smart speakers in recent years has made everyone feel the value of chat robots.

And with the development of smart homes and smart cars in the future, chat bots will have greater use value.

 

Speech Recognition

Voice recognition has become a national quotation. Voice-to-text can be converted in WeChat, the destination can be directly spoken in the navigation of the car, and the elderly can use the input method to directly speak without learning pinyin...

 

machine translation

The current machine translation accuracy rate is already very high. You can use Google Translate to understand the meaning of the article. Traditional human flesh translation is likely to be unemployed in the future.

 

NNUM's 2 approach, 3 core steps

NLP can be handled using traditional machine learning methods or deep learning methods. Different approaches to 2 also correspond to different processing steps. Details are as follows:

Way 1: Traditional Machine Learning NLP Process

Traditional machine learning NLP process

  1. Corpus pretreatment
    1. Chinese corpus preprocessing 4 steps (detailed below)
    2. 6 steps for English corpus preprocessing (detailed below)
  2. Feature engineering
    1. Feature extraction
    2. Feature selection
  3. Select classifier

 

Way 2: NLP Process for Deep Learning

Deep learning NLP process

  1. Corpus pretreatment
    1. Chinese corpus preprocessing 4 steps (detailed below)
    2. 6 steps for English corpus preprocessing (detailed below)
  2. Design model
  3. Model training

 

6 steps for English NLP corpus preprocessing

**6 steps for English NLP corpus preprocessing**

  1. Word Segmentation – Tokenization
  2. Stem extraction - Stemming
  3. Lexical reduction – Lemmatization
  4. Part of Speech Tagging – Parts of Speech
  5. Named entity recognition-NER
  6. Chunking-Chunking

 

4 steps for Chinese NLP corpus preprocessing

**4 steps for Chinese NLP corpus preprocessing**

  1. Chinese Word Segmentation – Chinese Word Segmentation
  2. Part of Speech Tagging – Parts of Speech
  3. Named entity recognition-NER
  4. Remove stop words

 

Final Thoughts

Natural Language Processing (NLP) is a bridge between machine language and human language to achieve the purpose of human-computer communication.

 

NLP's 2 core tasks:

  1. Natural Language Understanding – NLU
  2. Natural language generation-NLG

 

5 Difficulties of NLP:

  1. Language is not regular, or the law is intricate.
  2. Languages ​​can be freely combined and can be combined with complex language expressions.
  3. Language is an open collection, and we can invent and create some new expressions.
  4. Language needs to be linked to practical knowledge and has certain knowledge dependence.
  5. The use of the language is based on the environment and context.

 

Typical 4 applications for NLP:

  1. emotion analysis
  2. Chat robot
  3. Speech Recognition
  4. machine translation

 

NNUM 6 implementation steps:

  1. Participle-Tokenization
  2. Secondary extraction-stemming
  3. Lexical reduction-lemmatization
  4. Part of speech tagging - pos tags
  5. Named entity recognition -Ner
  6. Block-chunking

Baidu Encyclopedia Version + Wikipedia

Baidu Encyclopedia version

Natural language processing is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that enable effective communication between humans and computers in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, the language that people use every day, so it is closely related to the study of linguistics, but there are important differences. Natural language processing is not a general study of natural language, but rather the development of computer systems that can effectively implement natural language communication, especially the software systems therein. It is therefore part of computer science.

Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics that focus on the interaction between computers and human (natural) languages.

Read More

 

Wikipedia version

Natural Language Processing (NLP) is a sub-area of ​​computer science, information engineering, and artificial intelligence that involves the interaction between computers and human (natural) languages, especially how to program computers to process and analyze large amounts of natural language data. The challenges in natural language processing typically involve speech recognition, natural language understanding, and natural language generation.

Read More

 

Extended reading

Prospective article (15)

Status of migration learning in the field of NLP(2019-9)

Viewpoint | Breakthrough of cognitive intelligence: NLP, knowledge map is the next "Nugget Land" of AI?(2019-8)

From the lag of development to continuous breakthrough, NLP has become another blast point for AI?(2019-7)

[Technical Review] The history of the application of deep learning in natural language processing(2019-6)

Dry goods | the most comprehensive natural language processing attention summary(2019-6)

AI Product Manager Essentials: 8's most advanced pre-training model for NLP(2019-6)

8 excellent pre-training model for large inventory, NLP application so easy!(2019-5)

From rule-based to deep learning, NLP technology advanced trilogy(2019-3)

A Summary of the Differences Between Chinese and English Natural Language Processing NLP(2019-3)

Baidu released NLP model ERNIE, based on knowledge enhancement, surpassing BERT in multiple Chinese NLP tasks(2019-3)

A Summary of Attention Mechanism in Natural Language Processing(2019-1)

21 NLP task activation function competition: you must not guess who won

Good depth: 2018 NLP application and commercialization report

Getting started with 5 minutes Google's strongest NLP model: BERT

Why is NLP technology so difficult to land? What pits are there?

Microsoft Research Asia: NLP will usher in the golden decade

Sweeping 13 Chinese NLP task: Xiangxi Technology proposes Chinese glyph representation vector Glyce+Tianzi CNN

Deep long text: a ten-year review of Chinese word segmentation

The existing model also "does not understand" natural language: 20 researchers talk about the four open issues of NLP

Dialogue with Tsinghua NLP Lab Liu Zhiyuan: NLP is indispensable for knowledge base and graph neural network

Chinese participle reviews again in ten years: 2007-2017