There is a huge amount of textual information on the network, and NLP technology is needed to deal with these unstructured data.
This article introduces the basic concepts of NLP, 2 big tasks, 4 typical applications, and 6 practice steps.
To learn more about NLP-related content, please visit the NLP topic, and a 59-page NLP document download is available for free.
Visit the NLP topic and download a 59-page free PDF
Why is NLP important?
"Language understanding is the crown jewel in the field of artificial intelligence"
Bill Gates
Before the advent of artificial intelligence, machine intelligence processed structured data (such as data in Excel). But most of the data in the network is unstructured, such as: articles, pictures, audio, video...
In unstructured data, the amount of text is the most. Although he does not have much space for pictures and videos, his amount of information is the largest.
In order to be able to analyze and use these textual information, we need to use NLP technology to let the machine understand and use the text information.
What is natural language processing-NLP
Every animal has its own language, and the machine is also!
Natural Language Processing (NLP) is a bridge between machine language and human language to achieve the purpose of human-computer communication.
Humans communicate through language, and dogs communicate through barking. The machine also has its own way of communication, that is digital information.
Different languages can't communicate. For example, humans can't understand dog barking. Even humans in different languages can't communicate directly. They need translation to communicate.
This is especially true for computers. In order for computers to communicate with each other, people let all computers follow certain rules. These rules of computers are the language between computers.
Since there can be translations between different human languages, can humans and machines communicate directly through "translation"?
NLP is the bridge between humans and machines!
Why is "natural language" processing?
Natural language is the expression commonly used in daily life, which is what people usually mean by "speaking."
Natural language: I have a little camel on my back (unnatural language: my back is curved)
Natural language: The baby's agent sleeps baby's baby (a lot of this kind on Weibo)
NLP's 2 core mission
NLP has 2 core tasks:
Natural Language Understanding – NLU|NLI
Natural language understanding means that the machine is like a human being, and has the ability to understand the language of a normal person. Because natural language has many difficulties in understanding (detailed below), NLU It is still far from being human.
5 difficulties in natural language understanding:
- Language diversity
- Language ambiguity
- Language robustness
- Language knowledge dependence
- Language context
To learn more about NLU, check out this article.A text to understand natural language understanding - NLU (basic concept + practical application + 3 implementation)"
Natural language generation-NLG
NLG It is to cross the communication gap between humans and machines, and to convert non-verbal data into language formats that humans can understand, such as articles and reports.
NLG's 6 steps:
- Content Determination – Content Determination
- Text structure – Text Structuring
- Sentence Aggregation – Sentence Aggregation
- Grammaticalization – Lexicalisation
- Reference Expression Generation – Referring Expression Generation|REG
- Language Realization – Linguistic Realisation
To learn more about NLG, take a look at this article.A text to understand natural language generation – NLG (6 implementation steps + 3 typical applications)"
5 difficulties of NLP
- Language is not regular, or the law is intricate.
- Languages can be freely combined and can be combined with complex language expressions.
- Language is an open collection, and we can invent and create some new expressions.
- Language needs to be linked to practical knowledge and has certain knowledge dependence.
- The use of the language is based on the environment and context.
4 typical applications of NLP
emotion analysis
There are a lot of text messages on the Internet. The content of these messages is varied, but the emotions they express are the same: positive/positive-negative/negative.
Through sentiment analysis, you can quickly understand the user's grievances.
Chat robot
In the past, only Siri and Xiaobing robots were not very powerful, just as a way of entertainment. However, the rapid development of smart speakers in recent years has made everyone feel the value of chat robots.
And with the development of smart homes and smart cars in the future, chat bots will have greater use value.
Speech Recognition
Voice recognition has become a national quotation. Voice-to-text can be converted in WeChat, the destination can be directly spoken in the navigation of the car, and the elderly can use the input method to directly speak without learning pinyin...
machine translation
The current machine translation accuracy rate is already very high. You can use Google Translate to understand the meaning of the article. Traditional human flesh translation is likely to be unemployed in the future.
NNUM's 2 approach, 3 core steps
NLP can be handled using traditional machine learning methods or deep learning methods. Different approaches to 2 also correspond to different processing steps. Details are as follows:
Way 1: Traditional Machine Learning NLP Process
- Corpus pretreatment
- Chinese corpus preprocessing 4 steps (detailed below)
- 6 steps for English corpus preprocessing (detailed below)
- Feature engineering
- Feature extraction
- Feature selection
- Select classifier
Way 2: NLP Process for Deep Learning
- Corpus pretreatment
- Chinese corpus preprocessing 4 steps (detailed below)
- 6 steps for English corpus preprocessing (detailed below)
- Design model
- Model training
6 steps for English NLP corpus preprocessing
- Word Segmentation – Tokenization
- Stem extraction - Stemming
- Lexical reduction – Lemmatization
- Part of Speech Tagging – Parts of Speech
- Named entity recognition-NER
- Chunking-Chunking
4 steps for Chinese NLP corpus preprocessing
- Chinese Word Segmentation – Chinese Word Segmentation
- Part of Speech Tagging – Parts of Speech
- Named entity recognition-NER
- Remove stop words
Final Thoughts
Natural Language Processing (NLP) is a bridge between machine language and human language to achieve the purpose of human-computer communication.
NLP's 2 core tasks:
- Natural Language Understanding – NLU
- Natural language generation-NLG
5 Difficulties of NLP:
- Language is not regular, or the law is intricate.
- Languages can be freely combined and can be combined with complex language expressions.
- Language is an open collection, and we can invent and create some new expressions.
- Language needs to be linked to practical knowledge and has certain knowledge dependence.
- The use of the language is based on the environment and context.
Typical 4 applications for NLP:
- emotion analysis
- Chat robot
- Speech Recognition
- machine translation
NNUM 6 implementation steps:
- Participle-Tokenization
- Secondary extraction-stemming
- Lexical reduction-lemmatization
- Part of speech tagging - pos tags
- Named entity recognition -Ner
- Block-chunking
Baidu Encyclopedia Version + Wikipedia
Natural language processing is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that enable effective communication between humans and computers in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, the language that people use every day, so it is closely related to the study of linguistics, but there are important differences. Natural language processing is not a general study of natural language, but rather the development of computer systems that can effectively implement natural language communication, especially the software systems therein. It is therefore part of computer science.
Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics that focus on the interaction between computers and human (natural) languages.
Natural Language Processing (NLP) is a sub-area of computer science, information engineering, and artificial intelligence that involves the interaction between computers and human (natural) languages, especially how to program computers to process and analyze large amounts of natural language data. The challenges in natural language processing typically involve speech recognition, natural language understanding, and natural language generation.
Extended reading
6 Comments
Can this language processing be applied to Japanese?Where can I download the software?
Can be used in Japanese, the key is what do you want to do?
Thanks for popular science~ Xiaobai learned a lot ❤
Thanks for popular science~ Xiaobai learned a lot ❤
Good
nb