Want to learn more NLP For related content, please visit the NLP topic, and a 59-page NLP document download is available for free.

Visit the NLP topic and download a 59-page free PDF

What is the compositional parsing analysis?

The definition of Wikipedia is as follows: The constituency-based parse trees of constituency grammars (= phrase structure grammars) distinguish between terminal and non-terminal nodes. The interior nodes are labeled by non-terminal categories of the grammar, while the leaf nodes Are labeled by terminal categories.

The composition of a sentence is called a sentence component, also called a syntactic component. In a sentence, there is a certain combination relationship between words and words. According to different relationships, sentences can be divided into different components. Sentence components are played by words or phrases.

Syntactic structure analysis refers to judging whether the composition of the input word sequence (usually a sentence) conforms to a given grammar and analyzing the syntactic structure of the grammatical sentence. The syntactic structure is generally represented by a tree-like data structure, usually called a syntactic parsing tree or a parsing tree. The program module that completes this analysis process is called a syntactic parser. , also referred to as the parser.

 

Basic task

There are three basic tasks in syntactic structure analysis: 1. Determines whether the input string belongs to a certain language. 2. Eliminates ambiguity in lexical and structural terms in input sentences. 3. Analyzes the internal structure of input sentences, such as composition, context, and so on.

If a sentence has multiple structural representations, the parser should analyze the most likely structure of the sentence. Sometimes people also refer to syntactic structural analysis as language or sentence recognition.

Generally, constructing a syntactic parser needs to consider two parts: the formal representation of grammar and the description of the term information, and the design of the analysis algorithm. Currently widely used in natural language processing are context-free grammar (CFG) and constraint-based grammar (also known as grammar).

 

Common method

Syntactic structure analysis can be divided into rule-based analysis methods, statistical-based analysis methods, and recent deep learning-based methods.

Rule-based analysis method: The basic idea is to manually organize grammar rules, establish a grammar knowledge base, and achieve the elimination of syntactic structure ambiguity through conditional constraints and checks.

Statistical-based analysis: The most successful statistical syntactic analysis is based on probability-context-free grammar (PCFG or SCFG). The model used in this method mainly includes two types: lexicalized probabilistic model and unlexicalized probabilistic model.

Analysis method based on deep learning: In recent years, deep learning has achieved good results in nlp basic tasks, and many papers have emerged.

 

Phrase structure and dependency structure

The phrase tree can be transformed into a dependency tree in a one-to-one correspondence, and vice versa, because a dependency tree may correspond to multiple phrase trees. The conversion method can be implemented as follows:

Define a central word extraction rule to generate a central vocabulary;

Select a central child node for each node in the syntax tree according to the central vocabulary;

In the same layer, the central word of the non-central sub-node is dependent on the central word of the central sub-node, and the central word of the next layer is dependent on the central word of the upper layer, thereby obtaining the corresponding dependency structure.

 

Tool recommendation

StanfordCoreNLP

Stanford's, provides component parsing capabilities.

Github addressOfficial website

 

Berkeley Parser

Berkeley University nlp group open source tools. Provides syntax analysis in English.

Official address

 

SpaCy

Industrial-grade natural language processing tools, unfortunately do not support Chinese.

Gihub addressOfficial website

 

reference:

1. Statistical natural language processing

2. Chinese Information Processing Report - 2016