DeepMind, in conjunction with Google Brain, MIT and other organizations, published a large paper on 27 authors, proposing a "Graph network" that combines end-to-end learning with inductive reasoning, and is expected to solve the problem that deep learning cannot be used for relational reasoning.

As the benchmark of the industry, the trend of DeepMind has always been a hot spot in the AI industry. Recently, the world’s top AI lab seems to be focusing on them.**Explore "relationships"**Above, since the month of 6, several papers with "relationships" have been published, such as:

- Relational inductive bias for physical construction in humans and machines
- Relational Deep Reinforcement Learning
- relationshipRNN(Relational Recurrent Neural Networks)

There are many papers, but if there is any paper that is most worthy of reading, then you must choose this one - "Inductive Bias, Deep Learning and Graph Network".

This article is combined with DeepMind, Google Brain, MIT, and the University of Edinburgh.**27 author**(The 22 people are from DeepMind), using the 37 page length, the relationship between the inductive bias and the Graph network.

DeepMind's research scientist, Oriol Vinyals, is a rare promoter of the work on Twitter (he is also one of the authors himself) and said the review is "pretty comprehensive."

There are many well-known AI scholars who have commented on this article.

Denny Britz, who has been internship at Google Brain and engaged in intensive reinforcement studies, said that he is happy to see that someone combines the first-order logic of graphs with probabilistic reasoning, and that this field may be revived.

Chris Gray, founder of chip company Graphcore, commented that if this direction continues and results are really achieved, it will create a more promising foundation for AI than today's deep learning.

Stein Stafford, Ph.D., Ph.D. in mathematics at Cornell University, believes that**Graph NNs may solve the core problem that deep learning cannot be used for causal reasoning pointed out by Turing Award winner Judea Pearl**.

Open up a more promising direction than deep learning alone

So, what is this paper about? The opinions and points of DeepMind are very clear in this paragraph:

This is not only a submission, but also a review, or a unity. We believe that if AI is to achieve the same human capabilities, combinatorial generalization must be a top priority, and structured representation and calculation are the key to achieving this goal.

Just as the innate and acquired factors in biology work together, we believe that “hand-engineering” and “end-to-end” learning are not only one of them. We advocate combining the advantages of both. Benefit from their complementary advantages.

In the paper, the author explores how to use relational inductive biases in deep learning structures (such as the fully connected layer, convolutional layer, and recursive layer) to promote the relationship between entities, relationships, and components. Rules to learn.

They proposed a new AI module -**Graph network**It is a generalization and extension of various neural network methods that have previously operated the graph. The graph network has a strong relationship inductive bias, providing a direct interface for manipulating structured knowledge and generating structured behavior.

The author also discusses how graph networks support relational reasoning and combinatorial generalization, laying the groundwork for more complex, interpretable, and flexible reasoning models. Turing Award winner Judea Pearl: The Cause of Causal Reasoning for Deep Learning

At the beginning of 2018, undertaking NIPS 2017"Deep Learning Alchemy"The debate, deep learning, ushered in an important critics.

Turing Award winner, Judea Pearl, the father of the Bayesian network, published his paper at ArXiv"The Seven Sparks of Machine Learning Theoretical Obstacles and Causal Revolution"Discuss the limitations of current machine learning theory and give 7 inspiration from causal reasoning. Pearl pointed out that current machine learning systems run almost entirely in statistical or blind models and cannot be used as a basis for strong AI. He believes that the breakthrough lies in the "causal revolution" and draws on the structural causal reasoning model, which can make a unique contribution to automated reasoning.

In a recent interview, Pearl is even more blunt, and current deep learning is just "**Curve Fitting**"(Curve fitting). "This sounds like blasphemy... But from a mathematical point of view, no matter how clever you are to manipulate the data and how much information you can read from it, what you do is still just fitting a curve. "DeepMind’s proposal: Integrate traditional Bayesian causal networks and knowledge maps with deep reinforcement learning

how to solve this problem? DeepMind believes that it is necessary to start with the "graph network."

Deng Yu, founder of DaDian Medical, and Dr. Deng, CMU, explained the research background of DeepMind.

Dr. Deng Yu introduced that the machine learning community has**Three main schools**, Symbolism, Connectionism, Actionism.

**Symbolism**The origin of the study, focusing on the study of knowledge expression and logical reasoning. After decades of research, the main results of this school are the Bayesian causal network and the knowledge map.

The standard-bearer of the Bayesian causal network is Professor Judea Pearl, the winner of the 2011 Turing Award. However, it is said that at the XIPSX NIPS academic conference, when the father spoke, the audience was embarrassed. In 2017, Father published a new book, "The Book of Why", which defended the causal network and criticized deep learning for lack of rigorous logical reasoning. The knowledge map is mainly driven by search engine companies, including Google, Microsoft, and Baidu. The goal is to push search engines from keyword matching to semantic matching.

**Connectionism**The origin of Bionics is the use of mathematical models to imitate neurons. Professor Marvin Minsky won the Turing Prize in 1969 for his promotion of neuron research.Assemble a large number of neurons together to form a deep learning model. The standard bearer of deep learning is Professor Geoffrey Hinton.The most criticized flaw of deep learning models is unexplainable.

**Behaviorism**Introducing cybernetics to machine learning, the most famous achievement is reinforcement learning. The standard-bearer of intensive learning is Professor Richard Sutton. In recent years, Google DeepMind researchers have combined traditional intensive learning with deep learning to realize AlphaGo and defeat all human Go masters in the world today.

The paper published by DeepMind the day before yesterday proposes to integrate traditional Bayesian causal networks and knowledge maps with deep reinforcement learning and to sort out the research progress related to this topic. What is the "graphic network" proposed by DeepMind?

Here, it is necessary to make a more detailed introduction to so many "graph networks". Of course, you can also skip this section and look directly at the interpretation.

In the paper "Relationship Offset, Deep Learning, and Graph Networks," the authors explain their "graph network" in detail. The framework of the graph network (GN) defines**A class of functions for relational reasoning of graphical structure representation**. The GN framework summarizes and extends the various graph neural networks, MPNN, and NLNN methods, and supports the construction of complex structures from simple building blocks.

The main computing unit of the GN framework is **GN block**,which is"**Graph-to-graph**Module, which takes graph as input, performs calculations on the structure, and returns graph as output. As described in Box 3 below, the entity is represented by the nodes of the graph, the relations of the edges, and the global attributes. ) said.

The author of the paper uses "graph" to denote a directed, attributed multi-graph with global attributes. A node is represented as Vi, an edge is represented as ek, and global attributes are represented as**u**. Sk and rk represent the indices of the sender and receiver nodes. details as follows:

**Directed**: One-way, from the "sender" node to the "receiver" node.**Attribute**: property, which can be encoded as a vector (vector), set, or even another graph**Attributed**: edges and vertices have properties associated with them**Global attribute**:graph-level attribute**Multi-graph**: There are multiple edges between the vertices

The organization of the GN framework's blocks emphasizes customizability and synthesizes new architectures that represent the required relationship inductive biases.

Use an example to explain GN more specifically. Consider predicting the motion of a group of rubber balls in any gravitational field. They do not collide with each other, but have one or more springs that connect them to other balls (or all balls). We will refer to this running example in the definition below to illustrate the graphical representation and the calculations performed on it.

**Definition of "graph"**

In our GN framework, a graph is defined as G=(u,V,E) of an 3 tuple.

**u** Represents a global property; for example, u may represent a gravity field.

Is a collection of nodes (the base is Nv), where each Vi represents the attributes of the node. For example, V might represent each ball with attributes such as position, speed, and quality.

Is a set of edges (the base is Ne), where each ek represents the attribute of the edge, rk is the index of the receiving node, and sk is the index of the sending node. For example, E can represent springs that exist between different balls, as well as their corresponding spring constants.

**Algorithm 1: a complete GN block calculation step**

**Internal structure of the GN block**

A GN block contains three "update" functions and three "aggregation" functions:

among them:

To combine knowledge maps with deep learning, Dr. Deng Wei believes that there are several difficulties.

**1. Point vector:**

The knowledge map consists of points and edges. The node is used to represent the entity. The entity contains the attributes and the values of the attributes. Entities in traditional knowledge maps, usually composed of conceptual symbols, such as natural language vocabulary.

The edges in the traditional knowledge map connect two single points, that is, two entities. The edges express the relationship. The strength of the relationship is expressed by the weight. The weight of the edge of the traditional knowledge map is usually constant.

If you want to integrate the traditional knowledge map with deep learning, the first thing to do is to realize the differentiation of points. Replacing the vocabulary of natural language with a numerical word vector is an effective way to realize the differentiation of points. The common practice is to use a language model to analyze a large amount of text, and to find the word vector that best fits the context semantics for each vocabulary. . However, in the map, the traditional word vector generation algorithm is not very effective and needs to be modified.

**2. Super point:**

As mentioned earlier, the edges in the traditional knowledge map connect two single points and express the relationship between two single points. This assumption restricts the ability to express the map, because in many scenarios, multiple single points are combined to be related to other single points or single points. We combine a single point, called a hyper-node.

The question is which single points are combined to form a super point? The artificial a priori designation is of course a method. From the large amount of training data, it is also an idea to automatically learn the composition of the super point through the dropout or regulation algorithm.

**3. Super Edge:**

The edges in the traditional knowledge map express the relationship between points. The strength of the relationship is expressed by the weight. Usually the weight is a constant. But in many scenarios, the weight is not constant. As the value of the point is different, the weight of the edge also changes, and it is likely to be a nonlinear change.

The non-linear function is used to represent the edges of the spectrum, called the hyper-edge.

Deep learning models can be used to simulate nonlinear functions. Therefore, each edge of the knowledge map is a deep learning model. The input to the model is a superpoint of several single points, and the output of the model is another super point. If you look at each deep learning model as a tree, the root is the input and the leaf is the output. So bird's eye view**Knowledge map, actually a forest of deep learning models**.

**4. Path:**

When training the knowledge map, including training point vector, super point, and super edge, a training data is often a path walking in the map. By fitting a massive path, the most appropriate point vector, super point and super side are obtained. .

Using the fitting path to train the map, there is a problem that the training process and the evaluation after the end of the process are disconnected. For example, give you an outline of several articles, and the corresponding essays, let you learn how to write a text. The process of fitting emphasizes word-by-word imitation. However, the evaluation of the quality of the article, the focus is not on the step-by-step trend of the words, but in the smoothness of the entire article.

How to solve the disconnect between the training process and the final evaluation? A promising approach is to use reinforcement learning. The essence of reinforcement learning lies in the final evaluation, through backtracking and discounting methods, to evaluate its potential in each intermediate state of the path process.

But**The difficulty in strengthening learning lies in the fact that the number of intermediate states cannot be too much.**. When the number of states is too large, the training process of reinforcement learning cannot converge. The solution to the convergence problem is to use a deep learning model to estimate the potential value of all states. In other words, there is no need to estimate the potential values of all states, but only the finite parameters of a model need to be trained.

In this article published by DeepMind the day before yesterday, it is proposed to integrate deep reinforcement learning with knowledge graphs, and sort out a large number of related studies.However, the paper did not clearly state which specific solution DeepMind prefers.

Perhaps there are different options for different application scenarios, and there is no universal best solution. Atlas deep learning is the hotspot of the next AI algorithm?

Many important real-world data sets appear in the form of graphs or networks, such as social networks, knowledge graphs, the World Wide Web, and so on.At present, more and more researchers have begun to pay attention to the processing of this structured data set by the neural network model.

Combined with a series of papers on deep learning of pictures published by DeepMind, Google Brain, etc.**"Drawing depth learning" is the next AI algorithm hotspot**?

In short, let's start with this paper.

Address:https://arxiv.org/pdf/1806.01261.pdf

**参考资料**

- Interview with Judea Pearl
- Graph convolution network
- Relational RNN
- Deep learning
- Relational inductive bias

This article is transferred from the public number Xinzhiyuan,Original address

## Comments