Recently, graph neural networks (GNN) have become more and more popular in various fields, including social networks, knowledge graphs, recommendation systems, and even life sciences.

GNN has a strong ability to model the dependencies between nodes in the graph, which has made breakthroughs in the research field related to graph analysis.This article aims to introduce the basic knowledge of graph neural networks and two more advanced algorithms:DeepWalk 和 GraphSage.

Graph

Before discussing GNN, let’s first understand what isGraph. In computer science, a graph is a data structure consisting of two components:Vertices (vertices) Edges.A graph G can be described by the set of vertices V and edges E it contains.

Side can beDirectedOrUndirected, depending on whether there is a directional dependency between the vertices.

a directed graph 

Vertices are also commonly referred to asNodes. In this article, the two terms are interchangeable.

Graph neural network (GNN)

Graph neural network is a kindHead directly to Stubhub to view more events and check pricing on ticketsRunning on the graph structureNeural network. A typical application of GNN isNode classification.Essentially, each node in the graph is associated with a label, and our purpose is to predict the label of a node without ground-truth.

This section will describe The graph neural network model (Scarselli, F., et al., 2009) [1] The algorithm in this paper, this is the first paper to propose GNN, so it is usually consideredOriginal GNN.

In the node classification problem setting, the feature x_v of each node v is associated with a ground-truth label t_v.Given a partially labeled graph G, the goal is to use these labeled nodes to predict the labels of unlabeled nodes.It learns to represent each node with a d-dimensional vector h_v containing neighborhood information.which is:

Among them, x_co[v] represents the feature of the edge connected to v, h_ne[v] represents the embedding of the adjacent node of v, and x_ne[v] represents the feature of the adjacent node of v.The function f is a transition function that maps these inputs to a d-dimensional space.Since we are looking for the unique solution of h_v, we can apply the Banach fixed point theorem to rewrite the above equation as an iterative update process.

H and X represent the concatenation of all h and x respectively.

The output of the GNN is calculated by passing the state h_v and the characteristic x_v to the output function g.

Both f and g here can be interpreted as a feedforward fully connected neural network. L1 loss can be directly expressed as:

It can be optimized by gradient descent.

however,The original GNN has three main limitations :

  • If the assumption of "fixed point" is relaxed, then a more stable representation can be learned by using multilayer perceptrons and the iterative update process can be eliminated.This is because, in the original paper, different iterations use the same parameters of the transfer function f, and different parameters in different layers of MLP allow hierarchical feature extraction.
  • It cannot handle edge information (for example, different edges in the knowledge graph may represent different relationships between nodes)
  • The fixed point hinders the diversity of node distribution and is not suitable for learning to represent nodes.

In order to solve the above problems, researchers have proposed several variants of GNN.However, they are not the focus of this article.

DeepWalk: The first unsupervised learning node embedded algorithm

DeepWalk [2] was the first to proposeUnsupervised wayLearning algorithm for node embedding.

It is very similar to vocabulary embedding in the training process.The motivation is that the distribution of the nodes in the graph and the words in the corpus follow the power law, as shown in the following figure:

The algorithm consists of two steps:

  1. Perform random walks on the nodes in the graph to generate a sequence of nodes
  2. Run skip-gram and learn the embedding of each node based on the sequence of nodes generated in step 1

In each time step of random walks, the next node is uniformly sampled from the neighboring nodes of the previous node.Then cut each sequence into a sub-sequence of length 2|w| + 1, where w represents the window size in skip-gram.

This paper uses hierarchical softmax to solve the problem of high cost of softmax calculation due to the large number of nodes.In order to calculate the softmax value of each individual output element, we must calculate all e^xk of element k.

Definition of Softmax


Therefore, the calculation time of the original softmax is O(|V|), where V represents the set of vertices in the graph.

Hierarchical softmax utilizationBinary treeTo deal with this problem.In this binary tree, all the leaves (v1, v2,...v8 in the figure below) represent the vertices in the graph.In each internal node, there is a binary classifier to decide which path to choose.To calculate the probability of a given vertex v_k, you only need to calculate the probability of each sub-path from the root node to the leaf node v_k.Since the sum of the probabilities of the child nodes of each node is 1, the feature that the sum of the probabilities of all vertices is 1 remains unchanged in the hierarchical softmax.Since the longest path of a binary tree is O(log(n)), where n represents the number of leaf nodes, the calculation time of one element is now reduced to O(log|V|).

Hierarchical Softmax

After training DeepWalk GNN, the model has learned a good representation of each node, as shown in the figure below.Different colors indicate different labels in the input map.We can see that in the output graph (2-dimensional embedding), nodes with the same label are clustered together, and most nodes with different labels are correctly separated.

However, the main problem with DeepWalk isLack of generalization ability.Whenever a new node appears, it must retrain the model to represent this node.Therefore, this GNN is not suitable for dynamic graphs with constantly changing nodes in the graph.

GraphSage: Learn the embedding of each node

GraphSage provides a solution to the above problems, learning the embedding of each node in an inductive way.

Specifically,Each node of GraphSage is represented by the aggregation of its neighborhood. Therefore, even if a new node that is not seen during training is present, it can still be properly represented by its neighbors.

Here is the GraphSage algorithm:

The outer loop represents the number of update iterations, and h ^ k_v represents the latent vector of node v at the update iteration k.In each update iteration, the update of h ^ k_v is based on an aggregation function, the latent vector of the neighborhood of v and v in the previous iteration, and the weight matrix W ^ k.

Three aggregation functions are proposed in the paper:

1. Mean aggregator:

mean aggregator takes the average of the potential vectors of a node and all its neighbors.

Compared with the original equation, it deletes the concatenation operation in line 5 of the pseudo code above.This kind of operation can be regarded as a kind of "skip-connection". In the later part of the paper, it is proved that this can greatly improve the performance of the model.

2. LSTM Aggregator:

Since the nodes in the graph do not have any order, they randomly assign the order by arranging the nodes.

3. Pooling aggregator:

This operator performs an element-wise pooling function on adjacent sets.The following is an example of max-pooling:

The paper uses max-pooling as the default aggregation function.

The loss function is defined as follows:

Among them, u and v coexist in a fixed-length random walk, and v_n is a negative sample that does not coexist with u.This loss function encourages nodes that are closer to have similar embeddings, while nodes that are farther away are separated in the projection space.In this way, the node will obtain more and more information about its neighborhood.

GraphSage can generate representable embeddings for invisible nodes by aggregating nearby nodes.It allows node embedding to be applied to domains involving dynamic graphs, where the structure of the graph is constantly changing.For example, Pinterest uses PinSage, an extended version of GraphSage, as the core of its content discovery system.

Final Thoughts

In this article, we learned the basics of graph neural networks, DeepWalk and GraphSage. The power of GNN in complex graph structure modeling is truly amazing.Given its effectiveness, I believe that in the near future, GNN will play an important role in the development of AI.

[1] Scarselli, Franco, et al. "The graph neural network model."

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1015.7227&rep=rep1&type=pdf

[2] Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "Deepwalk: Online learning of social representations."

http://www.perozzi.net/publications/14_kdd_deepwalk.pdf

[3] Hamilton, Will, Zhitao Ying, and Jure Leskovec. “Inductive representation learning on large graphs.”

https://www-cs-faculty.stanford.edu/people/jure/pubs/graphsage-nips17.pdf

This article is transferred from the public number Xinzhiyuan,Original address