Graph Neural Networks (ESE680)
Graph Neural Networks (GNNs) are information processing architectures for signals supported on graphs. They have been developed and are presented in this course as generalizations of the convolutional neural networks (CNNs) that are used to process signals in time and space. Depending on how much you have heard of neural networks (NNs) and deep learning, this is a sentence that may sound strange. Aren’t CNNs just particular cases of NN? And isn’t the same true of GNNs? In a strict sense they are, but our focus on this course is in large scale problems involving high dimensional signals. In these settings NNs fail to scale. CNNs provide scalable learning for signals in time and space. GNNS support scalable learning for signals supported on graphs.
In this course we will cover graph convolutional filters and graph filter banks, before moving on to the study of single feature and multiple feature GNNs. We will also cover related architectures such as recurrent GNNs. Particular emphasis will be placed on studying the equivariance to permutation and stability to graph deformations of GNNs. These properties provide a measure of explanation respecting the good performance of GNNs that can be observed empirically. We will also study GNNs in the limi of large numbers of nodes to explain the transferability of GNNs across networks with different number of nodes.
Machine Learning on Graphs
Graphs can represent product or customer similarities in recommendation systems, agent interactions in multiagent robotics, or transceivers in a wireless communication network. Although otherwise disparate, these application domains share the presence of signals associated with nodes (ratings, perception or signal strength) out of which we want to extract some information (ratings of other products, control actions, or transmission opportunities). If data is available, we can formulate empirical risk minimization (ERM) problems to learn these data-to-information maps. However, it is a form of ERM in which a graph plays a central role in describing relationships between signal components. Therefore, one in which the graph should be leveraged. Graph Neural Networks (GNNs) are parametrizations of learning problems in general and ERM problems in particular that achieve this goal.
In any ERM problem we are given input-output pairs in a training set and we want to find a function that best approximates the input-output map according to a given risk. This function is later used to estimate the outputs associated with inputs that were not part of the training set. We say that the function has been trainedand that we have learned to estimate outputs. This simple statement hides the well known fact that ERM problems are nonsensical unless we make assumptions on how the function generalizes from the training set to unobserved samples. We can, for instance, assume that the map is linear, or, to be in tune with the times, that the map is a neural network.
If properly trained, the linear map and the GNN will make similar predictions for the samples they have observed. But they will make different predictions for unobserved samples. That is, they will generalize differently. An important empirical observation is that neither the linear transform not the neural network will generalize particularly well unless we are considering problems with a small number of variables. This is something that we could call the fundamental problem of machine learning. How do we devise a method that can handle large dimensional signals? GNNs and CNNs respond that question in the same way: With the use of convolutional filters.
Graph Filters and Graphs Neural Networks
We have seen that a characteristic shared by arbitrary linear and fully connected neural network parametrizations is that they do not scale well with the dimensionality of the input signals. This is best known in the case of signals in Euclidean space (time and images) where scalable linear processing is based on convolutional filters and scalable nonlinear processing is based on convolutional neural networks (CNNs). The reason for this is the ability of convolutions to exploit the structure of Euclidean space.
In this course we will describe graph filters and graph neural networks as analogous of convolutional filters and CNNs, but adapted to the processing of signals supported on graphs. Both of these concepts are simple. A graph filter is a polynomial on a matrix representation of the graph. Out of this definition we build a graph perceptron with the addition of a pointwise nonlinear function to process the output of a graph filter. Graph perceptrons are composed (or layered) to build a multilayer GNN. And individual layers are augmented from single filters to filter banks to build multiple feature GNNs.
Equivariance, Stability and Transferability
The relevant question at this juncture is whether graph filters and GNNs do for signals supported on graphs what convolutional filters and CNNs do for Euclidean data. To wit, do they enable scalable processing of signals supported on graphs? A growing body of empirical work shows that this is true to some extent — although results are not as impressive as is the case of voice and image processing. As an example that we can use to illustrate the advantages of graph filters and GNNs, consider a recommendation system in which we want to use past ratings that customers have given to products to predict future ratings. Collaborative filtering solutions build a graph of product similarities and interpret the ratings of separate customers as signals supported on the product similarity graph. We then use past ratings to construct a training set and learn to fill in the ratings that a given customer would give to products not yet rated. Empirical results do show that graph filters and GNNs work in recommendation systems with large number of products in which linear maps and fully connected neural networks fail. In fact, it is easy enough to arrive at three empirical observations that motivate this paper:
- (O1) Graph filters produce better rating estimates than arbitrary linear parametrizations and GNNs produce better estimates than arbitrary (fully connected) neural networks.
- (O2) GNNs predict ratings better than graph filters.
- (O3) A GNN that is trained on a graph with a certain number of nodes can be executed in a graph with a larger number of nodes and still produce good rating estimates.
Observations (O1)-(O3) support advocacy for the use of GNNs, at least in recommendation systems. But they also spark three interesting questions: (Q1) Why do graph filters and GNNs outperform linear transformations and fully connected neural networks? (Q2) Why do GNNs outperform graph filters? (Q3) Why do GNNs transfer to networks with different number of nodes? In this paper we present three theoretical analyses that help to answer these questions:
- Equivariance. Graph filters and GNNs are equivariant to permutations of the graph.
- Stability. GNNs provide a better tradeoff between discriminability and stability to graph perturbations.
- Transferability. As graphs converge to a limit object, a graphon, GNN outputs converge to outputs of a corresponding limit object, a graphon neural network.
These properties show that GNNs have strong generalization potential. Equivariance to permutations implies that nodes with analogous neighbor sets making analogous observations perform the same operations. Thus, we can learn to, say, fill in the ratings of a product from the ratings of another product in another part of the network if the local structures of the graph are the same. This helps explain why graph filters outperform linear transforms and GNNs outperform fully connected neural networks [cf. observation (O1)]. Stability to graph deformations affords a much stronger version of this statement. We can learn to generalize across different products if the local neighborhood structures are similar, not necessarily identical. Since GNNs possess better stability than graph filters for the same level of discriminability, this helps explain why GNNs outperform graph filters [cf. observation (O2)]. The convergence of GNNs towards graphon neural networks delineated under the transferability heading explains why GNNs can be trained and executed in graphs of different sizes [cf. observation (O3)]. It is germane to note that analogous of these properties hold for CNNs. They are equivariant to translations and stable to deformations of Euclidean space and have well defined continuous time limits.