On Whole-Graph Embeddings from Node Feature Distributions

Triangle Count reveals Communities and improves Graph Neural Networks

More Info
expand_more

Abstract

We consider three topics motivated by the Network Exploration Toolkit (NEExT) for building unsupervised graph embeddings. NEExT vectorizes the graphs in a graph collection using the Wasserstein (optimal transport) distance between the distributions of node features of each graph. We inspect the effect of sampling only a proportion of the nodes of each graph, and show that even if the sampling fraction tends to zero (sufficiently slowly), so does the Wasserstein distance. NEExT relies on the assumption that local node features are informative to global characteristics of graphs. With this in mind, we consider triangle count: triangles are a local feature that correlate with the strength of a graph's community structure. We give asymptotic results for the number of triangles in a 2-community stochastic block model setting and the ABCD random graph model in both a scale-free and finite variance degree regime. Lastly, we show by experiment that augmenting the node features of graphs with triangle count improves a Graph Neural Network (GNN) in its ability to pick up on community structure and local clustering. For this, we use a random graph dataset with varying community structure, a random graph dataset with varying clustering coefficient, and a real-life dataset. We position random graph models as an effective tool for benchmarking the expressiveness of GNNs.