On Whole-Graph Embeddings from Node Feature Distributions

Touwen, L.E.

On Whole-Graph Embeddings from Node Feature Distributions

Triangle Count reveals Communities and improves Graph Neural Networks

Master thesis (2024)

Authors

L.E. Touwen Electrical Engineering, Mathematics and Computer Science

Contributors

J. Komjáthy Applied Probability (mentor)

Pawel Pralat Toronto Metropolitan University (mentor)

A. Heinlein Numerical Analysis (coach)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Graph Neural Networks Random Graphs Graph Embedding Wasserstein distance

To reference this document use:

http://resolver.tudelft.nl/uuid:ce12639d-688e-4c29-b8e4-d312ab520c30

More Info

expand_more

Published Date

02-07-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

We consider three topics motivated by the Network Exploration Toolkit (NEExT) for building unsupervised graph embeddings. NEExT vectorizes the graphs in a graph collection using the Wasserstein (optimal transport) distance between the distributions of node features of each graph. We inspect the effect of sampling only a proportion of the nodes of each graph, and show that even if the sampling fraction tends to zero (sufficiently slowly), so does the Wasserstein distance. NEExT relies on the assumption that local node features are informative to global characteristics of graphs. With this in mind, we consider triangle count: triangles are a local feature that correlate with the strength of a graph's community structure. We give asymptotic results for the number of triangles in a 2-community stochastic block model setting and the ABCD random graph model in both a scale-free and finite variance degree regime. Lastly, we show by experiment that augmenting the node features of graphs with triangle count improves a Graph Neural Network (GNN) in its ability to pick up on community structure and local clustering. For this, we use a random graph dataset with varying community structure, a random graph dataset with varying clustering coefficient, and a real-life dataset. We position random graph models as an effective tool for benchmarking the expressiveness of GNNs.

Files

MSc_Thesis_Lourens_Touwen.pdf

(pdf | 1.23 Mb)

Unknown license