[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Last update: Jan 03, 2023

Related tags

Deep Learning Non-Homophily-Large-Scale

Overview

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Derek Lim*, Felix Hohne*, Xiuyu Li*, Sijia Linda Huang, Vaishnavi Gupta, Omkar Bhalerao, Ser-Nam Lim

Published at NeurIPS 2021

Here are codes to load our proposed datasets, compute our measure of homophily, and train various graph machine learning models in our experimental setup. We include an implementation of the new graph neural network LINKX that we develop.

Organization

main.py contains the main full batch experimental scripts.

main_scalable.py contains the minibatching experimental scripts.

parse.py contains flags for running models with specific settings and hyperparameters.

dataset.py loads our datasets.

models.py contains implementations for graph machine learning models, though C&S (correct_smooth.py, cs_tune_hparams.py) are in separate files. Running several of the GNN models on larger datasets may require at least 24GB of VRAM. Our LINKX model is implemented in this file.

homophily.py contains functions for computing homophily measures, including the one that we introduce in our_measure.

experiments/ contains the bash files to reproduce full batch experiments.

scalable_experiments/ contains the bash files to reproduce minibatching experiments.

wiki_scraping/ contains the Python scripts to reproduce the "wiki" dataset by querying the Wikipedia API and cleaning up the data.

Datasets

As discussed in the paper, our proposed datasets are "genius", "twitch-gamer", "fb100", "pokec", "wiki", "arxiv-year", and "snap-patents", which can be loaded by load_nc_dataset in dataset.py by passing in their respective string name. Many of these datasets are included in the data/ directory, but wiki, twitch-gamer, snap-patents, and pokec are automatically downloaded from a Google drive link when loaded from dataset.py. The arxiv-year dataset is downloaded using OGB downloaders. load_nc_dataset returns an NCDataset, the documentation for which is also provided in dataset.py. It is functionally equivalent to OGB's Library-Agnostic Loader for Node Property Prediction, except for the fact that it returns torch tensors. See the OGB website for more specific documentation. Just like the OGB function, dataset.get_idx_split() returns fixed dataset split for training, validation, and testing.

When there are multiple graphs (as in the case of fb100), different ones can be loaded by passing in the sub_dataname argument to load_nc_dataset in dataset.py. In particular, fb100 consists of 100 graphs. We only include ["Amherst41", "Cornell5", "Johns Hopkins55", "Penn94", "Reed98"] in this repo, although others may be downloaded from the internet archive. In the paper we test on Penn94.

References

The datasets come from a variety of sources, as listed here:

Penn94. Traud et al 2012. Social Structure of Facebook Networks
pokec. Leskovec et al. Stanford Network Analysis Project
arXiv-year. Hu et al 2020. Open Graph Benchmark
snap-patents. Leskovec et al. Stanford Network Analysis Project
genius. Lim and Benson 2020. Expertise and Dynamics within Crowdsourced Musical Knowledge Curation: A Case Study of the Genius Platform
twitch-gamers. Rozemberczki and Sarkar 2021. Twitch Gamers: a Dataset for Evaluating Proximity Preserving and Structural Role-based Node Embeddings
wiki. Collected by the authors of this work in 2021.

Installation instructions

Create and activate a new conda environment using python=3.8 (i.e. conda create --name non-hom python=3.8)
Activate your conda environment
Check CUDA version using nvidia-smi
run bash install.sh cu110, replacing cu110 with your CUDA version (CUDA 11 -> cu110, CUDA 10.2 -> cu102, CUDA 10.1 -> cu101). We tested on Ubuntu 18.04, CUDA 11.0.

Running experiments

Make sure a results folder exists in the root directory.
Our experiments are in the experiments/ and scalable_experiments/ directory. There are bash scripts for running methods on single and multiple datasets. Please note that the experiments must be run from the root directory, e.g. (bash experiments/mixhop_exp.sh snap-patents). For instance, to run the MixHop experiments on arxiv-year, use:

bash experiments/mixhop_exp.sh arxiv-year

To run LINKX on pokec, use:

bash experiments/linkx_exp.sh pokec

To run LINK on Penn94, use:

bash experiments/link_exp.sh fb100 Penn94

To run GCN-cluster on twitch-gamers, use:

bash scalable_experiments/gcn_cluster.sh twitch-gamer

To run LINKX minibatched on wiki, use

bash scalable_experiments/linkx_exp.sh wiki

To run LINKX on Geom-GCN with full hyperparameter grid on chameleon, use

bash experiments/linkx_tuning.sh chameleon

[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Related tags

Overview

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Organization

Datasets

References

Installation instructions

Running experiments

Owner

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

Code for ICMI2020 and ICMI2021 papers: "Studying Person-Specific Pointing and Gaze Behavior for Multimodal Referencing of Outside Objects from a Moving Vehicle" and "ML-PersRef: A Machine Learning-based Personalized Multimodal Fusion Approach for Referencing Outside Objects From a Moving Vehicle"

Code for STFT Transformer used in BirdCLEF 2021 competition.

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

Editing a Conditional Radiance Field

MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.

GenshinMapAutoMarkTools - Tools To add/delete/refresh resources mark in Genshin Impact Map

A library for uncertainty representation and training in neural networks.

My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs (GNN, GAT, GraphSAGE, GCN)

Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Official implementation of DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations in TensorFlow 2

Deep Residual Networks with 1K Layers

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

PyTorch implementation of EfficientNetV2

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation

Medical image analysis framework merging ANTsPy and deep learning

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes