Transformers are Graph Neural Networks!

Overview

🚀 Gated Graph Transformers

Gated Graph Transformers for graph-level property prediction, i.e. graph classification and regression.

Associated article: Transformers are Graph Neural Networks, by Chaitanya K. Joshi, published with The Gradient.

This repository is a continuously updated personal project to build intuitions about and track progress in Graph Representation Learning research. I aim to develop the most universal and powerful model which unifies state-of-the-art architectures from Graph Neural Networks and Transformers, without incorporating domain-specific tricks.

Gated Graph Transformer

Key Architectural Ideas

🤖 Deep, Residual Transformer Backbone

  • As the backbone architecture, I borrow the two-sub-layered, pre-normalization variant of Transformer encoders that has emerged as the standard in the NLP community, e.g. GPT-3. Each Transformer block consists of a message-passing sub-layer followed by a node-wise feedforward sub-layer. The graph convolution is described later.
  • The feedforward sub-layer projects node embeddings to an absurdly large dimension, passes them through a non-linear activation function, does dropout, and reduces back to the original embedding dimension.
  • The Transformer backbone enables training very deep and extremely overparameterized models. Overparameterization is important for performance in NLP and other combinatorially large domains, but was previously not possible for GNNs trained on small graph classifcation datasets. Coupled with unique node positional encodings (described later) and the feedforward sub-layer, overparameterization ensures that our GNN is Turing Universal (based on A. Loukas's recent insightful work, including this paper).

✉️ Anisotropic Graph Convolutions


Source: 'Deep Parametric Continuous Convolutional Neural Networks', Wang et al., 2018

  • As the graph convolution layer, I use the Gated Graph Convolution with dense attention mechanism, which we found to be the best performing graph convolution in Benchmarking GNNs. Intuitively, Gated GraphConv generalizes directional CNN filters for 2D images to arbitrary graphs by learning a weighted aggregations over the local neighbors of each node. It upgrades the node-to-node attention mechanism from GATs and MoNet (i.e. one attention weight per node pair) to consider dense feature-to-feature attention (i.e. d attention weights for pairs of d-dimensional node embeddings).
  • Another intuitive motivation for the Gated GraphConv is as a learnable directional diffusion process over the graph, or as a coupled PDE over node and edge features in the graph. Gated GraphConv makes the diffusion process/neighborhood aggregation anisotropic or directional, countering oversmoothing/oversquashing of features and enabling deeper models.
  • This graph convolution was originally proposed as a sentence encoder for NLP and further developed at NTU for molecule generation and combinatorial optimization. Evidently, I am partial to this idea. At the same time, it is worth noting that anisotropic local aggregations and generalizations of directed CNN filters have demonstrated strong performance across a myriad of applications, including 3D point clouds, drug discovery, material science, and programming languages.

🔄 Graph Positional Encodings


Source: 'Geometric Deep Learning: Going beyond Euclidean Data', Bronstein et al., 2017

  • I use the top-k non-trivial Laplacian Eigenvectors as unique node identifiers to inject structural/positional priors into the Transformer backbone. Laplacian Eigenvectors are a generalization of sinusoidal positional encodings from the original Transformers, and were concurrently proposed in the Benchmarking GNNs, EigenGNNs, and GCC papers.
  • Randomly flipping the sign of Laplacian Eigenvectors during training (due to symmetry) can be seen as an additional data augmentation or regularization technique, helping delay overfitting to training patterns. Going further, the Directional Graph Networks paper presents a more principled approach for using Laplacian Eigenvectors.

Some ideas still in the pipeline include:

  • Graph-specific Normalization - Originally motivated in Benchmarking GNNs as 'graph size normalization', there have been several subsequent graph-specific normalization techniques such as GraphNorm and MessageNorm, aiming to replace or augment standard Batch Normalization. Intuitively, there is room for improvement as BatchNorm flattens mini-batches of graphs instead of accounting for the underlying graph structure.

  • Theoretically Expressive Aggregation - There are several exciting ideas aiming to bridge the gap between theoretical expressive power, computational feasability, and generalization capacity for GNNs: PNA-style multi-head aggregation and scaling, generalized aggreagators from DeeperGCNs, pre-computing structural motifs as in GSN, etc.

  • Virtual Node and Low Rank Global Attention - After the message-passing step, the virtual node trick adds messages to-and-fro a virtual/super node connected to all graph nodes. LRGA comes with additional theretical motivations but does something similar. Intuitively, these techniques enable modelling long range or latent interactions in graphs and counter the oversquashing problem with deeper networks.

  • General Purpose Pre-training - It isn't truly a Transformer unless its pre-trained on hundreds of GPUs for thousands of hours...but general purpose pre-training for graph representation learning remains an open question!

Installation and Usage

# Create new Anaconda environment
conda create -n new-env python=3.7
conda activate new-env
# Install PyTorch 1.6 for CUDA 10.x
conda install pytorch=1.6 cudatoolkit=10.x -c pytorch
# Install DGL for CUDA 10.x
conda install -c dglteam dgl-cuda10.x
# Install other dependencies
conda install tqdm scikit-learn pandas urllib3 tensorboard
pip install -U ogb

# Train GNNs on ogbg-mol* datasets
python main_mol.py --dataset [ogbg-molhiv/ogbg-molpcba] --gnn [gated-gcn/gcn/mlp]

# Prepare submission for OGB leaderboards
bash scripts/ogbg-mol*.sh

# Collate results for submission
python submit.py --dataset [ogbg-molhiv/ogbg-molpcba] --expt [path-to-logs]

Note: The code was tested on Ubuntu 16.04, using Python 3.6, PyTorch 1.6 and CUDA 10.1.

Citation

@article{joshi2020transformers,
  author = {Joshi, Chaitanya K},
  title = {Transformers are Graph Neural Networks},
  journal = {The Gradient},
  year = {2020},
  howpublished = {\url{https://thegradient.pub/transformers-are-gaph-neural-networks/ } },
}
Owner
Chaitanya Joshi
Research Engineer at A*STAR, working on Graph Neural Networks
Chaitanya Joshi
This is the implementation of the paper LiST: Lite Self-training Makes Efficient Few-shot Learners.

LiST (Lite Self-Training) This is the implementation of the paper LiST: Lite Self-training Makes Efficient Few-shot Learners. LiST is short for Lite S

Microsoft 28 Dec 07, 2022
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

Microsoft 408 Dec 30, 2022
OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion.

OstrichRL This is the repository accompanying the paper OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion. It contain

Vittorio La Barbera 51 Nov 17, 2022
In this work, we will implement some basic but important algorithm of machine learning step by step.

WoRkS continued English 中文 Français Probability Density Estimation-Non-Parametric Methods(概率密度估计-非参数方法) 1. Kernel / k-Nearest Neighborhood Density Est

liziyu0104 1 Dec 30, 2021
Trax — Deep Learning with Clear Code and Speed

Trax — Deep Learning with Clear Code and Speed Trax is an end-to-end library for deep learning that focuses on clear code and speed. It is actively us

Google 7.3k Dec 26, 2022
Code for "ATISS: Autoregressive Transformers for Indoor Scene Synthesis", NeurIPS 2021

ATISS: Autoregressive Transformers for Indoor Scene Synthesis This repository contains the code that accompanies our paper ATISS: Autoregressive Trans

138 Dec 22, 2022
An index of algorithms for learning causality with data

awesome-causality-algorithms An index of algorithms for learning causality with data. Please cite our survey paper if this index is helpful. @article{

Ruocheng Guo 2.3k Jan 08, 2023
Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

csuhan 64 Jan 07, 2023
TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022)

TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022) Ziang Cao and Ziyuan Huang and Liang Pan and Shiwei Zhang and Ziwei Liu and Changhong Fu In

Intelligent Vision for Robotics in Complex Environment 100 Dec 19, 2022
Official code release for 3DV 2021 paper Human Performance Capture from Monocular Video in the Wild.

Official code release for 3DV 2021 paper Human Performance Capture from Monocular Video in the Wild.

Chen Guo 58 Dec 24, 2022
Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation Source code of Robust On-Policy Data Collection for Data-Efficient Policy Evalua

Autonomous Agents Research Group (University of Edinburgh) 2 Oct 09, 2022
AFL binary instrumentation

E9AFL --- Binary AFL E9AFL inserts American Fuzzy Lop (AFL) instrumentation into x86_64 Linux binaries. This allows binaries to be fuzzed without the

242 Dec 12, 2022
A modular PyTorch library for optical flow estimation using neural networks

A modular PyTorch library for optical flow estimation using neural networks

neu-vig 113 Dec 20, 2022
Region-aware Contrastive Learning for Semantic Segmentation, ICCV 2021

Region-aware Contrastive Learning for Semantic Segmentation, ICCV 2021 Abstract Recent works have made great success in semantic segmentation by explo

Hanzhe Hu 30 Dec 29, 2022
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 03, 2023
Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

105 Dec 23, 2022
Code for Mesh Convolution Using a Learned Kernel Basis

Mesh Convolution This repository contains the implementation (in PyTorch) of the paper FULLY CONVOLUTIONAL MESH AUTOENCODER USING EFFICIENT SPATIALLY

Yi_Zhou 35 Jan 03, 2023
realsense d400 -> jpg + csv

Realsense-capture realsense d400 - jpg + csv Requirements RealSense sdk : Installation Python3 pyrealsense2 (RealSense SDK) Numpy OpenCV Tkinter Run

Ar-Ray 2 Mar 22, 2022
RetinaFace: Deep Face Detection Library in TensorFlow for Python

RetinaFace is a deep learning based cutting-edge facial detector for Python coming with facial landmarks.

Sefik Ilkin Serengil 512 Dec 29, 2022
Deep High-Resolution Representation Learning for Human Pose Estimation

Deep High-Resolution Representation Learning for Human Pose Estimation (accepted to CVPR2019) News If you are interested in internship or research pos

HRNet 167 Dec 27, 2022