A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

Last update: Dec 27, 2022

Overview

SEAL

⠀⠀⠀

A PyTorch implementation of Semi-Supervised Graph Classification: A Hierarchical Graph Perspective (WWW 2019)

Abstract

Node classification and graph classification are two graph learning problems that predict the class label of a node and the class label of a graph respectively. A node of a graph usually represents a real-world entity, e.g., a user in a social network, or a protein in a protein-protein interaction network. In this work, we consider a more challenging but practically useful setting, in which a node itself is a graph instance. This leads to a hierarchical graph perspective which arises in many domains such as social network, biological network and document collection. For example, in a social network, a group of people with shared interests forms a user group, whereas a number of user groups are interconnected via interactions or common members. We study the node classification problem in the hierarchical graph where a `node' is a graph instance, e.g., a user group in the above example. As labels are usually limited in real-world data, we design two novel semi-supervised solutions named Semi-supervised graph classification via Cautious/Active Iteration (or SEAL-C/AI in short). SEAL-C/AI adopt an iterative framework that takes turns to build or update two classifiers, one working at the graph instance level and the other at the hierarchical graph level. To simplify the representation of the hierarchical graph, we propose a novel supervised, self-attentive graph embedding method called SAGE, which embeds graph instances of arbitrary size into fixed-length vectors. Through experiments on synthetic data and Tencent QQ group data, we demonstrate that SEAL-C/AI not only outperform competing methods by a significant margin in terms of accuracy/Macro-F1, but also generate meaningful interpretations of the learned representations.

This repository provides a PyTorch implementation of SEAL-CI as described in the paper:

Semi-Supervised Graph Classification: A Hierarchical Graph Perspective. Jia Li, Yu Rong, Hong Cheng, Helen Meng, Wenbing Huang, Junzhou Huang. WWW, 2019. [Paper]

A TensorFlow implementatio of the model is available [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          2.4
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
torch             1.1.0
torch-scatter     1.4.0
torch-sparse      0.4.3
torch-cluster     1.4.5
torch-geometric   1.3.2
torchvision       0.3.0

Datasets

Graphs

The code takes graphs for training from an input folder where each graph is stored as a JSON. Graphs used for testing are also stored as JSON files. Every node id and node label has to be indexed from 0. Keys of dictionaries are stored strings in order to make JSON serialization possible.

The graphs file has to be unzipped in the input folder.

Every JSON file has the following key-value structure:

{"edges": [[0, 1],[1, 2],[2, 3],[3, 4]],
 "features": {"0": ["A","B"], "1": ["B","K"], "2": ["C","F","A"], "3": ["A","B"], "4": ["B"]},
 "label": "A"}

The edges key has an edge list value which descibes the connectivity structure. The features key has features for each node which are stored as a dictionary -- within this nested dictionary features are list values, node identifiers are keys. The label key has a value which is the class membership.

Hierarchical graph

The hierarchical graph is stored as an edge list, where graph identifiers integers are the node identifiers. Finally, node pairs are separated by commas in the comma separated values file. This edge list file has a header.

Options

Training a SEAL-CI model is handled by the src/main.py script which provides the following command line arguments.

Input and output options

  --graphs                STR    Training graphs folder.      Default is `input/graphs/`.
  --hierarchical-graph    STR    Macro level graph.           Default is `input/synthetic_edges.csv`.

Model options

  --epochs                      INT     Number of epochs.                  Default is 10.
  --budget                      INT     Nodes to be added.                 Default is 20.
  --labeled-count               INT     Number of labeled instances.       Default is 100.
  --first-gcn-dimensions        INT     Graph level GCN 1st filters.       Default is 16.
  --second-gcn-dimensions       INT     Graph level GCN 2nd filters.       Default is 8.
  --first-dense-neurons         INT     SAGE aggregator neurons.           Default is 16.
  --second-dense-neurons        INT     SAGE attention neurons.            Default is 4.
  --macro-gcn-dimensions        INT     Macro level GCN neurons.           Default is 16.
  --weight-decay                FLOAT   Weight decay of Adam.              Defatul is 5*10^-5.
  --gamma                       FLOAT   Regularization parameter.          Default is 10^-5.
  --learning-rate               FLOAT   Adam learning rate.                Default is 0.01.

Examples

The following commands learn a model and score on the unlabaled instances. Training a model on the default dataset:

python src/main.py

Training each SEAL-CI model for a 100 epochs.

python src/main.py --epochs 100

Changing the budget size.

python src/main.py --budget 200

You might also like...

Unofficial PyTorch Implementation of AHDRNet (CVPR 2019)

AHDRNet-PyTorch This is the PyTorch implementation of Attention-guided Network for Ghost-free High Dynamic Range Imaging (CVPR 2019). The official cod

4 Sep 8, 2022

This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

42 Jan 7, 2023

An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

MixHop and N-GCN ⠀ A PyTorch implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019)

393 Dec 13, 2022

[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

FiGNN for CTR prediction The code and data for our paper in CIKM2019: Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Predicti

Big Data and Multi-modal Computing Group, CRIPAC

75 Dec 30, 2022

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

gHHC Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, D

35 Nov 16, 2022

《A-CNN: Annularly Convolutional Neural Networks on Point Clouds》(2019)

Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)

Certified Robustness to Adversarial Word Substitutions This is the official GitHub repository for the following paper: Certified Robustness to Adversa

38 Oct 16, 2022

Comments

question about python-cluster and python-scatter

Hello, I failed to build python-cluster 1.2.4 and python-scatter 1.1.2 with pytorch 0.4.1

It seems that python-scatter 1.0.4 can fit pytorch 0.4.1 However, I cant find proper verision for python-cluster

Thank you!

opened by gyc913 1
关于 RuntimeError: index 145 is out of bounds for dimension 0 with size 1 的报错

您好，我在运行您的代码的时候报错 RuntimeError: index 145 is out of bounds for dimension 0 with size 1，提示错误可能出现在node_features_1 = torch.nn.functional.relu(self.graph_convolution_1(features, edges))这一句处，涉及scatter.py。查了很久的资料，都没有解决。请问您知道是什么问题导致的吗？

opened by heyjiege 0
The details about json file

Hi, I have an question about the json file. In the graph folder, every json file is a dictionary include label,feature and edge, the feature is displayed by the index of the node, while the key is "cc_XX" and the "deg_4", so what does the "cc_XX" stand for? When I build my own dataset, how can I obtain the "cc_XX".

opened by ChenTao2017110 0

Releases(v_001)

v_001(Jul 27, 2021)

A PyTorch implementation of Semi-Supervised Graph Classification: A Hierarchical Graph Perspective (WWW 2019).
Source code(tar.gz)
Source code(zip)

A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

Related tags

Overview

SEAL

Abstract

Requirements

Datasets

Graphs

Hierarchical graph

Options

Input and output options

Model options

Examples

You might also like...

Unofficial PyTorch Implementation of AHDRNet (CVPR 2019)

This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

《A-CNN: Annularly Convolutional Neural Networks on Point Clouds》(2019)

《Deep Single Portrait Image Relighting》(ICCV 2019)

《Single Image Reflection Removal Beyond Linearity》(CVPR 2019)

Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)

Comments

question about python-cluster and python-scatter

关于 RuntimeError: index 145 is out of bounds for dimension 0 with size 1 的报错

The details about json file

Releases(v_001)

v_001(Jul 27, 2021)

Owner

Benedek Rozemberczki

PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

Very large and sparse networks appear often in the wild and present unique algorithmic opportunities and challenges for the practitioner

YOLOv5 in PyTorch > ONNX > CoreML > TFLite

A parametric soroban written with CADQuery.

A repository for interferometer controller code.

Sample Code for "Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL"

Info and sample codes for "NTU RGB+D Action Recognition Dataset"

FedScale: Benchmarking Model and System Performance of Federated Learning

Source code and Dataset creation for the paper "Neural Symbolic Regression That Scales"

An implementation of the proximal policy optimization algorithm

Second-order Attention Network for Single Image Super-resolution (CVPR-2019)

Pytorch implementation of the paper "Class-Balanced Loss Based on Effective Number of Samples"

A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.

A repo with study material, exercises, examples, etc for Devnet SPAUTO

Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging, ICCV2021 [PyTorch Code]

A Python Library for Graph Outlier Detection (Anomaly Detection)

PyTorch implementation of Pay Attention to MLPs

A tensorflow implementation of GCN-LPA

Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning

A toy project using OpenCV and PyMunk