Code for "On Memorization in Probabilistic Deep Generative Models"

Overview

On Memorization in Probabilistic Deep Generative Models

This repository contains the code necessary to reproduce the experiments in On Memorization in Probabilistic Deep Generative Models. You can also use this code to measure memorization in other types of probabilistic deep generative models. If you use our code in your own work please cite the paper using, for instance, the following BibTeX entry:

@article{van2021memorization,
  title={On Memorization in Probabilistic Deep Generative Models},
  author={{Van den Burg}, G. J. J. and Williams, C. K. I.},
  journal={arXiv preprint arXiv:2106.03216},
  year={2021}
}

If you have any questions or encounter an issue when using this code, please send an email to gertjanvandenburg at gmail dot com.

Introduction

The files in the scripts directory are needed to reproduce the experiments and generate the figures in the paper. The experiments are organized using the Makefile provided. To reproduce the experiments or recreate the figures from the analysis, you'll have to install a number of dependencies. We use PyTorch to implement the deep learning algorithms. If you don't wish to re-run all the models, you can download the result files used in the paper (see below).

The scripts are all written in Python, and the necessary external dependencies can be found in the requirements.txt file. These can be installed using:

$ pip install -r requirements.txt

To recreate the figures the following system dependencies are also needed: pdflatex, latexmk, lualatex, and make. These programs are available for all major platforms.

Reproducing the results

To train the models on the different data sets, you can run:

$ make memorization

Note that depending on your machine this may take some time, so it might be easier to simply download the result files instead. It is also worth mentioning that while we have made an effort to ensure reproducibility by setting the random seed in PyTorch, platform or package version differences may result in slightly different output files (see also PyTorch Reproducibility).

All figures in the paper are generated from the raw result files using Python scripts. First, the summarize.py script takes the raw result files and creates summary files for each data set. Next, the analysis scripts are used to generate the figures, most of which are LaTeX files that require compilation using PDFLaTeX or LuaLaTeX. Simply run:

$ make analysis

to create the summaries and the output files. When using the result files linked below this will give the exact same figures as shown in the paper.

Result files

Due to their size, the raw result files are not contained in this repository, but can be downloaded separately from this link (about 2.6GB). After downloading the results.zip file, unpack it and move the results directory to where you've cloned this repository (so adjacent to the scripts directory). Below is a concise overview of the necessary commands:

$ git clone https://github.com/alan-turing-institute/memorization
$ cd memorization
$ wget https://gertjanvandenburg.com/projects/memorization/results.zip # or download the file in some other way
$ unzip results.zip
$ touch results/*/*/*          # update modification time of the result files
$ make analysis                # optionally, run ``make -n analysis`` first to see what will happen

After unpacking the zip file, you can optionally verify the integrity of the results using the SHA-256 checksums provided:

$ sha256sum --check results.sha256

License

The code in this repository is licensed under the MIT license. See the LICENSE file for further details. Reuse of the code in this repository is allowed, but should cite our paper.

Notes

If you find any problems or have a suggestion for improvement of this repository, please let me know as it will help make this resource better for everyone.

Owner
The Alan Turing Institute
The UK's national institute for data science and artificial intelligence.
The Alan Turing Institute
A simple baseline for the 2022 IEEE GRSS Data Fusion Contest (DFC2022)

DFC2022 Baseline A simple baseline for the 2022 IEEE GRSS Data Fusion Contest (DFC2022) This repository uses TorchGeo, PyTorch Lightning, and Segmenta

isaac 24 Nov 28, 2022
A code repository associated with the paper A Benchmark for Rough Sketch Cleanup by Chuan Yan, David Vanderhaeghe, and Yotam Gingold from SIGGRAPH Asia 2020.

A Benchmark for Rough Sketch Cleanup This is the code repository associated with the paper A Benchmark for Rough Sketch Cleanup by Chuan Yan, David Va

33 Dec 18, 2022
Rethinking Nearest Neighbors for Visual Classification

Rethinking Nearest Neighbors for Visual Classification arXiv Environment settings Check out scripts/env_setup.sh Setup data Download the following fin

Menglin Jia 29 Oct 11, 2022
Jittor implementation of PCT:Point Cloud Transformer

PCT: Point Cloud Transformer This is a Jittor implementation of PCT: Point Cloud Transformer.

MenghaoGuo 547 Jan 03, 2023
AQP is a modular pipeline built to enable the comparison and testing of different quality metric configurations.

Audio Quality Platform - AQP An Open Modular Python Platform for Objective Speech and Audio Quality Metrics AQP is a highly modular pipeline designed

Jack Geraghty 24 Oct 01, 2022
Pytorch implementation of "A simple neural network module for relational reasoning" (Relational Networks)

Pytorch implementation of Relational Networks - A simple neural network module for relational reasoning Implemented & tested on Sort-of-CLEVR task. So

Kim Heecheol 800 Dec 05, 2022
Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

MusCaps: Generating Captions for Music Audio Ilaria Manco1 2, Emmanouil Benetos1, Elio Quinton2, Gyorgy Fazekas1 1 Queen Mary University of London, 2

Ilaria Manco 57 Dec 07, 2022
WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

30 Oct 28, 2022
A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Documentation | External Resources | Research Paper Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble. The

Benedek Rozemberczki 188 Dec 29, 2022
A foreign language learning aid using a neural network to predict probability of translating foreign words

Langy Langy is a reading-focused foreign language learning aid orientated towards young children. Reading is an activity that every child knows. It is

Shona Lowden 6 Nov 17, 2021
CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator

CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator This is the official code repository for NeurIPS 2021 paper: CARMS: Categorica

Alek Dimitriev 1 Jul 09, 2022
A Python library for Deep Probabilistic Modeling

Abstract DeeProb-kit is a Python library that implements deep probabilistic models such as various kinds of Sum-Product Networks, Normalizing Flows an

DeeProb-org 46 Dec 26, 2022
PFFDTD is an open-source FDTD simulator for 3D room acoustics

PFFDTD is an open-source FDTD simulator for 3D room acoustics

Brian Hamilton 34 Nov 24, 2022
Solving Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge

Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge Associated code for the paper Zero-Shot Learning in Named Entity Recognitio

Søren Hougaard Mulvad 13 Dec 25, 2022
A developer interface for creating Chat AIs for the Chai app.

ChaiPy A developer interface for creating Chat AIs for the Chai app. Usage Local development A quick start guide is available here, with a minimal exa

Chai 28 Dec 28, 2022
This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

BUPT GAMMA Lab 519 Jan 02, 2023
Get a Grip! - A robotic system for remote clinical environments.

Get a Grip! Within clinical environments, sterilization is an essential procedure for disinfecting surgical and medical instruments. For our engineeri

Jay Sharma 1 Jan 05, 2022
This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

Off-Belief Learning Introduction This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021. Environment Setup

Facebook Research 32 Jan 05, 2023
Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Segmenter: Transformer for Semantic Segmentation Segmenter: Transformer for Semantic Segmentation by Robin Strudel*, Ricardo Garcia*, Ivan Laptev and

594 Jan 06, 2023
PyTorch implementation for "HyperSPNs: Compact and Expressive Probabilistic Circuits", NeurIPS 2021

HyperSPN This repository contains code for the paper: HyperSPNs: Compact and Expressive Probabilistic Circuits "HyperSPNs: Compact and Expressive Prob

8 Nov 08, 2022