Training data extraction on GPT-2

Last update: Dec 07, 2022

Related tags

Overview

Training data extraction from GPT-2

This repository contains code for extracting training data from GPT-2, following the approach outlined in the following paper:

Extracting Training Data from Large Language Models
Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel
USENIX Security Symposium, 2021
https://arxiv.org/abs/2012.07805

WARNING: The experiments in our paper relied on different non-public codebases, and also involved a large amount of manual labor. The code in this repository is thus not meant to exactly reproduce the paper's results, but instead to illustrate the paper's approach and to help others perform similar experiments.
The code in this repository has not been tested at the scale considered in the paper (600,000 generated samples) and might find memorized content at a lower (or higher) rate!

Installation

You will need transformers, pytorch and tqdm. The code was tested with transformers v3.0.2 and torch v1.5.1.

Extracting Data

Simply run

python3 extraction.py --N 1000 --batch-size 10

to generate 1000 samples with GPT-2 (XL). The samples are generated with top-k sampling (k=40) and an empty prompt.

The generated samples are ranked according to four membership inference metrics introduced in our paper:

The log perplexity of the GPT-2 (XL) model.
The ratio of the log perplexities of the GPT-2 (XL) model and the GPT-2 (S) model.
The ratio of the log perplexities for the generated sample and the same sample in lower-case letters.
The ratio of the log perplexity of GPT-2 (XL) and the sample's entropy estimated by Zlib.

The top 10 samples according to each metric are printed out. These samples are likely to contain verbatim text from the GPT-2 training data.

Conditioning on Internet text

In our paper, we found that prompting GPT-2 with small snippets of text taken from the Web increased the chance of the model generating memorized content.

To reproduce this attack, first download a slice of the Common Crawl dataset:

./download_cc.sh

This will download a sample of the Crawl from May 2021 (~350 MB) to a file called commoncrawl.warc.wet.

Then, we can run the extraction attack with Internet prompts:

python3 extraction.py --N 1000 --internet-sampling --wet-file commoncrawl.warc.wet

Sample outputs

Some interesting data that we extracted from GPT-2 can be found here.

Note that these were found among 600,000 generated samples. If you generate a much smaller number of samples (10,000 for example), you will be less likely to find memorized content.

Citation

If this code is useful in your research, you are encouraged to cite our academic paper:

@inproceedings{carlini21extracting,
  author = {Carlini, Nicholas and Tramer, Florian and Wallace, Eric and Jagielski, Matthew and Herbert-Voss, Ariel and Lee, Katherine and Roberts, Adam and Brown, Tom and Song, Dawn and Erlingsson, Ulfar and Oprea, Alina and Raffel, Colin},
  title = {Extracting Training Data from Large Language Models},
  booktitle = {USENIX Security Symposium},
  year = {2021},
  howpublished = {arXiv preprint arXiv:2012.07805},
  url = {https://arxiv.org/abs/2012.07805}
}

Training data extraction on GPT-2

Related tags

Overview

Training data extraction from GPT-2

Installation

Extracting Data

Conditioning on Internet text

Sample outputs

Citation

Owner

Florian Tramer

Imaginaire - NVIDIA's Deep Imagination Team's PyTorch Library

Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli

SW components and demos for visual kinship recognition. An emphasis is put on the FIW dataset-- data loaders, benchmarks, results in summary.

一个运行在 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 或 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 等定时面板的签到项目

Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern Estimation

Alternatives to Deep Neural Networks for Function Approximations in Finance

This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN

Unofficial implementation of MUSIQ (Multi-Scale Image Quality Transformer)

TigerLily: Finding drug interactions in silico with the Graph.

A new data augmentation method for extreme lighting conditions.

This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams.

Exploring the link between uncertainty estimates obtained via "exact" Bayesian inference and out-of-distribution (OOD) detection.

Facial detection, landmark tracking and expression transfer library for Windows, Linux and Mac

Controlling a game using mediapipe hand tracking

An experiment to bait a generalized frontrunning MEV bot

Source code for the paper "SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text" PACLIC 2021

Explore extreme compression for pre-trained language models

Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

Code for Mesh Convolution Using a Learned Kernel Basis

C3D is a modified version of BVLC caffe to support 3D ConvNets.