Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Last update: Dec 19, 2022

Related tags

Deep Learning incontext-learning

Overview

GINC small-scale in-context learning dataset

GINC (Generative In-Context learning Dataset) is a small-scale synthetic dataset for studying in-context learning. The pretraining data is generated by a mixture of HMMs and the in-context learning prompt examples are also generated from HMMs (either from the mixture or not). The prompt examples are out-of-distribution with respect to the pretraining data since every example is independent, concatenated, and separated by delimiters. We provide code to generate GINC-style datasets of varying vocabulary sizes, number of HMMs, and other parameters.

Quickstart

Please create a conda environment or virtualenv using the information in conda-env.yml, then install transformers by going into the transformers/ directory and running pip install -e .. Modify consts.sh to change the default output locations and insert code to activate the environment of choice. Run scripts/runner.sh to run all the experiments on sbatch.

Explore the data

The default dataset has vocab size 50 and the pretraining data is generated as a mixture of 5 HMMs. The pretraining dataset is in data/GINC_trans0.1_start10.0_nsymbols50_nvalues10_nslots10_vic0.9_nhmms10/train.json while in-context prompts are in data/GINC_trans0.1_start10.0_nsymbols50_nvalues10_nslots10_vic0.9_nhmms10/id_prompts_randomsample_*.json.

This repo contains the experiments for the paper An Explanation of In-context Learning as Implicit Bayesian Inference. If you found this repo useful, please cite

@article{xie2021incontext,
  author = {Sang Michael Xie and Aditi Raghunathan and Percy Liang and Tengyu Ma},
  journal = {arXiv preprint arXiv:2111.02080},
  title = {An Explanation of In-context Learning as Implicit Bayesian Inference},
  year = {2021},
}

Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Related tags

Overview

GINC small-scale in-context learning dataset

Quickstart

Explore the data

Owner

P-Lambda

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

Hcaptcha-challenger - Gracefully face hCaptcha challenge with Yolov5(ONNX) embedded solution

(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

yolox_backbone is a deep-learning library and is a collection of YOLOX Backbone models.

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Stacked Recurrent Hourglass Network for Stereo Matching

Official code for article "Expression is enough: Improving traﬀic signal control with advanced traﬀic state representation"

ELSED: Enhanced Line SEgment Drawing

Official release of MSHT: Multi-stage Hybrid Transformer for the ROSE Image Analysis of Pancreatic Cancer axriv: http://arxiv.org/abs/2112.13513

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

Concept drift monitoring for HA model servers.

MT3: Multi-Task Multitrack Music Transcription

Fast, accurate and reliable software for algebraic CT reconstruction

Joint Gaussian Graphical Model Estimation: A Survey

Implements the training, testing and editing tools for "Pluralistic Image Completion"

Certifiable Outlier-Robust Geometric Perception

Streamlit Tutorial (ex: stock price dashboard, cartoon-stylegan, vqgan-clip, stylemixing, styleclip, sefa)

Caffe models in TensorFlow

Learning nonlinear operators via DeepONet

This is the official code of L2G, Unrolling and Recurrent Unrolling in Learning to Learn Graph Topologies.