CodeContests is a competitive programming dataset for machine-learning

Overview

CodeContests

CodeContests is a competitive programming dataset for machine-learning. This dataset was used when training AlphaCode.

It consists of programming problems, from a variety of sources:

Site URL Source
Aizu https://judge.u-aizu.ac.jp CodeNet
AtCoder https://atcoder.jp CodeNet
CodeChef https://www.codechef.com description2code
Codeforces https://codeforces.com description2code and Codeforces
HackerEarth https://www.hackerearth.com description2code

Problems include test cases in the form of paired inputs and outputs, as well as both correct and incorrect human solutions in a variety of languages.

Usage

Install the Cloud SDK, which provides the gsutil utility. You can then download the full data (~3GiB) with, e.g:

gsutil -m cp -r gs://dm-code_contests /tmp

The data consists of ContestProblem protocol buffers in Riegeli format. See contest_problem.proto for the protocol buffer definition and documentation of its fields.

The dataset contains three splits:

Split Filename
Training code_contests_train.riegeli-*-of-00128
Validation code_contests_valid.riegeli
Test code_contests_test.riegeli

There is example code for iterating over the dataset in C++ (in print_names.cc) and Python (in print_names_and_sources.py). For example, you can print the source and name of each problem in the validation data by installing bazel and then running:

bazel run -c opt \
  :print_names_and_sources /tmp/dm-code_contests/code_contests_valid.riegeli

Or do the same for the training data with the following command (which will print around 13000 lines of output):

bazel run -c opt \
  :print_names_and_sources /tmp/dm-code_contests/code_contests_train.riegeli*

Planned updates

We plan to update this repository with code for executing and evaluating potential solutions.

Citing this work

If you use this dataset or code, please cite this paper:

@misc{alphacode,
    title={Competition-Level Code Generation with AlphaCode},
    author={Li, Yujia and Choi, David and Chung, Junyoung and Kushman, Nate and
    Schrittwieser, Julian and Leblond, Rémi and Eccles, Tom and
    Keeling, James and Gimeno, Felix and Dal Lago, Agustin and
    Hubert, Thomas and Choy, Peter and de Masson d'Autume, Cyprien and
    Babuschkin, Igor and Chen, Xinyun and Huang, Po-Sen and Welbl, Johannes and
    Gowal, Sven and Cherepanov, Alexey and Molloy, James and
    Mankowitz, Daniel and Sutherland Robson, Esme and Kohli, Pushmeet and
    de Freitas, Nando and Kavukcuoglu, Koray and Vinyals, Oriol},
    year={2022},
    month={Feb}}

License

The code is licensed under the Apache 2.0 License.

All non-code materials provided are made available under the terms of the CC BY 4.0 license (Creative Commons Attribution 4.0 International license).

We gratefully acknowledge the contributions of the following:

Use of the third-party software, libraries code or data may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code may be subject to any such terms. We make no representations here with respect to rights or abilities to use any such materials.

Disclaimer

This is not an official Google product.

Owner
DeepMind
DeepMind
Deep learning for Engineers - Physics Informed Deep Learning

SciANN: Neural Networks for Scientific Computations SciANN is a Keras wrapper for scientific computations and physics-informed deep learning. New to S

SciANN 195 Jan 03, 2023
Pytorch implementation of Learning Rate Dropout.

Learning-Rate-Dropout Pytorch implementation of Learning Rate Dropout. Paper Link: https://arxiv.org/pdf/1912.00144.pdf Train ResNet-34 for Cifar10: r

42 Nov 25, 2022
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

LancoPKU 105 Jan 03, 2023
A3C LSTM Atari with Pytorch plus A3G design

NEWLY ADDED A3G A NEW GPU/CPU ARCHITECTURE OF A3C FOR SUBSTANTIALLY ACCELERATED TRAINING!! RL A3C Pytorch NEWLY ADDED A3G!! New implementation of A3C

David Griffis 532 Jan 02, 2023
PyTorch implementation of "Image-to-Image Translation Using Conditional Adversarial Networks".

pix2pix-pytorch PyTorch implementation of Image-to-Image Translation Using Conditional Adversarial Networks. Based on pix2pix by Phillip Isola et al.

mrzhu 383 Dec 17, 2022
StyleGAN2-ADA-training-jupyter - Training custom datasets in styleGAN2-ADA by NVIDIA using Jupyter

styleGAN2-ADA-training-jupyter Training custom datasets in styleGAN2-ADA on Jupyter Official StyleGAN2-ADA by NIVIDIA Paper Training Generative Advers

Mang Su Hyun 2 Feb 24, 2022
Distributed Arcface Training in Pytorch

Distributed Arcface Training in Pytorch

3 Nov 23, 2021
Torch code for our CVPR 2018 paper "Residual Dense Network for Image Super-Resolution" (Spotlight)

Residual Dense Network for Image Super-Resolution This repository is for RDN introduced in the following paper Yulun Zhang, Yapeng Tian, Yu Kong, Bine

Yulun Zhang 494 Dec 30, 2022
202 Jan 06, 2023
:fire: 2D and 3D Face alignment library build using pytorch

Face Recognition Detect facial landmarks from Python using the world's most accurate face alignment network, capable of detecting points in both 2D an

Adrian Bulat 6k Dec 31, 2022
This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

DCL-PyTorch Pytorch implementation for the Dynamic Concept Learner (DCL). More details can be found at the project page. Framework Grounding Physical

Zhenfang Chen 31 Jan 06, 2023
Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019

Class-Balanced Loss Based on Effective Number of Samples Tensorflow code for the paper: Class-Balanced Loss Based on Effective Number of Samples Yin C

Yin Cui 546 Jan 08, 2023
Differentiable rasterization applied to 3D model simplification tasks

nvdiffmodeling Differentiable rasterization applied to 3D model simplification tasks, as described in the paper: Appearance-Driven Automatic 3D Model

NVIDIA Research Projects 336 Dec 30, 2022
Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV

Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV File YOLOv3 weight can be downloaded

Ngoc Quyen Ngo 2 Mar 27, 2022
Code related to the manuscript "Averting A Crisis In Simulation-Based Inference"

Abstract We present extensive empirical evidence showing that current Bayesian simulation-based inference algorithms are inadequate for the falsificat

Montefiore Artificial Intelligence Research 3 Nov 14, 2022
Machine learning algorithms for many-body quantum systems

NetKet NetKet is an open-source project delivering cutting-edge methods for the study of many-body quantum systems with artificial neural networks and

NetKet 413 Dec 31, 2022
A large-image collection explorer and fast classification tool

IMAX: Interactive Multi-image Analysis eXplorer This is an interactive tool for visualize and classify multiple images at a time. It written in Python

Matias Carrasco Kind 23 Dec 16, 2022
Multi-Anchor Active Domain Adaptation for Semantic Segmentation (ICCV 2021 Oral)

Multi-Anchor Active Domain Adaptation for Semantic Segmentation Munan Ning*, Donghuan Lu*, Dong Wei†, Cheng Bian, Chenglang Yuan, Shuang Yu, Kai Ma, Y

Munan Ning 36 Dec 07, 2022
TrackTech: Real-time tracking of subjects and objects on multiple cameras

TrackTech: Real-time tracking of subjects and objects on multiple cameras This project is part of the 2021 spring bachelor final project of the Bachel

5 Jun 17, 2022
FlingBot: The Unreasonable Effectiveness of Dynamic Manipulations for Cloth Unfolding

This repository contains code for training and evaluating FlingBot in both simulation and real-world settings on a dual-UR5 robot arm setup for Ubuntu 18.04

Columbia Artificial Intelligence and Robotics Lab 70 Dec 06, 2022