Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Last update: Aug 09, 2022

Related tags

Deep Learning grokking

Overview

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Paper

Original paper can be found here

Datasets

I'm not super clear on how they defined their division. I am using integer division:

$$x\circ y = (x // y) mod p$$, for some prime $$p$$ and $$0\leq x,y \leq p$$
$$x\circ y = (x // y) mod p$$ if y is odd else (x - y) mod p, for some prime $$p$$ and $$0\leq x,y \leq p$$

Hyperparameters

The default hyperparameters are from the paper, but can be adjusted via the command line when running train.py

Running experiments

To run with default settings, simply run python train.py. The first time you train on any dataset you have to specify --force_data.

Arguments:

optimizer args

"--lr", type=float, default=1e-3
"--weight_decay", type=float, default=1
"--beta1", type=float, default=0.9
"--beta2", type=float, default=0.98

model args

"--num_heads", type=int, default=4
"--layers", type=int, default=2
"--width", type=int, default=128

data args

"--data_name", type=str, default="perm", choices=[
- "perm_xy", # permutation composition x * y
- "perm_xyx1", # permutation composition x * y * x^-1
- "perm_xyx", # permutation composition x * y * x
- "plus", # x + y
- "minus", # x - y
- "div", # x / y
- "div_odd", # x / y if y is odd else x - y
- "x2y2", # x^2 + y^2
- "x2xyy2", # x^2 + y^2 + xy
- "x2xyy2x", # x^2 + y^2 + xy + x
- "x3xy", # x^3 + y
- "x3xy2y" # x^3 + xy^2 + y ]
"--num_elements", type=int, default=5 (choose 5 for permutation data, 97 for arithmetic data)
"--data_dir", type=str, default="./data"
"--force_data", action="store_true", help="Whether to force dataset creation."

training args

"--batch_size", type=int, default=512
"--steps", type=int, default=10**5
"--train_ratio", type=float, default=0.5
"--seed", type=int, default=42
"--verbose", action="store_true"
"--log_freq", type=int, default=10
"--num_workers", type=int, default=4

Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Related tags

Overview

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Paper

Datasets

Hyperparameters

Running experiments

Arguments:

optimizer args

model args

data args

training args

Owner

Tom Lieberum

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

This folder contains the python code of UR5E's advanced forward kinematics model.

Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)

OptNet: Differentiable Optimization as a Layer in Neural Networks

Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

A practical ML pipeline for data labeling with experiment tracking using DVC.

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

Code for "On Memorization in Probabilistic Deep Generative Models"

Self-supervised learning on Graph Representation Learning (node-level task)

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, CVPR 2019.

[CVPR 2022 Oral] Balanced MSE for Imbalanced Visual Regression https://arxiv.org/abs/2203.16427

Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

New AidForBlind - Various Libraries used like OpenCV and other mentioned in Requirements.txt

[NeurIPS2021] Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

A quantum game modeling of pandemic (QHack 2022)

Fight Recognition from Still Images in the Wild @ WACVW2022, Real-world Surveillance Workshop