Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Last update: Feb 23, 2022

Overview

Accelerated Sparse Neural Training: A Provable and Efficient Method to FindN:M Transposable Masks

Recently, researchers proposed pruning deep neural network weights (DNNs) using an $N:M$ fine-grained block sparsity mask. In this mask, for each block of M weights, we have at least N zeros. In contrast to unstructured sparsity, N:M fine-grained block sparsity allows acceleration in actual modern hardware. Previously suggested solutions enabled DNN acceleration at the inference phase. To also allow such acceleration in the training phase, we suggest a novel transposable-fine-grained sparsity mask where the same mask can be used for both forward and backward passes. Our transposable mask ensures that both the weight matrix and its transpose follow the same sparsity pattern; thus the matrix multiplication required for passing the error backward can also be accelerated. We discuss the transposable constraint and devise a new measure for mask constraints, called mask-diversity (MD), which correlates with their expected accuracy. Lastly, we formulate the problem of finding the optimal transposable mask as a minimum-cost-flow problem and suggest a fast linear approximation that can be used when the masks dynamically change while training. Our experiments suggest 2x speed-up with no accuracy degradation over vision and language models. A reference implementation is available in the supplementary material.

Reproducing the results

This repository is partially based on convNet.pytorch repo. please ensure that you are using pytorch 1.7+. Reproducing AdaPrune results

cd AdaPrune
sh scripts/adaprune_dense_bnt.sh
sh scripts/adaprune_sparse.sh

Reproducing static NM-transposable starting from dense pre-trained model:

cd static_TNM
sh scripts/prune_pretrained_R50.sh

Reproducing dynamic NM-transposable from scratch:

cd dynamic_TNM
sh scripts/clone_and_copy.sh
sh scripts/run_R18.sh
sh scripts/run_R50.sh

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Related tags

Overview

Accelerated Sparse Neural Training: A Provable and Efficient Method to FindN:M Transposable Masks

Reproducing the results

Owner

itay hubara

IEEEXtreme15.0 Questions And Answers

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

ETM - R package for Topic Modelling in Embedding Spaces

Grover is a model for Neural Fake News -- both generation and detectio

Simple bots or Simbots is a library designed to create simple bots using the power of python. This library utilises Intent, Entity, Relation and Context model to create bots .

Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Semantic search for quotes.

To classify the News into Real/Fake using Features from the Text Content of the article

Subtitle Workshop (subshop): tools to download and synchronize subtitles

A fast hierarchical dimensionality reduction algorithm.

Binaural Speech Synthesis

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

LeBenchmark: a reproducible framework for assessing SSL from speech

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

SDL: Synthetic Document Layout dataset

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers