Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Last update: Jan 04, 2023

Related tags

Overview

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

This repository is built upon BEiT, thanks very much!

Now, we only implement the pretrain process according to the paper, and can't guarantee the performance reported in the paper can be reproduced!

Difference

At the same time, shuffle and unshuffle operations don't seem to be directly accessible in pytorch, so we use another method to realize this process:

For shuffle, we used the method of randomly generating mask-map (14x14) in BEiT, where mask=0 illustrates keep the token, mask=1 denotes drop the token (not participating caculation in Encoder). Then all visible tokens (mask=0) are put into encoder network.
For unshuffle, we get the postion embeddings (with adding the shared mask token) of all mask tokens according to the mask-map and then concate them with the visible tokens (from encoder), and put them into the decoder network to recontrust.

TODO

implement the finetune process
reuse the model in modeling_pretrain.py
caculate the normalized pixels target
add the cls token in the encoder
...

Setup

pip install -r requirements.txt

Run

# Set the path to save checkpoints
OUTPUT_DIR='output/'
# path to imagenet-1k train set
DATA_PATH='../ImageNet_ILSVRC2012/train'


OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 run_mae_pretraining.py \
        --data_path ${DATA_PATH} \
        --mask_ratio 0.75 \
        --model pretrain_mae_base_patch16_224 \
        --batch_size 128 \
        --opt_betas 0.9 0.95 \
        --warmup_epochs 40 \
        --epochs 1600 \
        --output_dir ${OUTPUT_DIR}

Note: the pretrain result is on the way ~

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Related tags

Overview

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Difference

TODO

Setup

Run

Owner

Zhiliang Peng

Python project to take sound as input and output as RGB + Brightness values suitable for DMX

A medical imaging framework for Pytorch

Sequence modeling benchmarks and temporal convolutional networks

DeepFaceLive - Live Deep Fake in python, Real-time face swap for PC streaming or video calls

ByteTrack(Multi-Object Tracking by Associating Every Detection Box)のPythonでのONNX推論サンプル

Efficient face emotion recognition in photos and videos

Auto grind btdb2 exp for tower

Speeding-Up Back-Propagation in DNN: Approximate Outer Product with Memory

A large-scale face dataset for face parsing, recognition, generation and editing.

Python package for dynamic system estimation of time series

Colour detection is necessary to recognize objects, it is also used as a tool in various image editing and drawing apps.

Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"

Some pre-commit hooks for OpenMMLab projects

Gray Zone Assessment

Final report with code for KAIST Course KSE 801.

Minimal PyTorch implementation of YOLOv3

PyTorch implementation of MulMON

GPT, but made only out of gMLPs

Pytorch tutorials for Neural Style transfert