Implementing DropPath/StochasticDepth in PyTorch

Related tags

Deep LearningDropPath
Overview
%load_ext memory_profiler

Implementing Stochastic Depth/Drop Path In PyTorch

DropPath is available on glasses my computer vision library!

Introduction

Today we are going to implement Stochastic Depth also known as Drop Path in PyTorch! Stochastic Depth introduced by Gao Huang et al is technique to "deactivate" some layers during training.

Let's take a look at a normal ResNet Block that uses residual connections (like almost all models now).If you are not familiar with ResNet, I have an article showing how to implement it.

Basically, the block's output is added to its input: output = block(input) + input. This is called a residual connection

alt

Here we see four ResnNet like blocks, one after the other.

alt

Stochastic Depth/Drop Path will deactivate some of the block's weight

alt

The idea is to reduce the number of layers/block used during training, saving time and make the network generalize better.

Practically, this means setting to zero the output of the block before adding.

Implementation

Let's start by importing our best friend, torch.

import torch
from torch import nn
from torch import Tensor

We can define a 4D tensor (batch x channels x height x width), in our case let's just send 4 images with one pixel each :)

x = torch.ones((4, 1, 1, 1))

We need a tensor of shape batch x 1 x 1 x 1 that will be used to set some of the elements in the batch to zero, using a given prob. Bernoulli to the rescue!

keep_prob: float = .5
mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    
mask
tensor([[[[0.]]],


        [[[1.]]],


        [[[1.]]],


        [[[1.]]]])

Btw, this is equivelant to

mask: Tensor = (torch.rand(x.shape[0], 1, 1, 1) > keep_prob).float()
mask
tensor([[[[1.]]],


        [[[1.]]],


        [[[1.]]],


        [[[1.]]]])

Before we multiply x by the mask we need to divide x by keep_prob to rescale down the inputs activation during training, see cs231n. So

x_scaled : Tensor = x / keep_prob
x_scaled
tensor([[[[2.]]],


        [[[2.]]],


        [[[2.]]],


        [[[2.]]]])

Finally

output: Tensor = x_scaled * mask
output
tensor([[[[2.]]],


        [[[2.]]],


        [[[2.]]],


        [[[2.]]]])

We can put together in a function

def drop_path(x: Tensor, keep_prob: float = 1.0) -> Tensor:
    mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    x_scaled: Tensor = x / keep_prob
    return x_scaled * mask

drop_path(x, keep_prob=0.5)
tensor([[[[0.]]],


        [[[0.]]],


        [[[2.]]],


        [[[0.]]]])

We can also do the operation in place

def drop_path(x: Tensor, keep_prob: float = 1.0) -> Tensor:
    mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    x.div_(keep_prob)
    x.mul_(mask)
    return x


drop_path(x, keep_prob=0.5)
tensor([[[[2.]]],


        [[[2.]]],


        [[[0.]]],


        [[[0.]]]])

However, we may want to use x somewhere else, and dividing x or mask by keep_prob is the same thing. Let's arrive at the final implementation

def drop_path(x: Tensor, keep_prob: float = 1.0, inplace: bool = False) -> Tensor:
    mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    mask.div_(keep_prob)
    if inplace:
        x.mul_(mask)
    else:
        x = x * mask
    return x

x = torch.ones((4, 1, 1, 1))
drop_path(x, keep_prob=0.8)
tensor([[[[1.2500]]],


        [[[1.2500]]],


        [[[1.2500]]],


        [[[1.2500]]]])

drop_path only works for 2d data, we need to automatically calculate the number of dimensions from the input size to make it work for any data time

def drop_path(x: Tensor, keep_prob: float = 1.0, inplace: bool = False) -> Tensor:
    mask_shape: Tuple[int] = (x.shape[0],) + (1,) * (x.ndim - 1) 
    # remember tuples have the * operator -> (1,) * 3 = (1,1,1)
    mask: Tensor = x.new_empty(mask_shape).bernoulli_(keep_prob)
    mask.div_(keep_prob)
    if inplace:
        x.mul_(mask)
    else:
        x = x * mask
    return x

x = torch.ones((4, 1))
drop_path(x, keep_prob=0.8)
tensor([[0.],
        [0.],
        [0.],
        [0.]])

Let's create a nice DropPath nn.Module

class DropPath(nn.Module):
    def __init__(self, p: float = 0.5, inplace: bool = False):
        super().__init__()
        self.p = p
        self.inplace = inplace

    def forward(self, x: Tensor) -> Tensor:
        if self.training and self.p > 0:
            x = drop_path(x, self.p, self.inplace)
        return x

    def __repr__(self):
        return f"{self.__class__.__name__}(p={self.p})"

    
DropPath()(torch.ones((4, 1)))
tensor([[2.],
        [0.],
        [0.],
        [0.]])

Usage with Residual Connections

We have our DropPath, cool but how do we use it? We need a classic ResNet block, let's implement our good old friend BottleNeckBlock

from torch import nn


class ConvBnAct(nn.Sequential):
    def __init__(self, in_features: int, out_features: int, kernel_size=1):
        super().__init__(
            nn.Conv2d(in_features, out_features, kernel_size=kernel_size, padding=kernel_size // 2),
            nn.BatchNorm2d(out_features),
            nn.ReLU()
        )
         

class BottleNeck(nn.Module):
    def __init__(self, in_features: int, out_features: int, reduction: int = 4):
        super().__init__()
        self.block = nn.Sequential(
            # wide -> narrow
            ConvBnAct(in_features, out_features // reduction, kernel_size=1),
            # narrow -> narrow
            ConvBnAct( out_features // reduction, out_features // reduction, kernel_size=3),
            # wide -> narrow
            ConvBnAct( out_features // reduction, out_features, kernel_size=1),
        )
        # I am lazy, no shortcut etc
        
    def forward(self, x: Tensor) -> Tensor:
        res = x
        x = self.block(x)
        return x + res
    
    
BottleNeck(64, 64)(torch.ones((1,64, 28, 28))).shape
torch.Size([1, 64, 28, 28])

To deactivate the block the operation x + res must be equal to res, so our DropPath has to be applied after the block.

class BottleNeck(nn.Module):
    def __init__(self, in_features: int, out_features: int, reduction: int = 4):
        super().__init__()
        self.block = nn.Sequential(
            # wide -> narrow
            ConvBnAct(in_features, out_features // reduction, kernel_size=1),
            # narrow -> narrow
            ConvBnAct( out_features // reduction, out_features // reduction, kernel_size=3),
            # wide -> narrow
            ConvBnAct( out_features // reduction, out_features, kernel_size=1),
        )
        # I am lazy, no shortcut etc
        self.drop_path = DropPath()
        
    def forward(self, x: Tensor) -> Tensor:
        res = x
        x = self.block(x)
        x = self.drop_path(x)
        return x + res
    
BottleNeck(64, 64)(torch.ones((1,64, 28, 28)))
tensor([[[[1.0009, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000],
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          ...,
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]],

         [[1.0005, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          ...,
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          [1.0000, 1.0011, 1.0011,  ..., 1.0011, 1.0011, 1.0247]],

         [[1.0203, 1.0123, 1.0123,  ..., 1.0123, 1.0123, 1.0299],
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          ...,
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]],

         ...,

         [[1.0011, 1.0180, 1.0180,  ..., 1.0180, 1.0180, 1.0465],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          ...,
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]],

         [[1.0130, 1.0170, 1.0170,  ..., 1.0170, 1.0170, 1.0213],
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          ...,
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          [1.0012, 1.0139, 1.0139,  ..., 1.0139, 1.0139, 1.0065]],

         [[1.0103, 1.0181, 1.0181,  ..., 1.0181, 1.0181, 1.0539],
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          ...,
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]]]],
       grad_fn=<AddBackward0>)

Tada 🎉 ! Now, randomly, our .block will be completely skipped!


Owner
Francesco Saverio Zuppichini
Computer Vision Engineer @ 🤗 BSc informatics. MSc AI. Artificial Intelligence /Deep Learning Enthusiast & Full Stack developer
Francesco Saverio Zuppichini
Multimodal Descriptions of Social Concepts: Automatic Modeling and Detection of (Highly Abstract) Social Concepts evoked by Art Images

MUSCO - Multimodal Descriptions of Social Concepts Automatic Modeling of (Highly Abstract) Social Concepts evoked by Art Images This project aims to i

0 Aug 22, 2021
Build a medical knowledge graph based on Unified Language Medical System (UMLS)

UMLS-Graph Build a medical knowledge graph based on Unified Language Medical System (UMLS) Requisite Install MySQL Server 5.6 and import UMLS data int

Donghua Chen 6 Dec 25, 2022
Discover hidden deepweb pages

DeepWeb Scapper Att: Demo version An simple script to scrappe deepweb to find pages. Will return if any of those exists and will save on a file. You s

Héber Júlio 77 Oct 02, 2022
The author's officially unofficial PyTorch BigGAN implementation.

BigGAN-PyTorch The author's officially unofficial PyTorch BigGAN implementation. This repo contains code for 4-8 GPU training of BigGANs from Large Sc

Andy Brock 2.6k Jan 02, 2023
4st place solution for the PBVS 2022 Multi-modal Aerial View Object Classification Challenge - Track 1 (SAR) at PBVS2022

A Two-Stage Shake-Shake Network for Long-tailed Recognition of SAR Aerial View Objects 4st place solution for the PBVS 2022 Multi-modal Aerial View Ob

LinpengPan 5 Nov 09, 2022
This is a deep learning-based method to segment deep brain structures and a brain mask from T1 weighted MRI.

DBSegment This tool generates 30 deep brain structures segmentation, as well as a brain mask from T1-Weighted MRI. The whole procedure should take ~1

Luxembourg Neuroimaging (Platform OpNeuroImg) 2 Oct 25, 2022
Ladder Variational Autoencoders (LVAE) in PyTorch

Ladder Variational Autoencoders (LVAE) PyTorch implementation of Ladder Variational Autoencoders (LVAE) [1]: where the variational distributions q at

Andrea Dittadi 63 Dec 22, 2022
Semi-Supervised Learning for Fine-Grained Classification

Semi-Supervised Learning for Fine-Grained Classification This repo contains the code of: A Realistic Evaluation of Semi-Supervised Learning for Fine-G

25 Nov 08, 2022
Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

Spchcat Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi. Description spchcat is a command-line tool that read

Pete Warden 279 Jan 03, 2023
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 28 Nov 25, 2022
PyTorch implementation of the paper Ultra Fast Structure-aware Deep Lane Detection

PyTorch implementation of the paper Ultra Fast Structure-aware Deep Lane Detection

1.4k Jan 06, 2023
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Disentangled Representation Learning for Text-Video Retrieval This is a PyTorch implementation of the paper Disentangled Representation Learning for T

Qiang Wang 49 Dec 18, 2022
This is a Python Module For Encryption, Hashing And Other stuff

EnroCrypt This is a Python Module For Encryption, Hashing And Other Basic Stuff You Need, With Secure Encryption And Strong Salted Hashing You Can Do

5 Sep 15, 2022
LAnguage Model Analysis

LAMA: LAnguage Model Analysis LAMA is a probe for analyzing the factual and commonsense knowledge contained in pretrained language models. The dataset

Meta Research 960 Jan 08, 2023
The 2nd Version Of Slothybot

SlothyBot Go to this website: "https://bitly.com/SlothyBot" The 2nd Version Of Slothybot. The Bot Has Many Features, Such As: Moderation Commands; Kic

Slothy 0 Jun 01, 2022
Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation Requirements This repository needs mmsegmentation Training To train

Adelaide Intelligent Machines (AIM) Group 7 Sep 12, 2022
Fully Convlutional Neural Networks for state-of-the-art time series classification

Deep Learning for Time Series Classification As the simplest type of time series data, univariate time series provides a reasonably good starting poin

Stephen 572 Dec 23, 2022
Implementation of TabTransformer, attention network for tabular data, in Pytorch

Tab Transformer Implementation of Tab Transformer, attention network for tabular data, in Pytorch. This simple architecture came within a hair's bread

Phil Wang 420 Jan 05, 2023
Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

JSON 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Json2Xml t

Nguyễn Trường Lâu 6 Aug 22, 2022