SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Last update: Dec 16, 2022

Related tags

Overview

SLIDE

The SLIDE package contains the source code for reproducing the main experiments in this paper.

Dataset

The Datasets can be downloaded in Amazon-670K. Note that the data is sorted by labels so please shuffle at least the validation/testing data.

TensorFlow Baselines

We suggest directly get TensorFlow docker image to install TensorFlow-GPU. For TensorFlow-CPU compiled with AVX2, we recommend using this precompiled build.

Also there is a TensorFlow docker image specifically built for CPUs with AVX-512 instructions, to get it use:

docker pull clearlinux/stacks-dlrs_2-mkl

config.py controls the parameters of TensorFlow training like learning rate. example_full_softmax.py, example_sampled_softmax.py are example files for Amazon-670K dataset with full softmax and sampled softmax respectively.

Build/Run on Intel platform

Prerequisites:

CMake >= 3.0 Intel Compiler (ICC) >= 19

Build with ICC compiler

source /opt/intel/compilers_and_libraries/linux/bin/compilervars.sh -arch intel64 -platform linux
cd /path/to/slide-root
mkdir -p bin && cd bin 
# BDW (AVX2)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc
# SKX/CLX (AVX512)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc -DOPT_AVX512=1
# CPX (AVX512 + BF16)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc -DOPT_AVX512=1 -DOPT_AVX512_BF16=1
make -j

Run on Intel SKX/CLX/CPX

cd bin
OMP_NUM_THREADS= KMP_HW_SUBSET=s,c,t KMP_AFFINITY=compact,granularity=fine KMP_BLOCKTIME=200 ./runme ../SLIDE/Config_amz.csv
For example, on CLX8280 2Sx28c:
OMP_NUM_THREADS=112 KMP_HW_SUBSET=2s,28c,2t KMP_AFFINITY=compact,granularity=fine KMP_BLOCKTIME=200 ./runme ../SLIDE/Config_amz.csv

For best performance please set Batchsize=multiple-of-logic-core-number from SLIDE/Config_amz.csv.

Results can be checked from the log file under dataset:

tail -f dataset/log.txt

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Related tags

Overview

SLIDE

Dataset

TensorFlow Baselines

Build/Run on Intel platform

Prerequisites:

Build with ICC compiler

Run on Intel SKX/CLX/CPX

Owner

Intel Labs

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

This repository contains the source code of an efficient 1D probabilistic model for music time analysis proposed in ICASSP2022 venue.

Hybrid Neural Fusion for Full-frame Video Stabilization

PyTorch implementation of "VRT: A Video Restoration Transformer"

Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

ML course - EPFL Machine Learning Course, Fall 2021

[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Predicting Student Attentiveness using OpenCV

Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

[ICCV'2021] "SSH: A Self-Supervised Framework for Image Harmonization", Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, Zhangyang Wang

Unofficial PyTorch implementation of Google AI's VoiceFilter system

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

SpineAI Bilsky Grading With Python

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Official implementation of YOGO for Point-Cloud Processing

Self-supervised learning algorithms provide a way to train Deep Neural Networks in an unsupervised way using contrastive losses

[ICCV '21] In this repository you find the code to our paper Keypoint Communities

Easy to use Audio Tagging in PyTorch

This is the reference implementation for "Coresets via Bilevel Optimization for Continual Learning and Streaming"