A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

Last update: Nov 30, 2021

Related tags

Overview

CNN from scratch

The most interesting part is in the folder neural_networks/layers.py: Code for a convolutional neural network, based on only numpy (no PyTorch or TensorFlow). It is therefore very foundational and illustrates how CNNs work mathematically.

The CNNs is compatible with colour images (3-channel rgb), includes pooling layers (class Pool2D) and works with any given (valid) stride.

neural_networks/activations.py contains basic activation functions, like ReLu or SoftMax with the appropriate forward / backward implementations calculating the jacobian, etc., needed for backpropagation.

Many functions make heavy use of slicing, to speed up the training process significantly. See e.g. Conv2D.forward:

for x in range(out_rows):
    for y in range(out_cols):
        out[:,x,y,:] = np.apply_over_axes(np.sum, W[None]*X_pad[:,x*s:x*s+kernel_height,y*s:y*s+kernel_width,:][...,None], [1,2,3])[:,0,0,0,:]

which is the sliced version of a depth-6 nested for loop -- and thus allows for significant speedup (on my computer, more than 20x speedup for the given training data).

In losses.py, CrossEntropy is the most important function. To allow for speed-up, we simplified mathematically as much as possible, yielding

loss = -1.0/m *np.trace(np.matmul(Y,np.log(Y_hat.T)))

for the forward pass and

-1/m*(np.divide(Y,Y_hat))

for the backward pass.

This is based on a project for CS289 at UC Berkeley.

A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

Related tags

Overview

CNN from scratch

Owner

OverFeat is a Convolutional Network-based image classifier and feature extractor.

Arxiv harvester - Poor man's simple harvester for arXiv resources

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

This repository contains the implementation of the paper Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE)

StellarGraph - Machine Learning on Graphs

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

[ICCV 2021] Official PyTorch implementation for Deep Relational Metric Learning.

How to Learn a Domain Adaptive Event Simulator? ACM MM, 2021

[AAAI-2021] Visual Boundary Knowledge Translation for Foreground Segmentation

Graph Representation Learning via Graphical Mutual Information Maximization

A large-image collection explorer and fast classification tool

Api's bulid in Flask perfom to manage Todo Task.

AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

Generate pixel-style avatars with python.

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network