Machine learning, in numpy

Last update: Dec 30, 2022

Overview

numpy-ml

Ever wish you had an inefficient but somewhat legible collection of machine learning algorithms implemented exclusively in NumPy? No?

Installation

For rapid experimentation

To use this code as a starting point for ML prototyping / experimentation, just clone the repository, create a new virtualenv, and start hacking:

$ git clone https://github.com/ddbourgin/numpy-ml.git
$ cd numpy-ml && virtualenv npml && source npml/bin/activate
$ pip3 install -r requirements-dev.txt

As a package

If you don't plan to modify the source, you can also install numpy-ml as a Python package: pip3 install -u numpy_ml.

The reinforcement learning agents train on environments defined in the OpenAI gym. To install these alongside numpy-ml, you can use pip3 install -u 'numpy_ml[rl]'.

Documentation

For more details on the available models, see the project documentation.

Available models

Gaussian mixture model
- EM training
Hidden Markov model
- Viterbi decoding
- Likelihood computation
- MLE parameter estimation via Baum-Welch/forward-backward algorithm
Latent Dirichlet allocation (topic model)
- Standard model with MLE parameter estimation via variational EM
- Smoothed model with MAP parameter estimation via MCMC
Neural networks
- Layers / Layer-wise ops
  - Add
  - Flatten
  - Multiply
  - Softmax
  - Fully-connected/Dense
  - Sparse evolutionary connections
  - LSTM
  - Elman-style RNN
  - Max + average pooling
  - Dot-product attention
  - Embedding layer
  - Restricted Boltzmann machine (w. CD-n training)
  - 2D deconvolution (w. padding and stride)
  - 2D convolution (w. padding, dilation, and stride)
  - 1D convolution (w. padding, dilation, stride, and causality)
- Modules
  - Bidirectional LSTM
  - ResNet-style residual blocks (identity and convolution)
  - WaveNet-style residual blocks with dilated causal convolutions
  - Transformer-style multi-headed scaled dot product attention
- Regularizers
  - Dropout
- Normalization
  - Batch normalization (spatial and temporal)
  - Layer normalization (spatial and temporal)
- Optimizers
  - SGD w/ momentum
  - AdaGrad
  - RMSProp
  - Adam
- Learning Rate Schedulers
  - Constant
  - Exponential
  - Noam/Transformer
  - Dlib scheduler
- Weight Initializers
  - Glorot/Xavier uniform and normal
  - He/Kaiming uniform and normal
  - Standard and truncated normal
- Losses
  - Cross entropy
  - Squared error
  - Bernoulli VAE loss
  - Wasserstein loss with gradient penalty
  - Noise contrastive estimation loss
- Activations
  - ReLU
  - Tanh
  - Affine
  - Sigmoid
  - Leaky ReLU
  - ELU
  - SELU
  - Exponential
  - Hard Sigmoid
  - Softplus
- Models
  - Bernoulli variational autoencoder
  - Wasserstein GAN with gradient penalty
  - word2vec encoder with skip-gram and CBOW architectures
- Utilities
  - col2im (MATLAB port)
  - im2col (MATLAB port)
  - conv1D
  - conv2D
  - deconv2D
  - minibatch
Tree-based models
- Decision trees (CART)
- [Bagging] Random forests
- [Boosting] Gradient-boosted decision trees
Linear models
- Ridge regression
- Logistic regression
- Ordinary least squares
- Bayesian linear regression w/ conjugate priors
  - Unknown mean, known variance (Gaussian prior)
  - Unknown mean, unknown variance (Normal-Gamma / Normal-Inverse-Wishart prior)
n-Gram sequence models
- Maximum likelihood scores
- Additive/Lidstone smoothing
- Simple Good-Turing smoothing
Multi-armed bandit models
- UCB1
- LinUCB
- Epsilon-greedy
- Thompson sampling w/ conjugate priors
  - Beta-Bernoulli sampler
- LinUCB
Reinforcement learning models
- Cross-entropy method agent
- First visit on-policy Monte Carlo agent
- Weighted incremental importance sampling Monte Carlo agent
- Expected SARSA agent
- TD-0 Q-learning agent
- Dyna-Q / Dyna-Q+ with prioritized sweeping
Nonparameteric models
- Nadaraya-Watson kernel regression
- k-Nearest neighbors classification and regression
- Gaussian process regression
Matrix factorization
- Regularized alternating least-squares
- Non-negative matrix factorization
Preprocessing
- Discrete Fourier transform (1D signals)
- Discrete cosine transform (type-II) (1D signals)
- Bilinear interpolation (2D signals)
- Nearest neighbor interpolation (1D and 2D signals)
- Autocorrelation (1D signals)
- Signal windowing
- Text tokenization
- Feature hashing
- Feature standardization
- One-hot encoding / decoding
- Huffman coding / decoding
- Term frequency-inverse document frequency (TF-IDF) encoding
- MFCC encoding
Utilities
- Similarity kernels
- Distance metrics
- Priority queue
- Ball tree
- Discrete sampler
- Graph processing and generators

Contributing

Am I missing your favorite model? Is there something that could be cleaner / less confusing? Did I mess something up? Submit a PR! The only requirement is that your models are written with just the Python standard library and NumPy. The SciPy library is also permitted under special circumstances ;)

See full contributing guidelines here.

Machine learning, in numpy

Related tags

Overview

numpy-ml

Installation

For rapid experimentation

As a package

Documentation

Available models

Contributing

Owner

David Bourgin

Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19)

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow.

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model

curl-impersonate: A special compilation of curl that makes it impersonate Chrome & Firefox

Type4Py: Deep Similarity Learning-Based Type Inference for Python

a pytorch implementation of auto-punctuation learned character by character

Using Language Model to Bootstrap Human Activity Recognition Ambient Sensors Based in Smart Homes

Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Code, Models and Datasets for OpenViDial Dataset

Official Repository for our ICCV2021 paper: Continual Learning on Noisy Data Streams via Self-Purified Replay

Python scripts form performing stereo depth estimation using the HITNET model in Tensorflow Lite.

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Implementation of Google Brain's WaveGrad high-fidelity vocoder

code for our ECCV 2020 paper "A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation"

Doods2 - API for detecting objects in images and video streams using Tensorflow

Learning Temporal Consistency for Low Light Video Enhancement from Single Images (CVPR2021)

This's an implementation of deepmind Visual Interaction Networks paper using pytorch

Mapping Conditional Distributions for Domain Adaptation Under Generalized Target Shift

Neural Reprojection Error: Merging Feature Learning and Camera Pose Estimation