🔥 Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python

Overview

Cogitare is a Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python. A friendly interface for beginners and a powerful toolset for experts.

Cogitare is built on top of PyTorch.

DocumentationTutorialsAboutInstallQuickstartContribution

PyPI version

1. About

It uses the best of PyTorch, Dask, NumPy, and others tools through a simple interface to train, to evaluate, to test models and more.

With Cogitare, you can use classical machine learning algorithms with high performance and develop state-of-the-art models quickly.

Check the tutorials at http://tutorials.cogitare-ai.org/

The primary objectives of Cogitare are:

  • provide an easy-to-use interface to train and evaluate models;
  • provide tools to debug and analyze the model;
  • provide implementations of state-of-the-art models (models for common tasks, ready to train and ready to use);
  • provide ready-to-use implementations of straightforward and classical models (such as LogisticRegression);
  • be compatible with models for a broad range of problems;
  • be compatible with other tools (scikit-learn, etcs);
  • keep growing with the community: accept as many new features as possible;
  • provide a friendly interface to beginners, and powerful features for experts;
  • take the best of the hardware through multi-processing and multi-threading;
  • and others.

Currently, it's a work in progress project that aims to provide a complete toolchain for machine learning and deep learning development, taking the best of cuda and multi-core processing.

2. Install

  • Install PyTorch from http://pytorch.org/

  • Install Cogitare from PIP:

    pip install cogitare
    
  • Cogitare is in active development, so it's recommended to get the latest version from GitHub. To install directly from GitHub, use:

    pip install -e git+https://github.com/cogitare-ai/cogitare#egg=cogitare
    

3. Quickstart

This is a simple tutorial to get started with Cogitare main functionalities.

In this tutorial, we will write a Convolutional Neural Network (CNN) to classify handwritten digits (MNIST).

3.1 Model

We start by defining our CNN model.

When developing a model with Cogitare, your model must extend the cogitare.Model class. This class provides the Model interface, which allows you to train and evaluate the model efficiently.

To implement a model, you must extend the cogitare.Model class and implement the forward() and loss() methods. The forward method will receive the batch. In this way, it is necessary to implement the forward pass through the network in this method, and then return the output of the net. The loss method will receive the output of the forward() and the batch received from the iterator, apply a loss function, compute and return it.

The Model interface will iterate over the dataset, and execute each batch on forward, loss, and backward.

# adapted from https://github.com/pytorch/examples/blob/master/mnist/main.py
from cogitare import Model
from cogitare import utils
from cogitare.data import DataSet, AsyncDataLoader
from cogitare.plugins import EarlyStopping
from cogitare.metrics.classification import accuracy
import cogitare

import torch.nn as nn
import torch
import torch.nn.functional as F
from torch.nn.utils import clip_grad_norm
import torch.optim as optim

from sklearn.datasets import fetch_mldata

import numpy as np

CUDA = True


cogitare.utils.set_cuda(CUDA)
class CNN(Model):
    
    def __init__(self):
        super(CNN, self).__init__()
        
        # define the model
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
    
    def forward(self, batch):
        # in this sample, each batch will be a tuple containing (input_batch, expected_batch)
        # in forward in are only interested in input so that we can ignore the second item of the tuple
        input, _ = batch
        
        # batch X flat tensor -> batch X 1 channel (gray) X width X heigth
        input = input.view(32, 1, 28, 28)
        
        # pass the data in the net
        x = F.relu(F.max_pool2d(self.conv1(input), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)

        # return the model output
        return F.log_softmax(x, dim=1)
    
    def loss(self, output, batch):
        # in this sample, each batch will be a tuple containing (input_batch, expected_batch)
        # in loss in are only interested in expected so that we can ignore the first item of the tuple
        _, expected = batch
        
        return F.nll_loss(output, expected)

The model class is simple; it only requires de forward and loss methods. By default, Cogitare will backward the loss returned by the loss() method, and optimize the model parameters. If you want to disable the Cogitare backward and optimization steps, just return None in the loss function. If you return None, you are responsible by backwarding and optimizing the parameters.

3.2 Data Loading

In this step, we will load the data from sklearn package.

mnist = fetch_mldata('MNIST original')
mnist.data = (mnist.data / 255).astype(np.float32)

Cogitare provides a toolbox to load and pre-process data for your models. In this introduction, we will use the DataSet and the AsyncDataLoader as examples.

The DataSet is responsible by iterating over multiples data iterators (in our case, we'll have two data iterators: input samples, expected samples).

# as input, the DataSet is expected a list of iterators. In our case, the first iterator is the input 
# data and the second iterator is the target data

# also, we set the batch size to 32 and enable the shuffling

# drop the last batch if its size is different of 32
data = DataSet([mnist.data, mnist.target.astype(int)], batch_size=32, shuffle=True, drop_last=True)

# then, we split our dataset into a train and into a validation sets, by a ratio of 0.8
data_train, data_validation = data.split(0.8)

Notice that Cogitare accepts any iterator as input. Instead of using our DataSet, you can use the mnist.data itself, PyTorch's data loaders, or any other input that acts as an iterator.

In some cases, we can increase the model performance by loading the data using multiples threads/processes or by pre-loading the data before being requested by the model.

With the AsyncDataLoader, we can load N batches ahead of the model execution in parallel. We present this technique in this sample because it can increase performance in a wide range of models (when the data loading or pre-processing is slower than the model execution).

def pre_process(batch):
    input, expected = batch
    
    # the data is a numpy.ndarray (loaded from sklearn), so we need to convert it to Variable
    input = utils.to_variable(input, dtype=torch.FloatTensor)  # converts to a torch Variable of LongTensor
    expected = utils.to_variable(expected, dtype=torch.LongTensor)  # converts to a torch Variable of LongTensor
    return input, expected


# we wrap our data_train and data_validation iterators over the async data loader.
# each loader will load 16 batches ahead of the model execution using 8 workers (8 threads, in this case).
# for each batch, it will be pre-processed in parallel with the preprocess function, that will load the data
# on GPU
data_train = AsyncDataLoader(data_train, buffer_size=16, mode='threaded', workers=8, on_batch_loaded=pre_process)
data_validation = AsyncDataLoader(data_validation, buffer_size=16, mode='threaded', workers=8, on_batch_loaded=pre_process)

to cache the async buffer before training, we can:

data_train.cache()
data_validation.cache()

3.3 Training

Now, we can train our model.

First, lets create the model instance and add the default plugins to watch the training status. The default plugin includes:

  • Progress bar per batch and epoch
  • Plot training and validation losses (if validation_dataset is present)
  • Log training loss
model = CNN()
model.register_default_plugins()

Besides that, we may want to add some extra plugins, such as the EarlyStopping. So, if the model is not decreasing the loss after N epochs, the training stops and the best model is used.

To add the early stopping algorithm, you can use:

early = EarlyStopping(max_tries=10, path='/tmp/model.pt')
# after 10 epochs without decreasing the loss, stop the training and the best model is saved at /tmp/model.pt

# the plugin will execute in the end of each epoch
model.register_plugin(early, 'on_end_epoch')

Also, a common technique is to clip the gradient during training. If you want to clip the grad, you can use:

model.register_plugin(lambda *args, **kw: clip_grad_norm(model.parameters(), 1.0), 'before_step')
# will execute the clip_grad_norm before each optimization step

Now, we define the optimizator, and then start the model training:

optimizer = optim.Adam(model.parameters(), lr=0.001)

if CUDA:
    model = model.cuda()
model.learn(data_train, optimizer, data_validation, max_epochs=100)
2018-02-02 20:59:23 sprawl cogitare.core.model[2443] INFO Model: 

CNN(
  (conv1): Conv2d (1, 10, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d (10, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2_drop): Dropout2d(p=0.5)
  (fc1): Linear(in_features=320, out_features=50)
  (fc2): Linear(in_features=50, out_features=10)
)

2018-02-02 20:59:23 sprawl cogitare.core.model[2443] INFO Training data: 

DataSet with:
    containers: [
        TensorHolder with 1750x32 samples
	TensorHolder with 1750x32 samples
    ],
    batch size: 32


2018-02-02 20:59:23 sprawl cogitare.core.model[2443] INFO Number of trainable parameters: 21,840
2018-02-02 20:59:23 sprawl cogitare.core.model[2443] INFO Number of non-trainable parameters: 0
2018-02-02 20:59:23 sprawl cogitare.core.model[2443] INFO Total number of parameters: 21,840
2018-02-02 20:59:23 sprawl cogitare.core.model[2443] INFO Starting the training ...
2018-02-02 21:02:04 sprawl cogitare.core.model[2443] INFO Training finished

Stopping training after 10 tries. Best score 0.0909
Model restored from: /tmp/model.pt

To check the model loss and accuracy on the validation dataset:

def model_accuracy(output, data):
    _, indices = torch.max(output, 1)
    
    return accuracy(indices, data[1])

# evaluate the model loss and accuracy over the validation dataset
metrics = model.evaluate_with_metrics(data_validation, {'loss': model.metric_loss, 'accuracy': model_accuracy})

# the metrics is an dict mapping the metric name (loss or accuracy, in this sample) to a list of the accuracy output
# we have a measurement per batch. So, to have a value of the full dataset, we take the mean value:

metrics_mean = {'loss': 0, 'accuracy': 0}
for loss, acc in zip(metrics['loss'], metrics['accuracy']):
    metrics_mean['loss'] += loss
    metrics_mean['accuracy'] += acc.item()

qtd = len(metrics['loss'])

print('Loss: {}'.format(metrics_mean['loss'] / qtd))
print('Accuracy: {}'.format(metrics_mean['accuracy'] / qtd))
Loss: 0.10143917564566948
Accuracy: 0.9846252860411899

One of the advantages of Cogitare is the plug-and-play APIs, which let you add/remove functionalities easily. With this sample, we trained a model with training progress bar, error plotting, early stopping, grad clipping, and model evaluation easily.

4. Contribution

Cogitare is a work in progress project, and any contribution is welcome.

You can contribute testing and providing bug reports, proposing feature ideas, fixing bugs, pushing code, etcs.

  1. You want to propose a new Feature and implement it
    • post about your intended feature, and we shall discuss the design and implementation. Once we agree that the plan looks good, go ahead and implement it.
  2. You want to implement a feature or bug-fix for an outstanding issue
    • Look at the outstanding issues here: https://github.com/cogitare-ai/cogitare/issues
    • Pick an issue and comment on the task that you want to work on this feature
    • If you need more context on a particular issue, please ask and we shall provide.

Once you finish implementing a feature or bugfix, please send a Pull Request to https://github.com/cogitare-ai/cogitare

If you are not familiar with creating a Pull Request, here are some guides:

Comments
  • [Feature request] Plugin to watch the training on web (tensorboard integrated with cogitare plugins)

    [Feature request] Plugin to watch the training on web (tensorboard integrated with cogitare plugins)

    • plot training error\std
    • plot validation error\std
    • time remaining
    • button to stop the training process
    • button to save the model at the current step
    • button to pause the training
    • button to resume the training
    • plot model parameters statistics
    • save/load model execution log, to compare and analyze different executions [1]
    • plot execution graph
    • maybe something like named-scope from tensorflow [2]
    • x-axis: by value or by relative time [3]
    • plot smothing
    • display real-time execution machine/gpu stats
    • add Hyper-parameter option to modify its value from the web interface

    [1] screenshot from 2017-10-31 17-04-09

    [2] screenshot from 2017-10-31 17-13-13

    [3] screenshot from 2017-10-31 17-52-29

    enhancement hard 
    opened by aron-bordin 1
  • [Feature Request] Implement History plugin

    [Feature Request] Implement History plugin

    A plugin that records all (or a fraction, if given a filter) of variables during the training process.

    It watches all hooks, capture the variables, and then can be exported.

    • be compatible with the Cogitare Monitor, implementing a history viewer.
    enhancement medium 
    opened by aron-bordin 0
  • [Feature Request] Add map parameter to dataholders

    [Feature Request] Add map parameter to dataholders

    A callable parameter, that can act over the sample before generating the batch.

    It should allow easy-to-use preprocessing algorithms through a distributed interface (threads, processes, machines)

    Add on dataholder:

    • on_sample_loaded
    • on_batch_loaded

    Add on asyncloader:

    • on_batch_loaded (useful for loading batches to gpu before using)
    enhancement 
    opened by aron-bordin 0
  • before first release, profile everything to make mem/speed improvements

    before first release, profile everything to make mem/speed improvements

    Logs.

    18/09 - replaced python indices by numpy indices and python shuffle by numpy shuffle in dataholder. In a dataset with millions of samples, improved by ~15x.

    enhancement 
    opened by aron-bordin 0
  • [Feature Request] add utils.auto_optim

    [Feature Request] add utils.auto_optim

    add a simple function on utils, which receives the optimizer name, the model parameters, and its arguments. This function will create the optimizer and return it.

    (if testing multiples optimizers, it's not required to change the code to change an optimizer. you can, for example, use an argument named "optim" and just pass this argument to the function)

    enhancement help wanted easy 
    opened by aron-bordin 0
  • [Feature Request] Implement Interactive SIGINT Interrupt

    [Feature Request] Implement Interactive SIGINT Interrupt

    A plugin that listens SIGINT signal during training.

    When receiving the signal, gives some options to the interactive user:

    • save/load the model state
    • quit training
    • maybe something else
    enhancement help wanted easy 
    opened by aron-bordin 0
Releases(v0.1.0)
  • v0.1.0(Feb 3, 2018)

    The first release of Cogitare.

    Support:

    • Model

    • Sequential Model

    • DataHolder

    • Sequential DataHolder

    • DataSet

    • Sequential DataSet

    • AsyncDataLoader

    • Metrics (classification, spatial)

    • Classic Models (LR, MLP)

    • Web Monitor (system usage, system details)

    • Early stopping plugin

    • Evaluator plugin (different test metrics on the model)

    • Logger

    • Plotting (matplotlib)

    • Progress Bars

    • Some utilities

    • Documentation with examples

    • Tests: 92% of coverage (8% remaining is of the Monitor undefined interface)

    Source code(tar.gz)
    Source code(zip)
Owner
Cogitare - Modern and Easy Deep Learning with Python
A modern, fast, and modular deep learning and machine learning framework for Python
Cogitare - Modern and Easy Deep Learning with Python
QilingLab challenge writeup

qiling lab writeup shielder 在 2021/7/21 發布了 QilingLab 來幫助學習 qiling framwork 的用法,剛好最近有用到,順手解了一下並寫了一下 writeup。 前情提要 Qiling 是一款功能強大的模擬框架,和 qemu user mode

Yuan 17 Nov 17, 2022
SmoothGrad implementation in PyTorch

SmoothGrad implementation in PyTorch PyTorch implementation of SmoothGrad: removing noise by adding noise. Vanilla Gradients SmoothGrad Guided backpro

SSKH 143 Jan 05, 2023
nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures. Here you will find the scripts necessary to produce th

Jesse Willis 0 Jan 20, 2022
The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021

SaxonJS-Tutorial-2021, version 1.0.4 Last updated on 4 November, 2021. Table of contents Background Prerequisites Starting a web server Running a Java

Saxonica 11 Oct 23, 2022
Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

RealBasicVSR [Paper] This is the official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution, arXiv". This repository contain

Kelvin C.K. Chan 566 Dec 28, 2022
Image Restoration Using Swin Transformer for VapourSynth

SwinIR SwinIR function for VapourSynth, based on https://github.com/JingyunLiang/SwinIR. Dependencies NumPy PyTorch, preferably with CUDA. Note that t

Holy Wu 11 Jun 19, 2022
This project aims at providing a concise, easy-to-use, modifiable reference implementation for semantic segmentation models using PyTorch.

Semantic Segmentation on PyTorch (include FCN, PSPNet, Deeplabv3, Deeplabv3+, DANet, DenseASPP, BiSeNet, EncNet, DUNet, ICNet, ENet, OCNet, CCNet, PSANet, CGNet, ESPNet, LEDNet, DFANet)

2.4k Jan 08, 2023
Tensorflow implementation and notebooks for Implicit Maximum Likelihood Estimation

tf-imle Tensorflow 2 and PyTorch implementation and Jupyter notebooks for Implicit Maximum Likelihood Estimation (I-MLE) proposed in the NeurIPS 2021

NEC Laboratories Europe 69 Dec 13, 2022
EvoJAX is a scalable, general purpose, hardware-accelerated neuroevolution toolkit

EvoJAX: Hardware-Accelerated Neuroevolution EvoJAX is a scalable, general purpose, hardware-accelerated neuroevolution toolkit. Built on top of the JA

Google 598 Jan 07, 2023
DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time Introduction This is official implementation for DR-GAN (IEEE TCS

Kang Liao 18 Dec 23, 2022
A MatConvNet-based implementation of the Fully-Convolutional Networks for image segmentation

MatConvNet implementation of the FCN models for semantic segmentation This package contains an implementation of the FCN models (training and evaluati

VLFeat.org 175 Feb 18, 2022
Pytorch code for "DPFM: Deep Partial Functional Maps" - 3DV 2021 (Oral)

DPFM Code for "DPFM: Deep Partial Functional Maps" - 3DV 2021 (Oral) Installation This implementation runs on python = 3.7, use pip to install depend

Souhaib Attaiki 29 Oct 03, 2022
Implementation of Pix2Seq in PyTorch

pix2seq-pytorch Implementation of Pix2Seq paper Different from the paper image input size 1280 bin size 1280 LambdaLR scheduler used instead of Linear

Tony Shin 9 Dec 15, 2022
A lightweight python AUTOmatic-arRAY library.

A lightweight python AUTOmatic-arRAY library. Write numeric code that works for: numpy cupy dask autograd jax mars tensorflow pytorch ... and indeed a

Johnnie Gray 62 Dec 27, 2022
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte

Gowthami Somepalli 284 Dec 21, 2022
Navigating StyleGAN2 w latent space using CLIP

Navigating StyleGAN2 w latent space using CLIP an attempt to build sth with the official SG2-ADA Pytorch impl kinda inspired by Generating Images from

Mike K. 55 Dec 06, 2022
Codes and scripts for "Explainable Semantic Space by Grounding Languageto Vision with Cross-Modal Contrastive Learning"

Visually Grounded Bert Language Model This repository is the official implementation of Explainable Semantic Space by Grounding Language to Vision wit

17 Dec 17, 2022
Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

Hong-Jia Chen 91 Dec 02, 2022
Code for the paper "Relation of the Relations: A New Formalization of the Relation Extraction Problem"

This repo contains the code for the EMNLP 2020 paper "Relation of the Relations: A New Paradigm of the Relation Extraction Problem" (Jin et al., 2020)

YYY 27 Oct 26, 2022
DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation

DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation This repository is the implementation of DynaTune paper. This folder

4 Nov 02, 2022