deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Last update: Oct 17, 2022

Related tags

Deep Learning deep-table

Overview

deep-table

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Design

Architecture

As shown below, each pretraining/fine-tuning model is decomposed into two modules: Encoder and Head.

Encoder

Encoder has Embedding and Backbone.

Embedding makes continuous/categorical features tokenized or simply normalized.
Backbone processes the tokenized features.

Pretraining/Fine-tuning Head

Pretraining/Fine-tuning Head uses Encoder module for training.

Implemented Methods

Available Modules

Encoder - Embedding

FeatureEmbedding
TabTransformerEmbedding

Encoder - Backbone

MLPBackbone
FTTransformerBackbone
SAINTBackbone

Model - Head

MLPHeadModel

Model - Pretraining

DenoisingPretrainModel
SAINTPretrainModel
TabTransformerPretrainModel
VIMEPretrainModel

How To Use

Step 0. Install

python setup.py install

# Installation with pip
pip install -e .

Step 1. Define config.json

You have to define three configs at least.

encoder
model
trainer

Minimum configurations are as follows:

from omegaconf import OmegaConf

encoder_config = OmegaConf.create({
    "embedding": {
        "name": "FeatureEmbedding",
    },
    "backbone": {
        "name": "FTTransformerBackbone",
    }
})

model_config = OmegaConf.create({
    "name": "MLPHeadModel"
})

trainer_config = OmegaConf.create({
    "max_epochs": 1,
})

Other parameters can be changed also by config.json if you want.

Step 2. Define Datamodule

from deep_table.data.data_module import TabularDatamodule


datamodule = TabularDatamodule(
    train=train_df,
    validation=val_df,
    test=test_df,
    task="binary",
    dim_out=1,
    categorical_cols=["education", "occupation", ...],
    continuous_cols=["age", "hours-per-week", ...],
    target=["income"],
    num_categories=110,
)

Step 3. Run Training

>> {'accuracy': array([0.8553...]), 'AUC': array([0.9111...]), 'F1 score': array([0.9077...]), 'cross_entropy': array([0.3093...])} ">

from deep_table.estimators.base import Estimator
from deep_table.utils import get_scores


estimator = Estimator(
    encoder_config,      # Encoder architecture
    model_config,        # model settings (learning rate, scheduler...)
    trainer_config,      # training settings (epoch, gpu...)
)

estimator.fit(datamodule)
predict = estimator.predict(datamodule.dataloader(split="test"))
get_scores(predict, target, task="binary")
>>> {'accuracy': array([0.8553...]),
     'AUC': array([0.9111...]),
     'F1 score': array([0.9077...]),
     'cross_entropy': array([0.3093...])}

If you want to train a model with pretraining, write as follows:

from deep_table.estimators.base import Estimator
from deep_table.utils import get_scores


pretrain_model_config = OmegaConf.create({
    "name": "SAINTPretrainModel"
})

pretrain_model = Estimator(encoder_config, pretrain_model_config, trainer_config)
pretrain_model.fit(datamodule)

estimator = Estimator(encoder_config, model_config, trainer_config)
estimator.fit(datamodule, from_pretrained=pretrain_model)

See notebooks/train_adult.ipynb for more details.

Custom Datasets

You can use your own datasets.

Prepare datasets and create DataFrame
Preprocess DataFrame
Create your own datamodules using TabularDatamodule

Example code is shown below.

import pandas as pd

import os,sys; sys.path.append(os.path.abspath(".."))
from deep_table.data.data_module import TabularDatamodule
from deep_table.preprocess import CategoryPreprocessor


# 0. Prepare datasets and create DataFrame
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

# 1. Preprocessing pd.DataFrame
category_preprocesser = CategoryPreprocessor(categorical_columns=["species"], use_unk=False)
iris = category_preprocesser.fit_transform(iris)

# 2. TabularDatamodule
datamodule = TabularDatamodule(
    train=iris.iloc[:20],
    val=iris.iloc[20:40],
    test=iris.iloc[40:],
    task="multiclass",
    dim_out=3,
    categorical_columns=[],
    continuous_columns=["sepal_length", "sepal_width", "petal_length", "petal_width"],
    target=["species"],
    num_categories=0,
)

See notebooks/custom_dataset.ipynb for the full training example.

Custom Models

You can also use your Embedding/Backbone/Model. Set arguments as shown below.

estimator = Estimator(
    encoder_config, model_config, trainer_config,
    custom_embedding=YourEmbedding, custom_backbone=YourBackbone, custom_model=YourModel
)

If custom models are set, the attributes name in corresponding configs will be overwritten.

See notebooks/custom_model.ipynb for more details.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Related tags

Overview

deep-table

Design

Architecture

Encoder

Pretraining/Fine-tuning Head

Implemented Methods

Available Modules

Encoder - Embedding

Encoder - Backbone

Model - Head

Model - Pretraining

How To Use

Step 0. Install

Step 1. Define config.json

Step 2. Define Datamodule

Step 3. Run Training

Custom Datasets

Custom Models

Owner

Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Train Scene Graph Generation for Visual Genome and GQA in PyTorch >= 1.2 with improved zero and few-shot generalization.

This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices.

ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification

The repo of Feedback Networks, CVPR17

Code for the paper "Graph Attention Tracking". (CVPR2021)

The official repository for BaMBNet

Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph

PyTorch code of paper "LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering"

The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).

[ECCV 2020] Gradient-Induced Co-Saliency Detection

End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)

Built a deep neural network (DNN) that functions as an end-to-end machine translation pipeline

Codes for building and training the neural network model described in Domain-informed neural networks for interaction localization within astroparticle experiments.

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

WormMovementSimulation - 3D Simulation of Worm Body Movement with Neurons attached to its body

Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style