A learning-based data collection tool for human segmentation

Last update: Jun 24, 2022

Overview

FullBodyFilter

A Learning-Based Data Collection Tool For Human Segmentation

Overview

Human segmentation is a difficult machine learning task of identifying and extracting the human in a picture. Most of the time this is done by using a convolutional neural network. In order to achieve an accurate and robust model, large amounts of data with varying human poses need to be collected to train the model. Collecting and labeling train data by hand takes lots of time and resources. This project explores another option to use automtation to collect and label pre-existing data from internet videos.

The model that was focused on is the DTEN ME model used for Zoom meetings virtual background.

Openpose is used to filter the video for suitable frames, in particular single person full body frames. Mask R-CNN is the teacher model that generates training labels. To find which images perform poorly on ME model, a comparison is done between ME masks and Mask R-CNN masks. The result is a set of images and masks that can be used as training data.

Overview of Program

A full report of the system design and implemenation details can be found in doc

Sample Results

Examples of train data saved. In each image bottom left is Mask R-CNN mask and bottom right is ME mask.

Usage

This project relies on Openpose and Mask R-CNN and all their dependencies. Instructions on how to set up each are found in there respective directories here.

Documentation on how to use scripts are located in doc.

A learning-based data collection tool for human segmentation

Related tags

Overview

FullBodyFilter

Contents

Overview

Sample Results

Usage

Owner

Robert Jiang

Official page of Struct-MDC (RA-L'22 with IROS'22 option); Depth completion from Visual-SLAM using point & line features

Layered Neural Atlases for Consistent Video Editing

A modular, open and non-proprietary toolkit for core robotic functionalities by harnessing deep learning

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

Algorithmic trading with deep learning experiments

Encode and decode text application

StyleGAN - Official TensorFlow Implementation

A simple python module to generate anchor (aka default/prior) boxes for object detection tasks.

Automatic Calibration for Non-repetitive Scanning Solid-State LiDAR and Camera Systems

Continuous Conditional Random Field Convolution for Point Cloud Segmentation

Neural network for recognizing the gender of people in photos

Pytorch implementation of "A simple neural network module for relational reasoning" (Relational Networks)

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

Time should be taken seer-iously

Python implementation of "Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation"

a minimal terminal with python 😎😉

3D AffordanceNet is a 3D point cloud benchmark consisting of 23k shapes from 23 semantic object categories, annotated with 56k affordance annotations and covering 18 visual affordance categories.

这是一个yolox-pytorch的源码，可以用于训练自己的模型。

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

这是一个mobilenet-yolov4-lite的库，把yolov4主干网络修改成了mobilenet，修改了Panet的卷积组成，使参数量大幅度缩小。