Real-Time Semantic Segmentation in Mobile device

Last update: Jan 01, 2023

Overview

Real-Time Semantic Segmentation in Mobile device

This project is an example project of semantic segmentation for mobile real-time app.

The architecture is inspired by MobileNetV2 and U-Net.

LFW, Labeled Faces in the Wild, is used as a Dataset.

The goal of this project is to detect hair segments with reasonable accuracy and speed in mobile device. Currently, it achieves 0.89 IoU.

About speed vs accuracy, more details are available at my post.

Example application

iOS
Android (TODO)

Requirements

Python 3.8
pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
CoreML for iOS app.

About Model

At this time, there is only one model in this repository, MobileNetV2_unet. As a typical U-Net architecture, it has encoder and decoder parts, which consist of depthwise conv blocks proposed by MobileNets.

Input image is encoded to 1/32 size, and then decoded to 1/2. Finally, it scores the results and make it to original size.

Steps to training

Data Preparation

Data is available at LFW. To get mask images, refer issue #11 for more. After you got images and masks, put the images of faces and masks as shown below.

data/
  lfw/
    raw/
      images/
        0001.jpg
        0002.jpg
      masks/
        0001.ppm
        0002.ppm

Training

If you use 224 x 224 as input size, pre-trained weight of MobileNetV2 is available. It will be automatically downloaded when you train model with the following command.

cd src
python run_train.py params/002.yaml

Dice coefficient is used as a loss function.

Pretrained model

Input size	IoU	Download
224	0.89	Google Drive

Converting

As the purpose of this project is to make model run in mobile device, this repository contains some scripts to convert models for iOS and Android.

run_convert_coreml.py
- It converts trained PyTorch model into CoreML model for iOS app.

TBD

Report speed vs accuracy in mobile device.
Convert pytorch to Android using TesorFlow Light

Real-Time Semantic Segmentation in Mobile device

Related tags

Overview

Real-Time Semantic Segmentation in Mobile device

Example application

Requirements

About Model

Steps to training

Data Preparation

Training

Pretrained model

Converting

TBD

Owner

MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data

Task Transformer Network for Joint MRI Reconstruction and Super-Resolution (MICCAI 2021)

PyTorch Implementation for AAAI'21 "Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection"

Jremesh-tools - Blender addon for quad remeshing

Official implementation of the article "Unsupervised JPEG Domain Adaptation For Practical Digital Forensics"

一个目标检测的通用框架(不需要cuda编译)，支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

Source code for 2021 ICCV paper "In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces"

Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network"

Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

A Traffic Sign Recognition Project which can help the driver recognise the signs via text as well as audio. Can be used at Night also.

Measuring Coding Challenge Competence With APPS

A framework for analyzing computer vision models with simulated data

🛠️ SLAMcore SLAM Utilities

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

Implementation of Fast Transformer in Pytorch

Reproduces the results of the paper "Finite Basis Physics-Informed Neural Networks (FBPINNs): a scalable domain decomposition approach for solving differential equations".

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting