Human Pose Detection on EdgeTPU

Overview

Coral PoseNet

Pose estimation refers to computer vision techniques that detect human figures in images and video, so that one could determine, for example, where someone’s elbow, shoulder or foot show up in an image. PoseNet does not recognize who is in an image, it is simply estimating where key body joints are.

This repo contains a set of PoseNet models that are quantized and optimized for use on Coral's Edge TPU, together with some example code to shows how to run it on a camera stream.

Why PoseNet ?

Pose estimation has many uses, from interactive installations that react to the body to augmented reality, animation, fitness uses, and more. We hope the accessibility of this model inspires more developers and makers to experiment and apply pose detection to their own unique projects, to demonstrate how machine learning can be deployed in ways that are anonymous and private.

How does it work ?

At a high level pose estimation happens in two phases:

  1. An input RGB image is fed through a convolutional neural network. In our case this is a MobileNet V1 architecture. Instead of a classification head however, there is a specialized head which produces a set of heatmaps (one for each kind of key point) and some offset maps. This step runs on the EdgeTPU. The results are then fed into step 2)

  2. A special multi-pose decoding algorithm is used to decode poses, pose confidence scores, keypoint positions, and keypoint confidence scores. Note that unlike in the TensorflowJS version we have created a custom OP in Tensorflow Lite and appended it to the network graph itself. This CustomOP does the decoding (on the CPU) as a post processing step. The advantage is that we don't have to deal with the heatmaps directly and when we then call this network through the Coral Python API we simply get a series of keypoints from the network.

If you're interested in the gory details of the decoding algorithm and how PoseNet works under the hood, I recommend you take a look at the original research paper or this medium post whihch describes the raw heatmaps produced by the convolutional model.

Important concepts

Pose: at the highest level, PoseNet will return a pose object that contains a list of keypoints and an instance-level confidence score for each detected person.

Keypoint: a part of a person’s pose that is estimated, such as the nose, right ear, left knee, right foot, etc. It contains both a position and a keypoint confidence score. PoseNet currently detects 17 keypoints illustrated in the following diagram:

pose keypoints

Keypoint Confidence Score: this determines the confidence that an estimated keypoint position is accurate. It ranges between 0.0 and 1.0. It can be used to hide keypoints that are not deemed strong enough.

Keypoint Position: 2D x and y coordinates in the original input image where a keypoint has been detected.

Examples in this repo

NOTE: PoseNet relies on the latest Pycoral API, tflite_runtime API, and libedgetpu1-std or libedgetpu1-max:

Please also update your system before running these examples. For more information on updating see:

To install all other requirements for third party libraries, simply run

sh install_requirements.sh

simple_pose.py

A minimal example that simply downloads an image, and prints the pose keypoints.

python3 simple_pose.py

pose_camera.py

A camera example that streams the camera image through posenet and draws the pose on top as an overlay. This is a great first example to run to familiarize yourself with the network and its outputs.

Run a simple demo like this:

python3 pose_camera.py

If the camera and monitor are both facing you, consider adding the --mirror flag:

python3 pose_camera.py --mirror

In this repo we have included 3 posenet model files for differnet input resolutions. The larger resolutions are slower of course, but allow a wider field of view, or further-away poses to be processed correctly.

posenet_mobilenet_v1_075_721_1281_quant_decoder_edgetpu.tflite
posenet_mobilenet_v1_075_481_641_quant_decoder_edgetpu.tflite
posenet_mobilenet_v1_075_353_481_quant_decoder_edgetpu.tflite

You can change the camera resolution by using the --res parameter:

python3 pose_camera.py --res 480x360  # fast but low res
python3 pose_camera.py --res 640x480  # default
python3 pose_camera.py --res 1280x720 # slower but high res

anonymizer.py

A fun little app that demonstrates how Coral and PoseNet can be used to analyze human behavior in an anonymous and privacy-preserving way.

Posenet converts an image of a human into a mere skeleton which captures its position and movement over time, but discards any precisely identifying features and the original camera image. Because Coral devices run all the image analysis locally, the actual image is never streamed anywhere and is immediately discarded. The poses can be safely stored or analysed.

For example a store owner may want to study the bahavior of customers as they move through the store, in order to optimize flow and improve product placement. A museum may want to track which areas are most busy, at which times such as to give guidance which exhibits may currently have the shortest waiting times.

With Coral this is possible without recording anybody's image directly or streaming data to a cloud service - instead the images are immediately discarded.

The anaonymizer is a small app that demonstrates this is a fun way. To use the anonymizer set up your camera in a sturdy position. Lauch the app and walk out of the image. This demo waits until no one is in the frame, then stores the 'background' image. Now, step back in. You'll see your current pose overlayed over a static image of the background.

python3 anonymizer.py

(If the camera and monitor are both facing you, consider adding the --mirror flag.)

video of three people interacting with the anonymizer demo

synthesizer.py

This demo allows people to control musical synthesizers with their arms. Up to 3 people are each assigned a different instrument and octave, and control the pitch with their right wrists and the volume with their left wrists.

You'll need to install FluidSynth and a General Midi SoundFont:

apt install fluidsynth fluid-soundfont-gm
pip3 install pyfluidsynth

Now you can run the demo like this:

python3 synthesizer.py

The PoseEngine class

The PoseEngine class (defined in pose_engine.py) allows easy access to the PoseNet network from Python, using the EdgeTPU API.

You simply initialize the class with the location of the model .tflite file and then call DetectPosesInImage, passing a numpy object that contains the image. The numpy object should be in int8, [Y,X,RGB] format.

A minimal example might be:

from tflite_runtime.interpreter import Interpreter
import os
import numpy as np
from PIL import Image
from PIL import ImageDraw
from pose_engine import PoseEngine


os.system('wget https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/'
          'Hindu_marriage_ceremony_offering.jpg/'
          '640px-Hindu_marriage_ceremony_offering.jpg -O /tmp/couple.jpg')
pil_image = Image.open('/tmp/couple.jpg').convert('RGB')
engine = PoseEngine(
    'models/mobilenet/posenet_mobilenet_v1_075_481_641_quant_decoder_edgetpu.tflite')
poses, _ = engine.DetectPosesInImage(pil_image)

for pose in poses:
    if pose.score < 0.4: continue
    print('\nPose Score: ', pose.score)
    for label, keypoint in pose.keypoints.items():
        print('  %-20s x=%-4d y=%-4d score=%.1f' %
              (label, keypoint.point[0], keypoint.point[1], keypoint.score))

To try this, run

python3 simple_pose.py

And you should see an output like this:

Inference time: 14 ms

Pose Score:  0.60698134
  NOSE                 x=211  y=152  score=1.0
  LEFT_EYE             x=224  y=138  score=1.0
  RIGHT_EYE            x=199  y=136  score=1.0
  LEFT_EAR             x=245  y=135  score=1.0
  RIGHT_EAR            x=183  y=129  score=0.8
  LEFT_SHOULDER        x=269  y=169  score=0.7
  RIGHT_SHOULDER       x=160  y=173  score=1.0
  LEFT_ELBOW           x=281  y=255  score=0.6
  RIGHT_ELBOW          x=153  y=253  score=1.0
  LEFT_WRIST           x=237  y=333  score=0.6
  RIGHT_WRIST          x=163  y=305  score=0.5
  LEFT_HIP             x=256  y=318  score=0.2
  RIGHT_HIP            x=171  y=311  score=0.2
  LEFT_KNEE            x=221  y=342  score=0.3
  RIGHT_KNEE           x=209  y=340  score=0.3
  LEFT_ANKLE           x=188  y=408  score=0.2
  RIGHT_ANKLE          x=189  y=410  score=0.2

Owner
google-coral
Open source projects for coral.ai
google-coral
[ICML 2021] "Graph Contrastive Learning Automated" by Yuning You, Tianlong Chen, Yang Shen, Zhangyang Wang

Graph Contrastive Learning Automated PyTorch implementation for Graph Contrastive Learning Automated [talk] [poster] [appendix] Yuning You, Tianlong C

Shen Lab at Texas A&M University 80 Nov 23, 2022
Official Repsoitory for "Mish: A Self Regularized Non-Monotonic Neural Activation Function" [BMVC 2020]

Mish: Self Regularized Non-Monotonic Activation Function BMVC 2020 (Official Paper) Notes: (Click to expand) A considerably faster version based on CU

Xa9aX ツ 1.2k Dec 29, 2022
Official Python implementation of the 'Sparse deconvolution'-v0.3.0

Sparse deconvolution Python v0.3.0 Official Python implementation of the 'Sparse deconvolution', and the CPU (NumPy) and GPU (CuPy) calculation backen

Weisong Zhao 23 Dec 28, 2022
CR-Fill: Generative Image Inpainting with Auxiliary Contextual Reconstruction. ICCV 2021

crfill Usage | Web App | | Paper | Supplementary Material | More results | code for paper ``CR-Fill: Generative Image Inpainting with Auxiliary Contex

182 Dec 20, 2022
Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation.

Understanding Minimum Bayes Risk Decoding This repo provides code and documentation for the following paper: Müller and Sennrich (2021): Understanding

ZurichNLP 13 May 01, 2022
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Hieu Duong 7 Jan 12, 2022
UNet model with VGG11 encoder pre-trained on Kaggle Carvana dataset

TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation By Vladimir Iglovikov and Alexey Shvets Introduction TernausNet is

Vladimir Iglovikov 1k Dec 28, 2022
Fake News Detection Using Machine Learning Methods

Fake-News-Detection-Using-Machine-Learning-Methods Fake news is always a real and dangerous issue. However, with the presence and abundance of various

Achraf Safsafi 1 Jan 11, 2022
The code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

CrossFormer This repository is the code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention. Introduction Existin

cheerss 238 Jan 06, 2023
Neural style transfer as a class in PyTorch

pt-styletransfer Neural style transfer as a class in PyTorch Based on: https://github.com/alexis-jacq/Pytorch-Tutorials Adds: StyleTransferNet as a cl

Tyler Kvochick 31 Jun 27, 2022
Official Pytorch implementation of Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

Scene Representation Networks This is the official implementation of the NeurIPS submission "Scene Representation Networks: Continuous 3D-Structure-Aw

Vincent Sitzmann 365 Jan 06, 2023
A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL

🌟 HNSW + PostgreSQL Indexer HNSWPostgreSQLIndexer Jina is a production-ready, scalable Indexer for the Jina neural search framework. It combines the

Jina AI 25 Oct 14, 2022
Our VMAgent is a platform for exploiting Reinforcement Learning (RL) on Virtual Machine (VM) scheduling tasks.

VMAgent is a platform for exploiting Reinforcement Learning (RL) on Virtual Machine (VM) scheduling tasks. VMAgent is constructed based on one month r

56 Dec 12, 2022
A Small and Easy approach to the BraTS2020 dataset (2D Segmentation)

BraTS2020 A Light & Scalable Solution to BraTS2020 | Medical Brain Tumor Segmentation (2D Segmentation) Developed the segmentation models for segregat

Gunjan Haldar 0 Jan 19, 2022
Wileless-PDGNet Implementation

Wileless-PDGNet Implementation This repo is related to the following paper: Boning Li, Ananthram Swami, and Santiago Segarra, "Power allocation for wi

6 Oct 04, 2022
Haze Removal can remove slight to extreme cases of haze affecting an image

Haze Removal can remove slight to extreme cases of haze affecting an image. Its most typical use is for landscape photography where the haze causes low contrast and low saturation, but it can also be

Grace Ugochi Nneji 3 Feb 15, 2022
atmaCup #11 の Public 4th / Pricvate 5th Solution のリポジトリです。

#11 atmaCup 2021-07-09 ~ 2020-07-21 に行われた #11 [初心者歓迎! / 画像編] atmaCup のリポジトリです。結果は Public 4th / Private 5th でした。 フレームワークは PyTorch で、実装は pytorch-image-m

Tawara 12 Apr 07, 2022
Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introdu

OATML 360 Dec 28, 2022
BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches

BLEND is a mechanism that can efficiently find fuzzy seed matches between sequences to significantly improve the performance and accuracy while reducing the memory space usage of two important applic

SAFARI Research Group at ETH Zurich and Carnegie Mellon University 19 Dec 26, 2022
D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos This repository contains the implementation for "D²Conv3D: Dynamic Dilated Co

17 Oct 20, 2022