Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

Last update: Jun 05, 2022

Related tags

Overview

👁️ Hindsight AI: Crime Classification With Clip

About

For Educational Purposes Only This is a recursive neural net trained to classify specific crime classes based on the UCF-Crime dataset UCF-CRIME or to perform general anomaly detection. The model uses images that have been encoded into the CLIP image embedding space.

Introducing CLIP

The model we are utilizing in our application, CLIP (developed by OpenAI), is a generalized image classification model which can take any image and produce word embeddings for the purpose of matching raw text strings to the contents of the image. The design and training of the model allows for high zero-shot performance in classifying images (i.e. image classification problems outside of the training set). The following image provides a summary of the model (taken from A. Radford et al.):

While typical image classification models train an image feature extractor and a linear classifier to predict a label, CLIP trains an image encoder and text encoder to predict the correct pairings of a batch of (image, text) training examples. At test time the learned text encoder synthesizes a zero-shot linear classifier by embedding the names or descriptions of the target dataset’s classes.

Installation

Clone the repo and the required packages can be found in the required.txt file. Running classifier.py will start an interactive application that will attempt to perform anomaly detection or multi-class classification on videos found in the 'Videos' directory.

The scripts that were used to create the image sequence database from the video files of the UCF-Crime dataset as well as the training scripts and models can be found in the src directory.

Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

Related tags

Overview

👁️ Hindsight AI: Crime Classification With Clip

About

Introducing CLIP

Installation

Owner

Miles Tweed

Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger.

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing

Visual Tracking by TridenAlign and Context Embedding

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

Use tensorflow to implement a Deep Neural Network for real time lane detection

PyTorch implementation of TSception V2 using DEAP dataset

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain Gait Recognition.

automated systems to assist guarding corona Virus precautions for Closed Rooms (e.g. Halls, offices, etc..)

PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

The author's officially unofficial PyTorch BigGAN implementation.

Single object tracking and segmentation.

Companion repository to the paper accepted at the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities

CLASP - Contrastive Language-Aminoacid Sequence Pretraining

Demystifying How Self-Supervised Features Improve Training from Noisy Labels