A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.

Overview

Telemanom (v2.0)

v2.0 updates:

  • Vectorized operations via numpy
  • Object-oriented restructure, improved organization
  • Merge branches into single branch for both processing modes (with/without labels)
  • Update requirements.txt and Dockerfile
  • Updated result output for both modes
  • PEP8 cleanup

Anomaly Detection in Time Series Data Using LSTMs and Automatic Thresholding

License

Telemanom employs vanilla LSTMs using Keras/Tensorflow to identify anomalies in multivariate sensor data. LSTMs are trained to learn normal system behaviors using encoded command information and prior telemetry values. Predictions are generated at each time step and the errors in predictions represent deviations from expected behavior. Telemanom then uses a novel nonparametric, unsupervised approach for thresholding these errors and identifying anomalous sequences of errors.

This repo along with the linked data can be used to re-create the experiments in our 2018 KDD paper, "Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding", which describes the background, methodologies, and experiments in more detail. While the system was originally deployed to monitor spacecraft telemetry, it can be easily adapted to similar problems.

Getting Started

Clone the repo (only available from source currently):

git clone https://github.com/khundman/telemanom.git && cd telemanom

Configure system/modeling parameters in config.yaml file (to recreate experiment from paper, leave as is). For example:

  • train: True if True, a new model will be trained for each input stream. If False (default) existing trained model will be loaded and used to generate predictions
  • predict: True Generate new predictions using models. If False (default), use existing saved predictions in evaluation (useful for tuning error thresholding and skipping prior processing steps)
  • l_s: 250 Determines the number of previous timesteps input to the model at each timestep t (used to generate predictions)

To run via Docker:

docker build -t telemanom .

# rerun experiment detailed in paper or run with your own set of labeled anomlies in 'labeled_anomalies.csv'
docker run telemanom -l labeled_anomalies.csv

# run without labeled anomalies
docker run telemanom

To run with local or virtual environment

From root of repo, curl and unzip data:

curl -O https://s3-us-west-2.amazonaws.com/telemanom/data.zip && unzip data.zip && rm data.zip

Install dependencies using python 3.6+ (recommend using a virtualenv):

pip install -r requirements.txt

Begin processing (from root of repo):

# rerun experiment detailed in paper or run with your own set of labeled anomlies
python example.py -l labeled_anomalies.csv

# run without labeled anomalies
python example.py

A jupyter notebook for evaluating results for a run is at telemanom/result_viewer.ipynb. To launch notebook:

jupyter notebook telemanom/result-viewer.ipynb

Plotly is used to generate interactive inline plots, e.g.:

drawing2

Data

Using your own data

Pre-split training and test sets must be placed in directories named data/train/ and data/test. One .npy file should be generated for each channel or stream (for both train and test) with shape (n_timesteps, n_inputs). The filename should be a unique channel name or ID. The telemetry values being predicted in the test data must be the first feature in the input.

For example, a channel T-1 should have train/test sets named T-1.npy with shapes akin to (4900,61) and (3925, 61), where the number of input dimensions are matching (61). The actual telemetry values should be along the first dimension (4900,1) and (3925,1).

Raw experiment data

The raw data available for download represents real spacecraft telemetry data and anomalies from the Soil Moisture Active Passive satellite (SMAP) and the Curiosity Rover on Mars (MSL). All data has been anonymized with regard to time and all telemetry values are pre-scaled between (-1,1) according to the min/max in the test set. Channel IDs are also anonymized, but the first letter gives indicates the type of channel (P = power, R = radiation, etc.). Model input data also includes one-hot encoded information about commands that were sent or received by specific spacecraft modules in a given time window. No identifying information related to the timing or nature of commands is included in the data. For example:

drawing

This data also includes pre-split test and training data, pre-trained models, predictions, and smoothed errors generated using the default settings in config.yaml. When getting familiar with the repo, running the result-viewer.ipynb notebook to visualize results is useful for developing intuition. The included data also is useful for isolating portions of the system. For example, if you wish to see the effects of changes to the thresholding parameters without having to train new models, you can set Train and Predict to False in config.yaml to use previously generated predictions from prior models.

Anomaly labels and metadata

The anomaly labels and metadata are available in labeled_anomalies.csv, which includes:

  • channel id: anonymized channel id - first letter represents nature of channel (P = power, R = radiation, etc.)
  • spacecraft: spacecraft that generated telemetry stream
  • anomaly_sequences: start and end indices of true anomalies in stream
  • class: the class of anomaly (see paper for discussion)
  • num values: number of telemetry values in each stream

To provide your own labels, use the labeled_anomalies.csv file as a template. The only required fields/columns are channel_id and anomaly_sequences. anomaly_sequences is a list of lists that contain start and end indices of anomalous regions in the test dataset for a channel.

Dataset and performance statistics:

Data

SMAP MSL Total
Total anomaly sequences 69 36 105
Point anomalies (% tot.) 43 (62%) 19 (53%) 62 (59%)
Contextual anomalies (% tot.) 26 (38%) 17 (47%) 43 (41%)
Unique telemetry channels 55 27 82
Unique ISAs 28 19 47
Telemetry values evaluated 429,735 66,709 496,444

Performance (with default params specified in paper)

Spacecraft Precision Recall F_0.5 Score
SMAP 85.5% 85.5% 0.71
Curiosity (MSL) 92.6% 69.4% 0.69
Total 87.5% 80.0% 0.71

Processing

Each time the system is started a unique datetime ID (ex. 2018-05-17_16.28.00) will be used to create the following

  • a results file (in results/) that extends labeled_anomalies.csv to include identified anomalous sequences and related info
  • a data subdirectory containing data files for created models, predictions, and smoothed errors for each channel. A file called params.log is also created that contains parameter settings and logging output during processing.

As mentioned, the jupyter notebook telemanom/result-viewer.ipynb can be used to visualize results for each stream.

Citation

If you use this work, please cite:

  title={Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding},
  author={Hundman, Kyle and Constantinou, Valentino and Laporte, Christopher and Colwell, Ian and Soderstrom, Tom},
  journal={arXiv preprint arXiv:1802.04431},
  year={2018}
}

License

Telemanom is distributed under Apache 2.0 license.

Contact: Kyle Hundman ([email protected])

Contributors

Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

Diversifying Commonsense Reasoning Generation on Knowledge Graph Introduction -- This is the pytorch implementation of our ACL 2022 paper "Diversifyin

DM2 Lab @ ND 61 Dec 30, 2022
[ICCV 2021] Deep Hough Voting for Robust Global Registration

Deep Hough Voting for Robust Global Registration, ICCV, 2021 Project Page | Paper | Video Deep Hough Voting for Robust Global Registration Junha Lee1,

57 Nov 28, 2022
Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

Zihao Fu 37 Nov 21, 2022
Vehicle speed detection with python

Vehicle-speed-detection In the project simulate the tracker.py first then simulate the SpeedDetector.py. Finally, a new window pops up and the output

3 Dec 15, 2022
🔎 Super-scale your images and run experiments with Residual Dense and Adversarial Networks.

Image Super-Resolution (ISR) The goal of this project is to upscale and improve the quality of low resolution images. This project contains Keras impl

idealo 4k Jan 08, 2023
All-in-one Docker container that allows a user to explore Nautobot in a lab environment.

Nautobot Lab This container is not for production use! Nautobot Lab is an all-in-one Docker container that allows a user to quickly get an instance of

Nautobot 29 Sep 16, 2022
Bio-OFC gym implementation and Gym-Fly environment

Bio-OFC gym implementation and Gym-Fly environment This repository includes the gym compatible implementation of the Bio-OFC algorithm from the paper

Siavash Golkar 1 Nov 16, 2021
Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

CLIP-Guided-Diffusion Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. Original colab notebooks by Ka

Nerdy Rodent 336 Dec 09, 2022
Reverse engineer your pytorch vision models, in style

🔍 Rover Reverse engineer your CNNs, in style Rover will help you break down your CNN and visualize the features from within the model. No need to wri

Mayukh Deb 32 Sep 24, 2022
Codebase for "ProtoAttend: Attention-Based Prototypical Learning."

Codebase for "ProtoAttend: Attention-Based Prototypical Learning." Authors: Sercan O. Arik and Tomas Pfister Paper: Sercan O. Arik and Tomas Pfister,

47 2 May 17, 2022
Keeper for Ricochet Protocol, implemented with Apache Airflow

Ricochet Keeper This repository contains Apache Airflow DAGs for executing keeper operations for Ricochet Exchange. Usage You will need to run this us

Ricochet Exchange 5 May 24, 2022
A Python library for generating new text from existing samples.

ReMarkov is a Python library for generating text from existing samples using Markov chains. You can use it to customize all sorts of writing from birt

8 May 17, 2022
Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System

News! Aug 2020: v0.4.0 version of AlphaPose is released! Stronger tracking! Include whole body(face,hand,foot) keypoints! Colab now available. Dec 201

Machine Vision and Intelligence Group @ SJTU 6.7k Dec 28, 2022
Project page for End-to-end Recovery of Human Shape and Pose

End-to-end Recovery of Human Shape and Pose Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik CVPR 2018 Project Page Requirements Pyt

1.4k Dec 29, 2022
Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)

NBFNet: Neural Bellman-Ford Networks This is the official codebase of the paper Neural Bellman-Ford Networks: A General Graph Neural Network Framework

MilaGraph 136 Dec 21, 2022
Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)

Stratified Transformer for 3D Point Cloud Segmentation Xin Lai*, Jianhui Liu*, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, Jiaya Jia

DV Lab 195 Jan 01, 2023
SigOpt wrappers for scikit-learn methods

SigOpt + scikit-learn Interfacing This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together Getting Started In

SigOpt 73 Sep 30, 2022
Chinese Advertisement Board Identification(Pytorch)

Chinese-Advertisement-Board-Identification. We use YoloV5 to extract the ROI of the location of the chinese word. Next, we sort the bounding box and recognize every chinese words which we extracted.

Li-Wei Hsiao 12 Jul 21, 2022
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

piglet PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021] This repo contains code and data for PIGLeT. If you like

Rowan Zellers 51 Oct 08, 2022
Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

Static Features Classifier This is a static features classifier for Point-Could

ABDALKARIM MOHTASIB 1 Jan 25, 2022