Anomaly Detection

시계열 데이터에 대한 이상치 탐지

1. Kernel Density Estimation을 활용한 이상치 탐지

train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
Train data로 kernel density estimation 모델을 적합하여 정상 데이터의 분포를 추정함
추정된 분포를 기반으로 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함

python kde.py --train_data_path='./data/nasa_bearing_train.csv' \
              --test_data_path='./data/nasa_bearing_test.csv' \
              --save_root_path='./result/kde'

2. Local Outlier Factor를 활용한 이상치 탐지

train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
Train data로 Local Outlier Factor 모델을 적합하여 n_neighbors 개수의 이웃을 기반으로 정상 데이터의 밀도를 추정함
추정된 밀도를 기반으로 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함

python lof.py --train_data_path='./data/nasa_bearing_train.csv' \
              --test_data_path='./data/nasa_bearing_test.csv' \
              --save_root_path='./result/lof' \
              --n_neighbors=5

3. Isolation Forest를 활용한 이상치 탐지

train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
Train data로 isolation forest 모델을 적합함
Train data를 reference set으로 사용하여 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함

python iforest.py --train_data_path='./data/nasa_bearing_train.csv' \
                  --test_data_path='./data/nasa_bearing_test.csv' \
                  --save_root_path='./result/iforest'

4. Spectral Residual을 활용한 이상치 탐지

설정된 window size 와 score window size 를 통해 window 구간 내 이상치를 탐지함
score window size 는 window size 보다 크게 설정해야함

python spectral.py --window= 24 \
                  --score_window=100

Anomaly Detection 이상치 탐지 전처리 모듈

Related tags

Overview

Anomaly Detection

1. Kernel Density Estimation을 활용한 이상치 탐지

2. Local Outlier Factor를 활용한 이상치 탐지

3. Isolation Forest를 활용한 이상치 탐지

4. Spectral Residual을 활용한 이상치 탐지

Owner

CLUST-consortium

Facilitating the design, comparison and sharing of deep text matching models.

Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

Question and answer retrieval in Turkish with BERT

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

A natural language modeling framework based on PyTorch

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

This code is the implementation of Text Emotion Recognition (TER) with linguistic features

Translate - a PyTorch Language Library

The first online catalogue for Arabic NLP datasets.

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Deep learning for NLP crash course at ABBYY.

A full spaCy pipeline and models for scientific/biomedical documents.

Associated Repository for "Translation between Molecules and Natural Language"

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

✨Rubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects.

The implementation of Parameter Differentiation based Multilingual Neural Machine Translation