Anomaly Detection 이상치 탐지 전처리 모듈

Overview

Anomaly Detection

시계열 데이터에 대한 이상치 탐지


1. Kernel Density Estimation을 활용한 이상치 탐지

  • train_data_pathtest_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
  • Train data로 kernel density estimation 모델을 적합하여 정상 데이터의 분포를 추정함
  • 추정된 분포를 기반으로 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함
python kde.py --train_data_path='./data/nasa_bearing_train.csv' \
              --test_data_path='./data/nasa_bearing_test.csv' \
              --save_root_path='./result/kde'



2. Local Outlier Factor를 활용한 이상치 탐지

  • train_data_pathtest_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
  • Train data로 Local Outlier Factor 모델을 적합하여 n_neighbors 개수의 이웃을 기반으로 정상 데이터의 밀도를 추정함
  • 추정된 밀도를 기반으로 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함
python lof.py --train_data_path='./data/nasa_bearing_train.csv' \
              --test_data_path='./data/nasa_bearing_test.csv' \
              --save_root_path='./result/lof' \
              --n_neighbors=5



3. Isolation Forest를 활용한 이상치 탐지

  • train_data_pathtest_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
  • Train data로 isolation forest 모델을 적합함
  • Train data를 reference set으로 사용하여 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함
python iforest.py --train_data_path='./data/nasa_bearing_train.csv' \
                  --test_data_path='./data/nasa_bearing_test.csv' \
                  --save_root_path='./result/iforest'



4. Spectral Residual을 활용한 이상치 탐지

  • 설정된 window size 와 score window size 를 통해 window 구간 내 이상치를 탐지함
  • score window size 는 window size 보다 크게 설정해야함
python spectral.py --window= 24 \
                  --score_window=100 
Owner
CLUST-consortium
CLUST Project
CLUST-consortium
Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

gpt2-poetry The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los A

Ashley Kim 2 Jan 09, 2022
Random Directed Acyclic Graph Generator

DAG_Generator Random Directed Acyclic Graph Generator verison1.0 简介 工作流通常由DAG(有向无环图)来定义,其中每个计算任务$T_i$由一个顶点(node,task,vertex)表示。同时,任务之间的每个数据或控制依赖性由一条加权

Livion 17 Dec 27, 2022
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

18 Nov 28, 2022
NLP command-line assistant powered by OpenAI

NLP command-line assistant powered by OpenAI

Axel 16 Dec 09, 2022
Python library for processing Chinese text

SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob

Rui Wang 6k Jan 02, 2023
TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech

TFPNER TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech Named entity recognition (NER), which aims at identifyin

1 Feb 07, 2022
Yuqing Xie 2 Feb 17, 2022
Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

derwen.ai 1.9k Jan 06, 2023
A method for cleaning and classifying text using transformers.

NLP Translation and Classification The repository contains a method for classifying and cleaning text using NLP transformers. Overview The input data

Ray Chamidullin 0 Nov 15, 2022
Collection of scripts to pinpoint obfuscated code

Obfuscation Detection (v1.0) Author: Tim Blazytko Automatically detect control-flow flattening and other state machines Description: Scripts and binar

Tim Blazytko 230 Nov 26, 2022
Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Recipes are a standard, well supported set of blueprints for machine learning engineers to rapidly train models using the latest research techniques without significant engineering overhead.Specifica

Meta Research 193 Dec 28, 2022
用Resnet101+GPT搭建一个玩王者荣耀的AI

基于pytorch框架用resnet101加GPT搭建AI玩王者荣耀 本源码模型主要用了SamLynnEvans Transformer 的源码的解码部分。以及pytorch自带的预训练模型"resnet101-5d3b4d8f.pth"

冯泉荔 2.2k Jan 03, 2023
Translates basic English sentences into the Huna language (hoo-NAH)

huna-translator The Huna Language Translates basic English sentences into the Huna language (hoo-NAH). The Huna constructed language was developed in

Miles Smith 0 Jan 20, 2022
Code Implementation of "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Span-ASTE: Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction ***** New March 31th, 2022: Scikit-Style API for Easy Usage *****

Chia Yew Ken 111 Dec 23, 2022
Sapiens is a human antibody language model based on BERT.

Sapiens: Human antibody language model ____ _ / ___| __ _ _ __ (_) ___ _ __ ___ \___ \ / _` | '_ \| |/ _ \ '

Merck Sharp & Dohme Corp. a subsidiary of Merck & Co., Inc. 13 Nov 20, 2022
Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021

efficient-task-transfer This repository contains code for the experiments in our paper "What to Pre-Train on? Efficient Intermediate Task Selection".

AdapterHub 26 Dec 24, 2022
BERT, LDA, and TFIDF based keyword extraction in Python

BERT, LDA, and TFIDF based keyword extraction in Python kwx is a toolkit for multilingual keyword extraction based on Google's BERT and Latent Dirichl

Andrew Tavis McAllister 41 Dec 27, 2022
a chinese segment base on crf

Genius Genius是一个开源的python中文分词组件,采用 CRF(Conditional Random Field)条件随机场算法。 Feature 支持python2.x、python3.x以及pypy2.x。 支持简单的pinyin分词 支持用户自定义break 支持用户自定义合并词

duanhongyi 237 Nov 04, 2022
hashily is a Python module that provides a variety of text decoding and encoding operations.

hashily is a python module that performs a variety of text decoding and encoding functions. It also various functions for encrypting and decrypting text using various ciphers.

DevMysT 5 Jul 17, 2022