Questions tagged [anomaly-detection]

Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behaviour. This is also known as outlier detection.

364 questions
74
votes
7 answers

Open source Anomaly Detection in Python

Problem Background: I am working on a project that involves log files similar to those found in the IT monitoring space (to my best understanding of IT space). These log files are time-series data, organized into hundreds/thousands of rows of…
ximiki
  • 943
  • 1
  • 7
  • 15
41
votes
5 answers

Is it necessary to standardize your data before clustering?

Is it necessary to standardize your data before cluster? In the example from scikit learn about DBSCAN, here they do this in the line: X = StandardScaler().fit_transform(X) But I do not understand why it is necessary. After all, clustering does…
makansij
  • 869
  • 2
  • 12
  • 17
24
votes
4 answers

Looking for a good package for anomaly detection in time series

Is there a comprehensive open source package (preferably in python or R) that can be used for anomaly detection in time series? There is a one class SVM package in scikit-learn but it is not for the time series data. I’m looking for more…
14
votes
4 answers

Detecting anomalies with neural network

I have a large multi dimensional dataset that is generated each day. What would be a good approach to detect any kind of 'anomaly' as compared with previous days? Is this a suitable problem that could be addressed with neural networks? Any…
Nickpick
  • 661
  • 2
  • 7
  • 18
13
votes
4 answers

What is the difference between outlier detection and anomaly detection?

I would like to know the difference in terms of applications (e.g. which one is credit card fraud detection?) and in terms of used techniques. Example papers which define the task would be welcome.
Martin Thoma
  • 19,540
  • 36
  • 98
  • 170
11
votes
1 answer

Learning with Positive labels only

I have ~7 million rows of customer data (~500 sparse attributes) A million out of them have opted in to a new service. How do I use this signal to predict which of the remaining customers are likely to adopt the service? And how do I measure the…
11
votes
2 answers

Tools for automatic anomaly detection on a SQL table?

I have a large SQL table that is essentially a log. The data is pretty complex and I'm trying to find some way to identify anomalies without me understanding all the data. I've found lots of tools for Anomaly Detection but most of them require a…
THE JOATMON
  • 211
  • 2
  • 4
10
votes
3 answers

Isolation forest sklearn contamination param

I am working on an unsupervised anomaly detection task on time series data using an isolation forest algorithm. I am developing it in Python, more in detail using scikit-learn. I found a lot of examples on this, but what is not very clear, is how to…
10
votes
1 answer

Difference: Replicator Neural Network vs. Autoencoder

I'm currently studying papers about outlier detection using RNN's (Replicator Neural Networks) and wonder what is the particular difference to Autoencoders? RNN's seem to be treaded for many as the holy grail of outlier/anomaly detection, however…
Nex
  • 285
  • 2
  • 6
9
votes
1 answer

Validation loss is lower than the training loss

I am using autoencoder for anomaly detection in warranty data. Architecture 1: The plot shows the training vs validation loss based on Architecture 1. As we see in the plot, validation loss is lower than the train loss which is totally weird.…
8
votes
1 answer

Using an autoencoder for anomaly detection on categorical data

Say a dataset has 0.5% of its features continuous and 99.5% categorical (binary) with ~2400 features in total. In this dataset, each observation is 1 of 2 classes - Fraud (1) or Not Fraud (0). Furthermore, there is a large class imbalance with only…
PyRsquared
  • 1,666
  • 1
  • 12
  • 18
8
votes
1 answer

how to compare different sets of time series data

I am trying to do some anomaly detection between time#series using Python and sklearn (but other package suggestions are definitely welcome!). I have a set of 10 time-series; each time-series consists of data collected from torque value of a tire…
7
votes
4 answers

Anomaly detection on time series

I've just started working on an anomaly detection development in Python. My data sets are a collection of timeseries. More in details, data are coming from some sensors/meters which record and collect data on boilers or other equipments. As I said…
7
votes
5 answers

What would be a good way to use clustering for outlier detection?

For simplicity let's assume the feature space is the XY plane.
7
votes
3 answers

Which outlier detection can detect these outliers?

I have a vector and want to detect outliers in it. The following figure shows the distribution of the vector. Red points are outliers. Blue points are normal points. Yellow points are also normal. I need an outlier detection method (a…
1
2 3
24 25