Anomaly detection in data mining

anomaly detection in network using data mining techniques and survey on anomaly detection using data mining techniques
Dr.JakeFinlay Profile Pic
Dr.JakeFinlay,Germany,Teacher
Published Date:22-07-2017
Your Website URL(Optional)
Comment
Data Mining Anomaly Detection © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1Anomaly/Outlier Detection  What are anomalies/outliers? – The set of data points that are considerably different than the remainder of the data  Variants of Anomaly/Outlier Detection Problems – Given a database D, find all the data points x D with anomaly scores greater than some threshold t – Given a database D, find all the data points x D having the top- n largest anomaly scores f(x) – Given a database D, containing mostly normal (but unlabeled) data points, and a test point x, compute the anomaly score of x with respect to D  Applications: – Credit card fraud detection, telecommunication fraud detection, network intrusion detection, fault detection © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Importance of Anomaly Detection Ozone Depletion History  In 1985 three researchers (Farman, Gardinar and Shanklin) were puzzled by data gathered by the British Antarctic Survey showing that ozone levels for Antarctica had dropped 10% below normal levels  Why did the Nimbus 7 satellite, which had instruments aboard for recording ozone levels, not record similarly low ozone concentrations?  The ozone concentrations recorded by the satellite were so low they were being treated as outliers by a Sources: computer program and discarded http://exploringdata.cqu.edu.au/ozone.html http://www.epa.gov/ozone/science/hole/size.html © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Anomaly Detection  Challenges – How many outliers are there in the data? – Method is unsupervised  Validation can be quite challenging (just like for clustering) – Finding needle in a haystack  Working assumption: – There are considerably more “normal” observations than “abnormal” observations (outliers/anomalies) in the data © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Anomaly Detection Schemes  General Steps – Build a profile of the “normal” behavior  Profile can be patterns or summary statistics for the overall population – Use the “normal” profile to detect anomalies  Anomalies are observations whose characteristics differ significantly from the normal profile  Types of anomaly detection schemes – Graphical & Statistical-based – Distance-based – Model-based © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Graphical Approaches  Boxplot (1-D), Scatter plot (2-D), Spin plot (3-D)  Limitations – Time consuming – Subjective © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Convex Hull Method  Extreme points are assumed to be outliers  Use convex hull method to detect extreme values  What if the outlier occurs in the middle of the data? © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Statistical Approaches  Assume a parametric model describing the distribution of the data (e.g., normal distribution)  Apply a statistical test that depends on – Data distribution – Parameter of distribution (e.g., mean, variance) – Number of expected outliers (confidence limit) © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Grubbs’ Test  Detect outliers in univariate data  Assume data comes from normal distribution  Detects one outlier at a time, remove the outlier, and repeat – H : There is no outlier in data 0 – H : There is at least one outlier A max XX  Grubbs’ test statistic: G s  Reject H if: 2 0 t (N1) ( /N,N2) G 2 N2t N ( /N,N2) © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Statistical-based – Likelihood Approach  Assume the data set D contains samples from a mixture of two probability distributions: – M (majority distribution) – A (anomalous distribution)  General Approach: – Initially, assume all the data points belong to M – Let L (D) be the log likelihood of D at time t t – For each point x that belongs to M, move it to A t  Let L (D) be the new log likelihood. t+1  Compute the difference,  = L (D) – L (D) t t+1  If  c (some threshold), then x is declared as an anomaly t and moved permanently from M to A © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Statistical-based – Likelihood Approach  Data distribution, D = (1 –) M +  A  M is a probability distribution estimated from data – Can be based on any modeling method (naïve Bayes, maximum entropy, etc)  A is initially assumed to be uniform distribution  Likelihood at time t: N  M A t t  L (D) P (x ) (1) P (x ) P (x ) t D i M i A i t t  i1 xM xA  i t i t LL(D) M log(1) logP (x ) A log logP (x )  t t M i t A i t t xM xA i t i t © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Limitations of Statistical Approaches  Most of the tests are for a single attribute  In many cases, data distribution may not be known  For high dimensional data, it may be difficult to estimate the true distribution © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Distance-based Approaches  Data is represented as a vector of features  Three major approaches – Nearest-neighbor based – Density based – Clustering based © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Nearest-Neighbor Based Approach  Approach: – Compute the distance between every pair of data points – There are various ways to define outliers:  Data points for which there are fewer than p neighboring points within a distance D  The top n data points whose distance to the kth nearest neighbor is greatest  The top n data points whose average distance to the k nearest neighbors is greatest © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Outliers in Lower Dimensional Projection  Divide each attribute into  equal-depth intervals – Each interval contains a fraction f = 1/ of the records  Consider a k-dimensional cube created by picking grid ranges from k different dimensions – If attributes are independent, we expect region to k contain a fraction f of the records – If there are N points, we can measure sparsity of a cube D as: – Negative sparsity indicates cube contains smaller number of points than expected © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Example 2  N=100,  = 5, f = 1/5 = 0.2, N  f = 4 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Density-based: LOF approach  For each point, compute the density of its local neighborhood  Compute local outlier factor (LOF) of a sample p as the average of the ratios of the density of sample p and the density of its nearest neighbors  Outliers are points with largest LOF value In the NN approach, p is 2 not considered as outlier, while LOF approach find both p and p as outliers 1 2 p 2  p 1  © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Clustering-Based  Basic idea: – Cluster the data into groups of different density – Choose points in small cluster as candidate outliers – Compute the distance between candidate points and non-candidate clusters.  If candidate points are far from all other non-candidate points, they are outliers © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Base Rate Fallacy  Bayes theorem:  More generally: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›Base Rate Fallacy (Axelsson, 1999) © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹›