In this work, we introduce a novel definition of outlier, namely the Gradient Outlier Factor (or GOF), with the aim to provide a definition that unifies with the statistical one on some standard distributions but has a different behavior in the presence of mixture distributions. Intuitively, the GOF score measures the probability to stay in the neighborhood of a certain object. It is directly proportional to the density and inversely proportional to the variation of the density. We derive formal properties under which the GOF definition unifies the statistical outlier definition and show that the unification holds for some standard distributions, while the GOF is able to capture tails in the presence of different distributions even if their densities sensibly differ. Moreover, we provide a probabilistic interpretation of the GOF score, by means of the notion of density of the data density. Experimental results confirm that there are scenarios in which the novel definition can be profitably employed. To the best of our knowledge, except for distance-based outlier, no other data mining outlier definition has a so clearly established relationship with statistical outliers.
Scheda prodotto non validato
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo
|Titolo:||Towards generalizing the unification with statistical outliers: the Gradient Outlier Factor measure|
|Data di pubblicazione:||2016|
|Citazione:||Towards generalizing the unification with statistical outliers: the Gradient Outlier Factor measure / Angiulli, Fabrizio; Fassetti, F.. - In: ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA. - ISSN 1556-4681. - 10:3(2016).|
|Appare nelle tipologie:||1.1 Articolo in rivista|