Detecting anomalous individuals from a given data population, is one major task pursued in knowledge discovery systems. Such exceptional individuals are usually referred to as outliers in the literature. Outlier detection has important applications in bioinformatics, fraud detection, network robustness analysis and intrusion detection and several techniques have been developed to obtain it, ranging from clustering-based to proximity-based methods to domain density analysis. Roughly speaking, such techniques models the "normal" behavior of individuals by computing some form of statistics over the given data set. In this paper we propose a rather different approach to outlier detection that should not be looked at as alternative but, rather, complementary to those statistical-like methods. Our approach consists in modelling what should be "normal" in the form of a logical theory. Then, the given data set is analyzed on the basis of that theory to single out anomalous data elements. In the paper we first formalize our theory-based approach to outlier detection and then study the cost implied by realizing outlier detection in this setting. As usual with database, we shall concentrate on data complexity, that is, the complexity measured assuming the given data set to be the input to the problem, while the underlying theory is considered fixed.

Detecting Outliers via Logical Theories and its Data Complexity

ANGIULLI, Fabrizio;GRECO, Gianluigi;PALOPOLI, Luigi
2004-01-01

Abstract

Detecting anomalous individuals from a given data population, is one major task pursued in knowledge discovery systems. Such exceptional individuals are usually referred to as outliers in the literature. Outlier detection has important applications in bioinformatics, fraud detection, network robustness analysis and intrusion detection and several techniques have been developed to obtain it, ranging from clustering-based to proximity-based methods to domain density analysis. Roughly speaking, such techniques models the "normal" behavior of individuals by computing some form of statistics over the given data set. In this paper we propose a rather different approach to outlier detection that should not be looked at as alternative but, rather, complementary to those statistical-like methods. Our approach consists in modelling what should be "normal" in the form of a logical theory. Then, the given data set is analyzed on the basis of that theory to single out anomalous data elements. In the paper we first formalize our theory-based approach to outlier detection and then study the cost implied by realizing outlier detection in this setting. As usual with database, we shall concentrate on data complexity, that is, the complexity measured assuming the given data set to be the input to the problem, while the underlying theory is considered fixed.
2004
3-540-23357-1
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/171448
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact