This work introduces the Prototype-based Domain Description rule (PDD) one-class classifier. PDD is a nearest neighbor-based classifier since it accepts objects on the basis of their nearest neighbor distances in a reference set of objects, also called prototypes. For a suitable choice of the prototype set, the PDD classifier is equivalent to another nearest neighbor-based one-class classifier, namely, the NNDD classifier. Moreover, it generalizes statistical tests for outlier detection. The concept of a PDD consistent subset is introduced, which exploits only a selected subset of the training set. It is shown that computing a minimum size PDD consistent subset is, in general, not approximable within any constant factor. A logarithmic approximation factor algorithm, called the CPDD algorithm, for computing a minimum size PDD consistent subset is then introduced. In order to efficiently manage very large data sets, a variant of the basic rule, called Fast CPDD, is also presented. Experimental results show that the CPDD rule sensibly improves over the CNNDD classifier, namely the condensed variant of NNDD, in terms of size of the subset while guaranteeing a comparable classification quality, that it is competitive over other one-class classification methods and is suitable to classify large data sets.
Prototype-based Domain Description for One-Class Classification
ANGIULLI, Fabrizio
2012-01-01
Abstract
This work introduces the Prototype-based Domain Description rule (PDD) one-class classifier. PDD is a nearest neighbor-based classifier since it accepts objects on the basis of their nearest neighbor distances in a reference set of objects, also called prototypes. For a suitable choice of the prototype set, the PDD classifier is equivalent to another nearest neighbor-based one-class classifier, namely, the NNDD classifier. Moreover, it generalizes statistical tests for outlier detection. The concept of a PDD consistent subset is introduced, which exploits only a selected subset of the training set. It is shown that computing a minimum size PDD consistent subset is, in general, not approximable within any constant factor. A logarithmic approximation factor algorithm, called the CPDD algorithm, for computing a minimum size PDD consistent subset is then introduced. In order to efficiently manage very large data sets, a variant of the basic rule, called Fast CPDD, is also presented. Experimental results show that the CPDD rule sensibly improves over the CNNDD classifier, namely the condensed variant of NNDD, in terms of size of the subset while guaranteeing a comparable classification quality, that it is competitive over other one-class classification methods and is suitable to classify large data sets.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.