In order to alleviate both the spatial and temporal cost of the nearest neighbor classification rule, competence preservation techniques aim at substituting the training set with a selected subset, known as consistent subset. In order to improve generalization and to prevent induction of overly complex models, in this study the application of the Pessimistic Error Estimate (PEE) principle in the context of the nearest neighbor rule is investigated. Generalization is estimated as a trade-off between training set accuracy and model complexity. As major results, it is shown that PEE-like selection strategies guarantee to preserve the accuracy of the consistent subset with a far larger reduction factor and, moreover, that sensible generalization improvements can be obtained by using a reduced subset. Moreover, comparison with state-of-the-art hybrid prototype selection methods highlight that the here introduced FCNN-PAC strategy is able to obtain a model of size comparable to that obtained by the best prototype selection methods, with far smaller time requirements, corresponding to four orders of magnitude on medium-sized datasets.

Pruning strategies for nearest neighbor competence preservation learners

Angiulli, Fabrizio
;
NARVAEZ VILEMA, MIRYAN ESTELA
2018-01-01

Abstract

In order to alleviate both the spatial and temporal cost of the nearest neighbor classification rule, competence preservation techniques aim at substituting the training set with a selected subset, known as consistent subset. In order to improve generalization and to prevent induction of overly complex models, in this study the application of the Pessimistic Error Estimate (PEE) principle in the context of the nearest neighbor rule is investigated. Generalization is estimated as a trade-off between training set accuracy and model complexity. As major results, it is shown that PEE-like selection strategies guarantee to preserve the accuracy of the consistent subset with a far larger reduction factor and, moreover, that sensible generalization improvements can be obtained by using a reduced subset. Moreover, comparison with state-of-the-art hybrid prototype selection methods highlight that the here introduced FCNN-PAC strategy is able to obtain a model of size comparable to that obtained by the best prototype selection methods, with far smaller time requirements, corresponding to four orders of magnitude on medium-sized datasets.
2018
Classification; Nearest neighbor rule; Overfitting; Pessimistic error estimate; Training-set consistent subset; Computer Science Applications1707 Computer Vision and Pattern Recognition; Cognitive Neuroscience; Artificial Intelligence
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/286462
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact