Minimizing the variance of cluster mixture models for clustering uncertain objects

IRIS

In recent years, there has been a growing interest in clustering uncertain objects. In contrast to traditional, ‘sharp’ data representation models, uncertain objects are modeled as probability distributions defined over uncertainty regions. In this context, a major issue is related to the poor efficiency of existing algorithms, which is mainly due to expensive computation of the distance between uncertain objects. In this work, we extend our earlier work in which a novel formulation to the problem of clustering uncertain objects is defined based on the minimization of the variance of the mixture models that represent the clusters being discovered. Analytical properties about the computation of variance for cluster mixture models are derived and exploited by a partitional clustering algorithm, called MMVar. This algorithm achieves high efficiency since it does not need to employ any distance measure between uncertain objects. Experiments have shown that MMVar is scalable and outperforms state-of-the-art algorithms in terms of efficiency, while achieving better average performance in terms of accuracy.

Minimizing the variance of cluster mixture models for clustering uncertain objects

F. Gullo;G. Ponti;TAGARELLI, Andrea

2013-01-01

Abstract

In recent years, there has been a growing interest in clustering uncertain objects. In contrast to traditional, ‘sharp’ data representation models, uncertain objects are modeled as probability distributions defined over uncertainty regions. In this context, a major issue is related to the poor efficiency of existing algorithms, which is mainly due to expensive computation of the distance between uncertain objects. In this work, we extend our earlier work in which a novel formulation to the problem of clustering uncertain objects is defined based on the minimization of the variance of the mixture models that represent the clusters being discovered. Analytical properties about the computation of variance for cluster mixture models are derived and exploited by a partitional clustering algorithm, called MMVar. This algorithm achieves high efficiency since it does not need to employ any distance measure between uncertain objects. Experiments have shown that MMVar is scalable and outperforms state-of-the-art algorithms in terms of efficiency, while achieving better average performance in terms of accuracy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2013
			
	Parole chiave
	
				uncertain data mining; uncertain cluster prototype; partitional clustering
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/135476

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

14

11

social impact