In recent years there has been a growing interest in clustering uncertain data. In contrast to traditional, "sharp" data representation models, uncertain data objects can be represented in terms of an uncertainty region over which a probability density function (pdf) is defined. In this context, the focus has been mainly on partitional and density-based approaches, whereas hierarchical clustering schemes have drawn less attention. We propose a centroid-linkage-based agglomerative hierarchical algorithm for clustering uncertain objects, named U-AHC. The cluster merging criterion is based on an information-theoretic measure to compute the distance between cluster prototypes. These prototypes are represented as mixture densities that summarize the pdfs of all the uncertain objects in the clusters. Experiments have shown that our method outperforms state-of-the-art clustering algorithms from an accuracy viewpoint while achieving reasonably good efficiency.

A Hierarchical Algorithm for Clustering Uncertain Data via an Information-Theoretic Approach

TAGARELLI, Andrea;GRECO, Sergio
2008-01-01

Abstract

In recent years there has been a growing interest in clustering uncertain data. In contrast to traditional, "sharp" data representation models, uncertain data objects can be represented in terms of an uncertainty region over which a probability density function (pdf) is defined. In this context, the focus has been mainly on partitional and density-based approaches, whereas hierarchical clustering schemes have drawn less attention. We propose a centroid-linkage-based agglomerative hierarchical algorithm for clustering uncertain objects, named U-AHC. The cluster merging criterion is based on an information-theoretic measure to compute the distance between cluster prototypes. These prototypes are represented as mixture densities that summarize the pdfs of all the uncertain objects in the clusters. Experiments have shown that our method outperforms state-of-the-art clustering algorithms from an accuracy viewpoint while achieving reasonably good efficiency.
2008
978-076953502-9
uncertain data mining; clustering; hierarchical clustering
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/161303
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 28
  • ???jsp.display-item.citation.isi??? 18
social impact