An Efficient Algorithm for Clustering Sets

IRIS

This paper proposes an algorithm, named HWK-Sets, based on K-Means, suited for clustering data which are variable-sized sets of elementary items. Clustering sets is difficult because data objects do not have numerical attributes and it is not possible to use the classical Euclidean distance upon which K-Means is normally based. An adaptation of the Jaccard distance between sets is used, which exploits application-sensitive information. More in particular, the Hartigan and Wong variation of K-Means is adopted which uses medoids as cluster representatives, can work with several seeding methods and can favor the fast attainment of a careful solution. The paper introduces HWK-Sets which is implemented in Java by parallel streams. Then, the efficiency and accuracy of HWK-Sets are demonstrated by simulation experiments

An Efficient Algorithm for Clustering Sets

Nigro, Libero^{Membro del Collaboration Group};Cicirelli, Franco^{Membro del Collaboration Group}

2023-01-01

Abstract

This paper proposes an algorithm, named HWK-Sets, based on K-Means, suited for clustering data which are variable-sized sets of elementary items. Clustering sets is difficult because data objects do not have numerical attributes and it is not possible to use the classical Euclidean distance upon which K-Means is normally based. An adaptation of the Jaccard distance between sets is used, which exploits application-sensitive information. More in particular, the Hartigan and Wong variation of K-Means is adopted which uses medoids as cluster representatives, can work with several seeding methods and can favor the fast attainment of a careful solution. The paper introduces HWK-Sets which is implemented in Java by parallel streams. Then, the efficiency and accuracy of HWK-Sets are demonstrated by simulation experiments

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Parole chiave
	
				Clustering sets, Hartigan &amp; Wong K-Means, Jaccard distance, Medoids, Seeding methods, benchmark datasets
			
	Appare nelle tipologie:
	
				2.1 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/355797

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact