K-Means is a well-known clustering algorithm whose goal is partitioning a number of data points into groups (clusters), so as to minimize dissimilari-ties of data, measured by some metric, within the same group. Due to its simplicity, K-Means is often used in machine learning unsupervised cluster-ing applications. However, the execution performance of K-Means can easily become a bottleneck when dealing with very large datasets, paired with a great number of clusters, as those encountered in many big data ecosystems. Therefore, many efforts are reported in the literature devoted to a paralleliza-tion of K-Means, both on shared-nothing and shared-memory architectures. This paper proposes a novel approach to parallel K-Means on multi/many core machines, which is based on the Theatre actor system developed in Ja-va. The realization is based on message-passing for synchronization among actors (workers) but also offers the possibility of sharing data, in a controlled and safe way, among the actors of the same computing node (theatre). The approach proves effective in delivering a high-performance execution. The paper first provides some background information about the basic K-Means algorithm and the Theatre architecture, then an actor-based parallel version of K-Means is described and experimented with.

Performance of Parallel K-Means based on Theatre

Cicirelli Franco;Nigro Libero;Pupo Francesco
2022-01-01

Abstract

K-Means is a well-known clustering algorithm whose goal is partitioning a number of data points into groups (clusters), so as to minimize dissimilari-ties of data, measured by some metric, within the same group. Due to its simplicity, K-Means is often used in machine learning unsupervised cluster-ing applications. However, the execution performance of K-Means can easily become a bottleneck when dealing with very large datasets, paired with a great number of clusters, as those encountered in many big data ecosystems. Therefore, many efforts are reported in the literature devoted to a paralleliza-tion of K-Means, both on shared-nothing and shared-memory architectures. This paper proposes a novel approach to parallel K-Means on multi/many core machines, which is based on the Theatre actor system developed in Ja-va. The realization is based on message-passing for synchronization among actors (workers) but also offers the possibility of sharing data, in a controlled and safe way, among the actors of the same computing node (theatre). The approach proves effective in delivering a high-performance execution. The paper first provides some background information about the basic K-Means algorithm and the Theatre architecture, then an actor-based parallel version of K-Means is described and experimented with.
2022
K-Means Clustering, Actors, Theatre, Java, High-Performance Computing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/328588
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact