The high-order coclustering problem, i.e., the problem of simultaneously clustering heterogeneous types of domain, has become an active research area in the last few years, due to the notable impact it has on several application scenarios. This problem is generally faced by optimizing a weighted combination of functions measuring the quality of coclustering over each pair of domains, where weights are chosen based on the supposed reliability/relevance of their correlation. However, little knowledge is likely to be available, in practice, in order to set these weights in a definite and precise manner. And, more importantly, it might even be conceptually unclear whether to prefer a weighing scheme over others, in those cases where functions encode contrasting goals so that improving the quality for a pair of domains leads to a deterioration for other pairs. The aim of this paper is precisely to shed light on the impact of weighting schemes on techniques based on linear combinations of pairwise objective functions, and to define an approach that overcomes the above problems by looking for an agreement-intuitively, a kind of compromise-among the various domains, thereby getting rid of the need to define an appropriate weighting scheme. Two algorithms performing coclustering on "star-structured" domains, based on linear combinations and agreements, respectively, have been designed within an information-theoretic framework. Results from a thorough experimentation, on both synthetic and real data, are discussed, in order to assess the effectiveness of the approaches and to get more insight into their actual behavior.

Co-Clustering Multiple Heterogeneous Domains: Linear Combinations and Agreements

GRECO, Gianluigi;GUZZO, Antonella;
2010

Abstract

The high-order coclustering problem, i.e., the problem of simultaneously clustering heterogeneous types of domain, has become an active research area in the last few years, due to the notable impact it has on several application scenarios. This problem is generally faced by optimizing a weighted combination of functions measuring the quality of coclustering over each pair of domains, where weights are chosen based on the supposed reliability/relevance of their correlation. However, little knowledge is likely to be available, in practice, in order to set these weights in a definite and precise manner. And, more importantly, it might even be conceptually unclear whether to prefer a weighing scheme over others, in those cases where functions encode contrasting goals so that improving the quality for a pair of domains leads to a deterioration for other pairs. The aim of this paper is precisely to shed light on the impact of weighting schemes on techniques based on linear combinations of pairwise objective functions, and to define an approach that overcomes the above problems by looking for an agreement-intuitively, a kind of compromise-among the various domains, thereby getting rid of the need to define an appropriate weighting scheme. Two algorithms performing coclustering on "star-structured" domains, based on linear combinations and agreements, respectively, have been designed within an information-theoretic framework. Results from a thorough experimentation, on both synthetic and real data, are discussed, in order to assess the effectiveness of the approaches and to get more insight into their actual behavior.
Data Mining; Coclustering
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/124271
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 13
social impact