We propose a novel approach to the problem of document clustering when multiple organizations are provided for the documents in input. Besides considering the information on the text-based content of the documents, our approach exploits frequent associations of the documents in the groups across the existing classifications, in order to capture how documents tend to be grouped together orthogonally to different views. A third-order tensor for the document collection is built over both the space of terms and the space of the discovered frequent document-associations, and then it is decomposed to finally establish a unique encompassing clustering of documents. Preliminary experiments conducted on a document clustering benchmark have shown the potential of the approach to capture the multi-view structure of existing organizations for a given collection of documents.

A tensor-based clustering approach for multiple document classifications

TAGARELLI, Andrea;GRECO, Sergio
2013-01-01

Abstract

We propose a novel approach to the problem of document clustering when multiple organizations are provided for the documents in input. Besides considering the information on the text-based content of the documents, our approach exploits frequent associations of the documents in the groups across the existing classifications, in order to capture how documents tend to be grouped together orthogonally to different views. A third-order tensor for the document collection is built over both the space of terms and the space of the discovered frequent document-associations, and then it is decomposed to finally establish a unique encompassing clustering of documents. Preliminary experiments conducted on a document clustering benchmark have shown the potential of the approach to capture the multi-view structure of existing organizations for a given collection of documents.
2013
978-989856541-9
document clustering; tensor modeling and decomposition; itemset mining
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/169062
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact