We propose a clustering framework for view-segmented documents, i.e., relatively long documents made up of smaller fragments that can be provided according to a target set of views or aspects. The framework is designed to exploit a view-based document segmentation into a third-order tensor model, whose decomposition result would enable any standard document clustering algorithm to better reflect the multi-faceted nature of the documents. Experimental results on document collections featuring paragraph-based, metadata-based, or user-driven views have shown the significance of the proposed approach, highlighting performance improvement in the document clustering task.
Clustering View-Segmented Documents via Tensor Modeling
TAGARELLI, Andrea;
2014-01-01
Abstract
We propose a clustering framework for view-segmented documents, i.e., relatively long documents made up of smaller fragments that can be provided according to a target set of views or aspects. The framework is designed to exploit a view-based document segmentation into a third-order tensor model, whose decomposition result would enable any standard document clustering algorithm to better reflect the multi-faceted nature of the documents. Experimental results on document collections featuring paragraph-based, metadata-based, or user-driven views have shown the significance of the proposed approach, highlighting performance improvement in the document clustering task.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.