We address the problem of clustering XML data according to semantically-enriched features extracted by analyzing content and structural specifics in the data. Content features are selected from the textual contents of XML elements, while structure features are extracted from XML tag paths on the basis of ontological knowledge. Moreover, we conceive a transactional model for representing sets of semantically cohesive XML structures, and exploit such a model to effectively and efficiently cluster XML data. The resulting clustering framework was successfully tested on some collections extracted from the DBLP XML archive.

Clustering Transactional XML Data with Semantically-Enriched Content and Structural Features

TAGARELLI, Andrea;GRECO, Sergio
2004-01-01

Abstract

We address the problem of clustering XML data according to semantically-enriched features extracted by analyzing content and structural specifics in the data. Content features are selected from the textual contents of XML elements, while structure features are extracted from XML tag paths on the basis of ontological knowledge. Moreover, we conceive a transactional model for representing sets of semantically cohesive XML structures, and exploit such a model to effectively and efficiently cluster XML data. The resulting clustering framework was successfully tested on some collections extracted from the DBLP XML archive.
2004
3-540-23894-8
semistructured data and XML; XML mining; document clustering
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/169876
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 3
social impact