The increasing availability of heterogeneous XML informative sources has raised a number of issues concerning how to represent and manage semistructured data. Although XML sources can exhibit proper structures and contents, differently annotated XML documents may in principle encode related semantics due to subjective definitions of markup tags. Discovering knowledge to infer semantic organization of XML documents has become a major challenge in XML data management. In this context, we address the problem of clustering XML data according to structure as well as content features enriched with lexical ontology knowledge. We propose a framework for clustering semantically cohesive XML structures based on a transactional representation model. Experiments on large real datasets give evidence that the proposed approach is highly effective in detecting groups of XML data that exhibit structure and/or content affinities.

Toward Semantic XML Clustering

TAGARELLI, Andrea;GRECO, Sergio
2006-01-01

Abstract

The increasing availability of heterogeneous XML informative sources has raised a number of issues concerning how to represent and manage semistructured data. Although XML sources can exhibit proper structures and contents, differently annotated XML documents may in principle encode related semantics due to subjective definitions of markup tags. Discovering knowledge to infer semantic organization of XML documents has become a major challenge in XML data management. In this context, we address the problem of clustering XML data according to structure as well as content features enriched with lexical ontology knowledge. We propose a framework for clustering semantically cohesive XML structures based on a transactional representation model. Experiments on large real datasets give evidence that the proposed approach is highly effective in detecting groups of XML data that exhibit structure and/or content affinities.
2006
0-89871-611-X
semistructured data and XML; XML mining; document clustering
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/170660
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 35
  • ???jsp.display-item.citation.isi??? 17
social impact