The increase in the volume and heterogeneity of semistructured data based application scenarios has demanded for next-generation methods that are able to effectively couple syntactic with semantic information in data management and mining tasks. The focus of this paper is on the development of methods for determining semantic relatedness in tree-shaped semistructured data and on the assessment of the impact of these methods on structural sense ranking in such data. By exploiting key features of a lexical knowledge base like WordNet, namely ontological relations and concept definitions, we propose a twofold approach that takes into account the particular form of labeled tree data as a conceptual hierarchical representation of real-world objects. We infer indirect relationships between tag concepts and exploit an interleaved search through different concept hierarchies in order to extend semantic relatedness measures originally conceived for plain-text data to deal with labeled tree data instances. We also develop a structural sense ranking framework which employs a context graph built on the tag concepts and the structural relations among tags in the tree data. Experimental evidence on a large real-world collection of Wikipedia articles has shown that the proposed methods can effectively detect and maximize semantic relatedness in tree-structured data, and can be profitably used to perform structural sense ranking.

Exploring Dictionary-based Semantic Relatedness in Labeled Tree Data

TAGARELLI, Andrea
2013-01-01

Abstract

The increase in the volume and heterogeneity of semistructured data based application scenarios has demanded for next-generation methods that are able to effectively couple syntactic with semantic information in data management and mining tasks. The focus of this paper is on the development of methods for determining semantic relatedness in tree-shaped semistructured data and on the assessment of the impact of these methods on structural sense ranking in such data. By exploiting key features of a lexical knowledge base like WordNet, namely ontological relations and concept definitions, we propose a twofold approach that takes into account the particular form of labeled tree data as a conceptual hierarchical representation of real-world objects. We infer indirect relationships between tag concepts and exploit an interleaved search through different concept hierarchies in order to extend semantic relatedness measures originally conceived for plain-text data to deal with labeled tree data instances. We also develop a structural sense ranking framework which employs a context graph built on the tag concepts and the structural relations among tags in the tree data. Experimental evidence on a large real-world collection of Wikipedia articles has shown that the proposed methods can effectively detect and maximize semantic relatedness in tree-structured data, and can be profitably used to perform structural sense ranking.
2013
semistructured data and XML; structural sense ranking; semantic relatedness
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/134277
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 14
social impact