The development and diffusion of ontologies allowed the creation of large banks of information regarding multiple domains known as knowledge bases. Ontologies propose a way to represent information providing semantic meaning that allows the data to be machine-interpretable. However, enjoying such rich knowledge is a difficult task for the majority of potential users who do not know either the knowledge-base definition or how to write queries with SPARQL. Systems able to translate natural language questions into SPARQL queries have the potential to overcome this problem. In this paper, we propose an approach that combines the Named Entity Recognition and Neural Machine Translation tasks to perform an automatic translation of natural language questions into executables SPARQL queries. The resulting approach provides robustness to the presence of terms that do not occur in the training set. We evaluate the potential of our approach by using Monument and QALD-9, which are well-known datasets for Question Answering over the DBpedia ontology.
A Neural-Machine-Translation System Resilient to Out of Vocabulary Words for Translating Natural Language to SPARQL
Ricca F.;Cuteri B.
2021-01-01
Abstract
The development and diffusion of ontologies allowed the creation of large banks of information regarding multiple domains known as knowledge bases. Ontologies propose a way to represent information providing semantic meaning that allows the data to be machine-interpretable. However, enjoying such rich knowledge is a difficult task for the majority of potential users who do not know either the knowledge-base definition or how to write queries with SPARQL. Systems able to translate natural language questions into SPARQL queries have the potential to overcome this problem. In this paper, we propose an approach that combines the Named Entity Recognition and Neural Machine Translation tasks to perform an automatic translation of natural language questions into executables SPARQL queries. The resulting approach provides robustness to the presence of terms that do not occur in the training set. We evaluate the potential of our approach by using Monument and QALD-9, which are well-known datasets for Question Answering over the DBpedia ontology.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.