Question Answering (QA) is a critical NLP task mainly based on deep learning models that allow users to answer questions in natural language and get a response. Since available general-purpose datasets are often not effective enough to suitably train a QA model, one of the main problems in this context is related to the availability of datasets which fit the considered context. Moreover, such datasets are generally in English, making QA system design in different languages difficult. To alleviate the above-depicted issues, in this work, we propose a framework which automatically generates a dataset for a given language and a given topic. To train our system in any language, an alternative way to evaluate the quality of the answers is needed, so we propose a novel unsupervised method. To test the proposed technique, we generate a dataset for the topic "computer science" and the language "Italian" and compare the performance of a QA system trained on available datasets and the built one.

A Semi-automatic Data Generator for Query Answering

Fabrizio Angiulli;Fabio Fassetti;Simona Nistico'
2022-01-01

Abstract

Question Answering (QA) is a critical NLP task mainly based on deep learning models that allow users to answer questions in natural language and get a response. Since available general-purpose datasets are often not effective enough to suitably train a QA model, one of the main problems in this context is related to the availability of datasets which fit the considered context. Moreover, such datasets are generally in English, making QA system design in different languages difficult. To alleviate the above-depicted issues, in this work, we propose a framework which automatically generates a dataset for a given language and a given topic. To train our system in any language, an alternative way to evaluate the quality of the answers is needed, so we propose a novel unsupervised method. To test the proposed technique, we generate a dataset for the topic "computer science" and the language "Italian" and compare the performance of a QA system trained on available datasets and the built one.
2022
978-3-031-16563-4
978-3-031-16564-1
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/345538
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact