Question Answering (QA) is a critical NLP task mainly based on deep learning models that allow users to answer questions in natural language and get a response. Since available general-purpose datasets are often not effective enough to suitably train a QA model, one of the main problems in this context is related to the availability of datasets which fit the considered context. Moreover, such datasets are generally in English, making QA system design in different languages difficult. To alleviate the above-depicted issues, in this work, we propose a framework which automatically generates a dataset for a given language and a given topic. To train our system in any language, an alternative way to evaluate the quality of the answers is needed, so we propose a novel unsupervised method. To test the proposed technique, we generate a dataset for the topic "computer science" and the language "Italian" and compare the performance of a QA system trained on available datasets and the built one.
A Semi-automatic Data Generator for Query Answering
Fabrizio Angiulli;Fabio Fassetti;Simona Nistico'
2022-01-01
Abstract
Question Answering (QA) is a critical NLP task mainly based on deep learning models that allow users to answer questions in natural language and get a response. Since available general-purpose datasets are often not effective enough to suitably train a QA model, one of the main problems in this context is related to the availability of datasets which fit the considered context. Moreover, such datasets are generally in English, making QA system design in different languages difficult. To alleviate the above-depicted issues, in this work, we propose a framework which automatically generates a dataset for a given language and a given topic. To train our system in any language, an alternative way to evaluate the quality of the answers is needed, so we propose a novel unsupervised method. To test the proposed technique, we generate a dataset for the topic "computer science" and the language "Italian" and compare the performance of a QA system trained on available datasets and the built one.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.