There is an increasing interest in developing statistical tools for extracting information from textual datasets. In a text mining framework, a knowledge discovery process typically implies the reduction of the vocabulary dimensionality, via a feature selection or a feature extraction approach. Here we propose a strategy designed to reduce the dimensionality of textual datasets through a network-based procedure. Network tools allow performing the reduction taking into account the association relations among terms used in the texts. The effectiveness of this strategy is shown by analysing a set of tweets about the recent COVID-19 global pandemic.
Network-based dimensionality reduction for textual datasets
Michelangelo Misuraca;
2023-01-01
Abstract
There is an increasing interest in developing statistical tools for extracting information from textual datasets. In a text mining framework, a knowledge discovery process typically implies the reduction of the vocabulary dimensionality, via a feature selection or a feature extraction approach. Here we propose a strategy designed to reduce the dimensionality of textual datasets through a network-based procedure. Network tools allow performing the reduction taking into account the association relations among terms used in the texts. The effectiveness of this strategy is shown by analysing a set of tweets about the recent COVID-19 global pandemic.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.