There is an increasing interest in developing statistical tools for extracting information from textual datasets. In a text mining framework, a knowledge discovery process typically implies the reduction of the vocabulary dimensionality, via a feature selection or a feature extraction approach. Here we propose a strategy designed to reduce the dimensionality of textual datasets through a network-based procedure. Network tools allow performing the reduction taking into account the association relations among terms used in the texts. The effectiveness of this strategy is shown by analysing a set of tweets about the recent COVID-19 global pandemic.

Network-based dimensionality reduction for textual datasets

Michelangelo Misuraca;
2023-01-01

Abstract

There is an increasing interest in developing statistical tools for extracting information from textual datasets. In a text mining framework, a knowledge discovery process typically implies the reduction of the vocabulary dimensionality, via a feature selection or a feature extraction approach. Here we propose a strategy designed to reduce the dimensionality of textual datasets through a network-based procedure. Network tools allow performing the reduction taking into account the association relations among terms used in the texts. The effectiveness of this strategy is shown by analysing a set of tweets about the recent COVID-19 global pandemic.
2023
9783031158841
vector space model, network analysis, community detection
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/336624
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact