The worldwide use of social media has generated vast volumes of user-generated content, offering valuable insights into public discourse, behavioral dynamics, and emerging trends. However, extracting meaningful topics from such data remains a significant challenge due to the informal, dynamic, and context-dependent nature of online language, where the semantics of terms and hashtags are often shaped by the specific sociocultural and temporal contexts in which they arise. To address these challenges, we propose NTM-HEC (Neural Topic Modeling via Hashtag Embedding Clustering), a novel hashtag-centric methodology for topic discovery that leverages the semantic richness encoded in hashtags, commonly used by social media users to annotate and categorize content. NTM-HEC relies on clustering low-dimensional embeddings of latent hashtag representations to uncover coherent and diverse topic structures. This enables it to fully leverage the inherently topical nature of hashtags, enhancing interpretability and improving robustness to linguistic variability and context-specificity. We evaluate the effectiveness of NTM-HEC through two case studies focused on online discourse surrounding the Russia-Ukraine conflict and the COVID-19 pandemic. In both cases, NTM-HEC outperforms competing models in topic coherence and diversity, demonstrating its ability to capture nuanced, trend-specific semantic patterns within real-world social media discussions.

Neural Topic Modeling in Social Media by Clustering Latent Hashtag Representations

Cantini, Riccardo;Cosentino, Cristian;Marozzo, Fabrizio;Talia, Domenico;Trunfio, Paolo
2025-01-01

Abstract

The worldwide use of social media has generated vast volumes of user-generated content, offering valuable insights into public discourse, behavioral dynamics, and emerging trends. However, extracting meaningful topics from such data remains a significant challenge due to the informal, dynamic, and context-dependent nature of online language, where the semantics of terms and hashtags are often shaped by the specific sociocultural and temporal contexts in which they arise. To address these challenges, we propose NTM-HEC (Neural Topic Modeling via Hashtag Embedding Clustering), a novel hashtag-centric methodology for topic discovery that leverages the semantic richness encoded in hashtags, commonly used by social media users to annotate and categorize content. NTM-HEC relies on clustering low-dimensional embeddings of latent hashtag representations to uncover coherent and diverse topic structures. This enables it to fully leverage the inherently topical nature of hashtags, enhancing interpretability and improving robustness to linguistic variability and context-specificity. We evaluate the effectiveness of NTM-HEC through two case studies focused on online discourse surrounding the Russia-Ukraine conflict and the COVID-19 pandemic. In both cases, NTM-HEC outperforms competing models in topic coherence and diversity, demonstrating its ability to capture nuanced, trend-specific semantic patterns within real-world social media discussions.
2025
9781643686318
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/399597
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact