Balanced and Token-Efficient Summarization of User Reviews via Stratified Sampling and Large Language Models

Marozzo, F.; Belcastro, L.; Cosentino, C.; Lio, P.

doi:10.1007/978-3-032-06078-5_17

User-generated reviews offer valuable insights into consumer experiences, preferences, and concerns. They provide direct feedback on product perception and improvements while helping users evaluate strengths, weaknesses, and alternatives. Advanced machine learning techniques, including LLMs like BERT and GPT, enhance the extraction of meaningful information from these vast datasets. This paper introduces a framework leveraging Large Language Models (LLMs) to generate high-quality summaries using minimal input tokens. By employing multidimensional classification (sentiment, topics, emotion) combined with a stratified sampling approach, our framework selects a compact yet comprehensive subset of reviews that accurately represents the original dataset. Tailored prompts guide the LLMs to create balanced summaries that fairly represent both strengths and weaknesses. Experiments on Amazon and Tripadvisor datasets demonstrate that our method significantly reduces token usage and computational costs, while consistently outperforming traditional AI-based summarization approaches in terms of content coverage, balance, and semantic accuracy.

Balanced and Token-Efficient Summarization of User Reviews via Stratified Sampling and Large Language Models

Marozzo F.;Belcastro L.;Cosentino C.;Lio P.

2026-01-01

Abstract

User-generated reviews offer valuable insights into consumer experiences, preferences, and concerns. They provide direct feedback on product perception and improvements while helping users evaluate strengths, weaknesses, and alternatives. Advanced machine learning techniques, including LLMs like BERT and GPT, enhance the extraction of meaningful information from these vast datasets. This paper introduces a framework leveraging Large Language Models (LLMs) to generate high-quality summaries using minimal input tokens. By employing multidimensional classification (sentiment, topics, emotion) combined with a stratified sampling approach, our framework selects a compact yet comprehensive subset of reviews that accurately represents the original dataset. Tailored prompts guide the LLMs to create balanced summaries that fairly represent both strengths and weaknesses. Experiments on Amazon and Tripadvisor datasets demonstrate that our method significantly reduces token usage and computational costs, while consistently outperforming traditional AI-based summarization approaches in terms of content coverage, balance, and semantic accuracy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Codice ISBN
	
				9783032060778
9783032060785
			
	Parole chiave
	
				AI-Generated Summaries
Generative AI
Large Language Models
Opinion Mining
Review Aggregation
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/401639

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

2

2

Balanced and Token-Efficient Summarization of User Reviews via Stratified Sampling and Large Language Models

Marozzo F.;Belcastro L.;Cosentino C.;Lio P.

2026-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)