Semantic-aware data imputation in dynamic relational databases via pre-trained language models

IRIS

Digital systems for information and representation management rely on database architectures, whose effectiveness is undermined by the presence of missing values. Data Imputation (DI) is a well-known process that replaces missing values, usually represented by means of nulls, with reliable constants. However, existing methods typically assume a static view of the database, overlooking the fact that real-world databases are often updated over time through the addition of new (possibly incomplete) information. We address Dynamic Data Imputation (DDI), that is the problem of imputing nulls in incrementally updated databases. We show that existing learning-based approaches are ill-suited for DDI, as they require costly retraining whenever the data increases over time. Instead, we propose a novel incremental algorithm called SENtence Transformer based Imputation ( SENTI ) that uses advanced techniques to perform quick and accurate similarity search by exploiting the inference capabilities of Pretrained Language Models, without any need for training. The experiments reveal that our technique outperforms the state-of-the-art static DI approaches (adapted to solve DDI) both in effectiveness and efficiency.

Semantic-aware data imputation in dynamic relational databases via pre-trained language models

Alfano, Gianvincenzo;Greco, Sergio;La Cava, Lucio;Mahmood, Tariq;Trubitsyna, Irina

2026-01-01

Abstract

Digital systems for information and representation management rely on database architectures, whose effectiveness is undermined by the presence of missing values. Data Imputation (DI) is a well-known process that replaces missing values, usually represented by means of nulls, with reliable constants. However, existing methods typically assume a static view of the database, overlooking the fact that real-world databases are often updated over time through the addition of new (possibly incomplete) information. We address Dynamic Data Imputation (DDI), that is the problem of imputing nulls in incrementally updated databases. We show that existing learning-based approaches are ill-suited for DDI, as they require costly retraining whenever the data increases over time. Instead, we propose a novel incremental algorithm called SENtence Transformer based Imputation ( SENTI ) that uses advanced techniques to perform quick and accurate similarity search by exploiting the inference capabilities of Pretrained Language Models, without any need for training. The experiments reveal that our technique outperforms the state-of-the-art static DI approaches (adapted to solve DDI) both in effectiveness and efficiency.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Parole chiave
	
				Dynamic data imputation
Dynamic databases
Sentence transformers
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/404878

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

0

social impact