Enhancing active learning through latent space exploration: A k-nearest neighbors approach

IRIS

Supervised machine learning often requires a significant volume of labeled training data, incurring substantial costs for data annotation. In scenarios with limited labeling budgets, selecting the most informative instances for labeling by an annotation oracle becomes crucial. Active learning addresses this challenge by strategically choosing informative instances for labeling, thereby maximizing model performance with limited labeled data. Existing active learning methods, however, typically do not fully exploit abundant unlabeled data that can be used to extract meaningful features from raw data. While some methods integrate variational autoencoders (VAEs) into active learning, this work introduces a novel framework that does not use VAEs merely to assist in the selection of data for the oracle. Instead, our approach leverages the latent space learned by the VAE to heuristically annotate unlabeled data through a k-nearest neighbor classifier within this space. The proposed approach allows to enhance existing active learning methods without relying solely on an annotation oracle, thus reducing the overall annotation cost. Experiments on benchmark datasets show that our proposal can improve the performance of existing active learning methods by up to 33% in terms of classification accuracy and by up to 0.38 in terms of F1-score when the initial labeled data is extremely limited. We make source code and evaluation data available at https://github.com/Franco7Scala/Laken.

Enhancing active learning through latent space exploration: A k-nearest neighbors approach

Flesca S.;Mandaglio D.;Scala F.

2025-01-01

Abstract

Supervised machine learning often requires a significant volume of labeled training data, incurring substantial costs for data annotation. In scenarios with limited labeling budgets, selecting the most informative instances for labeling by an annotation oracle becomes crucial. Active learning addresses this challenge by strategically choosing informative instances for labeling, thereby maximizing model performance with limited labeled data. Existing active learning methods, however, typically do not fully exploit abundant unlabeled data that can be used to extract meaningful features from raw data. While some methods integrate variational autoencoders (VAEs) into active learning, this work introduces a novel framework that does not use VAEs merely to assist in the selection of data for the oracle. Instead, our approach leverages the latent space learned by the VAE to heuristically annotate unlabeled data through a k-nearest neighbor classifier within this space. The proposed approach allows to enhance existing active learning methods without relying solely on an annotation oracle, thus reducing the overall annotation cost. Experiments on benchmark datasets show that our proposal can improve the performance of existing active learning methods by up to 33% in terms of classification accuracy and by up to 0.38 in terms of F1-score when the initial labeled data is extremely limited. We make source code and evaluation data available at https://github.com/Franco7Scala/Laken.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole chiave
	
				Active learning
Latent space
Pseudo-labeling
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/399962

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

0

social impact