Empowering biological knowledgebases: advances in human-in-the-loop AI-driven literature curation

IRIS

Biological knowledgebases facilitate discovery across the life sciences by structuring experimental findings into human-readable and computable formats. These essential resources are maintained by a small number of professional biocurators worldwide and face combined chronic underfunding and the exponential growth of the literature. In this perspective, we review how artificial intelligence, particularly large language models and agentic systems, can augment literature-curation workflows. Applications include literature recommendation, entity recognition, data extraction, summarization, ontology development, and quality control with emphasis on published use cases at Global Core BioData Resources and ELIXIR Core Data Resources. We identify key challenges, including the scarcity of training data, difficulty in extracting complex relationships, and concerns about error propagation. To address these challenges, we propose a human-in-the-loop framework where generative artificial intelligence approaches accelerate routine tasks while curators provide critical evaluation and domain expertise. We also propose practical recommendations for the community, including the creation of shared benchmark datasets, harmonized evaluation frameworks, and best-practice guidelines for transparent human-in-the-loop AI deployment in biocuration. These synergistic partnerships will be critical to ensure biological rigour, accelerating knowledge integration while maintaining the quality essential for trusted biological resources.

Empowering biological knowledgebases: advances in human-in-the-loop AI-driven literature curation

Wood, Valerie;Jeffryes, Matt;Green, Andrew F;Blum, Matthias;Orchard, Sandra;Panni, Simona;Quaglia, Federica;Rodriguez-Esteban, Raul;Seager, James;Tosatto, Silvio C E;Wittig, Ulrike;Harrison, Melissa

2026-01-01

Abstract

Biological knowledgebases facilitate discovery across the life sciences by structuring experimental findings into human-readable and computable formats. These essential resources are maintained by a small number of professional biocurators worldwide and face combined chronic underfunding and the exponential growth of the literature. In this perspective, we review how artificial intelligence, particularly large language models and agentic systems, can augment literature-curation workflows. Applications include literature recommendation, entity recognition, data extraction, summarization, ontology development, and quality control with emphasis on published use cases at Global Core BioData Resources and ELIXIR Core Data Resources. We identify key challenges, including the scarcity of training data, difficulty in extracting complex relationships, and concerns about error propagation. To address these challenges, we propose a human-in-the-loop framework where generative artificial intelligence approaches accelerate routine tasks while curators provide critical evaluation and domain expertise. We also propose practical recommendations for the community, including the creation of shared benchmark datasets, harmonized evaluation frameworks, and best-practice guidelines for transparent human-in-the-loop AI deployment in biocuration. These synergistic partnerships will be critical to ensure biological rigour, accelerating knowledge integration while maintaining the quality essential for trusted biological resources.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2026

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/399580

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

1

social impact