MOTIVATION: The exponential growth of non-coding RNA research-with over 230 000 papers published since 2000-has created an urgent knowledge management crisis in molecular biology. Despite their crucial regulatory roles, microRNAs (miRNAs) face a significant curation bottleneck, with only 1400 articles manually curated to the Gene Ontology (GO) knowledgebase over a decade. This highlights the critical need for automated systems that can accelerate biocuration while maintaining high-quality standards. RESULTS: We present GOFlowLLM, an automated curation pipeline powered by reasoning-enabled Large Language Models (LLMs) that follows established GO curation flowcharts to extract and structure miRNA-mediated gene silencing data at scale. When evaluated on existing curation, GOFlowLLM selects the correct GO term in 90% of cases, with curators agreeing with 95% of the system's reasoning steps and 90% of the evidence selected. Applied to 6996 previously uncurated articles using the Qwen QwQ-32B model, our system identified 2538 new candidate GO annotations on 1785 articles in just 58 hours-potentially doubling the available miRNA GO curation. Manual review shows curators agreed with the selected term in 87% of cases, the model's reasoning in 92% of cases, and the extracted evidence in 93%. The integration of reasoning traces provides transparent justification for annotations that can be reviewed by human curators, addressing a key challenge in adopting AI for scientific curation. AVAILABILITY AND IMPLEMENTATION: GOFlowLLM is implemented as an automated pipeline that follows expert-designed reasoning frameworks to maintain curation quality. The system is available on GitHub: https://github.com/RNAcentral/GO_Flow_LLM.

GOFlowLLM—curating miRNA literature with large language models and flowcharts

Panni, Simona;
2026-01-01

Abstract

MOTIVATION: The exponential growth of non-coding RNA research-with over 230 000 papers published since 2000-has created an urgent knowledge management crisis in molecular biology. Despite their crucial regulatory roles, microRNAs (miRNAs) face a significant curation bottleneck, with only 1400 articles manually curated to the Gene Ontology (GO) knowledgebase over a decade. This highlights the critical need for automated systems that can accelerate biocuration while maintaining high-quality standards. RESULTS: We present GOFlowLLM, an automated curation pipeline powered by reasoning-enabled Large Language Models (LLMs) that follows established GO curation flowcharts to extract and structure miRNA-mediated gene silencing data at scale. When evaluated on existing curation, GOFlowLLM selects the correct GO term in 90% of cases, with curators agreeing with 95% of the system's reasoning steps and 90% of the evidence selected. Applied to 6996 previously uncurated articles using the Qwen QwQ-32B model, our system identified 2538 new candidate GO annotations on 1785 articles in just 58 hours-potentially doubling the available miRNA GO curation. Manual review shows curators agreed with the selected term in 87% of cases, the model's reasoning in 92% of cases, and the extracted evidence in 93%. The integration of reasoning traces provides transparent justification for annotations that can be reviewed by human curators, addressing a key challenge in adopting AI for scientific curation. AVAILABILITY AND IMPLEMENTATION: GOFlowLLM is implemented as an automated pipeline that follows expert-designed reasoning frameworks to maintain curation quality. The system is available on GitHub: https://github.com/RNAcentral/GO_Flow_LLM.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/397837
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 2
social impact