: Foundation and multimodal models are rapidly becoming a core methodology in molecular informatics, particularly for drug discovery, by leveraging large-scale pretraining across sequences, graphs, 3D structures, and text. This mini-review provides practical guidance on when these models help, how to choose representations and data, and how to design pretraining and adaptation pipelines for real-world use. We clarify what qualifies as a foundation model in chemistry; compare chemical language models, graph-based architectures, and 3D equivariant networks; review multimodal strategies that connect molecules with proteins, pockets, and natural language; and summarize diffusion-based generative modeling. We also emphasize rigorous evaluation, discussing realistic splitting protocols, distribution shift, activity cliffs, uncertainty calibration, and conformal prediction in the context of widely used benchmarks.

Foundation and Multimodal Models for Drug Discovery in Molecular Informatics: Principles, Evaluation, and Practical Guidance

Pastore, Emmanuel Pio;De Rango, Francesco
2026-01-01

Abstract

: Foundation and multimodal models are rapidly becoming a core methodology in molecular informatics, particularly for drug discovery, by leveraging large-scale pretraining across sequences, graphs, 3D structures, and text. This mini-review provides practical guidance on when these models help, how to choose representations and data, and how to design pretraining and adaptation pipelines for real-world use. We clarify what qualifies as a foundation model in chemistry; compare chemical language models, graph-based architectures, and 3D equivariant networks; review multimodal strategies that connect molecules with proteins, pockets, and natural language; and summarize diffusion-based generative modeling. We also emphasize rigorous evaluation, discussing realistic splitting protocols, distribution shift, activity cliffs, uncertainty calibration, and conformal prediction in the context of widely used benchmarks.
2026
benchmarking
chemical language models
diffusion models
drug discovery
foundation models
multimodal learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/402068
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact