Natural products have particular significance in oncological drug discovery, yet traditional research approaches face significant logistical and economic obstacles. This study presents a foundational computational framework designed to predict cancer cell lines' transcriptional (alterations in gene expression) responses to natural compound treatments from baseline gene expression profiles. We developed a systematic three-step pipeline using 599 experiments from the LINCS dataset, encompassing 11 natural compounds across 61 cell lines with varying dosages and time points. In Step 1, we implemented comprehensive data reprocessing. Step 2 employs self-supervised learning through a PCA-based encoder-decoder architecture to create biologically meaningful low-dimensional embeddings of treatment responses. We used feature selection to reduce the dataset from 12,328 to 3,082 genes while preserving approximately 90% of total variance, capturing essential biological variables including compound specificity, dosage effects, and temporal dynamics in a 50-dimensional latent space. Our results demonstrate promising reconstruction performance with an R of 0.66 and a mean gene correlation of 0.81, indicating that biological information can be recovered from compact embedding representations. The linear Ridge decoder reconstructed gene-level responses from the latent space. These findings establish the infrastructure for the ultimate goal: enabling researchers to computationally simulate natural compound effects before conducting expensive, time-consuming laboratory experiments. The implementation of these foundational steps positions the framework for future development of predictive modeling capabilities that could enhance the discovery of new applications of natural product therapeutics in cancer.

A Self-Supervised Framework for Predicting the Efficacy of Anti-Cancer Natural Compounds via Transcriptional Response Embeddings

Pezzi V.;Sirianni R.
2025-01-01

Abstract

Natural products have particular significance in oncological drug discovery, yet traditional research approaches face significant logistical and economic obstacles. This study presents a foundational computational framework designed to predict cancer cell lines' transcriptional (alterations in gene expression) responses to natural compound treatments from baseline gene expression profiles. We developed a systematic three-step pipeline using 599 experiments from the LINCS dataset, encompassing 11 natural compounds across 61 cell lines with varying dosages and time points. In Step 1, we implemented comprehensive data reprocessing. Step 2 employs self-supervised learning through a PCA-based encoder-decoder architecture to create biologically meaningful low-dimensional embeddings of treatment responses. We used feature selection to reduce the dataset from 12,328 to 3,082 genes while preserving approximately 90% of total variance, capturing essential biological variables including compound specificity, dosage effects, and temporal dynamics in a 50-dimensional latent space. Our results demonstrate promising reconstruction performance with an R of 0.66 and a mean gene correlation of 0.81, indicating that biological information can be recovered from compact embedding representations. The linear Ridge decoder reconstructed gene-level responses from the latent space. These findings establish the infrastructure for the ultimate goal: enabling researchers to computationally simulate natural compound effects before conducting expensive, time-consuming laboratory experiments. The implementation of these foundational steps positions the framework for future development of predictive modeling capabilities that could enhance the discovery of new applications of natural product therapeutics in cancer.
2025
bio-activity prediction
cancer drug discovery
deep learning
gene expression embeddings
natural products
self-supervised learning
transcriptomics
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/398618
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact