Pharmacogenomics is an important research field that studies the impact of genetic variation of patients on drug responses, looking for correlations between single nucleotide polymorphisms (SNPs) of patient genome and drug toxicity or efficacy. The large number of available samples and the high resolution of the instruments allow microarray platforms to produce huge amounts of SNP data. To analyze such data and find correlations in a reasonable time, high-performance computing solutions must be used. Cloud4SNP is a bioinformatics tool, based on Data Mining Cloud Framework (DMCF), for parallel preprocessing and statistical analysis of SNP pharmacogenomics microarray data. This work describes how Cloud4SNP has been extended to execute applications on Apache Spark, which provides faster execution time for iterative and batch processing. The experimental evaluation shows that Cloud4SNP is able to exploit the high-performance features of Apache Spark, obtaining faster execution times and high level of scalability, with a global speedup that is very close to linear values.

High-Performance Framework to Analyze Microarray Data

Marozzo F.;Belcastro L.
2022

Abstract

Pharmacogenomics is an important research field that studies the impact of genetic variation of patients on drug responses, looking for correlations between single nucleotide polymorphisms (SNPs) of patient genome and drug toxicity or efficacy. The large number of available samples and the high resolution of the instruments allow microarray platforms to produce huge amounts of SNP data. To analyze such data and find correlations in a reasonable time, high-performance computing solutions must be used. Cloud4SNP is a bioinformatics tool, based on Data Mining Cloud Framework (DMCF), for parallel preprocessing and statistical analysis of SNP pharmacogenomics microarray data. This work describes how Cloud4SNP has been extended to execute applications on Apache Spark, which provides faster execution time for iterative and batch processing. The experimental evaluation shows that Cloud4SNP is able to exploit the high-performance features of Apache Spark, obtaining faster execution times and high level of scalability, with a global speedup that is very close to linear values.
978-1-0716-1838-7
978-1-0716-1839-4
Cloud computing
Pharmacogenomics
Single nucleotide polymorphisms
Statistical analysis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/328248
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact