Background: Recent technological advances in DNA sequencing and genotyping have led to theaccumulation of a remarkable quantity of data on genetic polymorphisms. However, thedevelopment of new statistical and computational tools for effective processing of these data hasnot been equally as fast. In particular, Machine Learning literature is limited to relatively few paperswhich are focused on the development and application of data mining methods for the analysis ofgenetic variability. On the other hand, these papers apply to genetic data procedures which hadbeen developed for a different kind of analysis and do not take into account the peculiarities ofpopulation genetics. The aim of our study was to define a new similarity measure, specificallyconceived for measuring the similarity between the genetic profiles of two groups of subjects (i.e.,cases and controls) taking into account that genetic profiles are usually distributed in a populationgroup according to the Hardy Weinberg equilibrium.Results: We set up a new kernel function consisting of a similarity measure between groups ofsubjects genotyped for numerous genetic loci. This measure weighs different genetic profilesaccording to the estimates of gene frequencies at Hardy-Weinberg equilibrium in the population.We named this function the "Hardy-Weinberg kernel".The effectiveness of the Hardy-Weinberg kernel was compared to the performance of the wellestablished linear kernel. We found that the Hardy-Weinberg kernel significantly outperformed thelinear kernel in a number of experiments where we used either simulated data or real data.Conclusion: The "Hardy-Weinberg kernel" reported here represents one of the first attempts atincorporating genetic knowledge into the definition of a kernel function designed for the analysisof genetic data. We show that the best performance of the "Hardy-Weinberg kernel" is observedwhen rare genotypes have different frequencies in cases and controls. The ability to capture theeffect of rare genotypes on phenotypic traits might be a very important and useful feature, as mostof the current statistical tools loose most of their statistical power when rare genotypes areinvolved in the susceptibility to the trait under study.

A novel similarity-measure for the analysis of genetic data in complex phenotypes

MONTESANTO, Alberto;CONFORTI, Domenico;ROSE, Giuseppina;PASSARINO, Giuseppe
2009-01-01

Abstract

Background: Recent technological advances in DNA sequencing and genotyping have led to theaccumulation of a remarkable quantity of data on genetic polymorphisms. However, thedevelopment of new statistical and computational tools for effective processing of these data hasnot been equally as fast. In particular, Machine Learning literature is limited to relatively few paperswhich are focused on the development and application of data mining methods for the analysis ofgenetic variability. On the other hand, these papers apply to genetic data procedures which hadbeen developed for a different kind of analysis and do not take into account the peculiarities ofpopulation genetics. The aim of our study was to define a new similarity measure, specificallyconceived for measuring the similarity between the genetic profiles of two groups of subjects (i.e.,cases and controls) taking into account that genetic profiles are usually distributed in a populationgroup according to the Hardy Weinberg equilibrium.Results: We set up a new kernel function consisting of a similarity measure between groups ofsubjects genotyped for numerous genetic loci. This measure weighs different genetic profilesaccording to the estimates of gene frequencies at Hardy-Weinberg equilibrium in the population.We named this function the "Hardy-Weinberg kernel".The effectiveness of the Hardy-Weinberg kernel was compared to the performance of the wellestablished linear kernel. We found that the Hardy-Weinberg kernel significantly outperformed thelinear kernel in a number of experiments where we used either simulated data or real data.Conclusion: The "Hardy-Weinberg kernel" reported here represents one of the first attempts atincorporating genetic knowledge into the definition of a kernel function designed for the analysisof genetic data. We show that the best performance of the "Hardy-Weinberg kernel" is observedwhen rare genotypes have different frequencies in cases and controls. The ability to capture theeffect of rare genotypes on phenotypic traits might be a very important and useful feature, as mostof the current statistical tools loose most of their statistical power when rare genotypes areinvolved in the susceptibility to the trait under study.
2009
Similarity; Genetic analysis; Kernel
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/130484
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 2
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
social impact