Bioinformatics Applied to Proteomics

Cristoni, S; Mazzuca, Silvia

Proteome is a fundamental science in which many sciences in the world are directing their efforts. The proteins play a key role in the biological function and their studies make possible to understand the mechanisms that occur in many biological events (human or animal diseases, factor that influence plant and bacterial grown). Due to the complexity of the investigation approach that involve various technologies, a high amount of data are produced. In fact, proteomics has known a strong evolution and now we are in a phase of unparalleled growth that is reflected by the amount of data generated from each experiment. That approach has provided, for the first time, unprecedented opportunities to address biology of humans, animals, plants as well as micro-organisms at system level. Bioinformatics applied to proteomics offered the management, data elaboration and integration of these huge amount of data. It is with this philosophy that this chapter was born. Thus, the role of bioinformatics is fundamental in order to reduce the analysis time and to provide statistically significant results . To process data efficiently, new software packages and algorithms are continuously being developed to improve protein identification, characterization and quantification in terms of high-throughput and statistical accuracy. However, many limitations exist concerning bioinformatic spectral data elaboration. In particular, for the analysis of plant proteins extensive data elaboration is necessary due to the lack of structural information in the proteomic and genomic public databases. The main focus of this chapter is to describe in detail the status of bioinformatics applied to proteomic studies. Moreover, the elaboration strategies and algorithms that have been adopted to overcome the well known limitations of the protein analysis without database structural information are described and disclosed. This chapter will get rid of light on recent developments in bioinformatic and data-mining approaches, and their limitations when applied to proteomic data sets, in order to reinforce the interdependence between proteomic technologies and bioinformatics tools. Proteomic studies involve the identification as well as qualitative and quantitative comparison of proteins expressed under different conditions, together with description of their properties and functions, usually in a large-scale, high-throughput format. The high dimensionality of data generated from these studies will require the development of improved bioinformatics tools and data-mining approaches for efficient and accurate data analysis of various biological systems (for reviews see, Li et al, 2009; Matthiesen & Jensen, 2008; Wright et al, 2009). After a rapid moving on the wide theme of the genomic and proteomic sciences, in which bioinformatics find their wider applications for the studies of biological systems, the chapter will focus on mass spectrometry that has become the prominent analytical method for the study of proteins and proteomes in post-genome era. The high volumes of complex spectra and data generated from such experiments represent new challenges for the field of bioinformatics. The past decade has seen an explosion of informatics tools targeted towards the processing, analysis, storage, and integration of mass spectrometry based proteomic data. In this chapter, some of the more recent developments in proteome informatics will be discussed. This includes new tools for predicting the properties of proteins and peptides which can be exploited in experimental proteomic design, and tools for the identification of peptides and proteins from their mass spectra. Similarly, informatics approaches are required for the move towards quantitative proteomics which are also briefly discussed. Finally, the growing number of proteomic data repositories and emerging data standards developed for the field are highlighted. These tools and technologies point the way towards the next phase of experimental proteomic and informatics challenges that the proteomics community will face. A particular emphasis is employed to describe the importance of statistics is now an essential component to understand the vast datasets and this is emphasized throughout the text. The majority of the chapter is devoted to the description of bioinformatics technologies (hardware and data management and applications) with particular emphasis on the bioinformatics improvements that have made possible to obtain significant results in the study of proteomics. Particular attention is focused on the emerging statistic semantic and network learning technologies. Moreover a particular attention is posed focusing on data sharing that is the essential core of system biology data elaboration. Finally, many examples of bioinformatics applied to biological systems are distributed along the different section of the chapter so to lead the reader to completely fill and understand the benefits of bioinformatics applied to system biology.