To analyze1large DNA data sets, we hypothesized that the organization of repeated bases within DNA follows rules similar to Cellular Automata (CA). These sequences could be defined as regular domains. By considering DNA strings as a finite onedimensional cell automated, consisting of a finite (numerable) set of cells spatially aligned on a straight line and adopting a color code that transforms the DNA bases (A, C, T, G) in numbers, we analyzed DNA strings in the approach of computational mechanics. In this approach, a regular domain is a space-time region consisting of sequences in the same regular language (the particular rule of system evolution, which gives rise to a formal language) that creates patterns computationally homogeneous and simple to describe. We discovered that regular domain exists. Results revealed the exact number of strings of given lengths, establishing their limit in length, their precise localizations in all the human chromosomes and their complex numerical organization. Furthermore, the distribution of these domains is not at random, nor chaotic neither probabilistic, but there are numeric attractors around which the number of these domains are distributed. This leads us to think that all these domains within the DNA are connected to each other and cannot be casually distributed, but they follow some combinatorics rules.

Discovery of regular domains in large DNA data sets

Bertacchini, Francesca;Bilotta, Eleonora;Pantano, Pietro
2017-01-01

Abstract

To analyze1large DNA data sets, we hypothesized that the organization of repeated bases within DNA follows rules similar to Cellular Automata (CA). These sequences could be defined as regular domains. By considering DNA strings as a finite onedimensional cell automated, consisting of a finite (numerable) set of cells spatially aligned on a straight line and adopting a color code that transforms the DNA bases (A, C, T, G) in numbers, we analyzed DNA strings in the approach of computational mechanics. In this approach, a regular domain is a space-time region consisting of sequences in the same regular language (the particular rule of system evolution, which gives rise to a formal language) that creates patterns computationally homogeneous and simple to describe. We discovered that regular domain exists. Results revealed the exact number of strings of given lengths, establishing their limit in length, their precise localizations in all the human chromosomes and their complex numerical organization. Furthermore, the distribution of these domains is not at random, nor chaotic neither probabilistic, but there are numeric attractors around which the number of these domains are distributed. This leads us to think that all these domains within the DNA are connected to each other and cannot be casually distributed, but they follow some combinatorics rules.
2017
9781450347228
Cellular Automata; DNA automatic segmentation; DNA regular domains; Software; Biomedical Engineering; Health Informatics; Computer Science Applications1707 Computer Vision and Pattern Recognition
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/267675
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact