To analyze1large DNA data sets, we hypothesized that the organization of repeated bases within DNA follows rules similar to Cellular Automata (CA). These sequences could be defined as regular domains. By considering DNA strings as a finite onedimensional cell automated, consisting of a finite (numerable) set of cells spatially aligned on a straight line and adopting a color code that transforms the DNA bases (A, C, T, G) in numbers, we analyzed DNA strings in the approach of computational mechanics. In this approach, a regular domain is a space-time region consisting of sequences in the same regular language (the particular rule of system evolution, which gives rise to a formal language) that creates patterns computationally homogeneous and simple to describe. We discovered that regular domain exists. Results revealed the exact number of strings of given lengths, establishing their limit in length, their precise localizations in all the human chromosomes and their complex numerical organization. Furthermore, the distribution of these domains is not at random, nor chaotic neither probabilistic, but there are numeric attractors around which the number of these domains are distributed. This leads us to think that all these domains within the DNA are connected to each other and cannot be casually distributed, but they follow some combinatorics rules.
Discovery of regular domains in large DNA data sets
Bertacchini, Francesca;Bilotta, Eleonora;Pantano, Pietro
2017-01-01
Abstract
To analyze1large DNA data sets, we hypothesized that the organization of repeated bases within DNA follows rules similar to Cellular Automata (CA). These sequences could be defined as regular domains. By considering DNA strings as a finite onedimensional cell automated, consisting of a finite (numerable) set of cells spatially aligned on a straight line and adopting a color code that transforms the DNA bases (A, C, T, G) in numbers, we analyzed DNA strings in the approach of computational mechanics. In this approach, a regular domain is a space-time region consisting of sequences in the same regular language (the particular rule of system evolution, which gives rise to a formal language) that creates patterns computationally homogeneous and simple to describe. We discovered that regular domain exists. Results revealed the exact number of strings of given lengths, establishing their limit in length, their precise localizations in all the human chromosomes and their complex numerical organization. Furthermore, the distribution of these domains is not at random, nor chaotic neither probabilistic, but there are numeric attractors around which the number of these domains are distributed. This leads us to think that all these domains within the DNA are connected to each other and cannot be casually distributed, but they follow some combinatorics rules.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.