J Theor Biol 2011 Apr 2;275(1):52-8. Epub 2011 Feb 2.
Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal.
DNA may be represented by sequences of four symbols, but it is often useful to convert those symbols into real or complex numbers for further analysis. Several mapping schemes have been used in the past, but most of them seem to be unrelated to any intrinsic characteristic of DNA. The objective of this work was to study a mapping scheme that is directly related to DNA characteristics, and that could be useful in discriminating between different species. Recently, we have proposed a methodology based on the inter-nucleotide distance, which proved to contribute to the discrimination among species. In this paper, we introduce a new distance, the distance to the nearest dissimilar nucleotide, which is the distance of a nucleotide to first occurrence of a different nucleotide. This distance is related to the repetition structure of single nucleotides. Using the information resulting from the concatenation of the distance to the nearest dissimilar and the inter-nucleotide distance, we found that this new distance brings additional discriminative capabilities. This suggests that the distance to the nearest dissimilar nucleotide might contribute with useful information about the evolution of the species.