Genome analysis with inter-nucleotide distances.

Bioinformatics 2009 Dec 16;25(23):3064-70. Epub 2009 Sep 16.

Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal.

Motivation: DNA sequences can be represented by sequences of four symbols, but it is often useful to convert the symbols into real or complex numbers for further analysis. Several mapping schemes have been used in the past, but they seem unrelated to any intrinsic characteristic of DNA. The objective of this work was to find a mapping scheme directly related to DNA characteristics and that would be useful in discriminating between different species. Mathematical models to explore DNA correlation structures may contribute to a better knowledge of the DNA and to find a concise DNA description.

Results: We developed a methodology to process DNA sequences based on inter-nucleotide distances. Our main contribution is a method to obtain genomic signatures for complete genomes, based on the inter-nucleotide distances, that are able to discriminate between different species. Using these signatures and hierarchical clustering, it is possible to build phylogenetic trees. Phylogenetic trees lead to genome differentiation and allow the inference of phylogenetic relations. The phylogenetic trees generated in this work display related species close to each other, suggesting that the inter-nucleotide distances are able to capture essential information about the genomes. To create the genomic signature, we construct a vector which describes the inter-nucleotide distance distribution of a complete genome and compare it with the reference distance distribution, which is the distribution of a sequence where the nucleotides are placed randomly and independently. It is the residual or relative error between the data and the reference distribution that is used to compare the DNA sequences of different organisms.

Download full-text PDF

Source
http://bioinformatics.oxfordjournals.org/content/25/23/3064.
Web Search
http://bioinformatics.oxfordjournals.org/cgi/doi/10.1093/bio
Publisher Site
http://dx.doi.org/10.1093/bioinformatics/btp546DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778338PMC
December 2009
18 Reads

Publication Analysis

Top Keywords

inter-nucleotide distances
16
phylogenetic trees
12
dna sequences
12
distance distribution
8
dna
8
based inter-nucleotide
8
inter-nucleotide
5
developed methodology
4
genomes create
4
descriptionresults developed
4
dna descriptionresults
4
create genomic
4
essential genomes
4
capture essential
4
distances capture
4
distances main
4
sequences based
4
genomic signature
4
process dna
4
methodology process
4

References

(Supplied by CrossRef)

DIGIT SIGNAL PROCESS 2004

PHYS REV E 2004

2008

IEEE SIGNAL PROCESS MAG 2001

PROCEEDINGS OF IEEE ICASSP 2005

PHYS REV E 1995

Ciccarelli et al.
Science 2006

SIGNAL PROCESS 2003

Hodge et al.
Journal of Cell Science 2000

Jeffrey et al.
Nucleic Acids Research 1990

Chemical Physics Letters 2005

Similar Publications