DNA word analysis based on the distribution of the distances between symmetric words.

Sci Rep 2017 04 7;7(1):728. Epub 2017 Apr 7.

Department of Mathematics & CIDMA, University of Aveiro, Aveiro, Portugal.

We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected.

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-017-00646-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5428789PMC
April 2017
5 Reads

Publication Analysis

Top Keywords

symmetric word
8
pairs symmetric
8
distances symmetric
8
word pairs
8
short distances
8
distances
6
symmetric
5
analysed complete
4
genome analysed
4
masked reported
4
human genome
4
complete genome
4
genome well
4
sequences masked
4
version repetitive
4
focused human
4
well version
4
repetitive sequences
4
hairpin/cruciform structures
4
occurrence non-standard
4

References

(Supplied by CrossRef)

DR Forsdyke et al.
Gene 2000

B Powdel et al.
DNA Research 2009

V Afreixo et al.
Biostatistics 2015

H Zhang et al.
Frontiers in Microbiology 2013

V Brázda et al.
BMC Molecular Biology 2011

J Kolb et al.
Chromosome Research 2009

H Inagaki et al.
Frontiers in Genetics 2016

M Hackenberg et al.
BMC Bioinformatics 2006

V Afreixo et al.
Bioinformatics 2009

Similar Publications