Resolving the complexity of the human genome using single-molecule sequencing.

Nature 2015 Jan 10;517(7536):608-11. Epub 2014 Nov 10.

1] Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA [2] Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA.

The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome--78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.

Download full-text PDF

Source
https://www.nature.com/articles/nature13907.pdf
Web Search
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4317254/
Web Search
http://www.nature.com/articles/nature13907
Publisher Site
http://dx.doi.org/10.1038/nature13907DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4317254PMC
January 2015
39 Reads

Publication Analysis

Top Keywords

human genome
16
tandem repeats
12
complex insertions
8
short tandem
8
complexity human
8
insertions long
8
human
6
complete sequence
4
regions resolve
4
genomic regions
4
sequence 26079
4
resolve complete
4
structural variants
4
level including
4
including inversions
4
inversions complex
4
base-pair level
4
variants base-pair
4
euchromatic structural
4
g+c-rich genomic
4

References

(Supplied by CrossRef)

H Kurahashi et al.
Genome Res. 2007

G Genovese et al.
Nature Genet. 2013

D Bovee et al.
Nature Genet. 2008

RE Mills et al.
Nature 2011

JM Kidd et al.
Cell 2010

EE Eichler et al.
Nature Rev. Genet. 2004

J Eid et al.
Science 2009

MJ Chaisson et al.
BMC Bioinformatics 2012

Similar Publications