The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.

Gigascience 2016 10 11;5(1):42. Epub 2016 Oct 11.

Complete Genomics, Inc., 2071 Stierlin Ct., Mountain View, CA, 94043, USA.

Background: Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information.

Findings: As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data.

Conclusions: These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13742-016-0148-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5057367PMC
October 2016
63 Reads

Publication Analysis

Top Keywords

genomes sequenced
12
human genome
12
genome project
8
read coverage
8
complete genomics'
8
genome
7
genomes
7
data
5
data release
4
freely gigadb
4
dataset freely
4
release process
4
currently 114
4
114 genomes
4
genomes dataset
4
gigadb repository
4
pgp data
4
phenotypic data
4
data remaining
4
remaining future
4

References

(Supplied by CrossRef)

EC Hayden et al.
Nature 2014

BA Peters et al.
Nature 2012

FB Dean et al.
Proc Natl Acad Sci U S A 2002

R Drmanac et al.
Science 2010

BA Peters et al.
Front Genet 2014

P Carnevali et al.
J Comput Biol 2012

A Auton et al.
Nature 2015

Genome of the Netherlands C et al.
Nat Genet 2014

K Walter et al.
Nature 2015

Similar Publications