Modeling read counts for CNV detection in exome sequencing data.

Stat Appl Genet Mol Biol 2011 Nov 8;10(1). Epub 2011 Nov 8.

Max Planck Institute for Molecular Genetics.

Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.

Download full-text PDF

Source
http://dx.doi.org/10.2202/1544-6115.1732DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3517018PMC
November 2011
22 Reads

Publication Analysis

Top Keywords

exome sequencing
12
sequencing data
8
read depth
8
control set
8
cnvs
6
sequencing
5
background read
4
depth control
4
set well
4
positional covariates
4
exomecopy applied
4
applied large
4
large chromosome
4
model exomecopy
4
gc-content model
4
data background
4
covariates gc-content
4
well positional
4
cnvs raw
4
difficulty problem
4

Similar Publications