R-Gada: a fast and flexible pipeline for copy number analysis in association studies.

BMC Bioinformatics 2010 Jul 16;11:380. Epub 2010 Jul 16.

Signal and Image Processing Institute, Viterbi School of Engineering, University of Southern California, EEB 400, 3740 McClintock Ave, Los Angeles, CA 90089-2564, USA.

Background: Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clear association.

Results: Here we present a new R package, that integrates: (i) data import from most common formats of Affymetrix, Illumina and aCGH arrays; (ii) a fast and accurate segmentation algorithm to call CNVs based on Genome Alteration Detection Analysis (GADA); and (iii) functions for displaying and exporting the Copy Number calls, identification of recurrent CNVs, multivariate analysis of population structure, and tools for performing association studies. Using a large dataset containing 270 HapMap individuals (Affymetrix Human SNP Array 6.0 Sample Dataset) we demonstrate a flexible pipeline implemented with the package. It requires less than one minute per sample (3 million probe arrays) on a single core computer, and provides a flexible parallelization for very large datasets. Case-control data were generated from the HapMap dataset to demonstrate a GWAS analysis.

Conclusions: The package provides the tools for creating a complete integrated pipeline from data normalization to statistical association. It can efficiently handle a massive volume of data consisting of millions of genetic markers and hundreds or thousands of samples with very accurate results.

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-11-380DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2915992PMC
July 2010
9 Reads

Publication Analysis

Top Keywords

copy number
12
association studies
12
flexible pipeline
8
dataset demonstrate
8
functions displaying
4
displaying exporting
4
exporting copy
4
creating complete
4
iii functions
4
analysis gada
4
complete integrated
4
gada iii
4
tools creating
4
number calls
4
analysisconclusions package
4
cnvs multivariate
4
multivariate analysis
4
recurrent cnvs
4
identification recurrent
4
calls identification
4

Altmetric Statistics

References

(Supplied by CrossRef)

AJ Iafrate et al.
Nat Genet 2004

R Redon et al.
Nature 2006

JM Kidd et al.
Nature 2008

S McCarroll et al.
Nat Genet 2008

DF Conrad et al.
Nature 2010

TA Manolio et al.
Nature 2009

G Perry et al.
Am J Hum Genet 2008

R Pique-Regi et al.
Bioinformatics 2008

AB Olshen et al.
Biostatistics 2004

L Winchester et al.
Brief Funct Genomic Proteomic 2009

ME Tipping et al.
J Mach Learn Res 2001

Similar Publications