Publications by authors named "Stephanie M Gogarten"

44 Publications

Association of clonal hematopoiesis with chronic obstructive pulmonary disease.

Blood 2022 Jan;139(3):357-368

Center for Public Health Genomics and.

Chronic obstructive pulmonary disease (COPD) is associated with age and smoking, but other determinants of the disease are incompletely understood. Clonal hematopoiesis of indeterminate potential (CHIP) is a common, age-related state in which somatic mutations in clonal blood populations induce aberrant inflammatory responses. Patients with CHIP have an elevated risk for cardiovascular disease, but the association of CHIP with COPD remains unclear. We analyzed whole-genome sequencing and whole-exome sequencing data to detect CHIP in 48 835 patients, of whom 8444 had moderate to very severe COPD, from four separate cohorts with COPD phenotyping and smoking history. We measured emphysema in murine models in which Tet2 was deleted in hematopoietic cells. In the COPDGene cohort, individuals with CHIP had risks of moderate-to-severe, severe, or very severe COPD that were 1.6 (adjusted 95% confidence interval [CI], 1.1-2.2) and 2.2 (adjusted 95% CI, 1.5-3.2) times greater than those for noncarriers. These findings were consistently observed in three additional cohorts and meta-analyses of all patients. CHIP was also associated with decreased FEV1% predicted in the COPDGene cohort (mean between-group differences, -5.7%; adjusted 95% CI, -8.8% to -2.6%), a finding replicated in additional cohorts. Smoke exposure was associated with a small but significant increased risk of having CHIP (odds ratio, 1.03 per 10 pack-years; 95% CI, 1.01-1.05 per 10 pack-years) in the meta-analysis of all patients. Inactivation of Tet2 in mouse hematopoietic cells exacerbated the development of emphysema and inflammation in models of cigarette smoke exposure. Somatic mutations in blood cells are associated with the development and severity of COPD, independent of age and cumulative smoke exposure.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1182/blood.2021013531DOI Listing
January 2022

BinomiRare: A robust test for association of a rare genetic variant with a binary outcome for mixed models and any case-control proportion.

HGG Adv 2021 Jul 12;2(3). Epub 2021 Jun 12.

Framingham Heart Study, Framingham, MA, USA.

Whole-genome sequencing (WGS) and whole-exome sequencing studies have become increasingly available and are being used to identify rare genetic variants associated with health and disease outcomes. Investigators routinely use mixed models to account for genetic relatedness or other clustering variables (e.g., family or household) when testing genetic associations. However, no existing tests of the association of a rare variant with a binary outcome in the presence of correlated data control the type 1 error where there are (1) few individuals harboring the rare allele, (2) a small proportion of cases relative to controls, and (3) covariates to adjust for. Here, we address all three issues in developing a framework for testing rare variant association with a binary trait in individuals harboring at least one risk allele. In this framework, we estimate outcome probabilities under the null hypothesis and then use them, within the individuals with at least one risk allele, to test variant associations. We extend the BinomiRare test, which was previously proposed for independent observations, and develop the Conway-Maxwell-Poisson (CMP) test and study their properties in simulations. We show that the BinomiRare test always controls the type 1 error, while the CMP test sometimes does not. We then use the BinomiRare test to test the association of rare genetic variants in target genes with small-vessel disease (SVD) stroke, short sleep, and venous thromboembolism (VTE), in whole-genome sequence data from the Trans-Omics for Precision Medicine (TOPMed) program.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.xhgg.2021.100040DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321319PMC
July 2021

Identification of novel and rare variants associated with handgrip strength using whole genome sequence data from the NHLBI Trans-Omics in Precision Medicine (TOPMed) Program.

PLoS One 2021 2;16(7):e0253611. Epub 2021 Jul 2.

Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States of America.

Handgrip strength is a widely used measure of muscle strength and a predictor of a range of morbidities including cardiovascular diseases and all-cause mortality. Previous genome-wide association studies of handgrip strength have focused on common variants primarily in persons of European descent. We aimed to identify rare and ancestry-specific genetic variants associated with handgrip strength by conducting whole-genome sequence association analyses using 13,552 participants from six studies representing diverse population groups from the Trans-Omics in Precision Medicine (TOPMed) Program. By leveraging multiple handgrip strength measures performed in study participants over time, we increased our effective sample size by 7-12%. Single-variant analyses identified ten handgrip strength loci among African-Americans: four rare variants, five low-frequency variants, and one common variant. One significant and four suggestive genes were identified associated with handgrip strength when aggregating rare and functional variants; all associations were ancestry-specific. We additionally leveraged the different ancestries available in the UK Biobank to further explore the ancestry-specific association signals from the single-variant association analyses. In conclusion, our study identified 11 new loci associated with handgrip strength with rare and/or ancestry-specific genetic variations, highlighting the added value of whole-genome sequencing in diverse samples. Several of the associations identified using single-variant or aggregate analyses lie in genes with a function relevant to the brain or muscle or were reported to be associated with muscle or age-related traits. Further studies in samples with sequence data and diverse ancestries are needed to confirm these findings.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0253611PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8253404PMC
November 2021

Genome-wide association study of body fat distribution traits in Hispanics/Latinos from the HCHS/SOL.

Hum Mol Genet 2021 11;30(22):2190-2204

Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA 91101, USA.

Central obesity is a leading health concern with a great burden carried by ethnic minority populations, especially Hispanics/Latinos. Genetic factors contribute to the obesity burden overall and to inter-population differences. We aimed to identify the loci associated with central adiposity measured as waist-to-hip ratio (WHR), waist circumference (WC) and hip circumference (HIP) adjusted for body mass index (adjBMI) by using the Hispanic Community Health Study/Study of Latinos (HCHS/SOL); determine if differences in associations differ by background group within HCHS/SOL and determine whether previously reported associations generalize to HCHS/SOL. Our analyses included 7472 women and 5200 men of mainland (Mexican, Central and South American) and Caribbean (Puerto Rican, Cuban and Dominican) background residing in the USA. We performed genome-wide association analyses stratified and combined across sexes using linear mixed-model regression. We identified 16 variants for waist-to-hip ratio adjusted for body mass index (WHRadjBMI), 22 for waist circumference adjusted for body mass index (WCadjBMI) and 28 for hip circumference adjusted for body mass index (HIPadjBMI), which reached suggestive significance (P < 1 × 10-6). Many loci exhibited differences in strength of associations by ethnic background and sex. We brought a total of 66 variants forward for validation in cohorts (N = 34 161) with participants of Hispanic/Latino, African and European descent. We confirmed four novel loci (P < 0.05 and consistent direction of effect, and P < 5 × 10-8 after meta-analysis), including two for WHRadjBMI (rs13301996, rs79478137); one for WCadjBMI (rs3168072) and one for HIPadjBMI (rs28692724). Also, we generalized previously reported associations to HCHS/SOL, (8 for WHRadjBMI, 10 for WCadjBMI and 12 for HIPadjBMI). Our study highlights the importance of large-scale genomic studies in ancestrally diverse Hispanic/Latino populations for identifying and characterizing central obesity susceptibility that may be ancestry-specific.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddab166DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8561424PMC
November 2021

Variant-specific inflation factors for assessing population stratification at the phenotypic variance level.

Nat Commun 2021 06 9;12(1):3506. Epub 2021 Jun 9.

Department of Biostatistics, University of Washington, Seattle, WA, USA.

In modern Whole Genome Sequencing (WGS) epidemiological studies, participant-level data from multiple studies are often pooled and results are obtained from a single analysis. We consider the impact of differential phenotype variances by study, which we term 'variance stratification'. Unaccounted for, variance stratification can lead to both decreased statistical power, and increased false positives rates, depending on how allele frequencies, sample sizes, and phenotypic variances vary across the studies that are pooled. We develop a procedure to compute variant-specific inflation factors, and show how it can be used for diagnosis of genetic association analyses on pooled individual level data from multiple studies. We describe a WGS-appropriate analysis approach, implemented in freely-available software, which allows study-specific variances and thereby improves performance in practice. We illustrate the variance stratification problem, its solutions, and the proposed diagnostic procedure, in simulations and in data from the Trans-Omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), used in association tests for hemoglobin concentrations and BMI.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-23655-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8190158PMC
June 2021

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.

Nature 2021 02 10;590(7845):290-299. Epub 2021 Feb 10.

The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes). In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03205-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875770PMC
February 2021

Inherited causes of clonal haematopoiesis in 97,691 whole genomes.

Nature 2020 10 14;586(7831):763-768. Epub 2020 Oct 14.

Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA.

Age is the dominant risk factor for most chronic human diseases, but the mechanisms through which ageing confers this risk are largely unknown. The age-related acquisition of somatic mutations that lead to clonal expansion in regenerating haematopoietic stem cell populations has recently been associated with both haematological cancer and coronary heart disease-this phenomenon is termed clonal haematopoiesis of indeterminate potential (CHIP). Simultaneous analyses of germline and somatic whole-genome sequences provide the opportunity to identify root causes of CHIP. Here we analyse high-coverage whole-genome sequences from 97,691 participants of diverse ancestries in the National Heart, Lung, and Blood Institute Trans-omics for Precision Medicine (TOPMed) programme, and identify 4,229 individuals with CHIP. We identify associations with blood cell, lipid and inflammatory traits that are specific to different CHIP driver genes. Association of a genome-wide set of germline genetic variants enabled the identification of three genetic loci associated with CHIP status, including one locus at TET2 that was specific to individuals of African ancestry. In silico-informed in vitro evaluation of the TET2 germline locus enabled the identification of a causal variant that disrupts a TET2 distal enhancer, resulting in increased self-renewal of haematopoietic stem cells. Overall, we observe that germline genetic variation shapes haematopoietic stem cell function, leading to CHIP through mechanisms that are specific to clonal haematopoiesis as well as shared mechanisms that lead to somatic mutations across tissues.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2819-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7944936PMC
October 2020

Multi-ancestry GWAS of the electrocardiographic PR interval identifies 202 loci underlying cardiac conduction.

Nat Commun 2020 05 21;11(1):2542. Epub 2020 May 21.

Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.

The electrocardiographic PR interval reflects atrioventricular conduction, and is associated with conduction abnormalities, pacemaker implantation, atrial fibrillation (AF), and cardiovascular mortality. Here we report a multi-ancestry (N = 293,051) genome-wide association meta-analysis for the PR interval, discovering 202 loci of which 141 have not previously been reported. Variants at identified loci increase the percentage of heritability explained, from 33.5% to 62.6%. We observe enrichment for cardiac muscle developmental/contractile and cytoskeletal genes, highlighting key regulation processes for atrioventricular conduction. Additionally, 8 loci not previously reported harbor genes underlying inherited arrhythmic syndromes and/or cardiomyopathies suggesting a role for these genes in cardiovascular pathology in the general population. We show that polygenic predisposition to PR interval duration is an endophenotype for cardiovascular disease, including distal conduction disease, AF, and atrioventricular pre-excitation. These findings advance our understanding of the polygenic basis of cardiac conduction, and the genetic relationship between PR interval duration and cardiovascular disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-15706-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7242331PMC
May 2020

Allelic Heterogeneity at the CRP Locus Identified by Whole-Genome Sequencing in Multi-ancestry Cohorts.

Am J Hum Genet 2020 01 26;106(1):112-120. Epub 2019 Dec 26.

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.

Whole-genome sequencing (WGS) can improve assessment of low-frequency and rare variants, particularly in non-European populations that have been underrepresented in existing genomic studies. The genetic determinants of C-reactive protein (CRP), a biomarker of chronic inflammation, have been extensively studied, with existing genome-wide association studies (GWASs) conducted in >200,000 individuals of European ancestry. In order to discover novel loci associated with CRP levels, we examined a multi-ancestry population (n = 23,279) with WGS (∼38× coverage) from the Trans-Omics for Precision Medicine (TOPMed) program. We found evidence for eight distinct associations at the CRP locus, including two variants that have not been identified previously (rs11265259 and rs181704186), both of which are non-coding and more common in individuals of African ancestry (∼10% and ∼1% minor allele frequency, respectively, and rare or monomorphic in 1000 Genomes populations of East Asian, South Asian, and European ancestry). We show that the minor (G) allele of rs181704186 is associated with lower CRP levels and decreased transcriptional activity and protein binding in vitro, providing a plausible molecular mechanism for this African ancestry-specific signal. The individuals homozygous for rs181704186-G have a mean CRP level of 0.23 mg/L, in contrast to individuals heterozygous for rs181704186 with mean CRP of 2.97 mg/L and major allele homozygotes with mean CRP of 4.11 mg/L. This study demonstrates the utility of WGS in multi-ethnic populations to drive discovery of complex trait associations of large effect and to identify functional alleles in noncoding regulatory regions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2019.12.002DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7042494PMC
January 2020

Genetic association testing using the GENESIS R/Bioconductor package.

Bioinformatics 2019 12;35(24):5346-5348

Department of Biostatistics, University of Washington, Seattle, WA, USA.

Summary: The Genomic Data Storage (GDS) format provides efficient storage and retrieval of genotypes measured by microarrays and sequencing. We developed GENESIS to perform various single- and aggregate-variant association tests using genotype data stored in GDS format. GENESIS implements highly flexible mixed models, allowing for different link functions, multiple variance components and phenotypic heteroskedasticity. GENESIS integrates cohesively with other R/Bioconductor packages to build a complete genomic analysis workflow entirely within the R environment.

Availability And Implementation: https://bioconductor.org/packages/GENESIS; vignettes included.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz567DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7904076PMC
December 2019

A fully adjusted two-stage procedure for rank-normalization in genetic association studies.

Genet Epidemiol 2019 04 17;43(3):263-275. Epub 2019 Jan 17.

Department of Biostatistics, University of Washington, Seattle, Washington.

When testing genotype-phenotype associations using linear regression, departure of the trait distribution from normality can impact both Type I error rate control and statistical power, with worse consequences for rarer variants. Because genotypes are expected to have small effects (if any) investigators now routinely use a two-stage method, in which they first regress the trait on covariates, obtain residuals, rank-normalize them, and then use the rank-normalized residuals in association analysis with the genotypes. Potential confounding signals are assumed to be removed at the first stage, so in practice, no further adjustment is done in the second stage. Here, we show that this widely used approach can lead to tests with undesirable statistical properties, due to both combination of a mis-specified mean-variance relationship and remaining covariate associations between the rank-normalized residuals and genotypes. We demonstrate these properties theoretically, and also in applications to genome-wide and whole-genome sequencing association studies. We further propose and evaluate an alternative fully adjusted two-stage approach that adjusts for covariates both when residuals are obtained and in the subsequent association test. This method can reduce excess Type I errors and improve statistical power.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.22188DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6416071PMC
April 2019

Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies.

Am J Hum Genet 2019 02 10;104(2):260-274. Epub 2019 Jan 10.

Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.

With advances in whole-genome sequencing (WGS) technology, more advanced statistical methods for testing genetic association with rare variants are being developed. Methods in which variants are grouped for analysis are also known as variant-set, gene-based, and aggregate unit tests. The burden test and sequence kernel association test (SKAT) are two widely used variant-set tests, which were originally developed for samples of unrelated individuals and later have been extended to family data with known pedigree structures. However, computationally efficient and powerful variant-set tests are needed to make analyses tractable in large-scale WGS studies with complex study samples. In this paper, we propose the variant-set mixed model association tests (SMMAT) for continuous and binary traits using the generalized linear mixed model framework. These tests can be applied to large-scale WGS studies involving samples with population structure and relatedness, such as in the National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program. SMMATs share the same null model for different variant sets, and a virtue of this null model, which includes covariates only, is that it needs to be fit only once for all tests in each genome-wide analysis. Simulation studies show that all the proposed SMMATs correctly control type I error rates for both continuous and binary traits in the presence of population structure and relatedness. We also illustrate our tests in a real data example of analysis of plasma fibrinogen levels in the TOPMed program (n = 23,763), using the Analysis Commons, a cloud-based computing platform.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2018.12.012DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6372261PMC
February 2019

Association Between Titin Loss-of-Function Variants and Early-Onset Atrial Fibrillation.

JAMA 2018 12;320(22):2354-2364

Department of Molecular and Functional Genomics, Geisinger, Danville, Pennsylvania.

Importance: Atrial fibrillation (AF) is the most common arrhythmia affecting 1% of the population. Young individuals with AF have a strong genetic association with the disease, but the mechanisms remain incompletely understood.

Objective: To perform large-scale whole-genome sequencing to identify genetic variants related to AF.

Design, Setting, And Participants: The National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine Program includes longitudinal and cohort studies that underwent high-depth whole-genome sequencing between 2014 and 2017 in 18 526 individuals from the United States, Mexico, Puerto Rico, Costa Rica, Barbados, and Samoa. This case-control study included 2781 patients with early-onset AF from 9 studies and identified 4959 controls of European ancestry from the remaining participants. Results were replicated in the UK Biobank (346 546 participants) and the MyCode Study (42 782 participants).

Exposures: Loss-of-function (LOF) variants in genes at AF loci and common genetic variation across the whole genome.

Main Outcomes And Measures: Early-onset AF (defined as AF onset in persons <66 years of age). Due to multiple testing, the significance threshold for the rare variant analysis was P = 4.55 × 10-3.

Results: Among 2781 participants with early-onset AF (the case group), 72.1% were men, and the mean (SD) age of AF onset was 48.7 (10.2) years. Participants underwent whole-genome sequencing at a mean depth of 37.8 fold and mean genome coverage of 99.1%. At least 1 LOF variant in TTN, the gene encoding the sarcomeric protein titin, was present in 2.1% of case participants compared with 1.1% in control participants (odds ratio [OR], 1.76 [95% CI, 1.04-2.97]). The proportion of individuals with early-onset AF who carried a LOF variant in TTN increased with an earlier age of AF onset (P value for trend, 4.92 × 10-4), and 6.5% of individuals with AF onset prior to age 30 carried a TTN LOF variant (OR, 5.94 [95% CI, 2.64-13.35]; P = 1.65 × 10-5). The association between TTN LOF variants and AF was replicated in an independent study of 1582 patients with early-onset AF (cases) and 41 200 control participants (OR, 2.16 [95% CI, 1.19-3.92]; P = .01).

Conclusions And Relevance: In a case-control study, there was a statistically significant association between an LOF variant in the TTN gene and early-onset AF, with the variant present in a small percentage of participants with early-onset AF (the case group). Further research is necessary to understand whether this is a causal relationship.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1001/jama.2018.18179DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6436530PMC
December 2018

Novel Common Genetic Susceptibility Loci for Colorectal Cancer.

J Natl Cancer Inst 2019 02;111(2):146-157

Division of Research, Kaiser Permanente Medical Care Program of Northern California, Oakland, CA.

Background: Previous genome-wide association studies (GWAS) have identified 42 loci (P < 5 × 10-8) associated with risk of colorectal cancer (CRC). Expanded consortium efforts facilitating the discovery of additional susceptibility loci may capture unexplained familial risk.

Methods: We conducted a GWAS in European descent CRC cases and control subjects using a discovery-replication design, followed by examination of novel findings in a multiethnic sample (cumulative n = 163 315). In the discovery stage (36 948 case subjects/30 864 control subjects), we identified genetic variants with a minor allele frequency of 1% or greater associated with risk of CRC using logistic regression followed by a fixed-effects inverse variance weighted meta-analysis. All novel independent variants reaching genome-wide statistical significance (two-sided P < 5 × 10-8) were tested for replication in separate European ancestry samples (12 952 case subjects/48 383 control subjects). Next, we examined the generalizability of discovered variants in East Asians, African Americans, and Hispanics (12 085 case subjects/22 083 control subjects). Finally, we examined the contributions of novel risk variants to familial relative risk and examined the prediction capabilities of a polygenic risk score. All statistical tests were two-sided.

Results: The discovery GWAS identified 11 variants associated with CRC at P < 5 × 10-8, of which nine (at 4q22.2/5p15.33/5p13.1/6p21.31/6p12.1/10q11.23/12q24.21/16q24.1/20q13.13) independently replicated at a P value of less than .05. Multiethnic follow-up supported the generalizability of discovery findings. These results demonstrated a 14.7% increase in familial relative risk explained by common risk alleles from 10.3% (95% confidence interval [CI] = 7.9% to 13.7%; known variants) to 11.9% (95% CI = 9.2% to 15.5%; known and novel variants). A polygenic risk score identified 4.3% of the population at an odds ratio for developing CRC of at least 2.0.

Conclusions: This study provides insight into the architecture of common genetic variation contributing to CRC etiology and improves risk prediction for individualized screening.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jnci/djy099DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6555904PMC
February 2019

Multi-Omics Analysis Reveals a HIF Network and Hub Gene EPAS1 Associated with Lung Adenocarcinoma.

EBioMedicine 2018 Jun 31;32:93-101. Epub 2018 May 31.

Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA. Electronic address:

Recent technological advancements have permitted high-throughput measurement of the human genome, epigenome, metabolome, transcriptome, and proteome at the population level. We hypothesized that subsets of genes identified from omic studies might have closely related biological functions and thus might interact directly at the network level. Therefore, we conducted an integrative analysis of multi-omic datasets of non-small cell lung cancer (NSCLC) to search for association patterns beyond the genome and transcriptome. A large, complex, and robust gene network containing well-known lung cancer-related genes, including EGFR and TERT, was identified from combined gene lists for lung adenocarcinoma. Members of the hypoxia-inducible factor (HIF) gene family were at the center of this network. Subsequent sequencing of network hub genes within a subset of samples from the Transdisciplinary Research in Cancer of the Lung-International Lung Cancer Consortium (TRICL-ILCCO) consortium revealed a SNP (rs12614710) in EPAS1 associated with NSCLC that reached genome-wide significance (OR = 1.50; 95% CI: 1.31-1.72; p = 7.75 × 10). Using imputed data, we found that this SNP remained significant in the entire TRICL-ILCCO consortium (p = .03). Additional functional studies are warranted to better understand interrelationships among genetic polymorphisms, DNA methylation status, and EPAS1 expression.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ebiom.2018.05.024DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6021270PMC
June 2018

Genome-wide association study and meta-analysis identify loci associated with ventricular and supraventricular ectopy.

Sci Rep 2018 04 4;8(1):5675. Epub 2018 Apr 4.

Division of Cardiology, Department of Medicine, University of Washington, Seattle, WA, USA.

The genetic basis of supraventricular and ventricular ectopy (SVE, VE) remains largely uncharacterized, despite established genetic mechanisms of arrhythmogenesis. To identify novel genetic variants associated with SVE/VE in ancestrally diverse human populations, we conducted a genome-wide association study of electrocardiographically identified SVE and VE in five cohorts including approximately 43,000 participants of African, European and Hispanic/Latino ancestry. In thirteen ancestry-stratified subgroups, we tested multivariable-adjusted associations of SVE and VE with single nucleotide polymorphism (SNP) dosage. We combined subgroup-specific association estimates in inverse variance-weighted, fixed-effects and Bayesian meta-analyses. We also combined fixed-effects meta-analytic t-test statistics for SVE and VE in multi-trait SNP association analyses. No loci reached genome-wide significance in trans-ethnic meta-analyses. However, we found genome-wide significant SNPs intronic to an apoptosis-enhancing gene previously associated with QRS interval duration (FAF1; lead SNP rs7545860; effect allele frequency = 0.02; P = 2.0 × 10) in multi-trait analysis among European ancestry participants and near a locus encoding calcium-dependent glycoproteins (DSC3; lead SNP rs8086068; effect allele frequency = 0.17) in meta-analysis of SVE (P = 4.0 × 10) and multi-trait analysis (P = 2.9 × 10) among African ancestry participants. The novel findings suggest several mechanisms by which genetic variation may predispose to ectopy in humans and highlight the potential value of leveraging pleiotropy in future studies of ectopy-related phenotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-018-23843-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5884864PMC
April 2018

Genome-wide association study of depressive symptoms in the Hispanic Community Health Study/Study of Latinos.

J Psychiatr Res 2018 04 16;99:167-176. Epub 2017 Dec 16.

Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, United States; Department of Psychiatry, Harvard Medical School, Boston, MA, United States; Stanley Center for Psychiatric Research, The Broad Institute of Harvard and MIT, Cambridge, MA, United States.

Although genome-wide association studies (GWAS) have identified several variants linked to depression, few GWAS of non-European populations have been performed. We conducted a genome-wide analysis of depression in a large, population-based sample of Hispanics/Latinos. Data came from 12,310 adults in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Past-week depressive symptoms were assessed using the 10-item Center for Epidemiological Studies of Depression Scale. Three phenotypes were examined: a total depression score, a total score modified to account for psychiatric medication use, and a score excluding anti-depressant medication users. We estimated heritability due to common variants (h), and performed a GWAS of the three phenotypes. Replication was attempted in three independent Hispanic/Latino cohorts. We also performed sex-stratified analyses, analyzed a binary trait indicating probable depression, and conducted three trans-ethnic analyses. The three phenotypes exhibited significant heritability (h = 6.3-6.9%; p = .002) in the total sample. No SNPs were genome-wide significant in analyses of the three phenotypes or the binary indicator of probable depression. In sex-stratified analyses, seven genome-wide significant SNPs (one in females; six in males) were identified, though none were supported through replication. Four out of 24 loci identified in prior GWAS were nominally associated in HCHS/SOL. There was no evidence of overlap in genetic risk factors across ancestry groups, though this may have been due to low power. We conducted the largest GWAS of depression-related phenotypes in Hispanic/Latino adults. Results underscore the genetic complexity of depressive symptoms as a phenotype in this population and suggest the need for much larger samples.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jpsychires.2017.12.010DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6192675PMC
April 2018

GWAS of the electrocardiographic QT interval in Hispanics/Latinos generalizes previously identified loci and identifies population-specific signals.

Sci Rep 2017 12 6;7(1):17075. Epub 2017 Dec 6.

Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA.

QT interval prolongation is a heritable risk factor for ventricular arrhythmias and can predispose to sudden death. Most genome-wide association studies (GWAS) of QT were performed in European ancestral populations, leaving other groups uncharacterized. Herein we present the first QT GWAS of Hispanic/Latinos using data on 15,997 participants from four studies. Study-specific summary results of the association between 1000 Genomes Project (1000G) imputed SNPs and electrocardiographically measured QT were combined using fixed-effects meta-analysis. We identified 41 genome-wide significant SNPs that mapped to 13 previously identified QT loci. Conditional analyses distinguished six secondary signals at NOS1AP (n = 2), ATP1B1 (n = 2), SCN5A (n = 1), and KCNQ1 (n = 1). Comparison of linkage disequilibrium patterns between the 13 lead SNPs and six secondary signals with previously reported index SNPs in 1000G super populations suggested that the SCN5A and KCNE1 lead SNPs were potentially novel and population-specific. Finally, of the 42 suggestively associated loci, AJAP1 was suggestively associated with QT in a prior East Asian GWAS; in contrast BVES and CAP2 murine knockouts caused cardiac conduction defects. Our results indicate that whereas the same loci influence QT across populations, population-specific variation exists, motivating future trans-ethnic and ancestrally diverse QT GWAS.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-017-17136-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719082PMC
December 2017

Genome-wide association study of PR interval in Hispanics/Latinos identifies novel locus at .

Heart 2018 06 10;104(11):904-911. Epub 2017 Nov 10.

Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.

Objective: PR interval (PR) is a heritable electrocardiographic measure of atrial and atrioventricular nodal conduction. Changes in PR duration may be associated with atrial fibrillation, heart failure and all-cause mortality. Hispanic/Latino populations have high burdens of cardiovascular morbidity and mortality, are highly admixed and represent exceptional opportunities for novel locus identification. However, they remain chronically understudied. We present the first genome-wide association study (GWAS) of PR in 14 756 participants of Hispanic/Latino ancestry from three studies.

Methods: Study-specific summary results of the association between 1000 Genomes Phase 1 imputed single-nucleotide polymorphisms (SNPs) and PR assumed an additive genetic model and were adjusted for global ancestry, study centre/region and clinical covariates. Results were combined using fixed-effects, inverse variance weighted meta-analysis. Sequential conditional analyses were used to identify independent signals. Replication of novel loci was performed in populations of Asian, African and European descent. ENCODE and RoadMap data were used to annotate results.

Results: We identified a novel genome-wide association (P<5×10) with PR at (rs6730558), which replicated in Asian and European populations (P<0.017). Additionally, we generalised 10 previously identified PR loci to Hispanics/Latinos. Bioinformatics annotation provided evidence for regulatory function in cardiac tissue. Further, for six loci that generalised, the Hispanic/Latino index SNP was genome-wide significant and identical to (or in high linkage disequilibrium with) the previously identified GWAS lead SNP.

Conclusions: Our results suggest that genetic determinants of PR are consistent across race/ethnicity, but extending studies to admixed populations can identify novel associations, underscoring the importance of conducting genetic studies in diverse populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1136/heartjnl-2017-312045DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6946379PMC
June 2018

Genetic loci associated with heart rate variability and their effects on cardiac disease risk.

Nat Commun 2017 06 14;8:15805. Epub 2017 Jun 14.

Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, USA.

Reduced cardiac vagal control reflected in low heart rate variability (HRV) is associated with greater risks for cardiac morbidity and mortality. In two-stage meta-analyses of genome-wide association studies for three HRV traits in up to 53,174 individuals of European ancestry, we detect 17 genome-wide significant SNPs in eight loci. HRV SNPs tag non-synonymous SNPs (in NDUFA11 and KIAA1755), expression quantitative trait loci (eQTLs) (influencing GNG11, RGS6 and NEO1), or are located in genes preferentially expressed in the sinoatrial node (GNG11, RGS6 and HCN4). Genetic risk scores account for 0.9 to 2.6% of the HRV variance. Significant genetic correlation is found for HRV with heart rate (-0.74
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms15805DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5474732PMC
June 2017

Genome-wide association study of heart rate and its variability in Hispanic/Latino cohorts.

Heart Rhythm 2017 11 10;14(11):1675-1684. Epub 2017 Jun 10.

Department of Epidemiology, University of Washington, Seattle, Washington; Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington.

Background: Although time-domain measures of heart rate variability (HRV) are used to estimate cardiac autonomic tone and disease risk in multiethnic populations, the genetic epidemiology of HRV in Hispanics/Latinos has not been characterized.

Objective: The purpose of this study was to conduct a genome-wide association study of heart rate (HR) and its variability in the Hispanic Community Health Study/Study of Latinos, Multi-Ethnic Study of Atherosclerosis, and Women's Health Initiative Hispanic SNP-Health Association Resource project (n = 13,767).

Methods: We estimated HR (bpm), standard deviation of normal-to-normal interbeat intervals (SDNN, ms), and root mean squared difference in successive, normal-to-normal interbeat intervals (RMSSD, ms) from resting, standard 12-lead ECGs. We estimated associations between each phenotype and 17 million genotyped or imputed single nucleotide polymorphisms (SNPs), accounting for relatedness and adjusting for age, sex, study site, and ancestry. Cohort-specific estimates were combined using fixed-effects, inverse-variance meta-analysis. We investigated replication for select SNPs exceeding genome-wide (P <5 × 10) or suggestive (P <10) significance thresholds.

Results: Two genome-wide significant SNPs replicated in a European ancestry cohort, 1 one for RMSSD (rs4963772; chromosome 12) and another for SDNN (rs12982903; chromosome 19). A suggestive SNP for HR (rs236352; chromosome 6) replicated in an African-American cohort. Functional annotation of replicated SNPs in cardiac and neuronal tissues identified potentially causal variants and mechanisms.

Conclusion: This first genome-wide association study of HRV and HR in Hispanics/Latinos underscores the potential for even modestly sized samples of non-European ancestry to inform the genetic epidemiology of complex traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.hrthm.2017.06.018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5671896PMC
November 2017

Genome-Wide Association Study of Heavy Smoking and Daily/Nondaily Smoking in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL).

Nicotine Tob Res 2018 03;20(4):448-457

Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY.

Introduction: Genetic variants associated with nicotine dependence have previously been identified, primarily in European-ancestry populations. No genome-wide association studies (GWAS) have been reported for smoking behaviors in Hispanics/Latinos in the United States and Latin America, who are of mixed ancestry with European, African, and American Indigenous components.

Methods: We examined genetic associations with smoking behaviors in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) (N = 12 741 with smoking data, 5119 ever-smokers), using ~2.3 million genotyped variants imputed to the 1000 Genomes Project phase 3. Mixed logistic regression models accounted for population structure, sampling, relatedness, sex, and age.

Results: The known region of CHRNA5, which encodes the α5 cholinergic nicotinic receptor subunit, was associated with heavy smoking at genome-wide significance (p ≤ 5 × 10-8) in a comparison of 1929 ever-smokers reporting cigarettes per day (CPD) > 10 versus 3156 reporting CPD ≤ 10. The functional variant rs16969968 in CHRNA5 had a p value of 2.20 × 10-7 and odds ratio (OR) of 1.32 for the minor allele (A); its minor allele frequency was 0.22 overall and similar across Hispanic/Latino background groups (Central American = 0.17; South American = 0.19; Mexican = 0.18; Puerto Rican = 0.22; Cuban = 0.29; Dominican = 0.19). CHRNA4 on chromosome 20 attained p < 10-4, supporting prior findings in non-Hispanics. For nondaily smoking, which is prevalent in Hispanic/Latino smokers, compared to daily smoking, loci on chromosomes 2 and 4 achieved genome-wide significance; replication attempts were limited by small Hispanic/Latino sample sizes.

Conclusions: Associations of nicotinic receptor gene variants with smoking, first reported in non-Hispanic European-ancestry populations, generalized to Hispanics/Latinos despite different patterns of smoking behavior.

Implications: We conducted the first large-scale genome-wide association study (GWAS) of smoking behavior in a US Hispanic/Latino cohort, and the first GWAS of daily/nondaily smoking in any population. Results show that the region of the nicotinic receptor subunit gene CHRNA5, which in non-Hispanic European-ancestry smokers has been associated with heavy smoking as well as cessation and treatment efficacy, is also significantly associated with heavy smoking in this Hispanic/Latino cohort. The results are an important addition to understanding the impact of genetic variants in understudied Hispanic/Latino smokers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/ntr/ntx107DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5896462PMC
March 2018

SeqArray-a storage-efficient high-performance data format for WGS variant calls.

Bioinformatics 2017 Aug;33(15):2251-2257

Department of Biostatistics, University of Washington, Seattle, WA, USA.

Motivation: Whole-genome sequencing (WGS) data are being generated at an unprecedented rate. Analysis of WGS data requires a flexible data format to store the different types of DNA variation. Variant call format (VCF) is a general text-based format developed to store variant genotypes and their annotations. However, VCF files are large and data retrieval is relatively slow. Here we introduce a new WGS variant data format implemented in the R/Bioconductor package 'SeqArray' for storing variant calls in an array-oriented manner which provides the same capabilities as VCF, but with multiple high compression options and data access using high-performance parallel computing.

Results: Benchmarks using 1000 Genomes Phase 3 data show file sizes are 14.0 Gb (VCF), 12.3 Gb (BCF, binary VCF), 3.5 Gb (BGT) and 2.6 Gb (SeqArray) respectively. Reading genotypes in the SeqArray package are two to three times faster compared with the htslib C library using BCF files. For the allele frequency calculation, the implementation in the SeqArray package is over 5 times faster than PLINK v1.9 with VCF and BCF files, and over 16 times faster than vcftools. When used in conjunction with R/Bioconductor packages, the SeqArray package provides users a flexible, feature-rich, high-performance programming environment for analysis of WGS variant data.

Availability And Implementation: http://www.bioconductor.org/packages/SeqArray.

Contact: [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btx145DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860110PMC
August 2017

A genome-wide interaction analysis of tricyclic/tetracyclic antidepressants and RR and QT intervals: a pharmacogenomics study from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium.

J Med Genet 2017 05 30;54(5):313-323. Epub 2016 Dec 30.

Laboratory of Epidemiology, Demography, and Biometry, National Institute on Aging, Bethesda, Maryland, USA.

Background: Increased heart rate and a prolonged QT interval are important risk factors for cardiovascular morbidity and mortality, and can be influenced by the use of various medications, including tricyclic/tetracyclic antidepressants (TCAs). We aim to identify genetic loci that modify the association between TCA use and RR and QT intervals.

Methods And Results: We conducted race/ethnic-specific genome-wide interaction analyses (with HapMap phase II imputed reference panel imputation) of TCAs and resting RR and QT intervals in cohorts of European (n=45 706; n=1417 TCA users), African (n=10 235; n=296 TCA users) and Hispanic/Latino (n=13 808; n=147 TCA users) ancestry, adjusted for clinical covariates. Among the populations of European ancestry, two genome-wide significant loci were identified for RR interval: rs6737205 in (β=56.3, p=3.9e) and rs9830388 in (β=25.2, p=1.7e). In Hispanic/Latino cohorts, rs2291477 in significantly modified the association between TCAs and QT intervals (β=9.3, p=2.55e). In the meta-analyses of the other ethnicities, these loci either were excluded from the meta-analyses (as part of quality control), or their effects did not reach the level of nominal statistical significance (p>0.05). No new variants were identified in these ethnicities. No additional loci were identified after inverse-variance-weighted meta-analysis of the three ancestries.

Conclusions: Among Europeans, TCA interactions with variants in and were identified in relation to RR intervals. Among Hispanic/Latinos, variants in modified the relation between TCAs and QT intervals. Future studies are required to confirm our results.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1136/jmedgenet-2016-104112DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5406254PMC
May 2017

Genetic variation near IRS1 is associated with adiposity and a favorable metabolic profile in U.S. Hispanics/Latinos.

Obesity (Silver Spring) 2016 11 24;24(11):2407-2413. Epub 2016 Sep 24.

Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, New York, USA.

Objective: Associations of IRS1 genetic variation with adiposity and metabolic profile in U.S. Hispanic/Latino individuals of diverse backgrounds were examined.

Methods: Previously genome-wide association study-identified IRS1 variants (rs2943650, rs2972146, rs2943641, and rs2943634) as related to body fat percentage (BF%) and multiple metabolic traits were tested among up to 12,730 adults (5,232 men; 7,515 women) from the Hispanic Community Health Study/Study of Latinos.

Results: The C-allele (frequency = 26%) of rs2943650 was significantly associated with higher BF% overall (β = 0.34 ± 0.11% per allele; P = 0.002) and in women (β = 0.41 ± 0.14% per C-allele; P = 0.003), but not in men (β = 0.28 ± 0.18% per C-allele; P = 0.11), though there was no significant sex difference. Using the inverse normal-transformed data to compare effect sizes, it was found that the association with BF% was stronger in Hispanic/Latino women than that previously reported in European women (β = 0.054 ± 0.018SD vs. β = 0.008 ± 0.011SD per C-allele; P = 0.03). The BF%-increasing allele of rs2943650 was significantly associated with lower levels of fasting insulin, homeostatic model assessment of insulin resistance, hemoglobin A1c, and triglycerides and higher high-density lipoprotein cholesterol (P < 0.05).

Conclusions: This study confirmed and extended previous findings of IRS1 variation associated with increased adiposity but a favorable metabolic profile in U.S. Hispanics/Latinos, with a relatively stronger genetic effect on BF% in Hispanic/Latino women compared with European women.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/oby.21624DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5093062PMC
November 2016

Meta-Analysis of Genome-Wide Association Studies with Correlated Individuals: Application to the Hispanic Community Health Study/Study of Latinos (HCHS/SOL).

Genet Epidemiol 2016 09 3;40(6):492-501. Epub 2016 Jun 3.

Department of Biostatistics, University of Washington, Seattle, Washington, United States of America.

Investigators often meta-analyze multiple genome-wide association studies (GWASs) to increase the power to detect associations of single nucleotide polymorphisms (SNPs) with a trait. Meta-analysis is also performed within a single cohort that is stratified by, e.g., sex or ancestry group. Having correlated individuals among the strata may complicate meta-analyses, limit power, and inflate Type 1 error. For example, in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), sources of correlation include genetic relatedness, shared household, and shared community. We propose a novel mixed-effect model for meta-analysis, "MetaCor," which accounts for correlation between stratum-specific effect estimates. Simulations show that MetaCor controls inflation better than alternatives such as ignoring the correlation between the strata or analyzing all strata together in a "pooled" GWAS, especially with different minor allele frequencies (MAFs) between strata. We illustrate the benefits of MetaCor on two GWASs in the HCHS/SOL. Analysis of dental caries (tooth decay) stratified by ancestry group detected a genome-wide significant SNP (rs7791001, P-value = 3.66×10-8, compared to 4.67×10-7 in pooled), with different MAFs between strata. Stratified analysis of body mass index (BMI) by ancestry group and sex reduced overall inflation from λGC=1.050 (pooled) to λGC=1.028 (MetaCor). Furthermore, even after removing close relatives to obtain nearly uncorrelated strata, a naïve stratified analysis resulted in λGC=1.058 compared to λGC=1.027 for MetaCor.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.21981DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4981554PMC
September 2016

Local Ancestry Inference in a Large US-Based Hispanic/Latino Study: Hispanic Community Health Study/Study of Latinos (HCHS/SOL).

G3 (Bethesda) 2016 06 1;6(6):1525-34. Epub 2016 Jun 1.

Department of Biostatistics, University of Washington, Seattle, Washington 98195.

We estimated local ancestry on the autosomes and X chromosome in a large US-based study of 12,793 Hispanic/Latino individuals using the RFMix method, and we compared different reference panels and approaches to local ancestry estimation on the X chromosome by means of Mendelian inconsistency rates as a proxy for accuracy. We developed a novel and straightforward approach to performing ancestry-specific PCA after finding artifactual behavior in the results from an existing approach. Using the ancestry-specific PCA, we found significant population structure within African, European, and Amerindian ancestries in the Hispanic/Latino individuals in our study. In the African ancestral component of the admixed individuals, individuals whose grandparents were from Central America clustered separately from individuals whose grandparents were from the Caribbean, and also from reference Yoruba and Mandenka West African individuals. In the European component, individuals whose grandparents were from Puerto Rico diverged partially from other background groups. In the Amerindian ancestral component, individuals clustered into multiple different groups depending on the grandparental country of origin. Therefore, local ancestry estimation provides further insight into the complex genetic structure of US Hispanic/Latino populations, which must be properly accounted for in genotype-phenotype association studies. It also provides a basis for admixture mapping and ancestry-specific allele frequency estimation, which are useful in the identification of risk factors for disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/g3.116.028779DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4889649PMC
June 2016
-->