Publications by authors named "Matthew P Conomos"

36 Publications

BinomiRare: A robust test for association of a rare genetic variant with a binary outcome for mixed models and any case-control proportion.

HGG Adv 2021 Jul 12;2(3). Epub 2021 Jun 12.

Framingham Heart Study, Framingham, MA, USA.

Whole-genome sequencing (WGS) and whole-exome sequencing studies have become increasingly available and are being used to identify rare genetic variants associated with health and disease outcomes. Investigators routinely use mixed models to account for genetic relatedness or other clustering variables (e.g., family or household) when testing genetic associations. However, no existing tests of the association of a rare variant with a binary outcome in the presence of correlated data control the type 1 error where there are (1) few individuals harboring the rare allele, (2) a small proportion of cases relative to controls, and (3) covariates to adjust for. Here, we address all three issues in developing a framework for testing rare variant association with a binary trait in individuals harboring at least one risk allele. In this framework, we estimate outcome probabilities under the null hypothesis and then use them, within the individuals with at least one risk allele, to test variant associations. We extend the BinomiRare test, which was previously proposed for independent observations, and develop the Conway-Maxwell-Poisson (CMP) test and study their properties in simulations. We show that the BinomiRare test always controls the type 1 error, while the CMP test sometimes does not. We then use the BinomiRare test to test the association of rare genetic variants in target genes with small-vessel disease (SVD) stroke, short sleep, and venous thromboembolism (VTE), in whole-genome sequence data from the Trans-Omics for Precision Medicine (TOPMed) program.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.xhgg.2021.100040DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321319PMC
July 2021

Genome-wide association study in the Taiwan biobank identifies four novel genes for human height: NABP2, RASA2, RNF41 and SLC39A5.

Hum Mol Genet 2021 Jul 16. Epub 2021 Jul 16.

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.

Numerous genome-wide association studies (GWASs) have been conducted for the identification of genetic variants involved with human height. The vast majority of these studies, however, have been conducted in populations of European ancestry. Here, we report the first GWAS of adult height in the Taiwan Biobank using a discovery sample of 14 571 individuals and an independent replication sample of 20 506 individuals. From our analysis we generalize to the Taiwanese population genome-wide significant associations with height and 18 previously identified genes in European and non-Taiwanese East Asian populations. We also identify and replicate, at the genome-wide significance level, associated variants for height in four novel genes at two loci that have not previously been reported: RASA2 on chromosome 3 and NABP2, RNF41, and SLC39A5 at 12q13.3 on chromosome 12. RASA2 and RNF41 are strong candidates for having a role in height with copy number and loss of function variants in RASA2 previously found to be associated with short stature disorders, and decreased expression of the RNF41 gene resulting in insulin resistance in skeletal muscle. The results from our analysis of the Taiwan Biobank underscore the potential for the identification of novel genetic discoveries in underrepresented worldwide populations, even for traits, such as height, that have been extensively investigated in large-scale studies of European ancestry populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddab202DOI Listing
July 2021

Variant-specific inflation factors for assessing population stratification at the phenotypic variance level.

Nat Commun 2021 06 9;12(1):3506. Epub 2021 Jun 9.

Department of Biostatistics, University of Washington, Seattle, WA, USA.

In modern Whole Genome Sequencing (WGS) epidemiological studies, participant-level data from multiple studies are often pooled and results are obtained from a single analysis. We consider the impact of differential phenotype variances by study, which we term 'variance stratification'. Unaccounted for, variance stratification can lead to both decreased statistical power, and increased false positives rates, depending on how allele frequencies, sample sizes, and phenotypic variances vary across the studies that are pooled. We develop a procedure to compute variant-specific inflation factors, and show how it can be used for diagnosis of genetic association analyses on pooled individual level data from multiple studies. We describe a WGS-appropriate analysis approach, implemented in freely-available software, which allows study-specific variances and thereby improves performance in practice. We illustrate the variance stratification problem, its solutions, and the proposed diagnostic procedure, in simulations and in data from the Trans-Omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), used in association tests for hemoglobin concentrations and BMI.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-23655-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8190158PMC
June 2021

Whole-genome sequencing association analysis of quantitative red blood cell phenotypes: The NHLBI TOPMed program.

Am J Hum Genet 2021 05 21;108(5):874-893. Epub 2021 Apr 21.

Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA.

Whole-genome sequencing (WGS), a powerful tool for detecting novel coding and non-coding disease-causing variants, has largely been applied to clinical diagnosis of inherited disorders. Here we leveraged WGS data in up to 62,653 ethnically diverse participants from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program and assessed statistical association of variants with seven red blood cell (RBC) quantitative traits. We discovered 14 single variant-RBC trait associations at 12 genomic loci, which have not been reported previously. Several of the RBC trait-variant associations (RPN1, ELL2, MIDN, HBB, HBA1, PIEZO1, and G6PD) were replicated in independent GWAS datasets imputed to the TOPMed reference panel. Most of these discovered variants are rare/low frequency, and several are observed disproportionately among non-European Ancestry (African, Hispanic/Latino, or East Asian) populations. We identified a 3 bp indel p.Lys2169del (g.88717175_88717177TCT[4]) (common only in the Ashkenazi Jewish population) of PIEZO1, a gene responsible for the Mendelian red cell disorder hereditary xerocytosis (MIM: 194380), associated with higher mean corpuscular hemoglobin concentration (MCHC). In stepwise conditional analysis and in gene-based rare variant aggregated association analysis, we identified several of the variants in HBB, HBA1, TMPRSS6, and G6PD that represent the carrier state for known coding, promoter, or splice site loss-of-function variants that cause inherited RBC disorders. Finally, we applied base and nuclease editing to demonstrate that the sentinel variant rs112097551 (nearest gene RPN1) acts through a cis-regulatory element that exerts long-range control of the gene RUVBL1 which is essential for hematopoiesis. Together, these results demonstrate the utility of WGS in ethnically diverse population-based samples and gene editing for expanding knowledge of the genetic architecture of quantitative hematologic traits and suggest a continuum between complex trait and Mendelian red cell disorders.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2021.04.003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8206199PMC
May 2021

Robust, flexible, and scalable tests for Hardy-Weinberg equilibrium across diverse ancestries.

Genetics 2021 May;218(1)

Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.

Traditional Hardy-Weinberg equilibrium (HWE) tests (the χ2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE. However, in data sets composed of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality. In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and to evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence data sets. Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false-positive rates by many orders of magnitude. Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently among the best across all evaluations. RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats. RUTH is publicly available at https://www.github.com/statgen/ruth.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/genetics/iyab044DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8128395PMC
May 2021

DUOX2 variants associate with preclinical disturbances in microbiota-immune homeostasis and increased inflammatory bowel disease risk.

J Clin Invest 2021 May;131(9)

Division of Gastroenterology and Hepatology, Department of Internal Medicine, Michigan Medicine, University of Michigan, Ann Arbor, Michigan, USA.

A primordial gut-epithelial innate defense response is the release of hydrogen peroxide by dual NADPH oxidase (DUOX). In inflammatory bowel disease (IBD), a condition characterized by an imbalanced gut microbiota-immune homeostasis, DUOX2 isoenzyme is the highest induced gene. Performing multiomic analyses using 2872 human participants of a wellness program, we detected a substantial burden of rare protein-altering DUOX2 gene variants of unknown physiologic significance. We identified a significant association between these rare loss-of-function variants and increased plasma levels of interleukin-17C, which is induced also in mucosal biopsies of patients with IBD. DUOX2-deficient mice replicated increased IL-17C induction in the intestine, with outlier high Il17c expression linked to the mucosal expansion of specific Proteobacteria pathobionts. Integrated microbiota/host gene expression analyses in patients with IBD corroborated IL-17C as a marker for epithelial activation by gram-negative bacteria. Finally, the impact of DUOX2 variants on IL-17C induction provided a rationale for variant stratification in case control studies that substantiated DUOX2 as an IBD risk gene. Thus, our study identifies an association of deleterious DUOX2 variants with a preclinical hallmark of disturbed microbiota-immune homeostasis that appears to precede the manifestation of IBD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1172/JCI141676DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8087203PMC
May 2021

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.

Nature 2021 02 10;590(7845):290-299. Epub 2021 Feb 10.

The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes). In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03205-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875770PMC
February 2021

Untargeted longitudinal analysis of a wellness cohort identifies markers of metastatic cancer years prior to diagnosis.

Sci Rep 2020 10 1;10(1):16275. Epub 2020 Oct 1.

Institute for Systems Biology, 401 Terry Ave N, Seattle, WA, 98109, USA.

We analyzed 1196 proteins in longitudinal plasma samples from participants in a commercial wellness program, including samples collected pre-diagnosis from ten cancer patients and 69 controls. For three individuals ultimately diagnosed with metastatic breast, lung, or pancreatic cancer, CEACAM5 was a persistent longitudinal outlier as early as 26.5 months pre-diagnosis. CALCA, a biomarker for medullary thyroid cancer, was hypersecreted in metastatic pancreatic cancer at least 16.5 months pre-diagnosis. ERBB2 levels spiked in metastatic breast cancer between 10.0 and 4.0 months pre-diagnosis. Our results support the value of deep phenotyping seemingly healthy individuals in prospectively inferring disease transitions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-020-73451-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7529776PMC
October 2020

Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale.

Nat Genet 2020 09 24;52(9):969-983. Epub 2020 Aug 24.

Department of Data Sciences, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce 'annotation principal components', multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-020-0676-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7483769PMC
September 2020

Genetic association testing using the GENESIS R/Bioconductor package.

Bioinformatics 2019 12;35(24):5346-5348

Department of Biostatistics, University of Washington, Seattle, WA, USA.

Summary: The Genomic Data Storage (GDS) format provides efficient storage and retrieval of genotypes measured by microarrays and sequencing. We developed GENESIS to perform various single- and aggregate-variant association tests using genotype data stored in GDS format. GENESIS implements highly flexible mixed models, allowing for different link functions, multiple variance components and phenotypic heteroskedasticity. GENESIS integrates cohesively with other R/Bioconductor packages to build a complete genomic analysis workflow entirely within the R environment.

Availability And Implementation: https://bioconductor.org/packages/GENESIS; vignettes included.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz567DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7904076PMC
December 2019

Genetic analyses of diverse populations improves discovery for complex traits.

Nature 2019 06 19;570(7762):514-518. Epub 2019 Jun 19.

Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.

Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific. Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations. Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839 non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States-where minority populations have a disproportionately higher burden of chronic conditions-the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-019-1310-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6785182PMC
June 2019

Genetic Predisposition Impacts Clinical Changes in a Lifestyle Coaching Program.

Sci Rep 2019 05 2;9(1):6805. Epub 2019 May 2.

Arivale, Inc, Seattle, WA, 98104, USA.

Both genetic and lifestyle factors contribute to an individual's disease risk, suggesting a multi-omic approach is essential for personalized prevention. Studies have examined the effectiveness of lifestyle coaching on clinical outcomes, however, little is known about the impact of genetic predisposition on the response to lifestyle coaching. Here we report on the results of a real-world observational study in 2531 participants enrolled in a commercial "Scientific Wellness" program, which combines multi-omic data with personalized, telephonic lifestyle coaching. Specifically, we examined: 1) the impact of this program on 55 clinical markers and 2) the effect of genetic predisposition on these clinical changes. We identified sustained improvements in clinical markers related to cardiometabolic risk, inflammation, nutrition, and anthropometrics. Notably, improvements in HbA1c were akin to those observed in landmark trials. Furthermore, genetic markers were associated with longitudinal changes in clinical markers. For example, individuals with genetic predisposition for higher LDL-C had a lesser decrease in LDL-C on average than those with genetic predisposition for average LDL-C. Overall, these results suggest that a program combining multi-omic data with lifestyle coaching produces clinically meaningful improvements, and that genetic predisposition impacts clinical responses to lifestyle change.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-019-43058-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6497671PMC
May 2019

Associations of variants In the hexokinase 1 and interleukin 18 receptor regions with oxyhemoglobin saturation during sleep.

PLoS Genet 2019 04 16;15(4):e1007739. Epub 2019 Apr 16.

Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, United States of America.

Sleep disordered breathing (SDB)-related overnight hypoxemia is associated with cardiometabolic disease and other comorbidities. Understanding the genetic bases for variations in nocturnal hypoxemia may help understand mechanisms influencing oxygenation and SDB-related mortality. We conducted genome-wide association tests across 10 cohorts and 4 populations to identify genetic variants associated with three correlated measures of overnight oxyhemoglobin saturation: average and minimum oxyhemoglobin saturation during sleep and the percent of sleep with oxyhemoglobin saturation under 90%. The discovery sample consisted of 8,326 individuals. Variants with p < 1 × 10(-6) were analyzed in a replication group of 14,410 individuals. We identified 3 significantly associated regions, including 2 regions in multi-ethnic analyses (2q12, 10q22). SNPs in the 2q12 region associated with minimum SpO2 (rs78136548 p = 2.70 × 10(-10)). SNPs at 10q22 were associated with all three traits including average SpO2 (rs72805692 p = 4.58 × 10(-8)). SNPs in both regions were associated in over 20,000 individuals and are supported by prior associations or functional evidence. Four additional significant regions were detected in secondary sex-stratified and combined discovery and replication analyses, including a region overlapping Reelin, a known marker of respiratory complex neurons.These are the first genome-wide significant findings reported for oxyhemoglobin saturation during sleep, a phenotype of high clinical interest. Our replicated associations with HK1 and IL18R1 suggest that variants in inflammatory pathways, such as the biologically-plausible NLRP3 inflammasome, may contribute to nocturnal hypoxemia.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1007739DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6467367PMC
April 2019

Admixture mapping identifies novel loci for obstructive sleep apnea in Hispanic/Latino Americans.

Hum Mol Genet 2019 02;28(4):675-687

Physiology and Biophysics, University of Mississippi, Jackson, MS, USA.

Obstructive sleep apnea (OSA) is a common disorder associated with increased risk of cardiovascular disease and mortality. Its prevalence and severity vary across ancestral background. Although OSA traits are heritable, few genetic associations have been identified. To identify genetic regions associated with OSA and improve statistical power, we applied admixture mapping on three primary OSA traits [the apnea hypopnea index (AHI), overnight average oxyhemoglobin saturation (SaO2) and percentage time SaO2 < 90%] and a secondary trait (respiratory event duration) in a Hispanic/Latino American population study of 11 575 individuals with significant variation in ancestral background. Linear mixed models were performed using previously inferred African, European and Amerindian local genetic ancestry markers. Global African ancestry was associated with a lower AHI, higher SaO2 and shorter event duration. Admixture mapping analysis of the primary OSA traits identified local African ancestry at the chromosomal region 2q37 as genome-wide significantly associated with AHI (P < 5.7 × 10-5), and European and Amerindian ancestries at 18q21 suggestively associated with both AHI and percentage time SaO2 < 90% (P < 10-3). Follow-up joint ancestry-SNP association analyses identified novel variants in ferrochelatase (FECH), significantly associated with AHI and percentage time SaO2 < 90% after adjusting for multiple tests (P < 8 × 10-6). These signals contributed to the admixture mapping associations and were replicated in independent cohorts. In this first admixture mapping study of OSA, novel associations with variants in the iron/heme metabolism pathway suggest a role for iron in influencing respiratory traits underlying OSA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddy387DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6360325PMC
February 2019

A Multi-omic Association Study of Trimethylamine N-Oxide.

Cell Rep 2018 07;24(4):935-946

Arivale, Inc., Seattle, WA 98104, USA.

Trimethylamine N-oxide (TMAO) is a circulating metabolite that has been implicated in the development of atherosclerosis and cardiovascular disease (CVD). In this paper, we identify blood markers, metabolites, proteins, gut microbiota patterns, and diets that are significantly associated with levels of plasma TMAO. We find that kidney markers are strongly associated with TMAO and identify CVD-related proteins that are positively correlated with TMAO. We show that metabolites derived by the gut microbiota are strongly correlated with TMAO and that the magnitude of this correlation varies with kidney function. Moreover, we identify diet-associated patterns in the microbiome that are correlated with TMAO. These findings suggest that both the process of TMAO accumulation and the mechanism by which TMAO promotes atherosclerosis are a complex interplay between diet and the microbiome on one hand and other system-level factors such as circulating proteins, metabolites, and kidney function.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2018.06.096DOI Listing
July 2018

GWAS of the electrocardiographic QT interval in Hispanics/Latinos generalizes previously identified loci and identifies population-specific signals.

Sci Rep 2017 12 6;7(1):17075. Epub 2017 Dec 6.

Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA.

QT interval prolongation is a heritable risk factor for ventricular arrhythmias and can predispose to sudden death. Most genome-wide association studies (GWAS) of QT were performed in European ancestral populations, leaving other groups uncharacterized. Herein we present the first QT GWAS of Hispanic/Latinos using data on 15,997 participants from four studies. Study-specific summary results of the association between 1000 Genomes Project (1000G) imputed SNPs and electrocardiographically measured QT were combined using fixed-effects meta-analysis. We identified 41 genome-wide significant SNPs that mapped to 13 previously identified QT loci. Conditional analyses distinguished six secondary signals at NOS1AP (n = 2), ATP1B1 (n = 2), SCN5A (n = 1), and KCNQ1 (n = 1). Comparison of linkage disequilibrium patterns between the 13 lead SNPs and six secondary signals with previously reported index SNPs in 1000G super populations suggested that the SCN5A and KCNE1 lead SNPs were potentially novel and population-specific. Finally, of the 42 suggestively associated loci, AJAP1 was suggestively associated with QT in a prior East Asian GWAS; in contrast BVES and CAP2 murine knockouts caused cardiac conduction defects. Our results indicate that whereas the same loci influence QT across populations, population-specific variation exists, motivating future trans-ethnic and ancestrally diverse QT GWAS.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-017-17136-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719082PMC
December 2017

Multiethnic Meta-Analysis Identifies RAI1 as a Possible Obstructive Sleep Apnea-related Quantitative Trait Locus in Men.

Am J Respir Cell Mol Biol 2018 03;58(3):391-401

30 School of Public Health, University of Adelaide, Adelaide, South Australia, Australia.

Obstructive sleep apnea (OSA) is a common heritable disorder displaying marked sexual dimorphism in disease prevalence and progression. Previous genetic association studies have identified a few genetic loci associated with OSA and related quantitative traits, but they have only focused on single ethnic groups, and a large proportion of the heritability remains unexplained. The apnea-hypopnea index (AHI) is a commonly used quantitative measure characterizing OSA severity. Because OSA differs by sex, and the pathophysiology of obstructive events differ in rapid eye movement (REM) and non-REM (NREM) sleep, we hypothesized that additional genetic association signals would be identified by analyzing the NREM/REM-specific AHI and by conducting sex-specific analyses in multiethnic samples. We performed genome-wide association tests for up to 19,733 participants of African, Asian, European, and Hispanic/Latino American ancestry in 7 studies. We identified rs12936587 on chromosome 17 as a possible quantitative trait locus for NREM AHI in men (N = 6,737; P = 1.7 × 10) but not in women (P = 0.77). The association with NREM AHI was replicated in a physiological research study (N = 67; P = 0.047). This locus overlapping the RAI1 gene and encompassing genes PEMT1, SREBF1, and RASD1 was previously reported to be associated with coronary artery disease, lipid metabolism, and implicated in Potocki-Lupski syndrome and Smith-Magenis syndrome, which are characterized by abnormal sleep phenotypes. We also identified gene-by-sex interactions in suggestive association regions, suggesting that genetic variants for AHI appear to vary by sex, consistent with the clinical observations of strong sexual dimorphism.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1165/rcmb.2017-0237OCDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5854957PMC
March 2018

Genome-wide association study of heart rate and its variability in Hispanic/Latino cohorts.

Heart Rhythm 2017 11 10;14(11):1675-1684. Epub 2017 Jun 10.

Department of Epidemiology, University of Washington, Seattle, Washington; Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington.

Background: Although time-domain measures of heart rate variability (HRV) are used to estimate cardiac autonomic tone and disease risk in multiethnic populations, the genetic epidemiology of HRV in Hispanics/Latinos has not been characterized.

Objective: The purpose of this study was to conduct a genome-wide association study of heart rate (HR) and its variability in the Hispanic Community Health Study/Study of Latinos, Multi-Ethnic Study of Atherosclerosis, and Women's Health Initiative Hispanic SNP-Health Association Resource project (n = 13,767).

Methods: We estimated HR (bpm), standard deviation of normal-to-normal interbeat intervals (SDNN, ms), and root mean squared difference in successive, normal-to-normal interbeat intervals (RMSSD, ms) from resting, standard 12-lead ECGs. We estimated associations between each phenotype and 17 million genotyped or imputed single nucleotide polymorphisms (SNPs), accounting for relatedness and adjusting for age, sex, study site, and ancestry. Cohort-specific estimates were combined using fixed-effects, inverse-variance meta-analysis. We investigated replication for select SNPs exceeding genome-wide (P <5 × 10) or suggestive (P <10) significance thresholds.

Results: Two genome-wide significant SNPs replicated in a European ancestry cohort, 1 one for RMSSD (rs4963772; chromosome 12) and another for SDNN (rs12982903; chromosome 19). A suggestive SNP for HR (rs236352; chromosome 6) replicated in an African-American cohort. Functional annotation of replicated SNPs in cardiac and neuronal tissues identified potentially causal variants and mechanisms.

Conclusion: This first genome-wide association study of HRV and HR in Hispanics/Latinos underscores the potential for even modestly sized samples of non-European ancestry to inform the genetic epidemiology of complex traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.hrthm.2017.06.018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5671896PMC
November 2017

A meta-analysis of genome-wide association studies of asthma in Puerto Ricans.

Eur Respir J 2017 05 1;49(5). Epub 2017 May 1.

Division of Pediatric Pulmonary Medicine, Allergy, and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, USA

Puerto Ricans are disproportionately affected with asthma in the USA. In this study, we aim to identify genetic variants that confer susceptibility to asthma in Puerto Ricans.We conducted a meta-analysis of genome-wide association studies (GWAS) of asthma in Puerto Ricans, including participants from: the Genetics of Asthma in Latino Americans (GALA) I-II, the Hartford-Puerto Rico Study and the Hispanic Community Health Study. Moreover, we examined whether susceptibility loci identified in previous meta-analyses of GWAS are associated with asthma in Puerto Ricans.The only locus to achieve genome-wide significance was chromosome 17q21, as evidenced by our top single nucleotide polymorphism (SNP), rs907092 (OR 0.71, p=1.2×10) at Similar to results in non-Puerto Ricans, SNPs in genes in the same linkage disequilibrium block as (, and ) were significantly associated with asthma in Puerto Ricans. With regard to results from a meta-analysis in Europeans, we replicated findings for rs2305480 at , but not for SNPs in any other genes. On the other hand, we replicated results from a meta-analysis of North American populations for SNPs at , and but not for Our findings suggest that common variants on chromosome 17q21 have the greatest effects on asthma in Puerto Ricans.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1183/13993003.01505-2016DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5527708PMC
May 2017

Genome-wide association study of iron traits and relation to diabetes in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL): potential genomic intersection of iron and glucose regulation?

Hum Mol Genet 2017 05;26(10):1966-1978

Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute, and Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, CA 90502, and the David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.

Genetic variants contribute to normal variation of iron-related traits and may also cause clinical syndromes of iron deficiency or excess. Iron overload and deficiency can adversely affect human health. For example, elevated iron storage is associated with increased diabetes risk, although mechanisms are still being investigated. We conducted the first genome-wide association study of serum iron, total iron binding capacity (TIBC), transferrin saturation, and ferritin in a Hispanic/Latino cohort, the Hispanic Community Health Study/Study of Latinos (>12 000 participants) and also assessed the generalization of previously known loci to this population. We then evaluated whether iron-associated variants were associated with diabetes and glycemic traits. We found evidence for a novel association between TIBC and a variant near the gene for protein phosphatase 1, regulatory subunit 3B (PPP1R3B; rs4841132, β = -0.116, P = 7.44 × 10-8). The effect strengthened when iron deficient individuals were excluded (β = -0.121, P = 4.78 × 10-9). Ten of sixteen variants previously associated with iron traits generalized to HCHS/SOL, including variants at the transferrin (TF), hemochromatosis (HFE), fatty acid desaturase 2 (FADS2)/myelin regulatory factor (MYRF), transmembrane protease, serine 6 (TMPRSS6), transferrin receptor (TFR2), N-acetyltransferase 2 (arylamine N-acetyltransferase) (NAT2), ABO blood group (ABO), and GRB2 associated binding protein 3 (GAB3) loci. In examining iron variant associations with glucose homeostasis, an iron-raising variant of TMPRSS6 was associated with lower HbA1c levels (P = 8.66 × 10-10). This association was attenuated upon adjustment for iron measures. In contrast, the iron-raising allele of PPP1R3B was associated with higher levels of fasting glucose (P = 7.70 × 10-7) and fasting insulin (P = 4.79 × 10-6), but these associations were not attenuated upon adjustment for TIBC-so iron is not likely a mediator. These results provide new genetic information on iron traits and their connection with glucose homeostasis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddx082DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6075359PMC
May 2017

SeqArray-a storage-efficient high-performance data format for WGS variant calls.

Bioinformatics 2017 Aug;33(15):2251-2257

Department of Biostatistics, University of Washington, Seattle, WA, USA.

Motivation: Whole-genome sequencing (WGS) data are being generated at an unprecedented rate. Analysis of WGS data requires a flexible data format to store the different types of DNA variation. Variant call format (VCF) is a general text-based format developed to store variant genotypes and their annotations. However, VCF files are large and data retrieval is relatively slow. Here we introduce a new WGS variant data format implemented in the R/Bioconductor package 'SeqArray' for storing variant calls in an array-oriented manner which provides the same capabilities as VCF, but with multiple high compression options and data access using high-performance parallel computing.

Results: Benchmarks using 1000 Genomes Phase 3 data show file sizes are 14.0 Gb (VCF), 12.3 Gb (BCF, binary VCF), 3.5 Gb (BGT) and 2.6 Gb (SeqArray) respectively. Reading genotypes in the SeqArray package are two to three times faster compared with the htslib C library using BCF files. For the allele frequency calculation, the implementation in the SeqArray package is over 5 times faster than PLINK v1.9 with VCF and BCF files, and over 16 times faster than vcftools. When used in conjunction with R/Bioconductor packages, the SeqArray package provides users a flexible, feature-rich, high-performance programming environment for analysis of WGS variant data.

Availability And Implementation: http://www.bioconductor.org/packages/SeqArray.

Contact: [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btx145DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860110PMC
August 2017

Estimating relationships between phenotypes and subjects drawn from admixed families.

BMC Proc 2016 18;10(Suppl 7):357-362. Epub 2016 Oct 18.

Department of Biostatistics, University of Washington, Seattle, WA 98195 USA.

Background: Estimating relationships among subjects in a sample, within family structures or caused by population substructure, is complicated in admixed populations. Inaccurate allele frequencies can bias both kinship estimates and tests for association between subjects and a phenotype. We analyzed the simulated and real family data from Genetic Analysis Workshop 19, and were aware of the simulation model.

Results: We found that kinship estimation is more accurate when marker data include common variants whose frequencies are less variable across populations. Estimates of heritability and association vary with age for longitudinally measured traits. Accounting for local ancestry identified different true associations than those identified by a traditional approach. Principal components aid kinship estimation and tests for association, but their utility is influenced by the frequency of the markers used to generate them.

Conclusions: Admixed families can provide a powerful resource for detecting disease loci, as well as analytical challenges. Allele frequencies, although difficult to adequately estimate in admixed populations, have a strong impact on the estimation of kinship, ancestry, and association with phenotypes. Approaches that acknowledge population structure in admixed families outperform those which ignore it.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12919-016-0056-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133521PMC
October 2016

Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models.

Am J Hum Genet 2016 Apr 24;98(4):653-66. Epub 2016 Mar 24.

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA. Electronic address:

Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM's constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2016.02.012DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4833218PMC
April 2016

Genetic Associations with Obstructive Sleep Apnea Traits in Hispanic/Latino Americans.

Am J Respir Crit Care Med 2016 Oct;194(7):886-897

10 Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas.

Rationale: Obstructive sleep apnea is a common disorder associated with increased risk for cardiovascular disease, diabetes, and premature mortality. Although there is strong clinical and epidemiologic evidence supporting the importance of genetic factors in influencing obstructive sleep apnea, its genetic basis is still largely unknown. Prior genetic studies focused on traits defined using the apnea-hypopnea index, which contains limited information on potentially important genetically determined physiologic factors, such as propensity for hypoxemia and respiratory arousability.

Objectives: To define novel obstructive sleep apnea genetic risk loci for obstructive sleep apnea, we conducted genome-wide association studies of quantitative traits in Hispanic/Latino Americans from three cohorts.

Methods: Genome-wide data from as many as 12,558 participants in the Hispanic Community Health Study/Study of Latinos, Multi-Ethnic Study of Atherosclerosis, and Starr County Health Studies population-based cohorts were metaanalyzed for association with the apnea-hypopnea index, average oxygen saturation during sleep, and average respiratory event duration.

Measurements And Main Results: Two novel loci were identified at genome-level significance (rs11691765, GPR83, P = 1.90 × 10 for the apnea-hypopnea index, and rs35424364; C6ORF183/CCDC162P, P = 4.88 × 10 for respiratory event duration) and seven additional loci were identified with suggestive significance (P < 5 × 10). Secondary sex-stratified analyses also identified one significant and several suggestive associations. Multiple loci overlapped genes with biologic plausibility.

Conclusions: These are the first genome-level significant findings reported for obstructive sleep apnea-related physiologic traits in any population. These findings identify novel associations in inflammatory, hypoxia signaling, and sleep pathways.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1164/rccm.201512-2431OCDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5074655PMC
October 2016

Genome-wide Association Study of Platelet Count Identifies Ancestry-Specific Loci in Hispanic/Latino Americans.

Am J Hum Genet 2016 Feb 21;98(2):229-42. Epub 2016 Jan 21.

Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Genetics of Obesity and Related Metabolic Traits Program, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

Platelets play an essential role in hemostasis and thrombosis. We performed a genome-wide association study of platelet count in 12,491 participants of the Hispanic Community Health Study/Study of Latinos by using a mixed-model method that accounts for admixture and family relationships. We discovered and replicated associations with five genes (ACTN1, ETV7, GABBR1-MOG, MEF2C, and ZBTB9-BAK1). Our strongest association was with Amerindian-specific variant rs117672662 (p value = 1.16 × 10(-28)) in ACTN1, a gene implicated in congenital macrothrombocytopenia. rs117672662 exhibited allelic differences in transcriptional activity and protein binding in hematopoietic cells. Our results underscore the value of diverse populations to extend insights into the allelic architecture of complex traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2015.12.003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4746331PMC
February 2016

Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos.

Am J Hum Genet 2016 Jan;98(1):165-84

Division of Cardiovascular Sciences, NHLBI, NIH, Bethesda, MD 20892, USA.

US Hispanic/Latino individuals are diverse in genetic ancestry, culture, and environmental exposures. Here, we characterized and controlled for this diversity in genome-wide association studies (GWASs) for the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). We simultaneously estimated population-structure principal components (PCs) robust to familial relatedness and pairwise kinship coefficients (KCs) robust to population structure, admixture, and Hardy-Weinberg departures. The PCs revealed substantial genetic differentiation within and among six self-identified background groups (Cuban, Dominican, Puerto Rican, Mexican, and Central and South American). To control for variation among groups, we developed a multi-dimensional clustering method to define a "genetic-analysis group" variable that retains many properties of self-identified background while achieving substantially greater genetic homogeneity within groups and including participants with non-specific self-identification. In GWASs of 22 biomedical traits, we used a linear mixed model (LMM) including pairwise empirical KCs to account for familial relatedness, PCs for ancestry, and genetic-analysis groups for additional group-associated effects. Including the genetic-analysis group as a covariate accounted for significant trait variation in 8 of 22 traits, even after we fit 20 PCs. Additionally, genetic-analysis groups had significant heterogeneity of residual variance for 20 of 22 traits, and modeling this heteroscedasticity within the LMM reduced genomic inflation for 19 traits. Furthermore, fitting an LMM that utilized a genetic-analysis group rather than a self-identified background group achieved higher power to detect previously reported associations. We expect that the methods applied here will be useful in other studies with multiple ethnic groups, admixture, and relatedness.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2015.12.001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4716704PMC
January 2016

Model-free Estimation of Recent Genetic Relatedness.

Am J Hum Genet 2016 Jan;98(1):127-48

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA. Electronic address:

Genealogical inference from genetic data is essential for a variety of applications in human genetics. In genome-wide and sequencing association studies, for example, accurate inference on both recent genetic relatedness, such as family structure, and more distant genetic relatedness, such as population structure, is necessary for protection against spurious associations. Distinguishing familial relatedness from population structure with genotype data, however, is difficult because both manifest as genetic similarity through the sharing of alleles. Existing approaches for inference on recent genetic relatedness have limitations in the presence of population structure, where they either (1) make strong and simplifying assumptions about population structure, which are often untenable, or (2) require correct specification of and appropriate reference population panels for the ancestries in the sample, which might be unknown or not well defined. Here, we propose PC-Relate, a model-free approach for estimating commonly used measures of recent genetic relatedness, such as kinship coefficients and IBD sharing probabilities, in the presence of unspecified structure. PC-Relate uses principal components calculated from genome-screen data to partition genetic correlations among sampled individuals due to the sharing of recent ancestors and more distant common ancestry into two separate components, without requiring specification of the ancestral populations or reference population panels. In simulation studies with population structure, including admixture, we demonstrate that PC-Relate provides accurate estimates of genetic relatedness and improved relationship classification over widely used approaches. We further demonstrate the utility of PC-Relate in applications to three ancestrally diverse samples that vary in both size and genealogical complexity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2015.11.022DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4716688PMC
January 2016

Genome-wide association study of dental caries in the Hispanic Communities Health Study/Study of Latinos (HCHS/SOL).

Hum Mol Genet 2016 Feb 11;25(4):807-16. Epub 2015 Dec 11.

Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA,

Dental caries is the most common chronic disease worldwide, and exhibits profound disparities in the USA with racial and ethnic minorities experiencing disproportionate disease burden. Though heritable, the specific genes influencing risk of dental caries remain largely unknown. Therefore, we performed genome-wide association scans (GWASs) for dental caries in a population-based cohort of 12 000 Hispanic/Latino participants aged 18-74 years from the HCHS/SOL. Intra-oral examinations were used to generate two common indices of dental caries experience which were tested for association with 27.7 M genotyped or imputed single-nucleotide polymorphisms separately in the six ancestry groups. A mixed-models approach was used, which adjusted for age, sex, recruitment site, five principal components of ancestry and additional features of the sampling design. Meta-analyses were used to combine GWAS results across ancestry groups. Heritability estimates ranged from 20-53% in the six ancestry groups. The most significant association observed via meta-analysis for both phenotypes was in the region of the NAMPT gene (rs190395159; P-value = 6 × 10(-10)), which is involved in many biological processes including periodontal healing. Another significant association was observed for rs72626594 (P-value = 3 × 10(-8)) downstream of BMP7, a tooth development gene. Other associations were observed in genes lacking known or plausible roles in dental caries. In conclusion, this was the largest GWAS of dental caries, to date and was the first to target Hispanic/Latino populations. Understanding the factors influencing dental caries susceptibility may lead to improvements in prediction, prevention and disease management, which may ultimately reduce the disparities in oral health across racial, ethnic and socioeconomic strata.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddv506DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4743689PMC
February 2016
-->