Publications by authors named "Matthew Zawistowski"

32 Publications

A survey of functional dyspepsia in 361,360 individuals: Phenotypic and genetic cross-disease analyses.

Neurogastroenterol Motil 2021 Aug 11:e14236. Epub 2021 Aug 11.

Department of Gastrointestinal and Liver Diseases, Biodonostia Health Research Institute, San Sebastian, Spain.

Background: Functional dyspepsia (FD) is a common gastrointestinal condition of poorly understood pathophysiology. While symptoms' overlap with other conditions may indicate common pathogenetic mechanisms, genetic predisposition is suspected but has not been adequately investigated.

Methods: Using healthcare, questionnaire, and genetic data from three large population-based biobanks (UK Biobank, EGCUT, and MGI), we surveyed FD comorbidities, heritability, and genetic correlations across a wide spectrum of conditions and traits in 10,078 cases and 351,282 non-FD controls of European ancestry.

Key Results: In UK Biobank, 281 diagnoses were detected at increased prevalence in FD, based on healthcare records. Among these, gastrointestinal conditions (OR = 4.0, p < 1.0 × 10 ), anxiety disorders (OR = 2.3, p < 1.4 × 10 ), ischemic heart disease (OR = 2.2, p < 2.3 × 10 ), and infectious and parasitic diseases (OR = 2.1, p = 1.5 × 10 ) showed strongest association with FD. Similar results were obtained in an analysis of self-reported conditions and use of medications from questionnaire data. Based on a genome-wide association meta-analysis of genotypes across all cohorts, FD heritability was estimated close to 5% (  = 0.047, p = 0.014). Genetic correlations indicate FD predisposition is shared with several other diseases and traits (r  > 0.344), mostly overlapping with those also enriched in FD patients. Suggestive (p < 5.0 × 10 ) association with FD risk was detected for 13 loci, with 2 showing nominal replication (p < 0.05) in an independent cohort of 192 FD patients.

Conclusions & Inferences: FD has a weak heritable component that shows commonalities with multiple conditions across a wide spectrum of pathophysiological domains. This new knowledge contributes to a better understanding of FD etiology and may have implications for improving its treatment.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/nmo.14236DOI Listing
August 2021

Genome-Wide Association Study of Pelvic Organ Prolapse Using the Michigan Genomics Initiative.

Female Pelvic Med Reconstr Surg 2021 08;27(8):502-506

Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI.

Objectives: The aim of this study was to (1) replicate previously identified genetic variants significantly associated with pelvic organ prolapse and (2) identify new genetic variants associated with pelvic organ prolapse using a genome-wide association study.

Methods: Using our institution's database linking genetic and clinical data, we identified 1,329 women of European ancestry with an International Classification of Diseases, Ninth Revision (ICD-9)/ICD-10 code for prolapse, 767 of whom also had Current Procedural Terminology (CPT)/ICD-9/ICD-10 procedure codes for prolapse surgery, and 16,383 women of European ancestry older than 40 years without a prolapse diagnosis code as controls. Patients were genotyped using the Illumina HumanCoreExome chip and imputed to the Haplotype Reference Consortium. We tested 20 million single nucleotide polymorphisms (SNPs) for association with pelvic organ prolapse adjusting for relatedness, age, chip version, and 4 principal components. We compared our results with 18 previously identified genome-wide significant SNPs from the UK Biobank, Commun Biol (2020;3:129), and Obstet Gynecol (2011;118:1345-1353).

Results: No variants achieved genome-wide significance (P = 5 × 10-8). However, we replicated 4 SNPs with biologic plausibility at nominal significance (P ≤ 0.05): rs12325192 (P = 0.002), rs9306894 (P = 0.05), rs1920568 (P = 0.034), and rs1247943 (P = 0.041), which were all intergenic and nearest the genes SALL1, GDF7, TBX5, and TBX5, respectively.

Conclusions: Our replication of 4 biologically plausible previously reported SNPs provides further evidence for a genetic contribution to prolapse, specifically that rs12325192, rs9306894, rs1920568, and rs1247943 may contribute to susceptibility for prolapse. These and previously reported associations that have not yet been replicated should be further explored in larger, more diverse cohorts, perhaps through meta-analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/SPV.0000000000001075DOI Listing
August 2021

Genome-wide analysis of 944 133 individuals provides insights into the etiology of haemorrhoidal disease.

Gut 2021 Apr 22. Epub 2021 Apr 22.

Department of Medicine I, Institute of Cancer Research, Medical University Vienna, Vienna, Austria.

Objective: Haemorrhoidal disease (HEM) affects a large and silently suffering fraction of the population but its aetiology, including suspected genetic predisposition, is poorly understood. We report the first genome-wide association study (GWAS) meta-analysis to identify genetic risk factors for HEM to date.

Design: We conducted a GWAS meta-analysis of 218 920 patients with HEM and 725 213 controls of European ancestry. Using GWAS summary statistics, we performed multiple genetic correlation analyses between HEM and other traits as well as calculated HEM polygenic risk scores (PRS) and evaluated their translational potential in independent datasets. Using functional annotation of GWAS results, we identified HEM candidate genes, which differential expression and coexpression in HEM tissues were evaluated employing RNA-seq analyses. The localisation of expressed proteins at selected loci was investigated by immunohistochemistry.

Results: We demonstrate modest heritability and genetic correlation of HEM with several other diseases from the GI, neuroaffective and cardiovascular domains. HEM PRS validated in 180 435 individuals from independent datasets allowed the identification of those at risk and correlated with younger age of onset and recurrent surgery. We identified 102 independent HEM risk loci harbouring genes whose expression is enriched in blood vessels and GI tissues, and in pathways associated with smooth muscles, epithelial and endothelial development and morphogenesis. Network transcriptomic analyses highlighted HEM gene coexpression modules that are relevant to the development and integrity of the musculoskeletal and epidermal systems, and the organisation of the extracellular matrix.

Conclusion: HEM has a genetic component that predisposes to smooth muscle, epithelial and connective tissue dysfunction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1136/gutjnl-2020-323868DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8292596PMC
April 2021

Investigating rare pathogenic/likely pathogenic exonic variation in bipolar disorder.

Mol Psychiatry 2021 Jan 22. Epub 2021 Jan 22.

HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA.

Bipolar disorder (BD) is a serious mental illness with substantial common variant heritability. However, the role of rare coding variation in BD is not well established. We examined the protein-coding (exonic) sequences of 3,987 unrelated individuals with BD and 5,322 controls of predominantly European ancestry across four cohorts from the Bipolar Sequencing Consortium (BSC). We assessed the burden of rare, protein-altering, single nucleotide variants classified as pathogenic or likely pathogenic (P-LP) both exome-wide and within several groups of genes with phenotypic or biologic plausibility in BD. While we observed an increased burden of rare coding P-LP variants within 165 genes identified as BD GWAS regions in 3,987 BD cases (meta-analysis OR = 1.9, 95% CI = 1.3-2.8, one-sided p = 6.0 × 10), this enrichment did not replicate in an additional 9,929 BD cases and 14,018 controls (OR = 0.9, one-side p = 0.70). Although BD shares common variant heritability with schizophrenia, in the BSC sample we did not observe a significant enrichment of P-LP variants in SCZ GWAS genes, in two classes of neuronal synaptic genes (RBFOX2 and FMRP) associated with SCZ or in loss-of-function intolerant genes. In this study, the largest analysis of exonic variation in BD, individuals with BD do not carry a replicable enrichment of rare P-LP variants across the exome or in any of several groups of genes with biologic plausibility. Moreover, despite a strong shared susceptibility between BD and SCZ through common genetic variation, we do not observe an association between BD risk and rare P-LP coding variants in genes known to modulate risk for SCZ.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41380-020-01006-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8295400PMC
January 2021

Loss-of-function genomic variants highlight potential therapeutic targets for cardiovascular disease.

Nat Commun 2020 12 18;11(1):6417. Epub 2020 Dec 18.

The Institute for Translational Genomics and Population Sciences, Department of Pediatrics and Los Angeles Biomedical Research Institute, Harbor-UCLA, Torrance, CA, USA.

Pharmaceutical drugs targeting dyslipidemia and cardiovascular disease (CVD) may increase the risk of fatty liver disease and other metabolic disorders. To identify potential novel CVD drug targets without these adverse effects, we perform genome-wide analyses of participants in the HUNT Study in Norway (n = 69,479) to search for protein-altering variants with beneficial impact on quantitative blood traits related to cardiovascular disease, but without detrimental impact on liver function. We identify 76 (11 previously unreported) presumed causal protein-altering variants associated with one or more CVD- or liver-related blood traits. Nine of the variants are predicted to result in loss-of-function of the protein. This includes ZNF529:p.K405X, which is associated with decreased low-density-lipoprotein (LDL) cholesterol (P = 1.3 × 10) without being associated with liver enzymes or non-fasting blood glucose. Silencing of ZNF529 in human hepatoma cells results in upregulation of LDL receptor and increased LDL uptake in the cells. This suggests that inhibition of ZNF529 or its gene product should be prioritized as a novel candidate drug target for treating dyslipidemia and associated CVD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-20086-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7749177PMC
December 2020

LabWAS: Novel findings and study design recommendations from a meta-analysis of clinical labs in two independent biobanks.

PLoS Genet 2020 11 11;16(11):e1009077. Epub 2020 Nov 11.

Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America.

Phenotypes extracted from Electronic Health Records (EHRs) are increasingly prevalent in genetic studies. EHRs contain hundreds of distinct clinical laboratory test results, providing a trove of health data beyond diagnoses. Such lab data is complex and lacks a ubiquitous coding scheme, making it more challenging than diagnosis data. Here we describe the first large-scale cross-health system genome-wide association study (GWAS) of EHR-based quantitative laboratory-derived phenotypes. We meta-analyzed 70 lab traits matched between the BioVU cohort from the Vanderbilt University Health System and the Michigan Genomics Initiative (MGI) cohort from Michigan Medicine. We show high replication of known association for these traits, validating EHR-based measurements as high-quality phenotypes for genetic analysis. Notably, our analysis provides the first replication for 699 previous GWAS associations across 46 different traits. We discovered 31 novel associations at genome-wide significance for 22 distinct traits, including the first reported associations for two lab-based traits. We replicated 22 of these novel associations in an independent tranche of BioVU samples. The summary statistics for all association tests are freely available to benefit other researchers. Finally, we performed mirrored analyses in BioVU and MGI to assess competing analytic practices for EHR lab traits. We find that using the mean of all available lab measurements provides a robust summary value, but alternate summarizations can improve power in certain circumstances. This study provides a proof-of-principle for cross health system GWAS and is a framework for future studies of quantitative EHR lab traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1009077DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7682892PMC
November 2020

A Novel Recurrent Genetic Variant Is Associated With a Dysplasia-Associated Arterial Disease Exhibiting Dissections and Fibromuscular Dysplasia.

Arterioscler Thromb Vasc Biol 2020 11 17;40(11):2686-2699. Epub 2020 Sep 17.

Division of Cardiovascular Medicine, Department of Internal Medicine (H.L.H., Y.W., M.-L.Y., K.L.H., J.L., A.E.K., S.K.G.), University of Michigan Medical School, Ann Arbor.

Objective: While rare variants in the gene have been associated with classical Ehlers-Danlos syndrome and rarely with arterial dissections, recurrent variants in underlying a systemic arteriopathy have not been described. Monogenic forms of multifocal fibromuscular dysplasia (mFMD) have not been previously defined. Approach and Results: We studied 4 independent probands with the pathogenic variant c.1540G>A, p.(Gly514Ser) who presented with arterial aneurysms, dissections, tortuosity, and mFMD affecting multiple arteries. Arterial medial fibroplasia and smooth muscle cell disorganization were confirmed histologically. The c.1540G>A variant is predicted to be pathogenic in silico and absent in gnomAD. The c.1540G>A variant is on a shared 160.1 kb haplotype with 0.4% frequency in Europeans. Furthermore, exome sequencing data from a cohort of 264 individuals with mFMD were examined for variants. In this mFMD cohort, c.1540G>A and 6 additional relatively rare variants predicted to be deleterious in silico were identified and were associated with arterial dissections (=0.005).

Conclusions: c.1540G>A is the first recurring variant recognized to be associated with arterial dissections and mFMD. This variant presents with a phenotype reminiscent of vascular Ehlers-Danlos syndrome. A shared haplotype among probands supports the existence of a common founder. Relatively rare genetic variants predicted to be deleterious by in silico analysis were identified in ≈2.7% of mFMD cases, and as they were enriched in patients with arterial dissections, may act as disease modifiers. Molecular testing for should be considered in patients with a phenotype overlapping with vascular Ehlers-Danlos syndrome and mFMD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1161/ATVBAHA.119.313885DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7953329PMC
November 2020

Chromosome 1q21.2 and additional loci influence risk of spontaneous coronary artery dissection and myocardial infarction.

Nat Commun 2020 09 4;11(1):4432. Epub 2020 Sep 4.

Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, USA.

Spontaneous coronary artery dissection (SCAD) is a non-atherosclerotic cause of myocardial infarction (MI), typically in young women. We undertook a genome-wide association study of SCAD (N = 270/N = 5,263) and identified and replicated an association of rs12740679 at chromosome 1q21.2 (P = 2.19 × 10, OR = 1.8) influencing ADAMTSL4 expression. Meta-analysis of discovery and replication samples identified associations with P < 5 × 10 at chromosome 6p24.1 in PHACTR1, chromosome 12q13.3 in LRP1, and in females-only, at chromosome 21q22.11 near LINC00310. A polygenic risk score for SCAD was associated with (1) higher risk of SCAD in individuals with fibromuscular dysplasia (P = 0.021, OR = 1.82 [95% CI: 1.09-3.02]) and (2) lower risk of atherosclerotic coronary artery disease and MI in the UK Biobank (P = 1.28 × 10, HR = 0.91 [95% CI :0.89-0.93], for MI) and Million Veteran Program (P = 9.33 × 10, OR = 0.95 [95% CI: 0.94-0.96], for CAD; P = 3.35 × 10, OR = 0.96 [95% CI: 0.95-0.98] for MI). Here we report that SCAD-related MI and atherosclerotic MI exist at opposite ends of a genetic risk spectrum, inciting MI with disparate underlying vascular biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-17558-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7474092PMC
September 2020

GWAS of thyroid stimulating hormone highlights pleiotropic effects and inverse association with thyroid cancer.

Nat Commun 2020 08 7;11(1):3981. Epub 2020 Aug 7.

Center for Statistical Genetics and Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, USA.

Thyroid stimulating hormone (TSH) is critical for normal development and metabolism. To better understand the genetic contribution to TSH levels, we conduct a GWAS meta-analysis at 22.4 million genetic markers in up to 119,715 individuals and identify 74 genome-wide significant loci for TSH, of which 28 are previously unreported. Functional experiments show that the thyroglobulin protein-altering variants P118L and G67S impact thyroglobulin secretion. Phenome-wide association analysis in the UK Biobank demonstrates the pleiotropic effects of TSH-associated variants and a polygenic score for higher TSH levels is associated with a reduced risk of thyroid cancer in the UK Biobank and three other independent studies. Two-sample Mendelian randomization using TSH index variants as instrumental variables suggests a protective effect of higher TSH levels (indicating lower thyroid function) on risk of thyroid cancer and goiter. Our findings highlight the pleiotropic effects of TSH-associated variants on thyroid function and growth of malignant and benign thyroid tumors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-17718-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7414135PMC
August 2020

Genome-wide association meta-analyses combining multiple risk phenotypes provide insights into the genetic architecture of cutaneous melanoma susceptibility.

Nat Genet 2020 05 27;52(5):494-504. Epub 2020 Apr 27.

Department of Dermatology, Instituto Valenciano de Oncología, Valencia, Spain.

Most genetic susceptibility to cutaneous melanoma remains to be discovered. Meta-analysis genome-wide association study (GWAS) of 36,760 cases of melanoma (67% newly genotyped) and 375,188 controls identified 54 significant (P < 5 × 10) loci with 68 independent single nucleotide polymorphisms. Analysis of risk estimates across geographical regions and host factors suggests the acral melanoma subtype is uniquely unrelated to pigmentation. Combining this meta-analysis with GWAS of nevus count and hair color, and transcriptome association approaches, uncovered 31 potential secondary loci for a total of 85 cutaneous melanoma susceptibility loci. These findings provide insights into cutaneous melanoma genetic architecture, reinforcing the importance of nevogenesis, pigmentation and telomere maintenance, together with identifying potential new pathways for cutaneous melanoma pathogenesis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-020-0611-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7255059PMC
May 2020

Meta-MultiSKAT: Multiple phenotype meta-analysis for region-based association test.

Genet Epidemiol 2019 10 21;43(7):800-814. Epub 2019 Aug 21.

Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.

The power of genetic association analyses can be increased by jointly meta-analyzing multiple correlated phenotypes. Here, we develop a meta-analysis framework, Meta-MultiSKAT, that uses summary statistics to test for association between multiple continuous phenotypes and variants in a region of interest. Our approach models the heterogeneity of effects between studies through a kernel matrix and performs a variance component test for association. Using a genotype kernel, our approach can test for rare-variants and the combined effects of both common and rare-variants. To achieve robust power, within Meta-MultiSKAT, we developed fast and accurate omnibus tests combining different models of genetic effects, functional genomic annotations, multiple correlated phenotypes, and heterogeneity across studies. In addition, Meta-MultiSKAT accommodates situations where studies do not share exactly the same set of phenotypes or have differing correlation patterns among the phenotypes. Simulation studies confirm that Meta-MultiSKAT can maintain the type-I error rate at the exome-wide level of 2.5 × 10 . Further simulations under different models of association show that Meta-MultiSKAT can improve the power of detection from 23% to 38% on average over single phenotype-based meta-analysis approaches. We demonstrate the utility and improved power of Meta-MultiSKAT in the meta-analyses of four white blood cell subtype traits from the Michigan Genomics Initiative (MGI) and SardiNIA studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.22248DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7006736PMC
October 2019

Exploring various polygenic risk scores for skin cancer in the phenomes of the Michigan genomics initiative and the UK Biobank with a visual catalog: PRSWeb.

PLoS Genet 2019 06 13;15(6):e1008202. Epub 2019 Jun 13.

Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America.

Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008202DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6592565PMC
June 2019

Sex-specific and pleiotropic effects underlying kidney function identified from GWAS meta-analysis.

Nat Commun 2019 04 23;10(1):1847. Epub 2019 Apr 23.

Department of Internal Medicine: Cardiology, University of Michigan, Ann Arbor, 48109, MI, USA.

Chronic kidney disease (CKD) is a growing health burden currently affecting 10-15% of adults worldwide. Estimated glomerular filtration rate (eGFR) as a marker of kidney function is commonly used to diagnose CKD. We analyze eGFR data from the Nord-Trøndelag Health Study and Michigan Genomics Initiative and perform a GWAS meta-analysis with public summary statistics, more than doubling the sample size of previous meta-analyses. We identify 147 loci (53 novel) associated with eGFR, including genes involved in transcriptional regulation, kidney development, cellular signaling, metabolism, and solute transport. Additionally, sex-stratified analysis identifies one locus with more significant effects in women than men. Using genetic risk scores constructed from these eGFR meta-analysis results, we show that associated variants are generally predictive of CKD with only modest improvements in detection compared with other known clinical risk factors. Collectively, these results yield additional insight into the genetic factors underlying kidney function and progression to CKD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-019-09861-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6478837PMC
April 2019

Leveraging summary statistics to make inferences about complex phenotypes in large biobanks.

Pac Symp Biocomput 2019 ;24:391-402

Department of Math, Computer Science, and Statistics, Dordt College, Sioux Center, IA 51250, USA

As genetic sequencing becomes less expensive and data sets linking genetic data and medical records (e.g., Biobanks) become larger and more common, issues of data privacy and computational challenges become more necessary to address in order to realize the benefits of these datasets. One possibility for alleviating these issues is through the use of already-computed summary statistics (e.g., slopes and standard errors from a regression model of a phenotype on a genotype). If groups share summary statistics from their analyses of biobanks, many of the privacy issues and computational challenges concerning the access of these data could be bypassed. In this paper we explore the possibility of using summary statistics from simple linear models of phenotype on genotype in order to make inferences about more complex phenotypes (those that are derived from two or more simple phenotypes). We provide exact formulas for the slope, intercept, and standard error of the slope for linear regressions when combining phenotypes. Derived equations are validated via simulation and tested on a real data set exploring the genetics of fatty acids.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417828PMC
January 2020

Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans.

Nat Commun 2018 09 14;9(1):3753. Epub 2018 Sep 14.

Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA.

A detailed understanding of the genome-wide variability of single-nucleotide germline mutation rates is essential to studying human genome evolution. Here, we use ~36 million singleton variants from 3560 whole-genome sequences to infer fine-scale patterns of mutation rate heterogeneity. Mutability is jointly affected by adjacent nucleotide context and diverse genomic features of the surrounding region, including histone modifications, replication timing, and recombination rate, sometimes suggesting specific mutagenic mechanisms. Remarkably, GC content, DNase hypersensitivity, CpG islands, and H3K36 trimethylation are associated with both increased and decreased mutation rates depending on nucleotide context. We validate these estimated effects in an independent dataset of ~46,000 de novo mutations, and confirm our estimates are more accurate than previously published results based on ancestrally older variants without considering genomic features. Our results thus provide the most refined portrait to date of the factors contributing to genome-wide variability of the human germline mutation rate.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-05936-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6138700PMC
September 2018

Association of Polygenic Risk Scores for Multiple Cancers in a Phenome-wide Study: Results from The Michigan Genomics Initiative.

Am J Hum Genet 2018 06 17;102(6):1048-1061. Epub 2018 May 17.

Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI 48109, USA; Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; University of Michigan Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA. Electronic address:

Health systems are stewards of patient electronic health record (EHR) data with extraordinarily rich depth and breadth, reflecting thousands of diagnoses and exposures. Measures of genomic variation integrated with EHRs offer a potential strategy to accurately stratify patients for risk profiling and discover new relationships between diagnoses and genomes. The objective of this study was to evaluate whether polygenic risk scores (PRS) for common cancers are associated with multiple phenotypes in a phenome-wide association study (PheWAS) conducted in 28,260 unrelated, genotyped patients of recent European ancestry who consented to participate in the Michigan Genomics Initiative, a longitudinal biorepository effort within Michigan Medicine. PRS for 12 cancer traits were calculated using summary statistics from the NHGRI-EBI catalog. A total of 1,711 synthetic case-control studies was used for PheWAS analyses. There were 13,490 (47.7%) patients with at least one cancer diagnosis in this study sample. PRS exhibited strong association for several cancer traits they were designed for, including female breast cancer, prostate cancer, melanoma, basal cell carcinoma, squamous cell carcinoma, and thyroid cancer. Phenome-wide significant associations were observed between PRS and many non-cancer diagnoses. To differentiate PRS associations driven by the primary trait from associations arising through shared genetic risk profiles, the idea of "exclusion PRS PheWAS" was introduced. Further analysis of temporal order of the diagnoses improved our understanding of these secondary associations. This comprehensive PheWAS used PRS instead of a single variant.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2018.04.001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5992124PMC
June 2018

Evaluating the contribution of rare variants to type 2 diabetes and related traits using pedigrees.

Proc Natl Acad Sci U S A 2018 01 26;115(2):379-384. Epub 2017 Dec 26.

Department of Statistics, Seoul National University, Seoul 08826, Republic of Korea.

A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association. Using gene expression data on 21,677 transcripts for 643 pedigree members, we identified evidence for large-effect rare-variant -expression quantitative trait loci that could not be detected in population studies, validating our approach. However, we did not identify any rare variants of large effect associated with T2D, or the related traits of fasting glucose and insulin, suggesting that large-effect rare variants account for only a modest fraction of the genetic risk of these traits in this sample of families. Reliable identification of large-effect rare variants will require larger samples of extended pedigrees or different study designs that further enrich for such variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1705859115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5777025PMC
January 2018

Exome-wide association study reveals novel psoriasis susceptibility locus at TNFSF15 and rare protective alleles in genes contributing to type I IFN signalling.

Hum Mol Genet 2017 11;26(21):4301-4313

Institute of Genetic Epidemiology, Helmholtz Zentrum Munich, Neuherberg, Germany.

Psoriasis is a common inflammatory skin disorder for which multiple genetic susceptibility loci have been identified, but few resolved to specific functional variants. In this study, we sought to identify common and rare psoriasis-associated gene-centric variation. Using exome arrays we genotyped four independent cohorts, totalling 11 861 psoriasis cases and 28 610 controls, aggregating the dataset through statistical meta-analysis. Single variant analysis detected a previously unreported risk locus at TNFSF15 (rs6478108; P = 1.50 × 10-8, OR = 1.10), and association of common protein-altering variants at 11 loci previously implicated in psoriasis susceptibility. We validate previous reports of protective low-frequency protein-altering variants within IFIH1 (encoding an innate antiviral receptor) and TYK2 (encoding a Janus kinase), in each case establishing a further series of protective rare variants (minor allele frequency < 0.01) via gene-wide aggregation testing (IFIH1: pburden = 2.53 × 10-7, OR = 0.707; TYK2: pburden = 6.17 × 10-4, OR = 0.744). Both genes play significant roles in type I interferon (IFN) production and signalling. Several of the protective rare and low-frequency variants in IFIH1 and TYK2 disrupt conserved protein domains, highlighting potential mechanisms through which their effect may be exerted.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddx328DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5886170PMC
November 2017

The Veterans Affairs Cardiac Risk Score: Recalibrating the Atherosclerotic Cardiovascular Disease Score for Applied Use.

Med Care 2017 09;55(9):864-870

*Veterans Affairs Center for Clinical Management Research †Department of Internal Medicine and Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI.

Background: Accurately estimating cardiovascular risk is fundamental to good decision-making in cardiovascular disease (CVD) prevention, but risk scores developed in one population often perform poorly in dissimilar populations. We sought to examine whether a large integrated health system can use their electronic health data to better predict individual patients' risk of developing CVD.

Methods: We created a cohort using all patients ages 45-80 who used Department of Veterans Affairs (VA) ambulatory care services in 2006 with no history of CVD, heart failure, or loop diuretics. Our outcome variable was new-onset CVD in 2007-2011. We then developed a series of recalibrated scores, including a fully refit "VA Risk Score-CVD (VARS-CVD)." We tested the different scores using standard measures of prediction quality.

Results: For the 1,512,092 patients in the study, the Atherosclerotic cardiovascular disease risk score had similar discrimination as the VARS-CVD (c-statistic of 0.66 in men and 0.73 in women), but the Atherosclerotic cardiovascular disease model had poor calibration, predicting 63% more events than observed. Calibration was excellent in the fully recalibrated VARS-CVD tool, but simpler techniques tested proved less reliable.

Conclusions: We found that local electronic health record data can be used to estimate CVD better than an established risk score based on research populations. Recalibration improved estimates dramatically, and the type of recalibration was important. Such tools can also easily be integrated into health system's electronic health record and can be more readily updated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/MLR.0000000000000781DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5561663PMC
September 2017

Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants.

Nat Commun 2017 05 24;8:15382. Epub 2017 May 24.

Department of Dermatology, University Medical Center Schleswig-Holstein, Campus Kiel, Kiel 24105, Germany.

Psoriasis is a complex disease of skin with a prevalence of about 2%. We conducted the largest meta-analysis of genome-wide association studies (GWAS) for psoriasis to date, including data from eight different Caucasian cohorts, with a combined effective sample size >39,000 individuals. We identified 16 additional psoriasis susceptibility loci achieving genome-wide significance, increasing the number of identified loci to 63 for European-origin individuals. Functional analysis highlighted the roles of interferon signalling and the NFκB cascade, and we showed that the psoriasis signals are enriched in regulatory elements from different T cells (CD8 T-cells and CD4 T-cells including T0, T1 and T17). The identified loci explain ∼28% of the genetic heritability and generate a discriminatory genetic risk score (AUC=0.76 in our sample) that is significantly correlated with age at onset (p=2 × 10). This study provides a comprehensive layout for the genetic architecture of common variants for psoriasis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms15382DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5458077PMC
May 2017

A rare coding allele in is protective for psoriatic arthritis.

Ann Rheum Dis 2017 Jul 13;76(7):1321-1324. Epub 2017 May 13.

Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal Research, University of Manchester, Manchester, UK.

Objectives: Psoriatic arthritis (PsA) is an inflammatory arthritis associated with psoriasis. While many common risk alleles have been reported for association with PsA as well as psoriasis, few rare coding alleles have yet been identified.

Methods: To identify rare coding variation associated with PsA risk or protection, we genotyped 41 267 variants with the exome chip and investigated association within an initial cohort of 1980 PsA cases and 5913 controls. Genotype data for an independent cohort of 2234 PsA cases and 5708 controls was also made available, allowing for a meta-analysis to be performed with the discovery dataset.

Results: We identified an association with the rare variant rs35667974 (p=2.39x10, OR=0.47), encoding an Ile923Val amino acid change in the gene protein product. The association was reproduced in our independent cohort, which reached a high level of significance on meta-analysis with the discovery and replication datasets (p=4.67x10). We identified a strong association with when performing multiple-variant analysis (p=6.77x10), and found evidence of independent effects between the rare allele and the common PsA variant at the same locus.

Conclusion: For the first time, we report a rare coding allele in to be protective for PsA. This rare allele has also been identified to have the same direction of effect on type I diabetes and psoriasis. While this association further supports existing evidence for as a causal gene for PsA, mechanistic studies will need to be pursued to confirm that is indeed causal.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1136/annrheumdis-2016-210592DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530346PMC
July 2017

Corrected ROC analysis for misclassified binary outcomes.

Stat Med 2017 06 28;36(13):2148-2160. Epub 2017 Feb 28.

Veterans Affairs Center for Clinical Management Research, Ann Arbor, 48105, MI, U.S.A.

Creating accurate risk prediction models from Big Data resources such as Electronic Health Records (EHRs) is a critical step toward achieving precision medicine. A major challenge in developing these tools is accounting for imperfect aspects of EHR data, particularly the potential for misclassified outcomes. Misclassification, the swapping of case and control outcome labels, is well known to bias effect size estimates for regression prediction models. In this paper, we study the effect of misclassification on accuracy assessment for risk prediction models and find that it leads to bias in the area under the curve (AUC) metric from standard ROC analysis. The extent of the bias is determined by the false positive and false negative misclassification rates as well as disease prevalence. Notably, we show that simply correcting for misclassification while building the prediction model is not sufficient to remove the bias in AUC. We therefore introduce an intuitive misclassification-adjusted ROC procedure that accounts for uncertainty in observed outcomes and produces bias-corrected estimates of the true AUC. The method requires that misclassification rates are either known or can be estimated, quantities typically required for the modeling step. The computational simplicity of our method is a key advantage, making it ideal for efficiently comparing multiple prediction models on very large datasets. Finally, we apply the correction method to a hospitalization prediction model from a cohort of over 1 million patients from the Veterans Health Administrations EHR. Implementations of the ROC correction are provided for Stata and R. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.7260DOI Listing
June 2017

Characterization of ADME gene variation in 21 populations by exome sequencing.

Pharmacogenet Genomics 2017 03;27(3):89-100

aGlaxoSmithKline Research and Development, Durham, North Carolina bGlaxoSmithKline Research and Development, King of Prussia, Pennsylvania cDepartment of Biostatistics and/or Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA dGlaxoSmithKline Research and Development, Sydney, New South Wales, Australia eSeoul National University fDNA Link Inc., Seoul, Korea gKyushu University, Fukuoka, Japan.

Objective: Proteins involving absorption, distribution, metabolism, and excretion (ADME) play a critical role in drug pharmacokinetics. The type and frequency of genetic variation in the ADME genes differ among populations. The aim of this study was to systematically investigate common and rare ADME coding variation in diverse ethnic populations by exome sequencing.

Materials And Methods: Data derived from commercial exome capture arrays and next-generation sequencing were used to characterize coding variation in 298 ADME genes in 251 Northeast Asians and 1181 individuals from the 1000 Genomes Project.

Results: Approximately 75% of the ADME coding sequence was captured at high quality across the joint samples harboring more than 8000 variants, with 49% of individuals carrying at least one 'knockout' allele. ADME genes carried 50% more nonsynonymous variation than non-ADME genes (P=8.2×10) and showed significantly greater levels of population differentiation (P=7.6×10). Out of the 2135 variants identified that were predicted to be deleterious, 633 were not on commercially available ADME or general-purpose genotyping arrays. Forty deleterious variants within important ADME genes, with frequencies of at least 2% in at least one population, were identified as candidates for future pharmacogenetic studies.

Conclusion: Exome sequencing was effective in accurately genotyping most ADME variants important for pharmacogenetic research, in addition to identifying rare or potentially de novo coding variants that may be clinically meaningful. Furthermore, as a class, ADME genes are more variable and less sensitive to purifying selection than non-ADME genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/FPC.0000000000000260DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5287433PMC
March 2017

Methods for association analysis and meta-analysis of rare variants in families.

Genet Epidemiol 2015 May 4;39(4):227-38. Epub 2015 Mar 4.

Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America.

Advances in exome sequencing and the development of exome genotyping arrays are enabling explorations of association between rare coding variants and complex traits. To ensure power for these rare variant analyses, a variety of association tests that group variants by gene or functional unit have been proposed. Here, we extend these tests to family-based studies. We develop family-based burden tests, variable frequency threshold tests and sequence kernel association tests. Through simulations, we compare the performance of different tests. We describe situations where family-based studies provide greater power than studies of unrelated individuals to detect rare variants associated with moderate to large changes in trait values. Broadly speaking, we find that when sample sizes are limited and only a modest fraction of all trait-associated variants can be identified, family samples are more powerful. Finally, we illustrate our approach by analyzing the relationship between coding variants and levels of high-density lipoprotein (HDL) cholesterol in 11,556 individuals from the HUNT and SardiNIA studies, demonstrating association for coding variants in the APOC3, CETP, LIPC, LIPG, and LPL genes and illustrating the value of family samples, meta-analysis, and gene-level tests. Our methods are implemented in freely available C++ code.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.21892DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4459524PMC
May 2015

Analysis of rare variant population structure in Europeans explains differential stratification of gene-based tests.

Eur J Hum Genet 2014 Sep 8;22(9):1137-44. Epub 2014 Jan 8.

1] Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA [2] Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA.

There is substantial interest in the role of rare genetic variants in the etiology of complex human diseases. Several gene-based tests have been developed to simultaneously analyze multiple rare variants for association with phenotypic traits. The tests can largely be partitioned into two classes - 'burden' tests and 'joint' tests - based on how they accumulate evidence of association across sites. We used the empirical joint site frequency spectra of rare, nonsynonymous variation from a large multi-population sequencing study to explore the effect of realistic rare variant population structure on gene-based tests. We observed an important difference between the two test classes: their susceptibility to population stratification. Focusing on European samples, we found that joint tests, which allow variants to have opposite directions of effect, consistently showed higher levels of P-value inflation than burden tests. We determined that the differential stratification was caused by two specific patterns in the interpopulation distribution of rare variants, each correlating with inflation in one of the test classes. The pattern that inflates joint tests is more prevalent in real data, explaining the higher levels of inflation in these tests. Furthermore, we show that the different sources of inflation between tests lead to heterogeneous responses to genomic control correction and the number of variants analyzed. Our results indicate that care must be taken when interpreting joint and burden analyses of the same set of rare variants, in particular, to avoid mistaking inflated P-values in joint tests for stronger signals of true associations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ejhg.2013.297DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4135410PMC
September 2014

Meta-analysis of gene-level tests for rare variant association.

Nat Genet 2014 Feb 15;46(2):200-4. Epub 2013 Dec 15.

1] Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, USA. [2].

The majority of reported complex disease associations for common genetic variants have been identified through meta-analysis, a powerful approach that enables the use of large sample sizes while protecting against common artifacts due to population structure and repeated small-sample analyses sharing individual-level data. As the focus of genetic association studies shifts to rare variants, genes and other functional units are becoming the focus of analysis. Here we propose and evaluate new approaches for performing meta-analysis of rare variant association tests, including burden tests, weighted burden tests, variable-threshold tests and tests that allow variants with opposite effects to be grouped together. We show that our approach retains useful features from single-variant meta-analysis approaches and demonstrate its use in a study of blood lipid levels in ∼18,500 individuals genotyped with exome arrays.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.2852DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3939031PMC
February 2014

The influence of genomic context on mutation patterns in the human genome inferred from rare variants.

Genome Res 2013 Dec 29;23(12):1974-84. Epub 2013 Aug 29.

Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA;

Understanding patterns of spontaneous mutations is of fundamental interest in studies of human genome evolution and genetic disease. Here, we used extremely rare variants in humans to model the molecular spectrum of single-nucleotide mutations. Compared to common variants in humans and human-chimpanzee fixed differences (substitutions), rare variants, on average, arose more recently in the human lineage and are less affected by the potentially confounding effects of natural selection, population demographic history, and biased gene conversion. We analyzed variants obtained from a population-based sequencing study of 202 genes in >14,000 individuals. We observed considerable variability in the per-gene mutation rate, which was correlated with local GC content, but not recombination rate. Using >20,000 variants with a derived allele frequency ≤ 10(-4), we examined the effect of local GC content and recombination rate on individual variant subtypes and performed comparisons with common variants and substitutions. The influence of local GC content on rare variants differed from that on common variants or substitutions, and the differences varied by variant subtype. Furthermore, recombination rate and recombination hotspots have little effect on rare variants of any subtype, yet both have a relatively strong impact on multiple variant subtypes in common variants and substitutions. This observation is consistent with the effect of biased gene conversion or selection-dependent processes. Our results highlight the distinct biases inherent in the initial mutation patterns and subsequent evolutionary processes that affect segregating variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.154971.113DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3847768PMC
December 2013

A geometric framework for evaluating rare variant tests of association.

Genet Epidemiol 2013 May 21;37(4):345-57. Epub 2013 Mar 21.

Department of Statistics, Harvard University, Cambridge, MA, USA.

The wave of next-generation sequencing data has arrived. However, many questions still remain about how to best analyze sequence data, particularly the contribution of rare genetic variants to human disease. Numerous statistical methods have been proposed to aggregate association signals across multiple rare variant sites in an effort to increase statistical power; however, the precise relation between the tests is often not well understood. We present a geometric representation for rare variant data in which rare allele counts in case and control samples are treated as vectors in Euclidean space. The geometric framework facilitates a rigorous classification of existing rare variant tests into two broad categories: tests for a difference in the lengths of the case and control vectors, and joint tests for a difference in either the lengths or angles of the two vectors. We demonstrate that genetic architecture of a trait, including the number and frequency of risk alleles, directly relates to the behavior of the length and joint tests. Hence, the geometric framework allows prediction of which tests will perform best under different disease models. Furthermore, the structure of the geometric framework immediately suggests additional classes and types of rare variant tests. We consider two general classes of tests which show robustness to noncausal and protective variants. The geometric framework introduces a novel and unique method to assess current rare variant methodology and provides guidelines for both applied and theoretical researchers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.21722DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3718063PMC
May 2013

An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people.

Science 2012 Jul 17;337(6090):100-4. Epub 2012 May 17.

Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.

Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1217876DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4319976PMC
July 2012
-->