Publications by authors named "Dajiang J Liu"

49 Publications

Multivariate analysis of 1.5 million people identifies genetic associations with traits related to self-regulation and addiction.

Nat Neurosci 2021 Aug 26. Epub 2021 Aug 26.

Department of Psychology, Virginia Commonwealth University, Richmond, VA, USA.

Behaviors and disorders related to self-regulation, such as substance use, antisocial behavior and attention-deficit/hyperactivity disorder, are collectively referred to as externalizing and have shared genetic liability. We applied a multivariate approach that leverages genetic correlations among externalizing traits for genome-wide association analyses. By pooling data from ~1.5 million people, our approach is statistically more powerful than single-trait analyses and identifies more than 500 genetic loci. The loci were enriched for genes expressed in the brain and related to nervous system development. A polygenic score constructed from our results predicts a range of behavioral and medical outcomes that were not part of genome-wide analyses, including traits that until now lacked well-performing polygenic scores, such as opioid use disorder, suicide, HIV infections, criminal convictions and unemployment. Our findings are consistent with the idea that persistent difficulties in self-regulation can be conceptualized as a neurodevelopmental trait with complex and far-reaching social and health correlates.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41593-021-00908-3DOI Listing
August 2021

Inferring genes that escape X-Chromosome inactivation reveals important contribution of variable escape genes to sex-biased diseases.

Genome Res 2021 Sep 23;31(9):1629-1637. Epub 2021 Aug 23.

Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania 17033, USA.

The X Chromosome plays an important role in human development and disease. However, functional genomic and disease association studies of X genes greatly lag behind autosomal gene studies, in part owing to the unique biology of X-Chromosome inactivation (XCI). Because of XCI, most genes are only expressed from one allele. Yet, ∼30% of X genes "escape" XCI and are transcribed from both alleles, many only in a proportion of the population. Such interindividual differences are likely to be disease relevant, particularly for sex-biased disorders. To understand the functional biology for X-linked genes, we developed X-Chromosome inactivation for RNA-seq (XCIR), a novel approach to identify escape genes using bulk RNA-seq data. Our method, available as an R package, is more powerful than alternative approaches and is computationally efficient to handle large population-scale data sets. Using annotated XCI states, we examined the contribution of X-linked genes to the disease heritability in the United Kingdom Biobank data set. We show that escape and variable escape genes explain the largest proportion of X heritability, which is in large part attributable to X genes with Y homology. Finally, we investigated the role of each XCI state in sex-biased diseases and found that although XY homologous gene pairs have a larger overall effect size, enrichment for variable escape genes is significantly increased in female-biased diseases. Our results, for the first time, quantitate the importance of variable escape genes for the etiology of sex-biased disease, and our pipeline allows analysis of larger data sets for a broad range of phenotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.275677.121DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8415373PMC
September 2021

Model-based assessment of replicability for genome-wide association meta-analysis.

Nat Commun 2021 03 30;12(1):1964. Epub 2021 Mar 30.

Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA.

Genome-wide association meta-analysis (GWAMA) is an effective approach to enlarge sample sizes and empower the discovery of novel associations between genotype and phenotype. Independent replication has been used as a gold-standard for validating genetic associations. However, as current GWAMA often seeks to aggregate all available datasets, it becomes impossible to find a large enough independent dataset to replicate new discoveries. Here we introduce a method, MAMBA (Meta-Analysis Model-based Assessment of replicability), for assessing the "posterior-probability-of-replicability" for identified associations by leveraging the strength and consistency of association signals between contributing studies. We demonstrate using simulations that MAMBA is more powerful and robust than existing methods, and produces more accurate genetic effects estimates. We apply MAMBA to a large-scale meta-analysis of addiction phenotypes with 1.2 million individuals. In addition to accurately identifying replicable common variant associations, MAMBA also pinpoints novel replicable rare variant associations from imputation-based GWAMA and hence greatly expands the set of analyzable variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-21226-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8009871PMC
March 2021

MetaPrism: A versatile toolkit for joint taxa/gene analysis of metagenomic sequencing data.

G3 (Bethesda) 2021 04;11(4)

Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.

In microbiome research, metagenomic sequencing generates enormous amounts of data. These data are typically classified into taxa for taxonomy analysis, or into genes for functional analysis. However, a joint analysis where the reads are classified into taxa-specific genes is often overlooked. To enable the analysis of this biologically meaningful feature, we developed a novel bioinformatic toolkit, MetaPrism, which can analyze sequence reads for a set of joint taxa/gene analyses to: 1) classify sequence reads and estimate the abundances for taxa-specific genes; 2) tabularize and visualize taxa-specific gene abundances; 3) compare the abundances between groups; and 4) build prediction models for clinical outcome. We illustrated these functions using a published microbiome metagenomics dataset from patients treated with immune checkpoint inhibitor therapy and showed the joint features can serve as potential biomarkers to predict therapeutic responses. MetaPrism is a toolkit for joint taxa and gene analysis. It offers biological insights on the taxa-specific genes on top of the taxa-alone or gene-alone analysis. MetaPrism is open-source software and freely available at https://github.com/jiwoongbio/MetaPrism. The example script to reproduce the manuscript is also provided in the above code repository.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/g3journal/jkab046DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049424PMC
April 2021

MB-GAN: Microbiome Simulation via Generative Adversarial Network.

Gigascience 2021 Feb;10(2)

University of Texas Southwestern Medical Center, Quantitative Biomedical Research Center, Department of Population and Data Sciences, 5323 Harry Hines Blvd, Dallas, TX 75390, USA.

Background: Trillions of microbes inhabit the human body and have a profound effect on human health. The recent development of metagenome-wide association studies and other quantitative analysis methods accelerate the discovery of the associations between human microbiome and diseases. To assess the strengths and limitations of these analytical tools, simulating realistic microbiome datasets is critically important. However, simulating the real microbiome data is challenging because it is difficult to model their correlation structure using explicit statistical models.

Results: To address the challenge of simulating realistic microbiome data, we designed a novel simulation framework termed MB-GAN, by using a generative adversarial network (GAN) and utilizing methodology advancements from the deep learning community. MB-GAN can automatically learn from given microbial abundances and compute simulated abundances that are indistinguishable from them. In practice, MB-GAN showed the following advantages. First, MB-GAN avoids explicit statistical modeling assumptions, and it only requires real datasets as inputs. Second, unlike the traditional GANs, MB-GAN is easily applicable and can converge efficiently.

Conclusions: By applying MB-GAN to a case-control gut microbiome study of 396 samples, we demonstrated that the simulated data and the original data had similar first-order and second-order properties, including sparsity, diversities, and taxa-taxa correlations. These advantages are suitable for further microbiome methodology development where high-fidelity microbiome data are needed.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giab005DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931821PMC
February 2021

Causal Relationship and Shared Genetic Loci between Psoriasis and Type 2 Diabetes through Trans-Disease Meta-Analysis.

J Invest Dermatol 2021 Jun 30;141(6):1493-1502. Epub 2020 Dec 30.

Department of Dermatology, University of Michigan Medical School, Ann Arbor, Michigan, USA; Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor Michigan, USA. Electronic address:

Psoriasis and type 2 diabetes (T2D) are complex conditions with significant impacts on health. Patients with psoriasis have a higher risk of T2D (∼1.5 OR) and vice versa, controlling for body mass index; yet, there has been a limited study comparing their genetic architecture. We hypothesized that there are shared genetic components between psoriasis and T2D. Trans-disease meta-analysis was applied to 8,016,731 well-imputed genetic markers from large-scale meta-analyses of psoriasis (11,024 cases and 16,336 controls) and T2D (74,124 cases and 824,006 controls), adjusted for body mass index. We confirmed our findings in a hospital-based study (42,112 patients) and tested for causal relationships with multivariable Mendelian randomization. Mendelian randomization identified a causal relationship between psoriasis and T2D (P = 1.6 × 10, OR = 1.01) and highlighted the impact of body mass index. Trans-disease meta-analysis further revealed four genome-wide significant loci (P < 5 × 10) with evidence of colocalization and shared directions of effect between psoriasis and T2D not present in body mass index. The proteins coded by genes in these loci (ACTR2, ERLIN1, TRMT112, and BECN1) are connected through NF-κB signaling. Our results provide insight into the immunological components that connect immune-mediated skin conditions and metabolic diseases, independent of confounding factors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jid.2020.11.025DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8154633PMC
June 2021

Discovery of rare variants associated with blood pressure regulation through meta-analysis of 1.3 million individuals.

Nat Genet 2020 12 23;52(12):1314-1332. Epub 2020 Nov 23.

Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Herlev, Denmark.

Genetic studies of blood pressure (BP) to date have mainly analyzed common variants (minor allele frequency > 0.05). In a meta-analysis of up to ~1.3 million participants, we discovered 106 new BP-associated genomic regions and 87 rare (minor allele frequency ≤ 0.01) variant BP associations (P < 5 × 10), of which 32 were in new BP-associated loci and 55 were independent BP-associated single-nucleotide variants within known BP-associated regions. Average effects of rare variants (44% coding) were ~8 times larger than common variant effects and indicate potential candidate causal genes at new and known loci (for example, GATA5 and PLCB3). BP-associated variants (including rare and common) were enriched in regions of active chromatin in fetal tissues, potentially linking fetal development with BP regulation in later life. Multivariable Mendelian randomization suggested possible inverse effects of elevated systolic and diastolic BP on large artery stroke. Our study demonstrates the utility of rare-variant analyses for identifying candidate genes and the results highlight potential therapeutic targets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-020-00713-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7610439PMC
December 2020

Genetic correlation, pleiotropy, and causal associations between substance use and psychiatric disorder.

Psychol Med 2020 Aug 7:1-11. Epub 2020 Aug 7.

Department of Psychology, University of Minnesota, Minneapolis, MN, USA.

Background: Substance use occurs at a high rate in persons with a psychiatric disorder. Genetically informative studies have the potential to elucidate the etiology of these phenomena. Recent developments in genome-wide association studies (GWAS) allow new avenues of investigation.

Method: Using results of GWAS meta-analyses, we performed a factor analysis of the genetic correlation structure, a genome-wide search of shared loci, and causally informative tests for six substance use phenotypes (four smoking, one alcohol, and one cannabis use) and five psychiatric disorders (ADHD, anorexia, depression, bipolar disorder, and schizophrenia).

Results: Two correlated externalizing and internalizing/psychosis factor were found, although model fit was beneath conventional standards. Of 458 loci reported in previous univariate GWAS of substance use and psychiatric disorders, about 50% (230 loci) were pleiotropic with additional 111 pleiotropic loci not reported from past GWAS. Of the 341 pleiotropic loci, 152 were associated with both substance use and psychiatric disorders, implicating neurodevelopment, cell morphogenesis, biological adhesion pathways, and enrichment in 13 different brain tissues. Seventy-five and 114 pleiotropic loci were specific to either psychiatric disorders or substance use phenotypes, implicating neuronal signaling pathway and clathrin-binding functions/structures, respectively. No consistent evidence for phenotypic causation was found across different Mendelian randomization methods.

Conclusions: Genetic etiology of substance use and psychiatric disorders is highly pleiotropic and involves shared neurodevelopmental path, neurotransmission, and intracellular trafficking. In aggregate, the patterns are not consistent with vertical pleiotropy, more likely reflecting horizontal pleiotropy or more complex forms of phenotypic causation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1017/S003329172000272XDOI Listing
August 2020

Seqminer2: an efficient tool to query and retrieve genotypes for statistical genetics analyses from biobank scale sequence dataset.

Bioinformatics 2020 12;36(19):4951-4954

Department of Clinical Science, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.

Summary: Here, we present a highly efficient R-package seqminer2 for querying and retrieving sequence variants from biobank scale datasets of millions of individuals and hundreds of millions of genetic variants. Seqminer2 implements a novel variant-based index for querying VCF/BCF files. It improves the speed of query and retrieval by several magnitudes compared to the state-of-the-art tools based upon tabix. It also reimplements support for BGEN and PLINK format, which improves speed over alternative implementations. The improved efficiency and comprehensive support for popular file formats will facilitate method development, software prototyping and data analysis of biobank scale sequence datasets in R.

Availability And Implementation: The seqminer2 R package is available from https://github.com/zhanxw/seqminer. Scripts used for the benchmarks are available in https://github.com/yang-lina/seqminer/blob/master/seqminer2%20benchmark%20script.txt.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa628DOI Listing
December 2020

Investigation of discordant phenotype in mild Hemophilia A using whole exome sequencing.

Thromb Res 2020 09 28;193:36-39. Epub 2020 May 28.

Department of Biochemistry and Molecular Biology, Penn State College of Medicine, Hershey, PA, United States of America. Electronic address:

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.thromres.2020.05.044DOI Listing
September 2020

Association Analysis and Meta-Analysis of Multi-Allelic Variants for Large-Scale Sequence Data.

Genes (Basel) 2020 05 25;11(5). Epub 2020 May 25.

Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033, USA.

There is great interest in understanding the impact of rare variants in human diseases using large sequence datasets. In deep sequence datasets of >10,000 samples, ~10% of the variant sites are observed to be multi-allelic. Many of the multi-allelic variants have been shown to be functional and disease-relevant. Proper analysis of multi-allelic variants is critical to the success of a sequencing study, but existing methods do not properly handle multi-allelic variants and can produce highly misleading association results. We discuss practical issues and methods to encode multi-allelic sites, conduct single-variant and gene-level association analyses, and perform meta-analysis for multi-allelic variants. We evaluated these methods through extensive simulations and the study of a large meta-analysis of ~18,000 samples on the cigarettes-per-day phenotype. We showed that our joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single-variant association tests among methods that can properly estimate allele effects, and enhanced gene-level tests over existing approaches. Software packages implementing these methods are available online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/genes11050586DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7288273PMC
May 2020

Prioritizing genetic variants in GWAS with lasso using permutation-assisted tuning.

Bioinformatics 2020 06;36(12):3811-3817

Department of Public Health Sciences, Pennsylvania State University, Hershey, PA 17033.

Motivation: Large scale genome-wide association studies (GWAS) have resulted in the identification of a wide range of genetic variants related to a host of complex traits and disorders. Despite their success, the individual single-nucleotide polymorphism (SNP) analysis approach adopted in most current GWAS can be limited in that it is usually biologically simple to elucidate a comprehensive genetic architecture of phenotypes and statistically underpowered due to heavy multiple-testing correction burden. On the other hand, multiple-SNP analyses (e.g. gene-based or region-based SNP-set analysis) are usually more powerful to examine the joint effects of a set of SNPs on the phenotype of interest. However, current multiple-SNP approaches can only draw an overall conclusion at the SNP-set level and does not directly inform which SNPs in the SNP-set are driving the overall genotype-phenotype association.

Results: In this article, we propose a new permutation-assisted tuning procedure in lasso (plasso) to identify phenotype-associated SNPs in a joint multiple-SNP regression model in GWAS. The tuning parameter of lasso determines the amount of shrinkage and is essential to the performance of variable selection. In the proposed plasso procedure, we first generate permutations as pseudo-SNPs that are not associated with the phenotype. Then, the lasso tuning parameter is delicately chosen to separate true signal SNPs and non-informative pseudo-SNPs. We illustrate plasso using simulations to demonstrate its superior performance over existing methods, and application of plasso to a real GWAS dataset gains new additional insights into the genetic control of complex traits.

Availability And Implementation: R codes to implement the proposed methodology is available at https://github.com/xyz5074/plasso.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa229DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320616PMC
June 2020

Impact of HFE variants and sex in lung cancer.

PLoS One 2019 19;14(12):e0226821. Epub 2019 Dec 19.

Department of Neurosurgery, The Pennsylvania State University College of Medicine, Penn State Health Milton S. Hershey Medical Center, Hershey, Pennsylvania, United States of America.

The homeostatic iron regulator protein HFE is involved in regulation of iron acquisition for cells. The prevalence of two common HFE gene variants (H63D, C282Y) has been studied in many cancer types; however, the impact of HFE variants, sex and HFE gene expression in lung cancer has not been studied. We determined the prevalence of HFE variants and their impact on cancer phenotypes in lung cancer cell lines, in lung cancer patient specimens, and using The Cancer Genome Atlas (TCGA) database. We found that seven out of ten human lung cancer cell lines carry the H63D or C282Y HFE variant. Analysis of lung cancer specimens from our institute (Penn State Hershey Medical Center) revealed a sex and genotype interaction risk for metastasis in lung adenocarcinoma (LUAD) patients; H63D HFE is associated with less metastasis in males compared to wild type (WT) HFE; however, females with the H63D HFE variant tend to develop more metastatic tumors than WT female patients. In the TCGA LUAD dataset, the H63D HFE variant was associated with poorer survival in females compared to females with WT HFE. The frequency of C282Y HFE is higher in female lung squamous cell carcinoma (LUSC) patients of TCGA than males, however the C282Y HFE variant did not impact the survival of LUSC patients. In the TCGA LUSC dataset, C282Y HFE patients (especially females) had poorer survival than WT HFE patients. HFE expression level was not affected by HFE genotype status and did not impact patient's survival, regardless of sex. In summary, these data suggest that there is a sexually dimorphic effect of HFE polymorphisms in the survival and metastatic disease in lung cancer.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0226821PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922424PMC
April 2020

Methods for the Analysis and Interpretation for Rare Variants Associated with Complex Traits.

Curr Protoc Hum Genet 2019 04 8;101(1):e83. Epub 2019 Mar 8.

Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania.

With the advent of Next Generation Sequencing (NGS) technologies, whole genome and whole exome DNA sequencing has become affordable for routine genetic studies. Coupled with improved genotyping arrays and genotype imputation methodologies, it is increasingly feasible to obtain rare genetic variant information in large datasets. Such datasets allow researchers to gain a more complete understanding of the genetic architecture of complex traits caused by rare variants. State-of-the-art statistical methods for the statistical genetics analysis of sequence-based association, including efficient algorithms for association analysis in biobank-scale datasets, gene-association tests, meta-analysis, fine mapping methods that integrate functional genomic dataset, and phenome-wide association studies (PheWAS), are reviewed here. These methods are expected to be highly useful for next generation statistical genetics analysis in the era of precision medicine. © 2019 by John Wiley & Sons, Inc.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cphg.83DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6455968PMC
April 2019

Protein-coding variants implicate novel genes related to lipid homeostasis contributing to body-fat distribution.

Nat Genet 2019 03 18;51(3):452-469. Epub 2019 Feb 18.

Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.

Body-fat distribution is a risk factor for adverse cardiovascular health consequences. We analyzed the association of body-fat distribution, assessed by waist-to-hip ratio adjusted for body mass index, with 228,985 predicted coding and splice site variants available on exome arrays in up to 344,369 individuals from five major ancestries (discovery) and 132,177 European-ancestry individuals (validation). We identified 15 common (minor allele frequency, MAF ≥5%) and nine low-frequency or rare (MAF <5%) coding novel variants. Pathway/gene set enrichment analyses identified lipid particle, adiponectin, abnormal white adipose tissue physiology and bone development and morphology as important contributors to fat distribution, while cross-trait associations highlight cardiometabolic traits. In functional follow-up analyses, specifically in Drosophila RNAi-knockdowns, we observed a significant increase in the total body triglyceride levels for two genes (DNAH10 and PLXND1). We implicate novel genes in fat distribution, stressing the importance of interrogating low-frequency and protein-coding variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0334-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6560635PMC
March 2019

Exome Chip Meta-analysis Fine Maps Causal Variants and Elucidates the Genetic Architecture of Rare Coding Variants in Smoking and Alcohol Use.

Biol Psychiatry 2019 06 6;85(11):946-955. Epub 2018 Dec 6.

Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina.

Background: Smoking and alcohol use have been associated with common genetic variants in multiple loci. Rare variants within these loci hold promise in the identification of biological mechanisms in substance use. Exome arrays and genotype imputation can now efficiently genotype rare nonsynonymous and loss of function variants. Such variants are expected to have deleterious functional consequences and to contribute to disease risk.

Methods: We analyzed ∼250,000 rare variants from 16 independent studies genotyped with exome arrays and augmented this dataset with imputed data from the UK Biobank. Associations were tested for five phenotypes: cigarettes per day, pack-years, smoking initiation, age of smoking initiation, and alcoholic drinks per week. We conducted stratified heritability analyses, single-variant tests, and gene-based burden tests of nonsynonymous/loss-of-function coding variants. We performed a novel fine-mapping analysis to winnow the number of putative causal variants within associated loci.

Results: Meta-analytic sample sizes ranged from 152,348 to 433,216, depending on the phenotype. Rare coding variation explained 1.1% to 2.2% of phenotypic variance, reflecting 11% to 18% of the total single nucleotide polymorphism heritability of these phenotypes. We identified 171 genome-wide associated loci across all phenotypes. Fine mapping identified putative causal variants with double base-pair resolution at 24 of these loci, and between three and 10 variants for 65 loci. Twenty loci contained rare coding variants in the 95% credible intervals.

Conclusions: Rare coding variation significantly contributes to the heritability of smoking and alcohol use. Fine-mapping genome-wide association study loci identifies specific variants contributing to the biological etiology of substance use behavior.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.biopsych.2018.11.024DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6534468PMC
June 2019

Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use.

Nat Genet 2019 02 14;51(2):237-244. Epub 2019 Jan 14.

Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche, Monserrato, Italy.

Tobacco and alcohol use are leading causes of mortality that influence risk for many complex diseases and disorders. They are heritable and etiologically related behaviors that have been resistant to gene discovery efforts. In sample sizes up to 1.2 million individuals, we discovered 566 genetic variants in 406 loci associated with multiple stages of tobacco use (initiation, cessation, and heaviness) as well as alcohol use, with 150 loci evidencing pleiotropic association. Smoking phenotypes were positively genetically correlated with many health conditions, whereas alcohol use was negatively correlated with these conditions, such that increased genetic risk for alcohol use is associated with lower disease risk. We report evidence for the involvement of many systems in tobacco and alcohol use, including genes involved in nicotinic, dopaminergic, and glutamatergic neurotransmission. The results provide a solid starting point to evaluate the effects of these loci in model organisms and more precise substance use measures.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0307-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6358542PMC
February 2019

Meta-analysis of up to 622,409 individuals identifies 40 novel smoking behaviour associated genetic loci.

Mol Psychiatry 2020 10 7;25(10):2392-2409. Epub 2019 Jan 7.

Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University Amsterdam, Amsterdam, Netherlands.

Smoking is a major heritable and modifiable risk factor for many diseases, including cancer, common respiratory disorders and cardiovascular diseases. Fourteen genetic loci have previously been associated with smoking behaviour-related traits. We tested up to 235,116 single nucleotide variants (SNVs) on the exome-array for association with smoking initiation, cigarettes per day, pack-years, and smoking cessation in a fixed effects meta-analysis of up to 61 studies (up to 346,813 participants). In a subset of 112,811 participants, a further one million SNVs were also genotyped and tested for association with the four smoking behaviour traits. SNV-trait associations with P < 5 × 10 in either analysis were taken forward for replication in up to 275,596 independent participants from UK Biobank. Lastly, a meta-analysis of the discovery and replication studies was performed. Sixteen SNVs were associated with at least one of the smoking behaviour traits (P < 5 × 10) in the discovery samples. Ten novel SNVs, including rs12616219 near TMEM182, were followed-up and five of them (rs462779 in REV3L, rs12780116 in CNNM2, rs1190736 in GPR101, rs11539157 in PJA1, and rs12616219 near TMEM182) replicated at a Bonferroni significance threshold (P < 4.5 × 10) with consistent direction of effect. A further 35 SNVs were associated with smoking behaviour traits in the discovery plus replication meta-analysis (up to 622,409 participants) including a rare SNV, rs150493199, in CCDC141 and two low-frequency SNVs in CEP350 and HDGFRP2. Functional follow-up implied that decreased expression of REV3L may lower the probability of smoking initiation. The novel loci will facilitate understanding the genetic aetiology of smoking behaviour and may lead to the identification of potential drug targets for smoking prevention and/or cessation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41380-018-0313-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7515840PMC
October 2020

Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program.

Nat Genet 2018 11 1;50(11):1514-1523. Epub 2018 Oct 1.

Initiative for Noncommunicable Diseases, Health Systems and Population Studies Division, International Centre for Diarrheal Disease Research, Dhaka, Bangladesh.

The Million Veteran Program (MVP) was established in 2011 as a national research initiative to determine how genetic variation influences the health of US military veterans. Here we genotyped 312,571 MVP participants using a custom biobank array and linked the genetic data to laboratory and clinical phenotypes extracted from electronic health records covering a median of 10.0 years of follow-up. Among 297,626 veterans with at least one blood lipid measurement, including 57,332 black and 24,743 Hispanic participants, we tested up to around 32 million variants for association with lipid levels and identified 118 novel genome-wide significant loci after meta-analysis with data from the Global Lipids Genetics Consortium (total n > 600,000). Through a focus on mutations predicted to result in a loss of gene function and a phenome-wide association study, we propose novel indications for pharmaceutical inhibitors targeting PCSK9 (abdominal aortic aneurysm), ANGPTL4 (type 2 diabetes) and PDE3B (triglycerides and coronary disease).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0222-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6521726PMC
November 2018

ADAPTIVE-WEIGHT BURDEN TEST FOR ASSOCIATIONS BETWEEN QUANTITATIVE TRAITS AND GENOTYPE DATA WITH COMPLEX CORRELATIONS.

Ann Appl Stat 2018 Sep 11;12(3):1558-1582. Epub 2018 Sep 11.

Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, USA.

High-throughput sequencing has often been used to screen samples from pedigrees or with population structure, producing genotype data with complex correlations rendered from both familial relation and linkage disequilibrium. With such data, it is critical to account for these genotypic correlations when assessing the contribution of variants by gene or pathway. Recognizing the limitations of existing association testing methods, we propose (ABT), a retrospective, mixed-model test for genetic association of quantitative traits on genotype data with complex correlations. This method makes full use of genotypic correlations across both samples and variants, and adopts "data-driven" weights to improve power. We derive the ABT statistic and its explicit distribution under the null hypothesis, and demonstrate through simulation studies that it is generally more powerful than the fixed-weight burden test and family-based SKAT in various scenarios, controlling for the type I error rate. Further investigation reveals the connection of ABT with kernel tests, as well as the adaptability of its weights to the direction of genetic effects. The application of ABT is illustrated by a whole genome analysis of genes with common and rare variants associated with fasting glucose from the NHLBI "Grand Opportunity" Exome Sequencing Project.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1214/17-AOAS1121DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6133321PMC
September 2018

Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes.

PLoS Genet 2018 07 17;14(7):e1007452. Epub 2018 Jul 17.

Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania, United States of America.

Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1007452DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6063450PMC
July 2018

Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes.

Nat Genet 2018 04 9;50(4):559-571. Epub 2018 Apr 9.

Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands.

We aggregated coding variant data for 81,412 type 2 diabetes cases and 370,832 controls of diverse ancestry, identifying 40 coding variant association signals (P < 2.2 × 10); of these, 16 map outside known risk-associated loci. We make two important observations. First, only five of these signals are driven by low-frequency variants: even for these, effect sizes are modest (odds ratio ≤1.29). Second, when we used large-scale genome-wide association data to fine-map the associated variants in their regional context, accounting for the global enrichment of complex trait associations in coding sequence, compelling evidence for coding variant causality was obtained for only 16 signals. At 13 others, the associated coding variants clearly represent 'false leads' with potential to generate erroneous mechanistic inference. Coding variant associations offer a direct route to biological insight for complex diseases and identification of validated therapeutic targets; however, appropriate mechanistic inference requires careful specification of their causal contribution to disease predisposition.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0084-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5898373PMC
April 2018

Publisher Correction: Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity.

Nat Genet 2018 05;50(5):766-767

Department of Genetic Epidemiology, University of Regensburg, Regensburg, Germany.

In the version of this article originally published, one of the two authors with the name Wei Zhao was omitted from the author list and the affiliations for both authors were assigned to the single Wei Zhao in the author list. In addition, the ORCID for Wei Zhao (Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA) was incorrectly assigned to author Wei Zhou. The errors have been corrected in the HTML and PDF versions of the article.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0082-3DOI Listing
May 2018

Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity.

Nat Genet 2018 01 22;50(1):26-41. Epub 2017 Dec 22.

Department of Genetic Epidemiology, University of Regensburg, Regensburg, Germany.

Genome-wide association studies (GWAS) have identified >250 loci for body mass index (BMI), implicating pathways related to neuronal biology. Most GWAS loci represent clusters of common, noncoding variants from which pinpointing causal genes remains challenging. Here we combined data from 718,734 individuals to discover rare and low-frequency (minor allele frequency (MAF) < 5%) coding variants associated with BMI. We identified 14 coding variants in 13 genes, of which 8 variants were in genes (ZBTB7B, ACHE, RAPGEF3, RAB21, ZFHX3, ENTPD6, ZFR2 and ZNF169) newly implicated in human obesity, 2 variants were in genes (MC4R and KSR2) previously observed to be mutated in extreme obesity and 2 variants were in GIPR. The effect sizes of rare variants are ~10 times larger than those of common variants, with the largest effect observed in carriers of an MC4R mutation introducing a stop codon (p.Tyr35Ter, MAF = 0.01%), who weighed ~7 kg more than non-carriers. Pathway analyses based on the variants associated with BMI confirm enrichment of neuronal genes and provide new evidence for adipocyte and energy expenditure biology, widening the potential of genetically supported therapeutic targets in obesity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-017-0011-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5945951PMC
January 2018

Exome-wide association study of plasma lipids in >300,000 individuals.

Nat Genet 2017 Dec 30;49(12):1758-1766. Epub 2017 Oct 30.

Division of Preventive Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA.

We screened variants on an exome-focused genotyping array in >300,000 participants (replication in >280,000 participants) and identified 444 independent variants in 250 loci significantly associated with total cholesterol (TC), high-density-lipoprotein cholesterol (HDL-C), low-density-lipoprotein cholesterol (LDL-C), and/or triglycerides (TG). At two loci (JAK2 and A1CF), experimental analysis in mice showed lipid changes consistent with the human data. We also found that: (i) beta-thalassemia trait carriers displayed lower TC and were protected from coronary artery disease (CAD); (ii) excluding the CETP locus, there was not a predictable relationship between plasma HDL-C and risk for age-related macular degeneration; (iii) only some mechanisms of lowering LDL-C appeared to increase risk for type 2 diabetes (T2D); and (iv) TG-lowering alleles involved in hepatic production of TG-rich lipoproteins (TM6SF2 and PNPLA3) tracked with higher liver fat, higher risk for T2D, and lower risk for CAD, whereas TG-lowering alleles involved in peripheral lipolysis (LPL and ANGPTL4) had no effect on liver fat but decreased risks for both T2D and CAD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3977DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5709146PMC
December 2017

Exome chip meta-analysis identifies novel loci and East Asian-specific coding variants that contribute to lipid levels and coronary artery disease.

Nat Genet 2017 Dec 30;49(12):1722-1730. Epub 2017 Oct 30.

Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, Tennessee, USA.

Most genome-wide association studies have been of European individuals, even though most genetic variation in humans is seen only in non-European samples. To search for novel loci associated with blood lipid levels and clarify the mechanism of action at previously identified lipid loci, we used an exome array to examine protein-coding genetic variants in 47,532 East Asian individuals. We identified 255 variants at 41 loci that reached chip-wide significance, including 3 novel loci and 14 East Asian-specific coding variant associations. After a meta-analysis including >300,000 European samples, we identified an additional nine novel loci. Sixteen genes were identified by protein-altering variants in both East Asians and Europeans, and thus are likely to be functional genes. Our data demonstrate that most of the low-frequency or rare coding variants associated with lipids are population specific, and that examining genomic data across diverse ancestries may facilitate the identification of functional genes at associated loci.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3978DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5899829PMC
December 2017

Clonal evolution in paired endometrial intraepithelial neoplasia/atypical hyperplasia and endometrioid adenocarcinoma.

Hum Pathol 2017 09 14;67:69-77. Epub 2017 Jul 14.

Department of Pathology, Penn State College of Medicine and Milton S. Hershey Medical Center, Hershey, PA 17033. Electronic address:

Endometrial intraepithelial neoplasia (EIN) and atypical endometrial hyperplasia (AH) are histomorphologically defined precursors to endometrioid adenocarcinoma, which are unified as EIN/AH by the World Health Organization. EIN/AH harbors a constellation of molecular alterations similar to those found in endometrioid adenocarcinoma. However, the process of clonal evolution from EIN/AH to carcinoma is poorly characterized. To investigate, we performed next-generation sequencing, copy number alteration (CNA) analysis, and immunohistochemistry for mismatch repair protein expression on EIN/AH and endometrioid adenocarcinoma samples from 6 hysterectomy cases with spatially distinct EIN/AH and carcinoma. In evaluating all samples, EIN/AH and carcinoma did not differ in mutational burden, CNA burden, or specific genes mutated (all P>.1). All paired EIN/AH and carcinoma samples shared at least one identical somatic mutation, frequently in PI(3)K pathway members. Large CNAs (>10 genes in length) were identified in 83% of cases; paired EIN/AH and carcinoma samples shared at least one identical CNA in these cases. Mismatch repair protein expression matched in all paired EIN/AH and carcinoma samples. All paired EIN/AH and carcinoma samples had identical The Cancer Genome Atlas subtype, with 3 classified as "copy number low endometrioid" and 3 classified as "microsatellite instability hypermutated." Although paired EIN/AH and carcinoma samples were clonal, private mutations (ie, present in only one sample) were identified in EIN/AH and carcinoma in all cases, frequently in established cancer-driving genes. These findings indicate that EIN/AH gives rise to endometrioid adenocarcinoma by a complex process of subclone evolution, not a linear accumulation of molecular events.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.humpath.2017.07.003DOI Listing
September 2017
-->