Publications by authors named "Hae Kyung Im"

68 Publications

Pleiotropy-guided transcriptome imputation from normal and tumor tissues identifies candidate susceptibility genes for breast and ovarian cancer.

HGG Adv 2021 Jul 16;2(3). Epub 2021 Jun 16.

Department of Virus, Lifestyle, and Genes, Danish Cancer Society Research Center, Copenhagen, Denmark.

Familial, sequencing, and genome-wide association studies (GWASs) and genetic correlation analyses have progressively unraveled the shared or pleiotropic germline genetics of breast and ovarian cancer. In this study, we aimed to leverage this shared germline genetics to improve the power of transcriptome-wide association studies (TWASs) to identify candidate breast cancer and ovarian cancer susceptibility genes. We built gene expression prediction models using the PrediXcan method in 681 breast and 295 ovarian tumors from The Cancer Genome Atlas and 211 breast and 99 ovarian normal tissue samples from the Genotype-Tissue Expression project and integrated these with GWAS meta-analysis data from the Breast Cancer Association Consortium (122,977 cases/105,974 controls) and the Ovarian Cancer Association Consortium (22,406 cases/40,941 controls). The integration was achieved through application of a pleiotropy-guided conditional/conjunction false discovery rate (FDR) approach in the setting of a TWASs. This identified 14 candidate breast cancer susceptibility genes spanning 11 genomic regions and 8 candidate ovarian cancer susceptibility genes spanning 5 genomic regions at conjunction FDR < 0.05 that were >1 Mb away from known breast and/or ovarian cancer susceptibility loci. We also identified 38 candidate breast cancer susceptibility genes and 17 candidate ovarian cancer susceptibility genes at conjunction FDR < 0.05 at known breast and/or ovarian susceptibility loci. The 22 genes identified by our cross-cancer analysis represent promising candidates that further elucidate the role of the transcriptome in mediating germline breast and ovarian cancer risk.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.xhgg.2021.100042DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8312632PMC
July 2021

Transcriptome prediction performance across machine learning models and diverse ancestries.

HGG Adv 2021 Apr 5;2(2). Epub 2021 Jan 5.

Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA.

Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.xhgg.2020.100019DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8087249PMC
April 2021

Population-scale tissue transcriptomics maps long non-coding RNAs to complex disease.

Cell 2021 May 16;184(10):2633-2648.e19. Epub 2021 Apr 16.

Department of Genetics, Stanford University, Stanford, CA 94305, USA; Department of Pathology, Stanford University, Stanford, CA 94305, USA. Electronic address:

Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2021.03.050DOI Listing
May 2021

A scalable unified framework of total and allele-specific counts for cis-QTL, fine-mapping, and prediction.

Nat Commun 2021 03 3;12(1):1424. Epub 2021 Mar 3.

Section of Genetic Medicine, The University of Chicago, Chicago, IL, USA.

Genetic studies of the transcriptome help bridge the gap between genetic variation and phenotypes. To maximize the potential of such studies, efficient methods to identify expression quantitative trait loci (eQTLs) and perform fine-mapping and genetic prediction of gene expression traits are needed. Current methods that leverage both total read counts and allele-specific expression to identify eQTLs are generally computationally intractable for large transcriptomic studies. Here, we describe a unified framework that addresses these needs and is scalable to thousands of samples. Using simulations and data from GTEx, we demonstrate its calibration and performance. For example, mixQTL shows a power gain equivalent to a 29% increase in sample size for genes with sufficient allele-specific read coverage. To showcase the potential of mixQTL, we apply it to 49 GTEx tissues and find 20% additional eQTLs (FDR < 0.05, per tissue) that are significantly more enriched among trait associated variants and candidate cis-regulatory elements comparing to the standard approach.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-21592-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7930098PMC
March 2021

Exploiting the GTEx resources to decipher the mechanisms at GWAS loci.

Genome Biol 2021 01 26;22(1):49. Epub 2021 Jan 26.

Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA.

The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02252-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7836161PMC
January 2021

Probabilistic colocalization of genetic variants from complex and molecular traits: promise and limitations.

Am J Hum Genet 2021 01 11;108(1):25-35. Epub 2020 Dec 11.

Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA. Electronic address:

Colocalization analysis has emerged as a powerful tool to uncover the overlapping of causal variants responsible for both molecular and complex disease phenotypes. The findings from colocalization analysis yield insights into the molecular pathways of complex diseases. In this paper, we conduct an in-depth investigation of the promise and limitations of the available colocalization analysis approaches. Focusing on variant-level colocalization approaches, we first establish the connections between various existing methods. We proceed to discuss the impacts of various controllable analytical factors and uncontrollable practical factors on outcomes of colocalization analysis through realistic simulations and real data examples. We identify a single analytical factor, the specification of prior enrichment levels, which can lead to severe inflation of false-positive colocalization findings. Meanwhile, the combination of many other analytical and practical factors all lead to diminished power. Consequently, we recommend the following strategies for the best practice of colocalization analysis: (1) estimating prior enrichment level from the observed data and (2) separating fine-mapping and colocalization analysis. Our analysis of 4,091 complex traits and the multi-tissue expression quantitative trait loci (eQTL) data from the GTEx (v.8) suggests that colocalizations of molecular QTLs and causal complex trait associations are widespread. However, only a small proportion can be confidently identified from currently available data due to a lack of power. Our findings set a benchmark for current and future integrative genetic association analysis applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2020.11.012DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820626PMC
January 2021

Cell type-specific genetic regulation of gene expression across human tissues.

Science 2020 09;369(6509)

Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain.

The Genotype-Tissue Expression (GTEx) project has identified expression and splicing quantitative trait loci in cis (QTLs) for the majority of genes across a wide range of human tissues. However, the functional characterization of these QTLs has been limited by the heterogeneous cellular composition of GTEx tissue samples. We mapped interactions between computational estimates of cell type abundance and genotype to identify cell type-interaction QTLs for seven cell types and show that cell type-interaction expression QTLs (eQTLs) provide finer resolution to tissue specificity than bulk tissue cis-eQTLs. Analyses of genetic associations with 87 complex traits show a contribution from cell type-interaction QTLs and enables the discovery of hundreds of previously unidentified colocalized loci that are masked in bulk tissue.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aaz8528DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8051643PMC
September 2020

Transcriptomic signatures across human tissues identify functional rare genetic variation.

Science 2020 09 10;369(6509). Epub 2020 Sep 10.

University of Mississippi Medical Center, Jackson, MS, USA.

Rare genetic variants are abundant across the human genome, and identifying their function and phenotypic impact is a major challenge. Measuring aberrant gene expression has aided in identifying functional, large-effect rare variants (RVs). Here, we expanded detection of genetically driven transcriptome abnormalities by analyzing gene expression, allele-specific expression, and alternative splicing from multitissue RNA-sequencing data, and demonstrate that each signal informs unique classes of RVs. We developed Watershed, a probabilistic model that integrates multiple genomic and transcriptomic signals to predict variant function, validated these predictions in additional cohorts and through experimental assays, and used them to assess RVs in the UK Biobank, the Million Veterans Program, and the Jackson Heart Study. Our results link thousands of RVs to diverse molecular effects and provide evidence to associate RVs affecting the transcriptome with human traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aaz5900DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7646251PMC
September 2020

The impact of sex on gene expression across human tissues.

Science 2020 09;369(6509)

Department of Statistics, University of Chicago, Chicago, IL, USA.

Many complex human phenotypes exhibit sex-differentiated characteristics. However, the molecular mechanisms underlying these differences remain largely unknown. We generated a catalog of sex differences in gene expression and in the genetic regulation of gene expression across 44 human tissue sources surveyed by the Genotype-Tissue Expression project (GTEx, v8 release). We demonstrate that sex influences gene expression levels and cellular composition of tissue samples across the human body. A total of 37% of all genes exhibit sex-biased expression in at least one tissue. We identify cis expression quantitative trait loci (eQTLs) with sex-differentiated effects and characterize their cellular origin. By integrating sex-biased eQTLs with genome-wide association study data, we identify 58 gene-trait associations that are driven by genetic regulation of gene expression in a single sex. These findings provide an extensive characterization of sex differences in the human transcriptome and its genetic regulation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aba3066DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8136152PMC
September 2020

Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx.

Genome Biol 2020 09 11;21(1):233. Epub 2020 Sep 11.

Department of Genetics, Stanford University, Stanford, CA, USA.

Background: Population structure among study subjects may confound genetic association studies, and lack of proper correction can lead to spurious findings. The Genotype-Tissue Expression (GTEx) project largely contains individuals of European ancestry, but the v8 release also includes up to 15% of individuals of non-European ancestry. Assessing ancestry-based adjustments in GTEx improves portability of this research across populations and further characterizes the impact of population structure on GWAS colocalization.

Results: Here, we identify a subset of 117 individuals in GTEx (v8) with a high degree of population admixture and estimate genome-wide local ancestry. We perform genome-wide cis-eQTL mapping using admixed samples in seven tissues, adjusted by either global or local ancestry. Consistent with previous work, we observe improved power with local ancestry adjustment. At loci where the two adjustments produce different lead variants, we observe 31 loci (0.02%) where a significant colocalization is called only with one eQTL ancestry adjustment method. Notably, both adjustments produce similar numbers of significant colocalizations within each of two different colocalization methods, COLOC and FINEMAP. Finally, we identify a small subset of eQTL-associated variants highly correlated with local ancestry, providing a resource to enhance functional follow-up.

Conclusions: We provide a local ancestry map for admixed individuals in the GTEx v8 release and describe the impact of ancestry and admixture on gene expression, eQTLs, and GWAS colocalization. While the majority of the results are concordant between local and global ancestry-based adjustments, we identify distinct advantages and disadvantages to each approach.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02113-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7488497PMC
September 2020

sn-spMF: matrix factorization informs tissue-specific genetic regulation of gene expression.

Genome Biol 2020 09 11;21(1):235. Epub 2020 Sep 11.

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, 21218, MD, USA.

Genetic regulation of gene expression, revealed by expression quantitative trait loci (eQTLs), exhibits complex patterns of tissue-specific effects. Characterization of these patterns may allow us to better understand mechanisms of gene regulation and disease etiology. We develop a constrained matrix factorization model, sn-spMF, to learn patterns of tissue-sharing and apply it to 49 human tissues from the Genotype-Tissue Expression (GTEx) project. The learned factors reflect tissues with known biological similarity and identify transcription factors that may mediate tissue-specific effects. sn-spMF, available at https://github.com/heyuan7676/ts_eQTLs , can be applied to learn biologically interpretable patterns of eQTL tissue-specificity and generate testable mechanistic hypotheses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02129-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7488540PMC
September 2020

PTWAS: investigating tissue-relevant causal molecular mechanisms of complex traits using probabilistic TWAS analysis.

Genome Biol 2020 09 11;21(1):232. Epub 2020 Sep 11.

Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.

We propose a new computational framework, probabilistic transcriptome-wide association study (PTWAS), to investigate causal relationships between gene expressions and complex traits. PTWAS applies the established principles from instrumental variables analysis and takes advantage of probabilistic eQTL annotations to delineate and tackle the unique challenges arising in TWAS. PTWAS not only confers higher power than the existing methods but also provides novel functionalities to evaluate the causal assumptions and estimate tissue- or cell-type-specific gene-to-trait effects. We illustrate the power of PTWAS by analyzing the eQTL data across 49 tissues from GTEx (v8) and GWAS summary statistics from 114 complex traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02026-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7488550PMC
September 2020

CORE GREML for estimating covariance between random effects in linear mixed models for complex trait analyses.

Nat Commun 2020 08 21;11(1):4208. Epub 2020 Aug 21.

Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.

As a key variance partitioning tool, linear mixed models (LMMs) using genome-based restricted maximum likelihood (GREML) allow both fixed and random effects. Classic LMMs assume independence between random effects, which can be violated, causing bias. Here we introduce a generalized GREML, named CORE GREML, that explicitly estimates the covariance between random effects. Using extensive simulations, we show that CORE GREML outperforms the conventional GREML, providing variance and covariance estimates free from bias due to correlated random effects. Applying CORE GREML to UK Biobank data, we find, for example, that the transcriptome, imputed using genotype data, explains a significant proportion of phenotypic variance for height (0.15, p-value = 1.5e-283), and that these transcriptomic effects correlate with the genomic effects (genome-transcriptome correlation = 0.35, p-value = 1.2e-14). We conclude that the covariance between random effects is a key parameter for estimation, especially when partitioning phenotypic variance by multi-omics layers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-18085-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7442840PMC
August 2020

Analysis of Genetically Regulated Gene Expression Identifies a Prefrontal PTSD Gene, SNRNP35, Specific to Military Cohorts.

Cell Rep 2020 06;31(9):107716

SAMRC Unit on Risk & Resilience in Mental Disorders, Department of Psychiatry, University of Cape Town, Cape Town 7700, South Africa.

To reveal post-traumatic stress disorder (PTSD) genetic risk influences on tissue-specific gene expression, we use brain and non-brain transcriptomic imputation. We impute genetically regulated gene expression (GReX) in 29,539 PTSD cases and 166,145 controls from 70 ancestry-specific cohorts and identify 18 significant GReX-PTSD associations corresponding to specific tissue-gene pairs. The results suggest substantial genetic heterogeneity based on ancestry, cohort type (military versus civilian), and sex. Two study-wide significant PTSD associations are identified in European and military European cohorts; ZNF140 is predicted to be upregulated in whole blood, and SNRNP35 is predicted to be downregulated in dorsolateral prefrontal cortex, respectively. In peripheral leukocytes from 175 marines, the observed PTSD differential gene expression correlates with the predicted differences for these individuals, and deployment stress produces glucocorticoid-regulated expression changes that include downregulation of both ZNF140 and SNRNP35. SNRNP35 knockdown in cells validates its functional role in U12-intron splicing. Finally, exogenous glucocorticoids in mice downregulate prefrontal Snrnp35 expression.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2020.107716DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7359754PMC
June 2020

Genetic regulatory variation in populations informs transcriptome analysis in rare disease.

Science 2019 10 10;366(6463):351-356. Epub 2019 Oct 10.

New York Genome Center, New York, NY, USA.

Transcriptome data can facilitate the interpretation of the effects of rare genetic variants. Here, we introduce ANEVA (analysis of expression variation) to quantify genetic variation in gene dosage from allelic expression (AE) data in a population. Application of ANEVA to the Genotype-Tissues Expression (GTEx) data showed that this variance estimate is robust and correlated with selective constraint in a gene. Using these variance estimates in a dosage outlier test (ANEVA-DOT) applied to AE data from 70 Mendelian muscular disease patients showed accuracy in detecting genes with pathogenic variants in previously resolved cases and led to one confirmed and several potential new diagnoses. Using our reference estimates from GTEx data, ANEVA-DOT can be incorporated in rare disease diagnostic pipelines to use RNA-sequencing data more effectively.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aay0256DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6814274PMC
October 2019

A -Ethnic Genome-Wide Association Study of Uterine Fibroids.

Front Genet 2019 12;10:511. Epub 2019 Jun 12.

Vanderbilt Epidemiology Center, Institute for Medicine and Public Health, Vanderbilt University Medical Center, Nashville, TN, United States.

Uterine fibroids affect up to 77% of women by menopause and account for up to $34 billion in healthcare costs each year. Although fibroid risk is heritable, genetic risk for fibroids is not well understood. We conducted a two-stage case-control meta-analysis of genetic variants in European and African ancestry women with and without fibroids classified by a previously published algorithm requiring pelvic imaging or confirmed diagnosis. Women from seven electronic Medical Records and Genomics (eMERGE) network sites (3,704 imaging-confirmed cases and 5,591 imaging-confirmed controls) and women of African and European ancestry from UK Biobank (UKB, 5,772 cases and 61,457 controls) were included in the discovery genome-wide association study (GWAS) meta-analysis. Variants showing evidence of association in Stage I GWAS ( < 1 × 10) were targeted in an independent replication sample of African and European ancestry individuals from the UKB (Stage II) (12,358 cases and 138,477 controls). Logistic regression models were fit with genetic markers imputed to a 1000 Genomes reference and adjusted for principal components for each race- and site-specific dataset, followed by fixed-effects meta-analysis. Final analysis with 21,804 cases and 205,525 controls identified 326 genome-wide significant variants in 11 loci, with three novel loci at chromosome 1q24 (sentinel-SNP rs14361789; = 4.7 × 10), chromosome 16q12.1 (sentinel-SNP rs4785384; = 1.5 × 10) and chromosome 20q13.1 (sentinel-SNP rs6094982; = 2.6 × 10). Our statistically significant findings further support previously reported loci including SNPs near , and /. We report evidence of ancestry-specific findings for sentinel-SNP rs10917151 in the / locus ( = 1.76 × 10). Ancestry-specific effect-estimates for rs10917151 were in opposite directions (P-Het-between-groups = 0.04) for predominantly African (OR = 0.84) and predominantly European women (OR = 1.16). Genetically-predicted gene expression of several genes including in vagina ( = 4.6 × 10), in esophageal mucosa ( = 8.7 × 10), in multiple tissues including subcutaneous adipose tissue ( = 3.3 × 10), and in skeletal muscle tissue ( = 5.8 × 10) were associated with fibroids. The finding for was supported by SNP-based summary Mendelian randomization analysis. Our study suggests that fibroid risk variants act through regulatory mechanisms affecting gene expression and are comprised of alleles that are both ancestry-specific and shared across continental ancestries.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2019.00511DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6582231PMC
June 2019

Publisher Correction: Gene expression imputation across multiple brain regions provides insights into schizophrenia risk.

Nat Genet 2019 Jun;51(6):1068

Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA.

In the HTML version of the article originally published, the author group 'The Schizophrenia Working Group of the Psychiatric Genomics Consortium' was displayed incorrectly. The error has been corrected in the HTML version of the article.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-019-0435-6DOI Listing
June 2019

Shared and distinct genetic risk factors for childhood-onset and adult-onset asthma: genome-wide and transcriptome-wide studies.

Lancet Respir Med 2019 06 27;7(6):509-522. Epub 2019 Apr 27.

Department of Medicine, The University of Chicago, Chicago, IL, USA. Electronic address:

Background: Childhood-onset and adult-onset asthma differ with respect to severity and comorbidities. Whether they also differ with respect to genetic risk factors has not been previously investigated in large samples. The goals of this study were to identify shared and distinct genetic risk loci for childhood-onset and adult-onset asthma, and to identify the genes that might mediate the effects of associated variation.

Methods: We did genome-wide and transcriptome-wide studies, using data from the UK Biobank, in individuals with asthma, including adults with childhood-onset asthma (onset before 12 years of age), adults with adult-onset asthma (onset between 26 and 65 years of age), and adults without asthma (controls; aged older than 38 years). We did genome-wide association studies (GWAS) for childhood-onset asthma and adult-onset asthma each compared with shared controls, and for age of asthma onset in all asthma cases, with a genome-wide significance threshold of p<5 × 10. Enrichment studies determined the tissues in which genes at GWAS loci were most highly expressed, and PrediXcan, a transcriptome-wide gene-based test, was used to identify candidate risk genes.

Findings: Of 376 358 British white individuals from the UK Biobank, we included 37 846 with self-reports of doctor-diagnosed asthma: 9433 adults with childhood-onset asthma; 21 564 adults with adult-onset asthma; and an additional 6849 young adults with asthma with onset between 12 and 25 years of age. For the first and second GWAS analyses, 318 237 individuals older than 38 years without asthma were used as controls. We detected 61 independent asthma loci: 23 were childhood-onset specific, one was adult-onset specific, and 37 were shared. 19 loci were associated with age of asthma onset. The most significant asthma-associated locus was at 17q12 (odds ratio 1·406, 95% CI 1·365-1·448; p=1·45 × 10) in the childhood-onset GWAS. Genes at the childhood onset-specific loci were most highly expressed in skin, blood, and small intestine; genes at the adult onset-specific loci were most highly expressed in lung, blood, small intestine, and spleen. PrediXcan identified 113 unique candidate genes at 22 of the 61 GWAS loci. Single-nucleotide polymorphism-based heritability estimates were more than three times larger for childhood-onset asthma (0·327) than for adult-onset disease (0·098). The onset of disease in childhood was associated with additional genes with relatively large effect sizes, with the largest odds ratio observed at the FLG locus at 1q21.3 (1·970, 95% CI 1·823-2·129).

Interpretation: Genetic risk factors for adult-onset asthma are largely a subset of the genetic risk for childhood-onset asthma but with overall smaller effects, suggesting a greater role for non-genetic risk factors in adult-onset asthma. Combined with gene expression and tissue enrichment patterns, we suggest that the establishment of disease in children is driven more by dysregulated allergy and epithelial barrier function genes, whereas the cause of adult-onset asthma is more lung-centred and environmentally determined, but with immune-mediated mechanisms driving disease progression in both children and adults.

Funding: US National Institutes of Health.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/S2213-2600(19)30055-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6534440PMC
June 2019

Imputed gene associations identify replicable trans-acting genes enriched in transcription pathways and complex traits.

Genet Epidemiol 2019 09 4;43(6):596-608. Epub 2019 Apr 4.

Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois.

Regulation of gene expression is an important mechanism through which genetic variation can affect complex traits. A substantial portion of gene expression variation can be explained by both local (cis) and distal (trans) genetic variation. Much progress has been made in uncovering cis-acting expression quantitative trait loci (cis-eQTL), but trans-eQTL have been more difficult to identify and replicate. Here we take advantage of our ability to predict the cis component of gene expression coupled with gene mapping methods such as PrediXcan to identify high confidence candidate trans-acting genes and their targets. That is, we correlate the cis component of gene expression with observed expression of genes in different chromosomes. Leveraging the shared cis-acting regulation across tissues, we combine the evidence of association across all available Genotype-Tissue Expression Project tissues and find 2,356 trans-acting/target gene pairs with high mappability scores. Reassuringly, trans-acting genes are enriched in transcription and nucleic acid binding pathways and target genes are enriched in known transcription factor binding sites. Interestingly, trans-acting genes are more significantly associated with selected complex traits and diseases than target or background genes, consistent with percolating trans effects. Our scripts and summary statistics are publicly available for future studies of trans-acting gene regulation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.22205DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687523PMC
September 2019

Opportunities and challenges for transcriptome-wide association studies.

Nat Genet 2019 04 29;51(4):592-599. Epub 2019 Mar 29.

Department of Computer Science, Stanford University, Stanford, CA, USA.

Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and gene expression datasets to identify gene-trait associations. In this Perspective, we explore properties of TWAS as a potential approach to prioritize causal genes at GWAS loci, by using simulations and case studies of literature-curated candidate causal genes for schizophrenia, low-density-lipoprotein cholesterol and Crohn's disease. We explore risk loci where TWAS accurately prioritizes the likely causal gene as well as loci where TWAS prioritizes multiple genes, some likely to be non-causal, owing to sharing of expression quantitative trait loci (eQTL). TWAS is especially prone to spurious prioritization with expression data from non-trait-related tissues or cell types, owing to substantial cross-cell-type variation in expression levels and eQTL strengths. Nonetheless, TWAS prioritizes candidate causal genes more accurately than simple baselines. We suggest best practices for causal-gene prioritization with TWAS and discuss future opportunities for improvement. Our results showcase the strengths and limitations of using eQTL datasets to determine causal genes at GWAS loci.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-019-0385-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6777347PMC
April 2019

Gene expression imputation across multiple brain regions provides insights into schizophrenia risk.

Nat Genet 2019 04 25;51(4):659-674. Epub 2019 Mar 25.

Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA.

Transcriptomic imputation approaches combine eQTL reference panels with large-scale genotype data in order to test associations between disease and gene expression. These genic associations could elucidate signals in complex genome-wide association study (GWAS) loci and may disentangle the role of different tissues in disease development. We used the largest eQTL reference panel for the dorso-lateral prefrontal cortex (DLPFC) to create a set of gene expression predictors and demonstrate their utility. We applied DLPFC and 12 GTEx-brain predictors to 40,299 schizophrenia cases and 65,264 matched controls for a large transcriptomic imputation study of schizophrenia. We identified 413 genic associations across 13 brain regions. Stepwise conditioning identified 67 non-MHC genes, of which 14 did not fall within previous GWAS loci. We identified 36 significantly enriched pathways, including hexosaminidase-A deficiency, and multiple porphyric disorder pathways. We investigated developmental expression patterns among the 67 non-MHC genes and identified specific groups of pre- and postnatal expression.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-019-0364-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7034316PMC
April 2019

Integrating predicted transcriptome from multiple tissues improves association detection.

PLoS Genet 2019 01 22;15(1):e1007889. Epub 2019 Jan 22.

Section of Genetic Medicine, The University of Chicago, Chicago, Illinois, United States of America.

Integration of genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is needed to improve our understanding of the biological mechanisms underlying GWAS hits, and our ability to identify therapeutic targets. Gene-level association methods such as PrediXcan can prioritize candidate targets. However, limited eQTL sample sizes and absence of relevant developmental and disease context restrict our ability to detect associations. Here we propose an efficient statistical method (MultiXcan) that leverages the substantial sharing of eQTLs across tissues and contexts to improve our ability to identify potential target genes. MultiXcan integrates evidence across multiple panels using multivariate regression, which naturally takes into account the correlation structure. We apply our method to simulated and real traits from the UK Biobank and show that, in realistic settings, we can detect a larger set of significantly associated genes than using each panel separately. To improve applicability, we developed a summary result-based extension called S-MultiXcan, which we show yields highly concordant results with the individual level version when LD is well matched. Our multivariate model-based approach allowed us to use the individual level results as a gold standard to calibrate and develop a robust implementation of the summary-based extension. Results from our analysis as well as software and necessary resources to apply our method are publicly available.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1007889DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6358100PMC
January 2019

Functionally oriented analysis of cardiometabolic traits in a trans-ethnic sample.

Hum Mol Genet 2019 04;28(7):1212-1224

Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.

Interpretation of genetic association results is difficult because signals often lack biological context. To generate hypotheses of the functional genetic etiology of complex cardiometabolic traits, we estimated the genetically determined component of gene expression from common variants using PrediXcan (1) and determined genes with differential predicted expression by trait. PrediXcan imputes tissue-specific expression levels from genetic variation using variant-level effect on gene expression in transcriptome data. To explore the value of imputed genetically regulated gene expression (GReX) models across different ancestral populations, we evaluated imputed expression levels for predictive accuracy genome-wide in RNA sequence data in samples drawn from European-ancestry and African-ancestry populations and identified substantial predictive power using European-derived models in a non-European target population. We then tested the association of GReX on 15 cardiometabolic traits including blood lipid levels, body mass index, height, blood pressure, fasting glucose and insulin, RR interval, fibrinogen level, factor VII level and white blood cell and platelet counts in 15 755 individuals across three ancestry groups, resulting in 20 novel gene-phenotype associations reaching experiment-wide significance across ancestries. In addition, we identified 18 significant novel gene-phenotype associations in our ancestry-specific analyses. Top associations were assessed for additional support via query of S-PrediXcan (2) results derived from publicly available genome-wide association studies summary data. Collectively, these findings illustrate the utility of transcriptome-based imputation models for discovery of cardiometabolic effect genes in a diverse dataset.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddy435DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6423424PMC
April 2019

ukbREST: efficient and streamlined data access for reproducible research in large biobanks.

Bioinformatics 2019 06;35(11):1971-1973

Department of Medicine, Section of Genetic Medicine, The University of Chicago, Chicago, IL, USA.

Summary: Large biobanks, such as UK Biobank with half a million participants, are changing the scale and availability of genotypic and phenotypic data for researchers to ask fundamental questions about the biology of health and disease. The breadth of the UK Biobank data is enabling discoveries at an unprecedented pace. However, this size and complexity pose new challenges to investigators who need to keep the accruing data up to date, comply with potential consent changes, and efficiently and reproducibly extract subsets of the data to answer specific scientific questions. Here we propose a tool called ukbREST designed for the UK Biobank study (easily extensible to other biobanks), which allows authorized users to efficiently retrieve phenotypic and genetic data. It exposes a REST API that makes data highly accessible inside a private and secure network, allowing the data specification in a human readable text format easily shareable with other researchers. These characteristics make ukbREST an important tool to make biobank's valuable data more readily accessible to the research community and facilitate reproducibility of the analysis, a key aspect of science.

Availability And Implementation: It is implemented in Python using the Flask-RESTful framework for the API, and it is under the MIT license. It works with PostgreSQL and a Docker image is available for easy deployment. The source code and documentation is available in Github: https://github.com/hakyimlab/ukbrest.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty925DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546122PMC
June 2019

Genetic architecture of gene expression traits across diverse populations.

PLoS Genet 2018 08 10;14(8):e1007586. Epub 2018 Aug 10.

Department of Biology, Loyola University Chicago, Chicago, Illinois, United States of America.

For many complex traits, gene regulation is likely to play a crucial mechanistic role. How the genetic architectures of complex traits vary between populations and subsequent effects on genetic prediction are not well understood, in part due to the historical paucity of GWAS in populations of non-European ancestry. We used data from the MESA (Multi-Ethnic Study of Atherosclerosis) cohort to characterize the genetic architecture of gene expression within and between diverse populations. Genotype and monocyte gene expression were available in individuals with African American (AFA, n = 233), Hispanic (HIS, n = 352), and European (CAU, n = 578) ancestry. We performed expression quantitative trait loci (eQTL) mapping in each population and show genetic correlation of gene expression depends on shared ancestry proportions. Using elastic net modeling with cross validation to optimize genotypic predictors of gene expression in each population, we show the genetic architecture of gene expression for most predictable genes is sparse. We found the best predicted gene in each population, TACSTD2 in AFA and CHURC1 in CAU and HIS, had similar prediction performance across populations with R2 > 0.8 in each population. However, we identified a subset of genes that are well-predicted in one population, but poorly predicted in another. We show these differences in predictive performance are due to allele frequency differences between populations. Using genotype weights trained in MESA to predict gene expression in independent populations showed that a training set with ancestry similar to the test set is better at predicting gene expression in test populations, demonstrating an urgent need for diverse population sampling in genomics. Our predictive models and performance statistics in diverse cohorts are made publicly available for use in transcriptome mapping methods at https://github.com/WheelerLab/DivPop.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1007586DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6105030PMC
August 2018

Multiethnic meta-analysis identifies ancestry-specific and cross-ancestry loci for pulmonary function.

Nat Commun 2018 07 30;9(1):2976. Epub 2018 Jul 30.

University of California Los Angeles, Los Angeles, CA, 90095, USA.

Nearly 100 loci have been identified for pulmonary function, almost exclusively in studies of European ancestry populations. We extend previous research by meta-analyzing genome-wide association studies of 1000 Genomes imputed variants in relation to pulmonary function in a multiethnic population of 90,715 individuals of European (N = 60,552), African (N = 8429), Asian (N = 9959), and Hispanic/Latino (N = 11,775) ethnicities. We identify over 50 additional loci at genome-wide significance in ancestry-specific or multiethnic meta-analyses. Using recent fine-mapping methods incorporating functional annotation, gene expression, and differences in linkage disequilibrium between ethnicities, we further shed light on potential causal variants and genes at known and newly identified loci. Several of the novel genes encode proteins with predicted or established drug targets, including KCNK2 and CDK12. Our study highlights the utility of multiethnic and integrative genomics approaches to extend existing knowledge of the genetics of lung function and clinical relevance of implicated loci.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-05369-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6065313PMC
July 2018

A Transcriptome-Wide Association Study Among 97,898 Women to Identify Candidate Susceptibility Genes for Epithelial Ovarian Cancer Risk.

Cancer Res 2018 09 27;78(18):5419-5430. Epub 2018 Jul 27.

Department of Clinical Genetics, Fox Chase Cancer Center, Philadelphia, Pennsylvania.

Large-scale genome-wide association studies (GWAS) have identified approximately 35 loci associated with epithelial ovarian cancer (EOC) risk. The majority of GWAS-identified disease susceptibility variants are located in noncoding regions, and causal genes underlying these associations remain largely unknown. Here, we performed a transcriptome-wide association study to search for novel genetic loci and plausible causal genes at known GWAS loci. We used RNA sequencing data (68 normal ovarian tissue samples from 68 individuals and 6,124 cross-tissue samples from 369 individuals) and high-density genotyping data from European descendants of the Genotype-Tissue Expression (GTEx V6) project to build ovarian and cross-tissue models of genetically regulated expression using elastic net methods. We evaluated 17,121 genes for their -predicted gene expression in relation to EOC risk using summary statistics data from GWAS of 97,898 women, including 29,396 EOC cases. With a Bonferroni-corrected significance level of < 2.2 × 10, we identified 35 genes, including at 11q14.2 (Z = 5.08, = 3.83 × 10, the cross-tissue model; 1 Mb away from any GWAS-identified EOC risk variant), a potential novel locus for EOC risk. All other 34 significantly associated genes were located within 1 Mb of known GWAS-identified loci, including 23 genes at 6 loci not previously linked to EOC risk. Upon conditioning on nearby known EOC GWAS-identified variants, the associations for 31 genes disappeared and three genes remained ( < 1.47 × 10). These data identify one novel locus ) and 34 genes at 13 known EOC risk loci associated with EOC risk, providing new insights into EOC carcinogenesis. Transcriptomic analysis of a large cohort confirms earlier GWAS loci and reveals FZD4 as a novel locus associated with EOC risk. .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1158/0008-5472.CAN-18-0951DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6139053PMC
September 2018

Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics.

Nat Commun 2018 05 8;9(1):1825. Epub 2018 May 8.

Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA.

Scalable, integrative methods to understand mechanisms that link genetic variants with phenotypes are needed. Here we derive a mathematical expression to compute PrediXcan (a gene mapping approach) results using summary data (S-PrediXcan) and show its accuracy and general robustness to misspecified reference sets. We apply this framework to 44 GTEx tissues and 100+ phenotypes from GWAS and meta-analysis studies, creating a growing public catalog of associations that seeks to capture the effects of gene expression variation on human phenotypes. Replication in an independent cohort is shown. Most of the associations are tissue specific, suggesting context specificity of the trait etiology. Colocalized significant associations in unexpected tissues underscore the need for an agnostic scanning of multiple contexts to improve our ability to detect causal regulatory mechanisms. Monogenic disease genes are enriched among significant associations for related traits, suggesting that smaller alterations of these genes may cause a spectrum of milder phenotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-03621-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5940825PMC
May 2018

Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.

Sci Data 2017 12 19;4:170179. Epub 2017 Dec 19.

Wellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK.

To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.179DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5735917PMC
December 2017
-->