Publications by authors named "Po-Ru Loh"

75 Publications

Protein-coding repeat polymorphisms strongly shape diverse human phenotypes.

Science 2021 09 23;373(6562):1499-1505. Epub 2021 Sep 23.

Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.

[Figure: see text].
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.abg8289DOI Listing
September 2021

GIGYF1 loss of function is associated with clonal mosaicism and adverse metabolic health.

Nat Commun 2021 07 7;12(1):4178. Epub 2021 Jul 7.

MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, UK.

Mosaic loss of chromosome Y (LOY) in leukocytes is the most common form of clonal mosaicism, caused by dysregulation in cell-cycle and DNA damage response pathways. Previous genetic studies have focussed on identifying common variants associated with LOY, which we now extend to rarer, protein-coding variation using exome sequences from 82,277 male UK Biobank participants. We find that loss of function of two genes-CHEK2 and GIGYF1-reach exome-wide significance. Rare alleles in GIGYF1 have not previously been implicated in any complex trait, but here loss-of-function carriers exhibit six-fold higher susceptibility to LOY (OR = 5.99 [3.04-11.81], p = 1.3 × 10). These same alleles are also associated with adverse metabolic health, including higher susceptibility to Type 2 Diabetes (OR = 6.10 [3.51-10.61], p = 1.8 × 10), 4 kg higher fat mass (p = 1.3 × 10), 2.32 nmol/L lower serum IGF1 levels (p = 1.5 × 10) and 4.5 kg lower handgrip strength (p = 4.7 × 10) consistent with proposed GIGYF1 enhancement of insulin and IGF-1 receptor signalling. These associations are mirrored by a common variant nearby associated with the expression of GIGYF1. Our observations highlight a potential direct connection between clonal mosaicism and metabolic health.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-24504-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8263756PMC
July 2021

Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses.

Nat Genet 2021 08 5;53(8):1260-1269. Epub 2021 Jul 5.

Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.

Exome association studies to date have generally been underpowered to systematically evaluate the phenotypic impact of very rare coding variants. We leveraged extensive haplotype sharing between 49,960 exome-sequenced UK Biobank participants and the remainder of the cohort (total n ≈ 500,000) to impute exome-wide variants with accuracy R > 0.5 down to minor allele frequency (MAF) ~0.00005. Association and fine-mapping analyses of 54 quantitative traits identified 1,189 significant associations (P < 5 × 10) involving 675 distinct rare protein-altering variants (MAF < 0.01) that passed stringent filters for likely causality. Across all traits, 49% of associations (578/1,189) occurred in genes with two or more hits; follow-up analyses of these genes identified allelic series containing up to 45 distinct 'likely-causal' variants. Our results demonstrate the utility of within-cohort imputation in population-scale genome-wide association studies, provide a catalog of likely-causal, large-effect coding variant associations and foreshadow the insights that will be revealed as genetic biobank studies continue to grow.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-021-00892-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8349845PMC
August 2021

Hematopoietic mosaic chromosomal alterations increase the risk for diverse types of infection.

Nat Med 2021 06 7;27(6):1012-1024. Epub 2021 Jun 7.

Institute for Molecular Medicine Finland, Helsinki, Finland.

Age is the dominant risk factor for infectious diseases, but the mechanisms linking age to infectious disease risk are incompletely understood. Age-related mosaic chromosomal alterations (mCAs) detected from genotyping of blood-derived DNA, are structural somatic variants indicative of clonal hematopoiesis, and are associated with aberrant leukocyte cell counts, hematological malignancy, and mortality. Here, we show that mCAs predispose to diverse types of infections. We analyzed mCAs from 768,762 individuals without hematological cancer at the time of DNA acquisition across five biobanks. Expanded autosomal mCAs were associated with diverse incident infections (hazard ratio (HR) 1.25; 95% confidence interval (CI) = 1.15-1.36; P = 1.8 × 10), including sepsis (HR 2.68; 95% CI = 2.25-3.19; P = 3.1 × 10), pneumonia (HR 1.76; 95% CI = 1.53-2.03; P = 2.3 × 10), digestive system infections (HR 1.51; 95% CI = 1.32-1.73; P = 2.2 × 10) and genitourinary infections (HR 1.25; 95% CI = 1.11-1.41; P = 3.7 × 10). A genome-wide association study of expanded mCAs identified 63 loci, which were enriched at transcriptional regulatory sites for immune cells. These results suggest that mCAs are a marker of impaired immunity and confer increased predisposition to infections.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41591-021-01371-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8245201PMC
June 2021

A model and test for coordinated polygenic epistasis in complex traits.

Proc Natl Acad Sci U S A 2021 Apr;118(15)

Department of Neurology, University of California Los Angeles, Los Angeles, CA 90095;

Interactions between genetic variants-epistasis-is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work, we develop a model for structured polygenic epistasis, called coordinated epistasis (CE), and prove that several recent theories of genetic architecture fall under the formal umbrella of CE. Unlike standard epistasis models that assume epistasis and main effects are independent, CE captures systematic correlations between epistasis and main effects that result from pathway-level epistasis, on balance skewing the penetrance of genetic effects. To test for the existence of CE, we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CE in 18 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue-trait pairs. Overall, CE is a dimension of genetic architecture that can capture structured, systemic forms of epistasis in complex human traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1922305118DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8053945PMC
April 2021

Estimating the effective sample size in association studies of quantitative traits.

G3 (Bethesda) 2021 Mar 18. Epub 2021 Mar 18.

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

The effective sample size (ESS) is a metric used to summarize in a single term the amount of correlation in a sample. It is of particular interest when predicting the statistical power of genome-wide association studies (GWAS) based on linear mixed models. Here, we introduce an analytical form of the ESS for mixed-model GWAS of quantitative traits and relate it to empirical estimators recently proposed. Using our framework, we derived approximations of the ESS for analyses of related and unrelated samples and for both marginal genetic and gene-environment interaction tests. We conducted simulations to validate our approximations and to provide a quantitative perspective on the statistical power of various scenarios, including power loss due to family relatedness and power gains due to conditioning on the polygenic signal. Our analyses also demonstrate that the power of gene-environment interaction GWAS in related individuals strongly depends on the family structure and exposure distribution. Finally, we performed a series of mixed-model GWAS on data from the UK Biobank and confirmed the simulation results. We notably found that the expected power drop due to family relatedness in the UK Biobank is negligible.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/g3journal/jkab057DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8495748PMC
March 2021

Protein-coding repeat polymorphisms strongly shape diverse human phenotypes.

bioRxiv 2021 Jan 19. Epub 2021 Jan 19.

Hundreds of the proteins encoded in human genomes contain domains that vary in size or copy number due to variable numbers of tandem repeats (VNTRs) in protein-coding exons. VNTRs have eluded analysis by the molecular methods-SNP arrays and high-throughput sequencing-used in large-scale human genetic studies to date; thus, the relationships of VNTRs to most human phenotypes are unknown. We developed ways to estimate VNTR lengths from whole-exome sequencing data, identify the SNP haplotypes on which VNTR alleles reside, and use imputation to project these haplotypes into abundant SNP data. We analyzed 118 protein-altering VNTRs in 415,280 UK Biobank participants for association with 791 phenotypes. Analysis revealed some of the strongest associations of common variants with human phenotypes including height, hair morphology, and biomarkers of human health; for example, a VNTR encoding 13-44 copies of a 19-amino-acid repeat in the chondroitin sulfate domain of aggrecan (ACAN) associated with height variation of 3.4 centimeters (s.e. 0.3 cm). Incorporating large-effect VNTRs into analysis also made it possible to map many additional effects at the same loci: for the blood biomarker lipoprotein(a), for example, analysis of the kringle IV-2 VNTR within the LPA gene revealed that 18 coding SNPs and the VNTR in LPA explained 90% of lipoprotein(a) heritability in Europeans, enabling insights about population differences and epidemiological significance of this clinical biomarker. These results point to strong, cryptic effects of highly polymorphic common structural variants that have largely eluded molecular analyses to date.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2021.01.19.427332DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7836119PMC
January 2021

Large mosaic copy number variations confer autism risk.

Nat Neurosci 2021 02 11;24(2):197-203. Epub 2021 Jan 11.

Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.

Although germline de novo copy number variants (CNVs) are known causes of autism spectrum disorder (ASD), the contribution of mosaic (early-developmental) copy number variants (mCNVs) has not been explored. In this study, we assessed the contribution of mCNVs to ASD by ascertaining mCNVs in genotype array intensity data from 12,077 probands with ASD and 5,500 unaffected siblings. We detected 46 mCNVs in probands and 19 mCNVs in siblings, affecting 2.8-73.8% of cells. Probands carried a significant burden of large (>4-Mb) mCNVs, which were detected in 25 probands but only one sibling (odds ratio = 11.4, 95% confidence interval = 1.5-84.2, P = 7.4 × 10). Event size positively correlated with severity of ASD symptoms (P = 0.016). Surprisingly, we did not observe mosaic analogues of the short de novo CNVs recurrently observed in ASD (eg, 16p11.2). We further experimentally validated two mCNVs in postmortem brain tissue from 59 additional probands. These results indicate that mCNVs contribute a previously unexplained component of ASD risk.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41593-020-00766-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7854495PMC
February 2021

Hematopoietic mosaic chromosomal alterations and risk for infection among 767,891 individuals without blood cancer.

medRxiv 2020 Nov 16. Epub 2020 Nov 16.

Age is the dominant risk factor for infectious diseases, but the mechanisms linking the two are incompletely understood . Age-related mosaic chromosomal alterations (mCAs) detected from blood-derived DNA genotyping, are structural somatic variants associated with aberrant leukocyte cell counts, hematological malignancy, and mortality . Whether mCAs represent independent risk factors for infection is unknown. Here we use genome-wide genotyping of blood DNA to show that mCAs predispose to diverse infectious diseases. We analyzed mCAs from 767,891 individuals without hematological cancer at DNA acquisition across four countries. Expanded mCA (cell fraction >10%) prevalence approached 4% by 60 years of age and was associated with diverse incident infections, including sepsis, pneumonia, and coronavirus disease 2019 (COVID-19) hospitalization. A genome-wide association study of expanded mCAs identified 63 significant loci. Germline genetic alleles associated with expanded mCAs were enriched at transcriptional regulatory sites for immune cells. Our results link mCAs with impaired immunity and predisposition to infections. Furthermore, these findings may also have important implications for the ongoing COVID-19 pandemic, particularly in prioritizing individual preventive strategies and evaluating immunization responses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2020.11.12.20230821DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7685330PMC
November 2020

Hematopoietic mosaic chromosomal alterations and risk for infection among 767,891 individuals without blood cancer.

Res Sq 2020 Nov 16. Epub 2020 Nov 16.

Age is the dominant risk factor for infectious diseases, but the mechanisms linking the two are incompletely understood1,2. Age-related mosaic chromosomal alterations (mCAs) detected from blood-derived DNA genotyping, are structural somatic variants associated with aberrant leukocyte cell counts, hematological malignancy, and mortality3-11. Whether mCAs represent independent risk factors for infection is unknown. Here we use genome-wide genotyping of blood DNA to show that mCAs predispose to diverse infectious diseases. We analyzed mCAs from 767,891 individuals without hematological cancer at DNA acquisition across four countries. Expanded mCA (cell fraction >10%) prevalence approached 4% by 60 years of age and was associated with diverse incident infections, including sepsis, pneumonia, and coronavirus disease 2019 (COVID-19) hospitalization. A genome-wide association study of expanded mCAs identified 63 significant loci. Germline genetic alleles associated with expanded mCAs were enriched at transcriptional regulatory sites for immune cells. Our results link mCAs with impaired immunity and predisposition to infections. Furthermore, these findings may also have important implications for the ongoing COVID-19 pandemic, particularly in prioritizing individual preventive strategies and evaluating immunization responses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.21203/rs.3.rs-100817/v1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7685327PMC
November 2020

Genetically predicted telomere length is associated with clonal somatic copy number alterations in peripheral leukocytes.

PLoS Genet 2020 10 22;16(10):e1009078. Epub 2020 Oct 22.

Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, United States of America.

Telomeres are DNA-protein structures at the ends of chromosomes essential in maintaining chromosomal stability. Observational studies have identified associations between telomeres and elevated cancer risk, including hematologic malignancies; but biologic mechanisms relating telomere length to cancer etiology remain unclear. Our study sought to better understand the relationship between telomere length and cancer risk by evaluating genetically-predicted telomere length (gTL) in relation to the presence of clonal somatic copy number alterations (SCNAs) in peripheral blood leukocytes. Genotyping array data were acquired from 431,507 participants in the UK Biobank and used to detect SCNAs from intensity information and infer telomere length using a polygenic risk score (PRS) of variants previously associated with leukocyte telomere length. In total, 15,236 (3.5%) of individuals had a detectable clonal SCNA on an autosomal chromosome. Overall, higher gTL value was positively associated with the presence of an autosomal SCNA (OR = 1.07, 95% CI = 1.05-1.09, P = 1.61×10-15). There was high consistency in effect estimates across strata of chromosomal event location (e.g., telomeric ends, interstitial or whole chromosome event; Phet = 0.37) and strata of copy number state (e.g., gain, loss, or neutral events; Phet = 0.05). Higher gTL value was associated with a greater cellular fraction of clones carrying autosomal SCNAs (β = 0.004, 95% CI = 0.002-0.007, P = 6.61×10-4). Our population-based examination of gTL and SCNAs suggests inherited components of telomere length do not preferentially impact autosomal SCNA event location or copy number status, but rather likely influence cellular replicative potential.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1009078DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7608979PMC
October 2020

Chromosomal alterations among age-related haematopoietic clones in Japan.

Nature 2020 08 24;584(7819):130-135. Epub 2020 Jun 24.

Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.

The extent to which the biology of oncogenesis and ageing are shaped by factors that distinguish human populations is unknown. Haematopoietic clones with acquired mutations become common with advancing age and can lead to blood cancers. Here we describe shared and population-specific patterns of genomic mutations and clonal selection in haematopoietic cells on the basis of 33,250 autosomal mosaic chromosomal alterations that we detected in 179,417 Japanese participants in the BioBank Japan cohort and compared with analogous data from the UK Biobank. In this long-lived Japanese population, mosaic chromosomal alterations were detected in more than 35.0% (s.e.m., 1.4%) of individuals older than 90 years, which suggests that such clones trend towards inevitability with advancing age. Japanese and European individuals exhibited key differences in the genomic locations of mutations in their respective haematopoietic clones; these differences predicted the relative rates of chronic lymphocytic leukaemia (which is more common among European individuals) and T cell leukaemia (which is more common among Japanese individuals) in these populations. Three different mutational precursors of chronic lymphocytic leukaemia (including trisomy 12, loss of chromosomes 13q and 13q, and copy-neutral loss of heterozygosity) were between two and six times less common among Japanese individuals, which suggests that the Japanese and European populations differ in selective pressures on clones long before the development of clinically apparent chronic lymphocytic leukaemia. Japanese and British populations also exhibited very different rates of clones that arose from B and T cell lineages, which predicted the relative rates of B and T cell cancers in these populations. We identified six previously undescribed loci at which inherited variants predispose to mosaic chromosomal alterations that duplicate or remove the inherited risk alleles, including large-effect rare variants at NBN, MRE11 and CTU2 (odds ratio, 28-91). We suggest that selective pressures on clones are modulated by factors that are specific to human populations. Further genomic characterization of clonal selection and cancer in populations from around the world is therefore warranted.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2426-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7489641PMC
August 2020

Monogenic and polygenic inheritance become instruments for clonal selection.

Nature 2020 08 24;584(7819):136-141. Epub 2020 Jun 24.

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Clonally expanded blood cells that contain somatic mutations (clonal haematopoiesis) are commonly acquired with age and increase the risk of blood cancer. The blood clones identified so far contain diverse large-scale mosaic chromosomal alterations (deletions, duplications and copy-neutral loss of heterozygosity (CN-LOH)) on all chromosomes, but the sources of selective advantage that drive the expansion of most clones remain unknown. Here, to identify genes, mutations and biological processes that give selective advantage to mutant clones, we analysed genotyping data from the blood-derived DNA of 482,789 participants from the UK Biobank. We identified 19,632 autosomal mosaic chromosomal alterations and analysed these for relationships to inherited genetic variation. We found 52 inherited, rare, large-effect coding or splice variants in 7 genes that were associated with greatly increased vulnerability to clonal haematopoiesis with specific acquired CN-LOH mutations. Acquired mutations systematically replaced the inherited risk alleles (at MPL) or duplicated them to the homologous chromosome (at FH, NBN, MRE11, ATM, SH2B3 and TM2D3). Three of the genes (MRE11, NBN and ATM) encode components of the MRN-ATM pathway, which limits cell division after DNA damage and telomere attrition; another two (MPL and SH2B3) encode proteins that regulate the self-renewal of stem cells. In addition, we found that CN-LOH mutations across the genome tended to cause chromosomal segments with alleles that promote the expansion of haematopoietic cells to replace their homologous (allelic) counterparts, increasing polygenic drive for blood-cell proliferation traits. Readily acquired mutations that replace chromosomal segments with their homologous counterparts seem to interact with pervasive inherited variation to create a challenge for lifelong cytopoiesis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2430-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7415571PMC
August 2020

Liability threshold modeling of case-control status and family history of disease increases association power.

Nat Genet 2020 05 20;52(5):541-547. Epub 2020 Apr 20.

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Family history of disease can provide valuable information in case-control association studies, but it is currently unclear how to best combine case-control status and family history of disease. We developed an association method based on posterior mean genetic liabilities under a liability threshold model, conditional on case-control status and family history (LT-FH). Analyzing 12 diseases from the UK Biobank (average N = 350,000) we compared LT-FH to genome-wide association without using family history (GWAS) and a previous proxy-based method incorporating family history (GWAX). LT-FH was 63% (standard error (s.e.) 6%) more powerful than GWAS and 36% (s.e. 4%) more powerful than the trait-specific maximum of GWAS and GWAX, based on the number of independent genome-wide-significant loci across all diseases (for example, 690 loci for LT-FH versus 423 for GWAS); relative improvements were similar when applying BOLT-LMM to GWAS, GWAX and LT-FH phenotypes. Thus, LT-FH greatly increases association power when family history of disease is available.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-020-0613-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7210076PMC
May 2020

Genetic predisposition to mosaic Y chromosome loss in blood.

Nature 2019 11 20;575(7784):652-657. Epub 2019 Nov 20.

Genetics of Complex Traits, University of Exeter Medical School, University of Exeter, Exeter, UK.

Mosaic loss of chromosome Y (LOY) in circulating white blood cells is the most common form of clonal mosaicism, yet our knowledge of the causes and consequences of this is limited. Here, using a computational approach, we estimate that 20% of the male population represented in the UK Biobank study (n = 205,011) has detectable LOY. We identify 156 autosomal genetic determinants of LOY, which we replicate in 757,114 men of European and Japanese ancestry. These loci highlight genes that are involved in cell-cycle regulation and cancer susceptibility, as well as somatic drivers of tumour growth and targets of cancer therapy. We demonstrate that genetic susceptibility to LOY is associated with non-haematological effects on health in both men and women, which supports the hypothesis that clonal haematopoiesis is a biomarker of genomic instability in other tissues. Single-cell RNA sequencing identifies dysregulated expression of autosomal genes in leukocytes with LOY and provides insights into why clonal expansion of these cells may occur. Collectively, these data highlight the value of studying clonal mosaicism to uncover fundamental mechanisms that underlie cancer and other ageing-related diseases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-019-1765-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6887549PMC
November 2019

Fast, sensitive and accurate integration of single-cell data with Harmony.

Nat Methods 2019 12 18;16(12):1289-1296. Epub 2019 Nov 18.

Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA.

The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony (https://github.com/immunogenomics/harmony), an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of ~10 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-019-0619-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6884693PMC
December 2019

GWAS of mosaic loss of chromosome Y highlights genetic effects on blood cell differentiation.

Nat Commun 2019 10 17;10(1):4719. Epub 2019 Oct 17.

Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, 230-0045, Japan.

Mosaic loss of chromosome Y (mLOY) is frequently observed in the leukocytes of ageing men. However, the genetic architecture and biological mechanisms underlying mLOY are not fully understood. In a cohort of 95,380 Japanese men, we identify 50 independent genetic markers in 46 loci associated with mLOY at a genome-wide significant level, 35 of which are unreported. Lead markers overlap enhancer marks in hematopoietic stem cells (HSCs, P ≤ 1.0 × 10). mLOY genome-wide association study signals exhibit polygenic architecture and demonstrate strong heritability enrichment in regions surrounding genes specifically expressed in multipotent progenitor (MPP) cells and HSCs (P ≤ 3.5 × 10). ChIP-seq data demonstrate that binding sites of FLI1, a fate-determining factor promoting HSC differentiation into platelets rather than red blood cells (RBCs), show a strong heritability enrichment (P = 1.5 × 10). Consistent with these findings, platelet and RBC counts are positively and negatively associated with mLOY, respectively. Collectively, our observations improve our understanding of the mechanisms underlying mLOY.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-019-12705-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6797717PMC
October 2019

Annotations capturing cell type-specific TF binding explain a large fraction of disease heritability.

Hum Mol Genet 2020 05;29(7):1057-1067

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA.

Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10-14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10-11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddz226DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206853PMC
May 2020

Author Correction: Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection.

Nat Genet 2019 Aug;51(8):1295

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

In the version of the paper initially published, information on competing interests for author Benjamin M. Neale was missing. The 'Competing interests' statement should have included the sentence 'B.M.N. is on the Scientific Advisory Board of Deep Genomics'.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-019-0468-xDOI Listing
August 2019

Genes with High Network Connectivity Are Enriched for Disease Heritability.

Am J Hum Genet 2019 05;104(5):896-913

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA. Electronic address:

Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set of pathway, network, and pathway+network annotations and applied stratified LD score regression to 42 diseases and complex traits (average N = 323K) to identify enriched annotations. First, we analyzed 18,119 biological pathways. We identified 156 pathway-trait pairs whose disease enrichment was statistically significant (FDR < 5%) after conditioning on all genes and 75 known functional annotations (from the baseline-LD model), a stringent step that greatly reduced the number of pathways detected; most significant pathway-trait pairs were previously unreported. Next, for each of four published gene networks, we constructed probabilistic annotations based on network connectivity. For each gene network, the network connectivity annotation was strongly significantly enriched. Surprisingly, the enrichments were fully explained by excess overlap between network annotations and regulatory annotations from the baseline-LD model, validating the informativeness of the baseline-LD model and emphasizing the importance of accounting for regulatory annotations in gene network analyses. Finally, for each of the 156 enriched pathway-trait pairs, for each of the four gene networks, we constructed pathway+network annotations by annotating genes with high network connectivity to the input pathway. For each gene network, these pathway+network annotations were strongly significantly enriched for the corresponding traits. Once again, the enrichments were largely explained by the baseline-LD model. In conclusion, gene network connectivity is highly informative for disease architectures, but the information in gene networks may be subsumed by regulatory annotations, emphasizing the importance of accounting for known annotations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2019.03.020DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6506868PMC
May 2019

Maternal and fetal genetic effects on birth weight and their relevance to cardio-metabolic risk factors.

Nat Genet 2019 05 1;51(5):804-814. Epub 2019 May 1.

Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK.

Birth weight variation is influenced by fetal and maternal genetic and non-genetic factors, and has been reproducibly associated with future cardio-metabolic health outcomes. In expanded genome-wide association analyses of own birth weight (n = 321,223) and offspring birth weight (n = 230,069 mothers), we identified 190 independent association signals (129 of which are novel). We used structural equation modeling to decompose the contributions of direct fetal and indirect maternal genetic effects, then applied Mendelian randomization to illuminate causal pathways. For example, both indirect maternal and direct fetal genetic effects drive the observational relationship between lower birth weight and higher later blood pressure: maternal blood pressure-raising alleles reduce offspring birth weight, but only direct fetal effects of these alleles, once inherited, increase later offspring blood pressure. Using maternal birth weight-lowering genotypes to proxy for an adverse intrauterine environment provided no evidence that it causally raises offspring blood pressure, indicating that the inverse birth weight-blood pressure association is attributable to genetic effects, and not to intrauterine programming.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-019-0403-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6522365PMC
May 2019

Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection.

Nat Commun 2019 02 15;10(1):790. Epub 2019 Feb 15.

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA.

Understanding the role of rare variants is important in elucidating the genetic basis of human disease. Negative selection can cause rare variants to have larger per-allele effect sizes than common variants. Here, we develop a method to estimate the minor allele frequency (MAF) dependence of SNP effect sizes. We use a model in which per-allele effect sizes have variance proportional to [p(1 - p)], where p is the MAF and negative values of α imply larger effect sizes for rare variants. We estimate α for 25 UK Biobank diseases and complex traits. All traits produce negative α estimates, with best-fit mean of -0.38 (s.e. 0.02) across traits. Despite larger rare variant effect sizes, rare variants (MAF < 1%) explain less than 10% of total SNP-heritability for most traits analyzed. Using evolutionary modeling and forward simulations, we validate the α model of MAF-dependent trait effects and assess plausible values of relevant evolutionary parameters.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-019-08424-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6377669PMC
February 2019

Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes.

Nat Commun 2019 02 4;10(1):569. Epub 2019 Feb 4.

Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.

We introduce cross-trait penalized regression (CTPR), a powerful and practical approach for multi-trait polygenic risk prediction in large cohorts. Specifically, we propose a novel cross-trait penalty function with the Lasso and the minimax concave penalty (MCP) to incorporate the shared genetic effects across multiple traits for large-sample GWAS data. Our approach extracts information from the secondary traits that is beneficial for predicting the primary trait based on individual-level genotypes and/or summary statistics. Our novel implementation of a parallel computing algorithm makes it feasible to apply our method to biobank-scale GWAS data. We illustrate our method using large-scale GWAS data (~1M SNPs) from the UK Biobank (N = 456,837). We show that our multi-trait method outperforms the recently proposed multi-trait analysis of GWAS (MTAG) for predictive performance. The prediction accuracy for height by the aid of BMI improves from R = 35.8% (MTAG) to 42.5% (MCP + CTPR) or 42.8% (Lasso + CTPR) with UK Biobank data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-019-08535-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6361917PMC
February 2019

Leveraging Polygenic Functional Enrichment to Improve GWAS Power.

Am J Hum Genet 2019 01 27;104(1):65-75. Epub 2018 Dec 27.

Department of Epidemiology. Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA; Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA. Electronic address:

Functional genomics data has the potential to increase GWAS power by identifying SNPs that have a higher prior probability of association. Here, we introduce a method that leverages polygenic functional enrichment to incorporate coding, conserved, regulatory, and LD-related genomic annotations into association analyses. We show via simulations with real genotypes that the method, functionally informed novel discovery of risk loci (FINDOR), correctly controls the false-positive rate at null loci and attains a 9%-38% increase in the number of independent associations detected at causal loci, depending on trait polygenicity and sample size. We applied FINDOR to 27 independent complex traits and diseases from the interim UK Biobank release (average N = 130K). Averaged across traits, we attained a 13% increase in genome-wide significant loci detected (including a 20% increase for disease traits) compared to unweighted raw p values that do not use functional data. We replicated the additional loci in independent UK Biobank and non-UK Biobank data, yielding a highly statistically significant replication slope (0.66-0.69) in each case. Finally, we applied FINDOR to the full UK Biobank release (average N = 416K), attaining smaller relative improvements (consistent with simulations) but larger absolute improvements, detecting an additional 583 GWAS loci. In conclusion, leveraging functional enrichment using our method robustly increases GWAS power.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2018.11.008DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323418PMC
January 2019

Estimating cross-population genetic correlations of causal effect sizes.

Genet Epidemiol 2019 03 25;43(2):180-188. Epub 2018 Nov 25.

Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts.

Recent studies have examined the genetic correlations of single-nucleotide polymorphism (SNP) effect sizes across pairs of populations to better understand the genetic architectures of complex traits. These studies have estimated ρ g , the cross-population correlation of joint-fit effect sizes at genotyped SNPs. However, the value of ρ g depends both on the cross-population correlation of true causal effect sizes ( ρ b ) and on the similarity in linkage disequilibrium (LD) patterns in the two populations, which drive tagging effects. Here, we derive the value of the ratio ρ g / ρ b as a function of LD in each population. By applying existing methods to obtain estimates of ρ g , we can use this ratio to estimate ρ b . Our estimates of ρ b were equal to 0.55 ( SE = 0.14) between Europeans and East Asians averaged across nine traits in the Genetic Epidemiology Research on Adult Health and Aging data set, 0.54 ( SE = 0.18) between Europeans and South Asians averaged across 13 traits in the UK Biobank data set, and 0.48 ( SE = 0.06) and 0.65 ( SE = 0.09) between Europeans and East Asians in summary statistic data sets for type 2 diabetes and rheumatoid arthritis, respectively. These results implicate substantially different causal genetic architectures across continental populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.22173DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6375794PMC
March 2019

Author Correction: A genome-wide cross-trait analysis from UK Biobank highlights the shared genetic architecture of asthma and allergic diseases.

Nat Genet 2018 12;50(12):1753

Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

In the version of this article originally published, there were two errors in the text of the second paragraph of the Methods section. In the sentence "To identify genetic variants that contribute to doctor-diagnosed asthma and allergic diseases (detailed phenotype information described in the Supplementary Note) and link them with other conditions, we performed GWASs using phenotype measures in UK Biobank participants (N = 487,409)" the number of participants should have been 150,509. In the sentence "Thus, a total of 110,361 European descendants with high-quality genotyping and complete phenotype/covariate data were used for these analyses, including 25,685 allergic diseases subjects (hay fever/allergic rhinitis or eczema, without doctor-diagnosed asthma), 14,085 asthma subjects and 76,768 controls for the analysis" the phrase "without doctor-diagnosed asthma" should have read "some with doctor-diagnosed asthma." The errors have been corrected in the HTML and PDF versions of the article.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0284-8DOI Listing
December 2018

Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations.

Nat Genet 2018 11 8;50(11):1600-1607. Epub 2018 Oct 8.

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Common variant heritability has been widely reported to be concentrated in variants within cell-type-specific non-coding functional annotations, but little is known about low-frequency variant functional architectures. We partitioned the heritability of both low-frequency (0.5%≤ minor allele frequency <5%) and common (minor allele frequency ≥5%) variants in 40 UK Biobank traits across a broad set of functional annotations. We determined that non-synonymous coding variants explain 17 ± 1% of low-frequency variant heritability ([Formula: see text]) versus 2.1 ± 0.2% of common variant heritability ([Formula: see text]). Cell-type-specific non-coding annotations that were significantly enriched for [Formula: see text] of corresponding traits were similarly enriched for [Formula: see text] for most traits, but more enriched for brain-related annotations and traits. For example, H3K4me3 marks in brain dorsolateral prefrontal cortex explain 57 ± 12% of [Formula: see text] versus 12 ± 2% of [Formula: see text] for neuroticism. Forward simulations confirmed that low-frequency variant enrichment depends on the mean selection coefficient of causal variants in the annotation, and can be used to predict effect size variance of causal rare variants (minor allele frequency <0.5%).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0231-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6236676PMC
November 2018

Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk.

Nat Genet 2018 10 3;50(10):1483-1493. Epub 2018 Sep 3.

Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Biological interpretation of genome-wide association study data frequently involves assessing whether SNPs linked to a biological process, for example, binding of a transcription factor, show unsigned enrichment for disease signal. However, signed annotations quantifying whether each SNP allele promotes or hinders the biological process can enable stronger statements about disease mechanism. We introduce a method, signed linkage disequilibrium profile regression, for detecting genome-wide directional effects of signed functional annotations on disease risk. We validate the method via simulations and application to molecular quantitative trait loci in blood, recovering known transcriptional regulators. We apply the method to expression quantitative trait loci in 48 Genotype-Tissue Expression tissues, identifying 651 transcription factor-tissue associations including 30 with robust evidence of tissue specificity. We apply the method to 46 diseases and complex traits (average n = 290 K), identifying 77 annotation-trait associations representing 12 independent transcription factor-trait associations, and characterize the underlying transcriptional programs using gene-set enrichment analyses. Our results implicate new causal disease genes and new disease mechanisms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0196-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6202062PMC
October 2018

Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations.

Nature 2018 07 11;559(7714):350-355. Epub 2018 Jul 11.

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

The selective pressures that shape clonal evolution in healthy individuals are largely unknown. Here we investigate 8,342 mosaic chromosomal alterations, from 50 kb to 249 Mb long, that we uncovered in blood-derived DNA from 151,202 UK Biobank participants using phase-based computational techniques (estimated false discovery rate, 6-9%). We found six loci at which inherited variants associated strongly with the acquisition of deletions or loss of heterozygosity in cis. At three such loci (MPL, TM2D3-TARSL2, and FRA10B), we identified a likely causal variant that acted with high penetrance (5-50%). Inherited alleles at one locus appeared to affect the probability of somatic mutation, and at three other loci to be objects of positive or negative clonal selection. Several specific mosaic chromosomal alterations were strongly associated with future haematological malignancies. Our results reveal a multitude of paths towards clonal expansions with a wide range of effects on human health.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-018-0321-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6054542PMC
July 2018
-->