Publications by authors named "Kasper Lage"

66 Publications

Endothelial ARHGEF26 is an angiogenic factor promoting VEGF signaling.

Cardiovasc Res 2021 Nov 26. Epub 2021 Nov 26.

Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Aims: Genetic studies have implicated the ARHGEF26 locus in the risk of coronary artery disease (CAD). However, the causal pathways by which DNA variants at the ARHGEF26 locus confer risk for CAD are incompletely understood. We sought to elucidate the mechanism responsible for the enhanced risk of CAD associated with the ARHGEF26 locus.

Methods And Results: In a conditional analysis of the ARHGEF26 locus, we show that the sentinel CAD-risk signal is significantly associated with various non-lipid vascular phenotypes. In human endothelial cell (EC), ARHGEF26 promotes the angiogenic capacity, and interacts with known angiogenic factors and pathways. Quantitative mass spectrometry showed that one CAD-risk coding variant, rs12493885 (p.Val29Leu), resulted in a gain-of-function ARHGEF26 that enhances proangiogenic signaling and displays enhanced interactions with several proteins partially related to the angiogenic pathway. ARHGEF26 is required for endothelial angiogenesis by promoting macropinocytosis of VEGFR2 on cell membrane and is crucial to VEGF-dependent murine vessel sprouting ex vivo. In vivo, global or tissue-specific deletion of ARHGEF26 in EC, but not in vascular smooth muscle cells, significantly reduced atherosclerosis in mice, with enhanced plaque stability.

Conclusions: Our results demonstrate that ARHGEF26 is an angiogenic factor, and that DNA variants within ARHGEF26 that are associated with CAD risk could affect angiogenic pathways by promoting VEGF signaling.

Translational Perspective: Understanding the genetic architecture of coronary artery disease (CAD) is critical to developing new therapeutics. Here, we present work that revealed the causal mechanism by which DNA variants at the ARHGEF26 locus confer risk for CAD. Angiogenesis-related vascular phenotypes are associated with the ARHGEF26 locus, and ARHGEF26 promotes the angiogenic capacity of human endothelial cells.Together, our work demonstrates that ARHGEF26 is a novel angiogenic factor, and endothelial-specific inhibition of ARHGEF26 may be beneficial to treating CAD. The causal pathway and the actionable therapeutic hypotheses from our work will facilitate the development of new therapeutics for CAD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/cvr/cvab344DOI Listing
November 2021

is mutated in clonal hematopoiesis and myelodysplastic syndromes and impacts RNA splicing.

Blood Cancer Discov 2021 Sep 14;2(5):500-517. Epub 2021 Jul 14.

The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York.

Clonal hematopoiesis results from somatic mutations in cancer driver genes in hematopoietic stem cells. We sought to identify novel drivers of clonal expansion using an unbiased analysis of sequencing data from 84,683 persons and identified common mutations in the 5-methylcytosine reader, , as well as in , , and . We also identified these mutations at low frequency in myelodysplastic syndrome patients. edited mouse hematopoietic stem and progenitor cells exhibited a competitive advantage and increased genome-wide intron retention. mutations potentially link DNA methylation and RNA splicing, the two most commonly mutated pathways in clonal hematopoiesis and MDS.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1158/2643-3230.BCD-20-0224DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8462124PMC
September 2021

Translating polygenic risk scores for clinical use by estimating the confidence bounds of risk prediction.

Nat Commun 2021 09 6;12(1):5276. Epub 2021 Sep 6.

Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark.

A promise of genomics in precision medicine is to provide individualized genetic risk predictions. Polygenic risk scores (PRS), computed by aggregating effects from many genomic variants, have been developed as a useful tool in complex disease research. However, the application of PRS as a tool for predicting an individual's disease susceptibility in a clinical setting is challenging because PRS typically provide a relative measure of risk evaluated at the level of a group of people but not at individual level. Here, we introduce a machine-learning technique, Mondrian Cross-Conformal Prediction (MCCP), to estimate the confidence bounds of PRS-to-disease-risk prediction. MCCP can report disease status conditional probability value for each individual and give a prediction at a desired error level. Moreover, with a user-defined prediction error rate, MCCP can estimate the proportion of sample (coverage) with a correct prediction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-25014-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8421428PMC
September 2021

Coexpression network architecture reveals the brain-wide and multiregional basis of disease susceptibility.

Nat Neurosci 2021 09 22;24(9):1313-1323. Epub 2021 Jul 22.

Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.

Gene networks have yielded numerous neurobiological insights, yet an integrated view across brain regions is lacking. We leverage RNA sequencing in 864 samples representing 12 brain regions to robustly identify 12 brain-wide, 50 cross-regional and 114 region-specific coexpression modules. Nearly 40% of genes fall into brain-wide modules, while 25% comprise region-specific modules reflecting regional biology, such as oxytocin signaling in the hypothalamus, or addiction pathways in the nucleus accumbens. Schizophrenia and autism genetic risk are enriched in brain-wide and multiregional modules, indicative of broad impact; these modules implicate neuronal proliferation and activity-dependent processes, including endocytosis and splicing, in disease pathophysiology. We find that cell-type-specific long noncoding RNA and gene isoforms contribute substantially to regional synaptic diversity and that constrained, mutation-intolerant genes are primarily enriched in neurons. We leverage these data using an omnigenic-inspired network framework to characterize how coexpression and gene regulatory networks reflect neuropsychiatric disease risk, supporting polygenic models.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41593-021-00887-5DOI Listing
September 2021

Genoppi is an open-source software for robust and standardized integration of proteomic and genetic data.

Nat Commun 2021 05 10;12(1):2580. Epub 2021 May 10.

Stanley Center at Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Combining genetic and cell-type-specific proteomic datasets can generate biological insights and therapeutic hypotheses, but a technical and statistical framework for such analyses is lacking. Here, we present an open-source computational tool called Genoppi (lagelab.org/genoppi) that enables robust, standardized, and intuitive integration of quantitative proteomic results with genetic data. We use Genoppi to analyze 16 cell-type-specific protein interaction datasets of four proteins (BCL2, TDP-43, MDM2, PTEN) involved in cancer and neurological disease. Through systematic quality control of the data and integration with published protein interactions, we show a general pattern of both cell-type-independent and cell-type-specific interactions across three cancer cell types and one human iPSC-derived neuronal cell type. Furthermore, through the integration of proteomic and genetic datasets in Genoppi, our results suggest that the neuron-specific interactions of these proteins are mediating their genetic involvement in neurodegenerative diseases. Importantly, our analyses suggest that human iPSC-derived neurons are a relevant model system for studying the involvement of BCL2 and TDP-43 in amyotrophic lateral sclerosis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-22648-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8110583PMC
May 2021

Targeting acute myeloid leukemia dependency on VCP-mediated DNA repair through a selective second-generation small-molecule inhibitor.

Sci Transl Med 2021 03;13(587)

INSERM U1186, Gustave-Roussy Cancer Center, Université Paris-Saclay, 94805 Villejuif, France.

The development and survival of cancer cells require adaptive mechanisms to stress. Such adaptations can confer intrinsic vulnerabilities, enabling the selective targeting of cancer cells. Through a pooled in vivo short hairpin RNA (shRNA) screen, we identified the adenosine triphosphatase associated with diverse cellular activities (AAA-ATPase) valosin-containing protein (VCP) as a top stress-related vulnerability in acute myeloid leukemia (AML). We established that AML was the most responsive disease to chemical inhibition of VCP across a panel of 16 cancer types. The sensitivity to VCP inhibition of human AML cell lines, primary patient samples, and syngeneic and xenograft mouse models of AML was validated using -directed shRNAs, overexpression of a dominant-negative VCP mutant, and chemical inhibition. By combining mass spectrometry-based analysis of the VCP interactome and phospho-signaling studies, we determined that VCP is important for ataxia telangiectasia mutated (ATM) kinase activation and subsequent DNA repair through homologous recombination in AML. A second-generation VCP inhibitor, CB-5339, was then developed and characterized. Efficacy and safety of CB-5339 were validated in multiple AML models, including syngeneic and patient-derived xenograft murine models. We further demonstrated that combining DNA-damaging agents, such as anthracyclines, with CB-5339 treatment synergizes to impair leukemic growth in an MLL-AF9-driven AML murine model. These studies support the clinical testing of CB-5339 as a single agent or in combination with standard-of-care DNA-damaging chemotherapy for the treatment of AML.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/scitranslmed.abg1168DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8672851PMC
March 2021

Systematic auditing is essential to debiasing machine learning in biology.

Commun Biol 2021 02 10;4(1):183. Epub 2021 Feb 10.

Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Biases in data used to train machine learning (ML) models can inflate their prediction performance and confound our understanding of how and what they learn. Although biases are common in biological data, systematic auditing of ML models to identify and eliminate these biases is not a common practice when applying ML in the life sciences. Here we devise a systematic, principled, and general approach to audit ML models in the life sciences. We use this auditing framework to examine biases in three ML applications of therapeutic interest and identify unrecognized biases that hinder the ML process and result in substantially reduced model performance on new datasets. Ultimately, we show that ML models tend to learn primarily from data biases when there is insufficient signal in the data to learn from. We provide detailed protocols, guidelines, and examples of code to enable tailoring of the auditing framework to other biomedical applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-021-01674-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7876113PMC
February 2021

Leadership.

Cell Syst 2021 01;12(1):1-4

We asked group leaders how they foster mutually reinforcing research productivity and psychological safety in their teams.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cels.2020.12.004DOI Listing
January 2021

Cohesin mutations alter DNA damage repair and chromatin structure and create therapeutic vulnerabilities in MDS/AML.

JCI Insight 2021 02 8;6(3). Epub 2021 Feb 8.

Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA.

The cohesin complex plays an essential role in chromosome maintenance and transcriptional regulation. Recurrent somatic mutations in the cohesin complex are frequent genetic drivers in cancer, including myelodysplastic syndromes (MDS) and acute myeloid leukemia (AML). Here, using genetic dependency screens of stromal antigen 2-mutant (STAG2-mutant) AML, we identified DNA damage repair and replication as genetic dependencies in cohesin-mutant cells. We demonstrated increased levels of DNA damage and sensitivity of cohesin-mutant cells to poly(ADP-ribose) polymerase (PARP) inhibition. We developed a mouse model of MDS in which Stag2 mutations arose as clonal secondary lesions in the background of clonal hematopoiesis driven by tet methylcytosine dioxygenase 2 (Tet2) mutations and demonstrated selective depletion of cohesin-mutant cells with PARP inhibition in vivo. Finally, we demonstrated a shift from STAG2- to STAG1-containing cohesin complexes in cohesin-mutant cells, which was associated with longer DNA loop extrusion, more intermixing of chromatin compartments, and increased interaction with PARP and replication protein A complex. Our findings inform the biology and therapeutic opportunities for cohesin-mutant malignancies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1172/jci.insight.142149DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7934867PMC
February 2021

Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants.

Proc Natl Acad Sci U S A 2020 11 26;117(45):28201-28211. Epub 2020 Oct 26.

Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142;

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.2002660117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7668189PMC
November 2020

Prediction of cancer driver genes through network-based moment propagation of mutation scores.

Bioinformatics 2020 07;36(Suppl_1):i508-i515

Department of Biosystems Science and Engineering, Machine Learning and Computational Biology Lab, ETH Zürich, Basel 4058, Switzerland.

Motivation: Gaining a comprehensive understanding of the genetics underlying cancer development and progression is a central goal of biomedical research. Its accomplishment promises key mechanistic, diagnostic and therapeutic insights. One major step in this direction is the identification of genes that drive the emergence of tumors upon mutation. Recent advances in the field of computational biology have shown the potential of combining genetic summary statistics that represent the mutational burden in genes with biological networks, such as protein-protein interaction networks, to identify cancer driver genes. Those approaches superimpose the summary statistics on the nodes in the network, followed by an unsupervised propagation of the node scores through the network. However, this unsupervised setting does not leverage any knowledge on well-established cancer genes, a potentially valuable resource to improve the identification of novel cancer drivers.

Results: We develop a novel node embedding that enables classification of cancer driver genes in a supervised setting. The embedding combines a representation of the mutation score distribution in a node's local neighborhood with network propagation. We leverage the knowledge of well-established cancer driver genes to define a positive class, resulting in a partially labeled dataset, and develop a cross-validation scheme to enable supervised prediction. The proposed node embedding followed by a supervised classification improves the predictive performance compared with baseline methods and yields a set of promising genes that constitute candidates for further biological validation.

Availability And Implementation: Code available at https://github.com/BorgwardtLab/MoProEmbeddings.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa452DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355253PMC
July 2020

TCF12 haploinsufficiency causes autosomal dominant Kallmann syndrome and reveals network-level interactions between causal loci.

Hum Mol Genet 2020 08;29(14):2435-2450

Center for Human Disease Modeling, Duke University, Durham, NC 27701, USA.

Dysfunction of the gonadotropin-releasing hormone (GnRH) axis causes a range of reproductive phenotypes resulting from defects in the specification, migration and/or function of GnRH neurons. To identify additional molecular components of this system, we initiated a systematic genetic interrogation of families with isolated GnRH deficiency (IGD). Here, we report 13 families (12 autosomal dominant and one autosomal recessive) with an anosmic form of IGD (Kallmann syndrome) with loss-of-function mutations in TCF12, a locus also known to cause syndromic and non-syndromic craniosynostosis. We show that loss of tcf12 in zebrafish larvae perturbs GnRH neuronal patterning with concomitant attenuation of the orthologous expression of tcf3a/b, encoding a binding partner of TCF12, and stub1, a gene that is both mutated in other syndromic forms of IGD and maps to a TCF12 affinity network. Finally, we report that restored STUB1 mRNA rescues loss of tcf12 in vivo. Our data extend the mutational landscape of IGD, highlight the genetic links between craniofacial patterning and GnRH dysfunction and begin to assemble the functional network that regulates the development of the GnRH axis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddaa120DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7608740PMC
August 2020

BraInMap Elucidates the Macromolecular Connectivity Landscape of Mammalian Brain.

Cell Syst 2020 04;10(4):333-350.e14

Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Broad Institute of Massachusetts Institute of Technology and Harvard University, Boston, MA, USA.

Connectivity webs mediate the unique biology of the mammalian brain. Yet, while cell circuit maps are increasingly available, knowledge of their underlying molecular networks remains limited. Here, we applied multi-dimensional biochemical fractionation with mass spectrometry and machine learning to survey endogenous macromolecules across the adult mouse brain. We defined a global "interactome" comprising over one thousand multi-protein complexes. These include hundreds of brain-selective assemblies that have distinct physical and functional attributes, show regional and cell-type specificity, and have links to core neurological processes and disorders. Using reciprocal pull-downs and a transgenic model, we validated a putative 28-member RNA-binding protein complex associated with amyotrophic lateral sclerosis, suggesting a coordinated function in alternative splicing in disease progression. This brain interaction map (BraInMap) resource facilitates mechanistic exploration of the unique molecular machinery driving core cellular processes of the central nervous system. It is publicly available and can be explored here https://www.bu.edu/dbin/cnsb/mousebrain/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cels.2020.03.003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7938770PMC
April 2020

Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology.

Nat Commun 2019 09 6;10(1):4064. Epub 2019 Sep 6.

Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA.

Population-based biobanks with genomic and dense phenotype data provide opportunities for generating effective therapeutic hypotheses and understanding the genomic role in disease predisposition. To characterize latent components of genetic associations, we apply truncated singular value decomposition (DeGAs) to matrices of summary statistics derived from genome-wide association analyses across 2,138 phenotypes measured in 337,199 White British individuals in the UK Biobank study. We systematically identify key components of genetic associations and the contributions of variants, genes, and phenotypes to each component. As an illustration of the utility of the approach to inform downstream experiments, we report putative loss of function variants, rs114285050 (GPR151) and rs150090666 (PDE3B), that substantially contribute to obesity-related traits and experimentally demonstrate the role of these genes in adipocyte biology. Our approach to dissect components of genetic associations across the human phenome will accelerate biomedical hypothesis generation by providing insights on previously unexplored latent structures.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-019-11953-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6731283PMC
September 2019

Assessment of network module identification across complex diseases.

Nat Methods 2019 09 30;16(9):843-852. Epub 2019 Aug 30.

Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.

Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-019-0509-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6719725PMC
September 2019

Single-cell sequencing of neonatal uterus reveals an Misr2+ endometrial progenitor indispensable for fertility.

Elife 2019 06 24;8. Epub 2019 Jun 24.

Pediatric Surgical Research Laboratories, Massachusetts General Hospital, Boston, United States.

The Mullerian ducts are the anlagen of the female reproductive tract, which regress in the male fetus in response to MIS. This process is driven by subluminal mesenchymal cells expressing Misr2, which trigger the regression of the adjacent Mullerian ductal epithelium. In females, these Misr2+ cells are retained, yet their contribution to the development of the uterus remains unknown. Here, we report that subluminal Misr2+ cells persist postnatally in the uterus of rodents, but recede by week 37 of gestation in humans. Using single-cell RNA sequencing, we demonstrate that ectopic postnatal MIS administration inhibits these cells and prevents the formation of endometrial stroma in rodents, suggesting a progenitor function. Exposure to MIS during the first six days of life, by inhibiting specification of the stroma, dysregulates paracrine signals necessary for uterine development, eventually resulting in apoptosis of the Misr2+ cells, uterine hypoplasia, and complete infertility in the adult female.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.46349DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6650247PMC
June 2019

Developing a network view of type 2 diabetes risk pathways through integration of genetic, genomic and functional data.

Genome Med 2019 03 26;11(1):19. Epub 2019 Mar 26.

Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.

Background: Genome-wide association studies (GWAS) have identified several hundred susceptibility loci for type 2 diabetes (T2D). One critical, but unresolved, issue concerns the extent to which the mechanisms through which these diverse signals influencing T2D predisposition converge on a limited set of biological processes. However, the causal variants identified by GWAS mostly fall into a non-coding sequence, complicating the task of defining the effector transcripts through which they operate.

Methods: Here, we describe implementation of an analytical pipeline to address this question. First, we integrate multiple sources of genetic, genomic and biological data to assign positional candidacy scores to the genes that map to T2D GWAS signals. Second, we introduce genes with high scores as seeds within a network optimization algorithm (the asymmetric prize-collecting Steiner tree approach) which uses external, experimentally confirmed protein-protein interaction (PPI) data to generate high-confidence sub-networks. Third, we use GWAS data to test the T2D association enrichment of the "non-seed" proteins introduced into the network, as a measure of the overall functional connectivity of the network.

Results: We find (a) non-seed proteins in the T2D protein-interaction network so generated (comprising 705 nodes) are enriched for association to T2D (p = 0.0014) but not control traits, (b) stronger T2D-enrichment for islets than other tissues when we use RNA expression data to generate tissue-specific PPI networks and (c) enhanced enrichment (p = 3.9 × 10) when we combine the analysis of the islet-specific PPI network with a focus on the subset of T2D GWAS loci which act through defective insulin secretion.

Conclusions: These analyses reveal a pattern of non-random functional connectivity between candidate causal genes at T2D GWAS loci and highlight the products of genes including YWHAG, SMAD4 or CDK2 as potential contributors to T2D-relevant islet dysfunction. The approach we describe can be applied to other complex genetic and genomic datasets, facilitating integration of diverse data types into disease-associated networks.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13073-019-0628-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6436236PMC
March 2019

Genome-scale analysis identifies paralog lethality as a vulnerability of chromosome 1p loss in cancer.

Nat Genet 2018 07 28;50(7):937-943. Epub 2018 Jun 28.

Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.

Functional redundancy shared by paralog genes may afford protection against genetic perturbations, but it can also result in genetic vulnerabilities due to mutual interdependency. Here, we surveyed genome-scale short hairpin RNA and CRISPR screening data on hundreds of cancer cell lines and identified MAGOH and MAGOHB, core members of the splicing-dependent exon junction complex, as top-ranked paralog dependencies. MAGOHB is the top gene dependency in cells with hemizygous MAGOH deletion, a pervasive genetic event that frequently occurs due to chromosome 1p loss. Inhibition of MAGOHB in a MAGOH-deleted context compromises viability by globally perturbing alternative splicing and RNA surveillance. Dependency on IPO13, an importin-β receptor that mediates nuclear import of the MAGOH/B-Y14 heterodimer, is highly correlated with dependency on both MAGOH and MAGOHB. Both MAGOHB and IPO13 represent dependencies in murine xenografts with hemizygous MAGOH deletion. Our results identify MAGOH and MAGOHB as reciprocal paralog dependencies across cancer types and suggest a rationale for targeting the MAGOHB-IPO13 axis in cancers with chromosome 1p deletion.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0155-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6143899PMC
July 2018

GeNets: a unified web platform for network-based genomic analyses.

Nat Methods 2018 07 18;15(7):543-546. Epub 2018 Jun 18.

Department of Surgery, Massachusetts General Hospital, Boston, MA, USA.

Functional genomics networks are widely used to identify unexpected pathway relationships in large genomic datasets. However, it is challenging to compare the signal-to-noise ratios of different networks and to identify the optimal network with which to interpret a particular genetic dataset. We present GeNets, a platform in which users can train a machine-learning model (Quack) to carry out these comparisons and execute, store, and share analyses of genetic and RNA-sequencing datasets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-018-0039-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6450090PMC
July 2018

Integrated Bayesian analysis of rare exonic variants to identify risk genes for schizophrenia and neurodevelopmental disorders.

Genome Med 2017 Dec 20;9(1):114. Epub 2017 Dec 20.

Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA.

Background: Integrating rare variation from trio family and case-control studies has successfully implicated specific genes contributing to risk of neurodevelopmental disorders (NDDs) including autism spectrum disorders (ASD), intellectual disability (ID), developmental disorders (DDs), and epilepsy (EPI). For schizophrenia (SCZ), however, while sets of genes have been implicated through the study of rare variation, only two risk genes have been identified.

Methods: We used hierarchical Bayesian modeling of rare-variant genetic architecture to estimate mean effect sizes and risk-gene proportions, analyzing the largest available collection of whole exome sequence data for SCZ (1,077 trios, 6,699 cases, and 13,028 controls), and data for four NDDs (ASD, ID, DD, and EPI; total 10,792 trios, and 4,058 cases and controls).

Results: For SCZ, we estimate there are 1,551 risk genes. There are more risk genes and they have weaker effects than for NDDs. We provide power analyses to predict the number of risk-gene discoveries as more data become available. We confirm and augment prior risk gene and gene set enrichment results for SCZ and NDDs. In particular, we detected 98 new DD risk genes at FDR < 0.05. Correlations of risk-gene posterior probabilities are high across four NDDs (ρ>0.55), but low between SCZ and the NDDs (ρ<0.3). An in-depth analysis of 288 NDD genes shows there is highly significant protein-protein interaction (PPI) network connectivity, and functionally distinct PPI subnetworks based on pathway enrichment, single-cell RNA-seq cell types, and multi-region developmental brain RNA-seq.

Conclusions: We have extended a pipeline used in ASD studies and applied it to infer rare genetic parameters for SCZ and four NDDs ( https://github.com/hoangtn/extTADA ). We find many new DD risk genes, supported by gene set enrichment and PPI network connectivity analyses. We find greater similarity among NDDs than between NDDs and SCZ. NDD gene subnetworks are implicated in postnatally expressed presynaptic and postsynaptic genes, and for transcriptional and post-transcriptional gene regulation in prenatal neural progenitor and stem cells.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13073-017-0497-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5738153PMC
December 2017

NetSig: network-based discovery from cancer genomes.

Nat Methods 2018 01 4;15(1):61-66. Epub 2017 Dec 4.

Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA.

Methods that integrate molecular network information and tumor genome data could complement gene-based statistical tests to identify likely new cancer genes; but such approaches are challenging to validate at scale, and their predictive value remains unclear. We developed a robust statistic (NetSig) that integrates protein interaction networks with data from 4,742 tumor exomes. NetSig can accurately classify known driver genes in 60% of tested tumor types and predicts 62 new driver candidates. Using a quantitative experimental framework to determine in vivo tumorigenic potential in mice, we found that NetSig candidates induce tumors at rates that are comparable to those of known oncogenes and are ten-fold higher than those of random genes. By reanalyzing nine tumor-inducing NetSig candidates in 242 patients with oncogene-negative lung adenocarcinomas, we find that two (AKT2 and TFDP2) are significantly amplified. Our study presents a scalable integrated computational and experimental workflow to expand discovery from cancer genomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.4514DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5985961PMC
January 2018

A scored human protein-protein interaction network to catalyze genomic interpretation.

Nat Methods 2017 01 28;14(1):61-64. Epub 2016 Nov 28.

Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts, USA.

Genome-scale human protein-protein interaction networks are critical to understanding cell biology and interpreting genomic data, but challenging to produce experimentally. Through data integration and quality control, we provide a scored human protein-protein interaction network (InWeb_InBioMap, or InWeb_IM) with severalfold more interactions (>500,000) and better functional biological relevance than comparable resources. We illustrate that InWeb_InBioMap enables functional interpretation of >4,700 cancer genomes and genes involved in autism.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.4083DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5839635PMC
January 2017

Genetic and Proteomic Interrogation of Lower Confidence Candidate Genes Reveals Signaling Networks in β-Catenin-Active Cancers.

Cell Syst 2016 09;3(3):302-316.e4

Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, 450 Brookline Avenue, Boston, MA 02215, USA. Electronic address:

Genome-scale expression studies and comprehensive loss-of-function genetic screens have focused almost exclusively on the highest confidence candidate genes. Here, we describe a strategy for characterizing the lower confidence candidates identified by such approaches. We interrogated 177 genes that we classified as essential for the proliferation of cancer cells exhibiting constitutive β-catenin activity and integrated data for each of the candidates, derived from orthogonal short hairpin RNA (shRNA) knockdown and clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9-mediated gene editing knockout screens, to yield 69 validated genes. We then characterized the relationships between sets of these genes using complementary assays: medium-throughput stable isotope labeling by amino acids in cell culture (SILAC)-based mass spectrometry, yielding 3,639 protein-protein interactions, and a CRISPR-mediated pairwise double knockout screen, yielding 375 combinations exhibiting greater- or lesser-than-additive phenotypic effects indicating genetic interactions. These studies identify previously unreported regulators of β-catenin, define functional networks required for the survival of β-catenin-active cancers, and provide an experimental strategy that may be applied to define other signaling networks.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cels.2016.09.001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5455996PMC
September 2016

52 Genetic Loci Influencing Myocardial Mass.

J Am Coll Cardiol 2016 09;68(13):1435-1448

Department of Medical Genetics, University Medical Center Utrecht, Utrecht, the Netherlands; Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht, the Netherlands.

Background: Myocardial mass is a key determinant of cardiac muscle function and hypertrophy. Myocardial depolarization leading to cardiac muscle contraction is reflected by the amplitude and duration of the QRS complex on the electrocardiogram (ECG). Abnormal QRS amplitude or duration reflect changes in myocardial mass and conduction, and are associated with increased risk of heart failure and death.

Objectives: This meta-analysis sought to gain insights into the genetic determinants of myocardial mass.

Methods: We carried out a genome-wide association meta-analysis of 4 QRS traits in up to 73,518 individuals of European ancestry, followed by extensive biological and functional assessment.

Results: We identified 52 genomic loci, of which 32 are novel, that are reliably associated with 1 or more QRS phenotypes at p < 1 × 10(-8). These loci are enriched in regions of open chromatin, histone modifications, and transcription factor binding, suggesting that they represent regions of the genome that are actively transcribed in the human heart. Pathway analyses provided evidence that these loci play a role in cardiac hypertrophy. We further highlighted 67 candidate genes at the identified loci that are preferentially expressed in cardiac tissue and associated with cardiac abnormalities in Drosophila melanogaster and Mus musculus. We validated the regulatory function of a novel variant in the SCN5A/SCN10A locus in vitro and in vivo.

Conclusions: Taken together, our findings provide new insights into genes and biological pathways controlling myocardial mass and may help identify novel therapeutic targets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jacc.2016.07.729DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5478167PMC
September 2016

CD44 Splice Variant v8-10 as a Marker of Serous Ovarian Cancer Prognosis.

PLoS One 2016 2;11(6):e0156595. Epub 2016 Jun 2.

Pediatric Surgical Research Laboratories, Massachusetts General Hospital, Department of Surgery, Boston, Massachusetts, United States of America.

CD44 is a transmembrane hyaluronic acid receptor gene that encodes over 100 different tissue-specific protein isoforms. The most ubiquitous, CD44 standard, has been used as a cancer stem cell marker in ovarian and other cancers. Expression of the epithelial CD44 variant containing exons v8-10 (CD44v8-10) has been associated with more chemoresistant and metastatic tumors in gastrointestinal and breast cancers, but its role in ovarian cancer is unknown; we therefore investigated its use as a prognostic marker in this disease. The gene expression profiles of 254 tumor samples from The Cancer Genome Atlas RNAseqV2 were analyzed for the presence of CD44 isoforms. A trend for longer survival was observed in patients with high expression of CD44 isoforms that include exons v8-10. Immunohistochemical (IHC) analysis of tumors for presence of CD44v8-10 was performed on an independent cohort of 210 patients with high-grade serous ovarian cancer using a tumor tissue microarray. Patient stratification based on software analysis of staining revealed a statistically significant increase in survival in patients with the highest levels of transmembrane protein expression (top 10 or 20%) compared to those with the lowest expression (bottom 10 and 20%) (p = 0.0181, p = 0.0262 respectively). Expression of CD44v8-10 in primary ovarian cancer cell lines was correlated with a predominantly epithelial phenotype characterized by high expression of epithelial markers and low expression of mesenchymal markers by qPCR, Western blot, and IHC. Conversely, detection of proteolytically cleaved and soluble extracellular domain of CD44v8-10 in patient ascites samples was correlated with significantly worse prognosis (p<0.05). Therefore, presence of transmembrane CD44v8-10 on the surface of primary tumor cells may be a marker of a highly epithelial tumor with better prognosis while enzymatic cleavage of CD44v8-10, as detected by presence of the soluble extracellular domain in ascites fluid, may be indicative of a more metastatic disease and worse prognosis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0156595PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4890777PMC
July 2017

Systematic Functional Interrogation of Rare Cancer Variants Identifies Oncogenic Alleles.

Cancer Discov 2016 07 4;6(7):714-26. Epub 2016 May 4.

Broad Institute of MIT and Harvard, Cambridge, Massachusetts. Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.

Unlabelled: Cancer genome characterization efforts now provide an initial view of the somatic alterations in primary tumors. However, most point mutations occur at low frequency, and the function of these alleles remains undefined. We have developed a scalable systematic approach to interrogate the function of cancer-associated gene variants. We subjected 474 mutant alleles curated from 5,338 tumors to pooled in vivo tumor formation assays and gene expression profiling. We identified 12 transforming alleles, including two in genes (PIK3CB, POT1) that have not been shown to be tumorigenic. One rare KRAS allele, D33E, displayed tumorigenicity and constitutive activation of known RAS effector pathways. By comparing gene expression changes induced upon expression of wild-type and mutant alleles, we inferred the activity of specific alleles. Because alleles found to be mutated only once in 5,338 tumors rendered cells tumorigenic, these observations underscore the value of integrating genomic information with functional studies.

Significance: Experimentally inferring the functional status of cancer-associated mutations facilitates the interpretation of genomic information in cancer. Pooled in vivo screen and gene expression profiling identified functional variants and demonstrated that expression of rare variants induced tumorigenesis. Variant phenotyping through functional studies will facilitate defining key somatic events in cancer. Cancer Discov; 6(7); 714-26. ©2016 AACR.See related commentary by Cho and Collisson, p. 694This article is highlighted in the In This Issue feature, p. 681.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1158/2159-8290.CD-16-0160DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4930723PMC
July 2016

Disproportionate Contributions of Select Genomic Compartments and Cell Types to Genetic Risk for Coronary Artery Disease.

PLoS Genet 2015 Oct 28;11(10):e1005622. Epub 2015 Oct 28.

Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America; Icahn Institute for Genomics and Multiscale Biology, Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America; Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America; Center for Statistical Genetics, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America; Zena and Michael A. Weiner Cardiovascular Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America.

Large genome-wide association studies (GWAS) have identified many genetic loci associated with risk for myocardial infarction (MI) and coronary artery disease (CAD). Concurrently, efforts such as the National Institutes of Health (NIH) Roadmap Epigenomics Project and the Encyclopedia of DNA Elements (ENCODE) Consortium have provided unprecedented data on functional elements of the human genome. In the present study, we systematically investigate the biological link between genetic variants associated with this complex disease and their impacts on gene function. First, we examined the heritability of MI/CAD according to genomic compartments. We observed that single nucleotide polymorphisms (SNPs) residing within nearby regulatory regions show significant polygenicity and contribute between 59-71% of the heritability for MI/CAD. Second, we showed that the polygenicity and heritability explained by these SNPs are enriched in histone modification marks in specific cell types. Third, we found that a statistically higher number of 45 MI/CAD-associated SNPs that have been identified from large-scale GWAS studies reside within certain functional elements of the genome, particularly in active enhancer and promoter regions. Finally, we observed significant heterogeneity of this signal across cell types, with strong signals observed within adipose nuclei, as well as brain and spleen cell types. These results suggest that the genetic etiology of MI/CAD is largely explained by tissue-specific regulatory perturbation within the human genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1005622DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4625039PMC
October 2015

Comprehensive assessment of cancer missense mutation clustering in protein structures.

Proc Natl Acad Sci U S A 2015 Oct 21;112(40):E5486-95. Epub 2015 Sep 21.

Department of Pathology and Cancer Center, Massachusetts General Hospital, Boston, MA 02114; Harvard Medical School, Boston, MA 02115; Broad Institute of MIT and Harvard, Cambridge, MA 02142;

Large-scale tumor sequencing projects enabled the identification of many new cancer gene candidates through computational approaches. Here, we describe a general method to detect cancer genes based on significant 3D clustering of mutations relative to the structure of the encoded protein products. The approach can also be used to search for proteins with an enrichment of mutations at binding interfaces with a protein, nucleic acid, or small molecule partner. We applied this approach to systematically analyze the PanCancer compendium of somatic mutations from 4,742 tumors relative to all known 3D structures of human proteins in the Protein Data Bank. We detected significant 3D clustering of missense mutations in several previously known oncoproteins including HRAS, EGFR, and PIK3CA. Although clustering of missense mutations is often regarded as a hallmark of oncoproteins, we observed that a number of tumor suppressors, including FBXW7, VHL, and STK11, also showed such clustering. Beside these known cases, we also identified significant 3D clustering of missense mutations in NUF2, which encodes a component of the kinetochore, that could affect chromosome segregation and lead to aneuploidy. Analysis of interaction interfaces revealed enrichment of mutations in the interfaces between FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg2+, MAX-DNA, SRSF2-RNA, and others. Together, our results indicate that systematic consideration of 3D structure can assist in the identification of cancer genes and in the understanding of the functional role of their mutations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1516373112DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4603469PMC
October 2015

MATR3 disruption in human and mouse associated with bicuspid aortic valve, aortic coarctation and patent ductus arteriosus.

Hum Mol Genet 2015 Apr 7;24(8):2375-89. Epub 2015 Jan 7.

Division of Genetics, Department of Medicine,

Cardiac left ventricular outflow tract (LVOT) defects represent a common but heterogeneous subset of congenital heart disease for which gene identification has been difficult. We describe a 46,XY,t(1;5)(p36.11;q31.2)dn translocation carrier with pervasive developmental delay who also exhibited LVOT defects, including bicuspid aortic valve (BAV), coarctation of the aorta (CoA) and patent ductus arteriosus (PDA). The 1p breakpoint disrupts the 5' UTR of AHDC1, which encodes AT-hook DNA-binding motif containing-1 protein, and AHDC1-truncating mutations have recently been described in a syndrome that includes developmental delay, but not congenital heart disease [Xia, F., Bainbridge, M.N., Tan, T.Y., Wangler, M.F., Scheuerle, A.E., Zackai, E.H., Harr, M.H., Sutton, V.R., Nalam, R.L., Zhu, W. et al. (2014) De Novo truncating mutations in AHDC1 in individuals with syndromic expressive language delay, hypotonia, and sleep apnea. Am. J. Hum. Genet., 94, 784-789]. On the other hand, the 5q translocation breakpoint disrupts the 3' UTR of MATR3, which encodes the nuclear matrix protein Matrin 3, and mouse Matr3 is strongly expressed in neural crest, developing heart and great vessels, whereas Ahdc1 is not. To further establish MATR3 3' UTR disruption as the cause of the proband's LVOT defects, we prepared a mouse Matr3(Gt-ex13) gene trap allele that disrupted the 3' portion of the gene. Matr3(Gt-ex13) homozygotes are early embryo lethal, but Matr3(Gt-ex13) heterozygotes exhibit incompletely penetrant BAV, CoA and PDA phenotypes similar to those in the human proband, as well as ventricular septal defect (VSD) and double-outlet right ventricle (DORV). Both the human MATR3 translocation breakpoint and the mouse Matr3(Gt-ex13) gene trap insertion disturb the polyadenylation of MATR3 transcripts and alter Matrin 3 protein expression, quantitatively or qualitatively. Thus, subtle perturbations in Matrin 3 expression appear to cause similar LVOT defects in human and mouse.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddv004DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4380077PMC
April 2015
-->