Publications by authors named "Manolis Kellis"

234 Publications

High-throughput 5' UTR engineering for enhanced protein production in non-viral gene therapies.

Nat Commun 2021 07 6;12(1):4138. Epub 2021 Jul 6.

Synthetic Biology Group, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, USA.

Despite significant clinical progress in cell and gene therapies, maximizing protein expression in order to enhance potency remains a major technical challenge. Here, we develop a high-throughput strategy to design, screen, and optimize 5' UTRs that enhance protein expression from a strong human cytomegalovirus (CMV) promoter. We first identify naturally occurring 5' UTRs with high translation efficiencies and use this information with in silico genetic algorithms to generate synthetic 5' UTRs. A total of ~12,000 5' UTRs are then screened using a recombinase-mediated integration strategy that greatly enhances the sensitivity of high-throughput screens by eliminating copy number and position effects that limit lentiviral approaches. Using this approach, we identify three synthetic 5' UTRs that outperform commonly used non-viral gene therapy plasmids in expressing protein payloads. In summary, we demonstrate that high-throughput screening of 5' UTR libraries with recombinase-mediated integration can identify genetic elements that enhance protein expression, which should have numerous applications for engineered cell and gene therapies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-24436-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8260622PMC
July 2021

Genetic drivers of mA methylation in human brain, lung, heart and muscle.

Nat Genet 2021 Jul 1. Epub 2021 Jul 1.

Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.

The most prevalent post-transcriptional mRNA modification, N-methyladenosine (mA), plays diverse RNA-regulatory roles, but its genetic control in human tissues remains uncharted. Here we report 129 transcriptome-wide mA profiles, covering 91 individuals and 4 tissues (brain, lung, muscle and heart) from GTEx/eGTEx. We integrate these with interindividual genetic and expression variation, revealing 8,843 tissue-specific and 469 tissue-shared mA quantitative trait loci (QTLs), which are modestly enriched in, but mostly orthogonal to, expression QTLs. We integrate mA QTLs with disease genetics, identifying 184 GWAS-colocalized mA QTL, including brain mA QTLs underlying neuroticism, depression, schizophrenia and anxiety; lung mA QTLs underlying expiratory flow and asthma; and muscle/heart mA QTLs underlying coronary artery disease. Last, we predict novel mA regulators that show preferential binding in mA QTLs, protein interactions with known mA regulators and expression correlation with the mA levels of their targets. Our results provide important insights and resources for understanding both cis and trans regulation of epitranscriptomic modifications, their interindividual variation and their roles in human disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-021-00890-3DOI Listing
July 2021

NEBULA is a fast negative binomial mixed model for differential or co-expression analysis of large-scale multi-subject single-cell data.

Commun Biol 2021 May 26;4(1):629. Epub 2021 May 26.

Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA.

The increasing availability of single-cell data revolutionizes the understanding of biological mechanisms at cellular resolution. For differential expression analysis in multi-subject single-cell data, negative binomial mixed models account for both subject-level and cell-level overdispersions, but are computationally demanding. Here, we propose an efficient NEgative Binomial mixed model Using a Large-sample Approximation (NEBULA). The speed gain is achieved by analytically solving high-dimensional integrals instead of using the Laplace approximation. We demonstrate that NEBULA is orders of magnitude faster than existing tools and controls false-positive errors in marker gene identification and co-expression analysis. Using NEBULA in Alzheimer's disease cohort data sets, we found that the cell-level expression of APOE correlated with that of other genetic risk factors (including CLU, CST3, TREM2, C1q, and ITM2B) in a cell-type-specific pattern and an isoform-dependent manner in microglia. NEBULA opens up a new avenue for the broad application of mixed models to large-scale multi-subject single-cell data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-021-02146-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8155058PMC
May 2021

SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes.

Nat Commun 2021 05 11;12(1):2642. Epub 2021 May 11.

MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA.

Despite its clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. We use comparative genomics to provide a high-confidence protein-coding gene set, characterize evolutionary constraint, and prioritize functional mutations. We select 44 Sarbecovirus genomes at ideally-suited evolutionary distances, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for ORFs 3a, 6, 7a, 7b, 8, 9b, and a novel alternate-frame gene, ORF3c, whereas ORFs 2b, 3d/3d-2, 3b, 9c, and 10 lack protein-coding signatures or convincing experimental evidence of protein-coding function. Furthermore, we show no other conserved protein-coding genes remain to be discovered. Mutation analysis suggests ORF8 contributes to within-individual fitness but not person-to-person transmission. Cross-strain and within-strain evolutionary pressures agree, except for fewer-than-expected within-strain mutations in nsp3 and S1, and more-than-expected in nucleocapsid, which shows a cluster of mutations in a predicted B-cell epitope, suggesting immune-avoidance selection. Evolutionary histories of residues disrupted by spike-protein substitutions D614G, N501Y, E484K, and K417N/T provide clues about their biology, and we catalog likely-functional co-inherited mutations. Previously reported RNA-modification sites show no enrichment for conservation. Here we report a high-confidence gene set and evolutionary-history annotations providing valuable resources and insights on SARS-CoV-2 biology, mutations, and evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-22905-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8113528PMC
May 2021

Evolution of delayed resistance to immunotherapy in a melanoma responder.

Nat Med 2021 Jun 3;27(6):985-992. Epub 2021 May 3.

Department of Pathology, Harvard Medical School, Brigham and Woman's Hospital, Boston, MA, USA.

Despite initial responses, most melanoma patients develop resistance to immune checkpoint blockade (ICB). To understand the evolution of resistance, we studied 37 tumor samples over 9 years from a patient with metastatic melanoma with complete clinical response to ICB followed by delayed recurrence and death. Phylogenetic analysis revealed co-evolution of seven lineages with multiple convergent, but independent resistance-associated alterations. All recurrent tumors emerged from a lineage characterized by loss of chromosome 15q, with post-treatment clones acquiring additional genomic driver events. Deconvolution of bulk RNA sequencing and highly multiplexed immunofluorescence (t-CyCIF) revealed differences in immune composition among different lineages. Imaging revealed a vasculogenic mimicry phenotype in NGFR tumor cells with high PD-L1 expression in close proximity to immune cells. Rapid autopsy demonstrated two distinct NGFR spatial patterns with high polarity and proximity to immune cells in subcutaneous tumors versus a diffuse spatial pattern in lung tumors, suggesting different roles of this neural-crest-like program in different tumor microenvironments. Broadly, this study establishes a high-resolution map of the evolutionary dynamics of resistance to ICB, characterizes a de-differentiated neural-crest tumor population in melanoma immunotherapy resistance and describes site-specific differences in tumor-immune interactions via longitudinal analysis of a patient with melanoma with an unusual clinical course.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41591-021-01331-8DOI Listing
June 2021

Conflicting and ambiguous names of overlapping ORFs in the SARS-CoV-2 genome: A homology-based resolution.

Virology 2021 06 17;558:145-151. Epub 2021 Mar 17.

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.

At least six small alternative-frame open reading frames (ORFs) overlapping well-characterized SARS-CoV-2 genes have been hypothesized to encode accessory proteins. Researchers have used different names for the same ORF or the same name for different ORFs, resulting in erroneous homological and functional inferences. We propose standard names for these ORFs and their shorter isoforms, developed in consultation with the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. We recommend calling the 39 codon Spike-overlapping ORF ORF2b; the 41, 57, and 22 codon ORF3a-overlapping ORFs ORF3c, ORF3d, and ORF3b; the 33 codon ORF3d isoform ORF3d-2; and the 97 and 73 codon Nucleocapsid-overlapping ORFs ORF9b and ORF9c. Finally, we document conflicting usage of the name ORF3b in 32 studies, and consequent erroneous inferences, stressing the importance of reserving identical names for homologs. We recommend that authors referring to these ORFs provide lengths and coordinates to minimize ambiguity caused by prior usage of alternative names.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.virol.2021.02.013DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7967279PMC
June 2021

disrupts intracellular lipid homeostasis in human iPSC-derived glia.

Sci Transl Med 2021 03;13(583)

Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA.

The allele of the apolipoprotein E gene () has been established as a genetic risk factor for many diseases including cardiovascular diseases and Alzheimer's disease (AD), yet its mechanism of action remains poorly understood. APOE is a lipid transport protein, and the dysregulation of lipids has recently emerged as a key feature of several neurodegenerative diseases including AD. However, it is unclear how APOE4 perturbs the intracellular lipid state. Here, we report that , but not , disrupted the cellular lipidomes of human induced pluripotent stem cell (iPSC)-derived astrocytes generated from fibroblasts of or carriers, and of yeast expressing human isoforms. We combined lipidomics and unbiased genome-wide screens in yeast with functional and genetic characterization to demonstrate that human APOE4 induced altered lipid homeostasis. These changes resulted in increased unsaturation of fatty acids and accumulation of intracellular lipid droplets both in yeast and in -expressing human iPSC-derived astrocytes. We then identified genetic and chemical modulators of this lipid disruption. We showed that supplementation of the culture medium with choline (a soluble phospholipid precursor) restored the cellular lipidome to its basal state in -expressing human iPSC-derived astrocytes and in yeast expressing human Our study illuminates key molecular disruptions in lipid metabolism that may contribute to the disease risk linked to the genotype. Our study suggests that manipulating lipid metabolism could be a therapeutic approach to help alleviate the consequences of carrying the allele.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/scitranslmed.aaz4564DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8218593PMC
March 2021

Exome-wide age-of-onset analysis reveals exonic variants in ERN1 and SPPL2C associated with Alzheimer's disease.

Transl Psychiatry 2021 02 26;11(1):146. Epub 2021 Feb 26.

Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA.

Despite recent discoveries in genome-wide association studies (GWAS) of genomic variants associated with Alzheimer's disease (AD), its underlying biological mechanisms are still elusive. The discovery of novel AD-associated genetic variants, particularly in coding regions and from APOE ε4 non-carriers, is critical for understanding the pathology of AD. In this study, we carried out an exome-wide association analysis of age-of-onset of AD with ~20,000 subjects and placed more emphasis on APOE ε4 non-carriers. Using Cox mixed-effects models, we find that age-of-onset shows a stronger genetic signal than AD case-control status, capturing many known variants with stronger significance, and also revealing new variants. We identified two novel variants, rs56201815, a rare synonymous variant in ERN1, and rs12373123, a common missense variant in SPPL2C in the MAPT region in APOE ε4 non-carriers. Besides, a rare missense variant rs144292455 in TACR3 showed the consistent direction of effect sizes across all studies with a suggestive significant level. In an attempt to unravel their regulatory and biological functions, we found that the minor allele of rs56201815 was associated with lower average FDG uptake across five brain regions in ADNI. Our eQTL analyses based on 6198 gene expression samples from ROSMAP and GTEx revealed that the minor allele of rs56201815 was potentially associated with elevated expression of ERN1, a key gene triggering unfolded protein response (UPR), in multiple brain regions, including the posterior cingulate cortex and nucleus accumbens. Our cell-type-specific eQTL analysis using ~80,000 single nuclei in the prefrontal cortex revealed that the protective minor allele of rs12373123 significantly increased the expression of GRN in microglia, and was associated with MAPT expression in astrocytes. These findings provide novel evidence supporting the hypothesis of the potential involvement of the UPR to ER stress in the pathological pathway of AD, and also give more insights into underlying regulatory mechanisms behind the pleiotropic effects of rs12373123 in multiple degenerative diseases including AD and Parkinson's disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41398-021-01263-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7910483PMC
February 2021

Regulatory genomic circuitry of human disease loci by integrative epigenomics.

Nature 2021 02 3;590(7845):300-307. Epub 2021 Feb 3.

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.

Annotating the molecular basis of human disease remains an unsolved challenge, as 93% of disease loci are non-coding and gene-regulatory annotations are highly incomplete. Here we present EpiMap, a compendium comprising 10,000 epigenomic maps across 800 samples, which we used to define chromatin states, high-resolution enhancers, enhancer modules, upstream regulators and downstream target genes. We used this resource to annotate 30,000 genetic loci that were associated with 540 traits, predicting trait-relevant tissues, putative causal nucleotide variants in enriched tissue enhancers and candidate tissue-specific target genes for each. We partitioned multifactorial traits into tissue-specific contributing factors with distinct functional enrichments and disease comorbidity patterns, and revealed both single-factor monotropic and multifactor pleiotropic loci. Top-scoring loci frequently had multiple predicted driver variants, converging through multiple enhancers with a common target gene, multiple genes in common tissues, or multiple genes and multiple tissues, indicating extensive pleiotropy. Our results demonstrate the importance of dense, rich, high-resolution epigenomic annotations for the investigation of complex traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-03145-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875769PMC
February 2021

Distinct metabolic programs established in the thymus control effector functions of γδ T cell subsets in tumor microenvironments.

Nat Immunol 2021 02 18;22(2):179-192. Epub 2021 Jan 18.

Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

Metabolic programming controls immune cell lineages and functions, but little is known about γδ T cell metabolism. Here, we found that γδ T cell subsets making either interferon-γ (IFN-γ) or interleukin (IL)-17 have intrinsically distinct metabolic requirements. Whereas IFN-γ γδ T cells were almost exclusively dependent on glycolysis, IL-17 γδ T cells strongly engaged oxidative metabolism, with increased mitochondrial mass and activity. These distinct metabolic signatures were surprisingly imprinted early during thymic development and were stably maintained in the periphery and within tumors. Moreover, pro-tumoral IL-17 γδ T cells selectively showed high lipid uptake and intracellular lipid storage and were expanded in obesity and in tumors of obese mice. Conversely, glucose supplementation enhanced the antitumor functions of IFN-γ γδ T cells and reduced tumor growth upon adoptive transfer. These findings have important implications for the differentiation of effector γδ T cells and their manipulation in cancer immunotherapy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41590-020-00848-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7610600PMC
February 2021

Genomic RNA Elements Drive Phase Separation of the SARS-CoV-2 Nucleocapsid.

Mol Cell 2020 12 27;80(6):1078-1091.e6. Epub 2020 Nov 27.

Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA; Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA. Electronic address:

We report that the SARS-CoV-2 nucleocapsid protein (N-protein) undergoes liquid-liquid phase separation (LLPS) with viral RNA. N-protein condenses with specific RNA genomic elements under physiological buffer conditions and condensation is enhanced at human body temperatures (33°C and 37°C) and reduced at room temperature (22°C). RNA sequence and structure in specific genomic regions regulate N-protein condensation while other genomic regions promote condensate dissolution, potentially preventing aggregation of the large genome. At low concentrations, N-protein preferentially crosslinks to specific regions characterized by single-stranded RNA flanked by structured elements and these features specify the location, number, and strength of N-protein binding sites (valency). Liquid-like N-protein condensates form in mammalian cells in a concentration-dependent manner and can be altered by small molecules. Condensation of N-protein is RNA sequence and structure specific, sensitive to human body temperature, and manipulatable with small molecules, and therefore presents a screenable process for identifying antiviral compounds effective against SARS-CoV-2.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.molcel.2020.11.041DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7691212PMC
December 2020

Conserved Epigenetic Regulatory Logic Infers Genes Governing Cell Identity.

Cell Syst 2020 12 4;11(6):625-639.e13. Epub 2020 Dec 4.

Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia. Electronic address:

Determining genes that orchestrate cell differentiation in development and disease remains a fundamental goal of cell biology. This study establishes a genome-wide metric based on the gene-repressive trimethylation of histone H3 at lysine 27 (H3K27me3) across hundreds of diverse cell types to identify genetic regulators of cell differentiation. We introduce a computational method, TRIAGE, which uses discordance between gene-repressive tendency and expression to identify genetic drivers of cell identity. We apply TRIAGE to millions of genome-wide single-cell transcriptomes, diverse omics platforms, and eukaryotic cells and tissue types. Using a wide range of data, we validate the performance of TRIAGE in identifying cell-type-specific regulatory factors across diverse species including human, mouse, boar, bird, fish, and tunicate. Using CRISPR gene editing, we use TRIAGE to experimentally validate RNF220 as a regulator of Ciona cardiopharyngeal development and SIX3 as required for differentiation of endoderm in human pluripotent stem cells. A record of this paper's transparent peer review process is included in the Supplemental Information.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cels.2020.11.001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7781436PMC
December 2020

GENCODE 2021.

Nucleic Acids Res 2021 01;49(D1):D916-D923

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa1087DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778937PMC
January 2021

Plasma-derived extracellular vesicle analysis and deconvolution enable prediction and tracking of melanoma checkpoint blockade outcome.

Sci Adv 2020 Nov 13;6(46). Epub 2020 Nov 13.

Broad Institute of Harvard and MIT, Cambridge, MA, USA.

Immune checkpoint inhibitors (ICIs) show promise, but most patients do not respond. We identify and validate biomarkers from extracellular vesicles (EVs), allowing non-invasive monitoring of tumor- intrinsic and host immune status, as well as a prediction of ICI response. We undertook transcriptomic profiling of plasma-derived EVs and tumors from 50 patients with metastatic melanoma receiving ICI, and validated with an independent EV-only cohort of 30 patients. Plasma-derived EV and tumor transcriptomes correlate. EV profiles reveal drivers of ICI resistance and melanoma progression, exhibit differentially expressed genes/pathways, and correlate with clinical response to ICI. We created a Bayesian probabilistic deconvolution model to estimate contributions from tumor and non-tumor sources, enabling interpretation of differentially expressed genes/pathways. EV RNA-seq mutations also segregated ICI response. EVs serve as a non-invasive biomarker to jointly probe tumor-intrinsic and immune changes to ICI, function as predictive markers of ICI responsiveness, and monitor tumor persistence and immune activation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/sciadv.abb3461DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673759PMC
November 2020

A multiresolution framework to characterize single-cell state landscapes.

Nat Commun 2020 10 26;11(1):5399. Epub 2020 Oct 26.

MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, 02139, USA.

Dissecting the cellular heterogeneity embedded in single-cell transcriptomic data is challenging. Although many methods and approaches exist, identifying cell states and their underlying topology is still a major challenge. Here, we introduce the concept of multiresolution cell-state decomposition as a practical approach to simultaneously capture both fine- and coarse-grain patterns of variability. We implement this concept in ACTIONet, a comprehensive framework that combines archetypal analysis and manifold learning to provide a ready-to-use analytical approach for multiresolution single-cell state characterization. ACTIONet provides a robust, reproducible, and highly interpretable single-cell analysis platform that couples dominant pattern discovery with a corresponding structural representation of the cell state landscape. Using multiple synthetic and real data sets, we demonstrate ACTIONet's superior performance relative to existing alternatives. We use ACTIONet to integrate and annotate cells across three human cortex data sets. Through integrative comparative analysis, we define a consensus vocabulary and a consistent set of gene signatures discriminating against the transcriptomic cell types and subtypes of the human prefrontal cortex.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-18416-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7588427PMC
October 2020

Evidence for secondary-variant genetic burden and non-random distribution across biological modules in a recessive ciliopathy.

Nat Genet 2020 11 12;52(11):1145-1150. Epub 2020 Oct 12.

Center for Human Disease Modeling, Duke University Medical Center, Durham, NC, USA.

The influence of genetic background on driver mutations is well established; however, the mechanisms by which the background interacts with Mendelian loci remain unclear. We performed a systematic secondary-variant burden analysis of two independent cohorts of patients with Bardet-Biedl syndrome (BBS) with known recessive biallelic pathogenic mutations in one of 17 BBS genes for each individual. We observed a significant enrichment of trans-acting rare nonsynonymous secondary variants in patients with BBS compared with either population controls or a cohort of individuals with a non-BBS diagnosis and recessive variants in the same gene set. Strikingly, we found a significant over-representation of secondary alleles in chaperonin-encoding genes-a finding corroborated by the observation of epistatic interactions involving this complex in vivo. These data indicate a complex genetic architecture for BBS that informs the biological properties of disease modules and presents a model for secondary-variant burden analysis in recessive disorders.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-020-0707-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8272915PMC
November 2020

SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes.

Res Sq 2020 Oct 1. Epub 2020 Oct 1.

MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA.

Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and prioritize functional mutations from the ongoing COVID-19 pandemic. We select 44 complete Sarbecovirus genomes at evolutionary distances ideally-suited for protein-coding and non-coding element identification, create whole-genome alignments, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for all named genes and for 3a, 6, 7a, 7b, 8, 9b, and also ORF3c, a novel alternate-frame gene. By contrast, ORF10, and overlapping-ORFs 9c, 3b, and 3d lack protein-coding signatures or convincing experimental evidence and are not protein-coding. Furthermore, we show no other protein-coding genes remain to be discovered. Cross-strain and within-strain evolutionary pressures largely agree at the gene, amino-acid, and nucleotide levels, with some notable exceptions, including fewer-than-expected mutations in nsp3 and Spike subunit S1, and more-than-expected mutations in Nucleocapsid. The latter also shows a cluster of amino-acid-changing variants in otherwise-conserved residues in a predicted B-cell epitope, which may indicate positive selection for immune avoidance. Several Spike-protein mutations, including D614G, which has been associated with increased transmission, disrupt otherwise-perfectly-conserved amino acids, and could be novel adaptations to human hosts. The resulting high-confidence gene set and evolutionary-history annotations provide valuable resources and insights on COVID-19 biology, mutations, and evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.21203/rs.3.rs-80345/v1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7536840PMC
October 2020

Mapping the epigenomic and transcriptomic interplay during memory formation and recall in the hippocampal engram ensemble.

Nat Neurosci 2020 12 5;23(12):1606-1617. Epub 2020 Oct 5.

Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA.

The epigenome and three-dimensional (3D) genomic architecture are emerging as key factors in the dynamic regulation of different transcriptional programs required for neuronal functions. In this study, we used an activity-dependent tagging system in mice to determine the epigenetic state, 3D genome architecture and transcriptional landscape of engram cells over the lifespan of memory formation and recall. Our findings reveal that memory encoding leads to an epigenetic priming event, marked by increased accessibility of enhancers without the corresponding transcriptional changes. Memory consolidation subsequently results in spatial reorganization of large chromatin segments and promoter-enhancer interactions. Finally, with reactivation, engram neurons use a subset of de novo long-range interactions, where primed enhancers are brought in contact with their respective promoters to upregulate genes involved in local protein translation in synaptic compartments. Collectively, our work elucidates the comprehensive transcriptional and epigenomic landscape across the lifespan of memory formation and recall in the hippocampal engram ensemble.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41593-020-00717-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7686266PMC
December 2020

Genus-Wide Characterization of Bumblebee Genomes Provides Insights into Their Evolution and Variation in Ecological and Behavioral Traits.

Mol Biol Evol 2021 01;38(2):486-501

Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China.

Bumblebees are a diverse group of globally important pollinators in natural ecosystems and for agricultural food production. With both eusocial and solitary life-cycle phases, and some social parasite species, they are especially interesting models to understand social evolution, behavior, and ecology. Reports of many species in decline point to pathogen transmission, habitat loss, pesticide usage, and global climate change, as interconnected causes. These threats to bumblebee diversity make our reliance on a handful of well-studied species for agricultural pollination particularly precarious. To broadly sample bumblebee genomic and phenotypic diversity, we de novo sequenced and assembled the genomes of 17 species, representing all 15 subgenera, producing the first genus-wide quantification of genetic and genomic variation potentially underlying key ecological and behavioral traits. The species phylogeny resolves subgenera relationships, whereas incomplete lineage sorting likely drives high levels of gene tree discordance. Five chromosome-level assemblies show a stable 18-chromosome karyotype, with major rearrangements creating 25 chromosomes in social parasites. Differential transposable element activity drives changes in genome sizes, with putative domestications of repetitive sequences influencing gene coding and regulatory potential. Dynamically evolving gene families and signatures of positive selection point to genus-wide variation in processes linked to foraging, diet and metabolism, immunity and detoxification, as well as adaptations for life at high altitudes. Our study reveals how bumblebee genes and genomes have evolved across the Bombus phylogeny and identifies variations potentially linked to key ecological and behavioral traits of these important pollinators.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msaa240DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7826183PMC
January 2021

Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets.

Nat Commun 2020 09 16;11(1):4662. Epub 2020 Sep 16.

Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.

Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X's feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10×  faster than other tools. The advantage of HapTree-X's ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-18320-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7494856PMC
September 2020

Expanded encyclopaedias of DNA elements in the human and mouse genomes.

Nature 2020 07 29;583(7818):699-710. Epub 2020 Jul 29.

Department of Biological Science, Florida State University, Tallahassee, FL, USA.

The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE and Roadmap Epigenomics data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2493-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7410828PMC
July 2020

Translation Initiation Site Profiling Reveals Widespread Synthesis of Non-AUG-Initiated Protein Isoforms in Yeast.

Cell Syst 2020 08 24;11(2):145-160.e5. Epub 2020 Jul 24.

Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA. Electronic address:

Genomic analyses in budding yeast have helped define the foundational principles of eukaryotic gene expression. However, in the absence of empirical methods for defining coding regions, these analyses have historically excluded specific classes of possible coding regions, such as those initiating at non-AUG start codons. Here, we applied an experimental approach to globally annotate translation initiation sites in yeast and identified 149 genes with alternative N-terminally extended protein isoforms initiating from near-cognate codons upstream of annotated AUG start codons. These isoforms are produced in concert with canonical isoforms and translated with high specificity, resulting from initiation at only a small subset of possible start codons. The non-AUG initiation driving their production is enriched during meiosis and induced by low eIF5A, which is seen in this context. These findings reveal widespread production of non-canonical protein isoforms and unexpected complexity to the rules by which even a simple eukaryotic genome is decoded.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cels.2020.06.011DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7508262PMC
August 2020

Cell Type-Specific Transcriptomics Reveals that Mutant Huntingtin Leads to Mitochondrial RNA Release and Neuronal Innate Immune Activation.

Neuron 2020 09 17;107(5):891-908.e8. Epub 2020 Jul 17.

Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA; Picower Institute for Learning and Memory, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. Electronic address:

The mechanisms by which mutant huntingtin (mHTT) leads to neuronal cell death in Huntington's disease (HD) are not fully understood. To gain new molecular insights, we used single nuclear RNA sequencing (snRNA-seq) and translating ribosome affinity purification (TRAP) to conduct transcriptomic analyses of caudate/putamen (striatal) cell type-specific gene expression changes in human HD and mouse models of HD. In striatal spiny projection neurons, the most vulnerable cell type in HD, we observe a release of mitochondrial RNA (mtRNA) (a potent mitochondrial-derived innate immunogen) and a concomitant upregulation of innate immune signaling in spiny projection neurons. Further, we observe that the released mtRNAs can directly bind to the innate immune sensor protein kinase R (PKR). We highlight the importance of studying cell type-specific gene expression dysregulation in HD pathogenesis and reveal that the activation of innate immune signaling in the most vulnerable HD neurons provides a novel framework to understand the basis of mHTT toxicity and raises new therapeutic opportunities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neuron.2020.06.021DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7486278PMC
September 2020

Sarbecovirus comparative genomics elucidates gene content of SARS-CoV-2 and functional impact of COVID-19 pandemic mutations.

bioRxiv 2020 Jun 3. Epub 2020 Jun 3.

MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA.

Despite its overwhelming clinical importance for understanding and mitigating the COVID-19 pandemic, the protein-coding gene content of the SARS-CoV-2 genome remains unresolved, with the function and even protein-coding status of many hypothetical proteins unknown and often conflicting among different annotations, thus hindering efforts for systematic dissection of its biology and the impact of recent mutations. Comparative genomics is a powerful approach for distinguishing protein-coding versus non-coding functional elements, based on their characteristic patterns of change, which we previously used to annotate protein-coding genes in human, fly, and other species. Here, we use comparative genomics to provide a high-confidence set of SARS-CoV-2 protein-coding genes, to characterize their protein-level and nucleotide-level evolutionary constraint, and to interpret the functional implications for SARS-CoV-2 mutations acquired during the current pandemic. We select 44 complete Sarbecovirus genomes at evolutionary distances well-suited for protein-coding and non-coding element identification, create whole-genome alignments spanning all named and putative genes, and quantify their protein-coding evolutionary signatures using PhyloCSF and their overlapping constraint using FRESCo. We find strong protein-coding signatures for all named genes and for hypothetical ORFs 3a, 6, 7a, 7b, and 8, indicating protein-coding roles, and provide strong evidence of protein-coding status for a recently-proposed alternate-frame novel ORF within 3a. By contrast, ORF10 shows no protein-coding signatures but shows unusually-high nucleotide-level constraint, indicating it has important but non-coding functions, and ORF14 and SARS-CoV-1 ORF3b, which overlap other genes, lack evolutionary signatures expected for dual-coding regions, indicating they do not produce functional proteins. ORF9b has ambiguous protein-coding signatures, preventing us from resolving its protein-coding status. ORF8 shows extremely fast nucleotide-level evolution, lacks a known function, and was deactivated in SARS-CoV-1, but shows clear signatures indicating protein-coding function worthy of further investigation given its rapid evolution and potential role in replication. SARS-CoV-2 mutations are preferentially excluded from evolutionarily-constrained amino acid residues and synonymously-constrained nucleotides, indicating purifying constraint acting at both coding and non-coding levels. In contrast, we find a conserved region in the nucleocapsid that is enriched for recent mutations, which could indicate a selective signal, and find that several spike-protein mutations previously identified as candidates for increased transmission and several mutations in isolates found to generate higher viral load in-vitro disrupt otherwise-perfectly-conserved amino-acids, consistent with adaptations for human-to-human transmission.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2020.06.02.130955DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302193PMC
June 2020

Reconstruction of the human blood-brain barrier in vitro reveals a pathogenic mechanism of APOE4 in pericytes.

Nat Med 2020 06 8;26(6):952-963. Epub 2020 Jun 8.

Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA.

In Alzheimer's disease, amyloid deposits along the brain vasculature lead to a condition known as cerebral amyloid angiopathy (CAA), which impairs blood-brain barrier (BBB) function and accelerates cognitive degeneration. Apolipoprotein (APOE4) is the strongest risk factor for CAA, yet the mechanisms underlying this genetic susceptibility are unknown. Here we developed an induced pluripotent stem cell-based three-dimensional model that recapitulates anatomical and physiological properties of the human BBB in vitro. Similarly to CAA, our in vitro BBB displayed significantly more amyloid accumulation in APOE4 compared to APOE3. Combinatorial experiments revealed that dysregulation of calcineurin-nuclear factor of activated T cells (NFAT) signaling and APOE in pericyte-like mural cells induces APOE4-associated CAA pathology. In the human brain, APOE and NFAT are selectively dysregulated in pericytes of APOE4 carriers, and inhibition of calcineurin-NFAT signaling reduces APOE4-associated CAA pathology in vitro and in vivo. Our study reveals the role of pericytes in APOE4-mediated CAA and highlights calcineurin-NFAT signaling as a therapeutic target in CAA and Alzheimer's disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41591-020-0886-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7704032PMC
June 2020

Analysis of Genetically Regulated Gene Expression Identifies a Prefrontal PTSD Gene, SNRNP35, Specific to Military Cohorts.

Cell Rep 2020 06;31(9):107716

SAMRC Unit on Risk & Resilience in Mental Disorders, Department of Psychiatry, University of Cape Town, Cape Town 7700, South Africa.

To reveal post-traumatic stress disorder (PTSD) genetic risk influences on tissue-specific gene expression, we use brain and non-brain transcriptomic imputation. We impute genetically regulated gene expression (GReX) in 29,539 PTSD cases and 166,145 controls from 70 ancestry-specific cohorts and identify 18 significant GReX-PTSD associations corresponding to specific tissue-gene pairs. The results suggest substantial genetic heterogeneity based on ancestry, cohort type (military versus civilian), and sex. Two study-wide significant PTSD associations are identified in European and military European cohorts; ZNF140 is predicted to be upregulated in whole blood, and SNRNP35 is predicted to be downregulated in dorsolateral prefrontal cortex, respectively. In peripheral leukocytes from 175 marines, the observed PTSD differential gene expression correlates with the predicted differences for these individuals, and deployment stress produces glucocorticoid-regulated expression changes that include downregulation of both ZNF140 and SNRNP35. SNRNP35 knockdown in cells validates its functional role in U12-intron splicing. Finally, exogenous glucocorticoids in mice downregulate prefrontal Snrnp35 expression.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2020.107716DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7359754PMC
June 2020

Alternatives to amyloid for Alzheimer's disease therapies-a symposium report.

Ann N Y Acad Sci 2020 09 29;1475(1):3-14. Epub 2020 May 29.

Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts.

For decades, Alzheimer's disease research has focused on amyloid as the primary pathogenic agent. This focus has driven the development of numerous amyloid-targeting therapies; however, with one possible exception, none of these therapies have been effective in preventing or delaying cognitive decline in patients, and there are no approved disease-modifying agents. It is becoming more apparent that alternative drug targets are needed to address this complex disease. An increased understanding of Alzheimer's disease pathology has highlighted the need to target the appropriate disease pathology at the appropriate time in the disease course. Preclinical and early clinical studies have focused on targets, including inflammation, tau, vascular health, and the microbiome. This report summarizes the presentations from a New York Academy of Sciences' one-day symposium entitled "Alzheimer's Disease Therapeutics: Alternatives to Amyloid," held on November 20, 2019.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/nyas.14371DOI Listing
September 2020
-->