Publications by authors named "Martin Hemberg"

58 Publications

Fast searches of large collections of single-cell data using scfind.

Nat Methods 2021 03 1;18(3):262-271. Epub 2021 Mar 1.

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

Single-cell technologies have made it possible to profile millions of cells, but for these resources to be useful they must be easy to query and access. To facilitate interactive and intuitive access to single-cell data we have developed scfind, a single-cell analysis tool that facilitates fast search of biologically or clinically relevant marker genes in cell atlases. Using transcriptome data from six mouse cell atlases, we show how scfind can be used to evaluate marker genes, perform in silico gating, and identify both cell-type-specific and housekeeping genes. Moreover, we have developed a subquery optimization routine to ensure that long and complex queries return meaningful results. To make scfind more user friendly, we use indices of PubMed abstracts and techniques from natural language processing to allow for arbitrary queries. Finally, we show how scfind can be used for multi-omics analyses by combining single-cell ATAC-seq data with transcriptome data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-021-01076-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7116898PMC
March 2021

Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench.

Nucleic Acids Res 2021 Feb 1. Epub 2021 Feb 1.

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.

As the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here, we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkab004DOI Listing
February 2021

MicroExonator enables systematic discovery and quantification of microexons across mouse embryonic development.

Genome Biol 2021 Jan 22;22(1):43. Epub 2021 Jan 22.

Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.

Background: Microexons, exons that are ≤ 30 nucleotides, are a highly conserved and dynamically regulated set of cassette exons. They have key roles in nervous system development and function, as evidenced by recent results demonstrating the impact of microexons on behaviour and cognition. However, microexons are often overlooked due to the difficulty of detecting them using standard RNA-seq aligners.

Results: Here, we present MicroExonator, a novel pipeline for reproducible de novo discovery and quantification of microexons. We process 289 RNA-seq datasets from eighteen mouse tissues corresponding to nine embryonic and postnatal stages, providing the most comprehensive survey of microexons available for mice. We detect 2984 microexons, 332 of which are differentially spliced throughout mouse embryonic brain development, including 29 that are not present in mouse transcript annotation databases. Unsupervised clustering of microexons based on their inclusion patterns segregates brain tissues by developmental time, and further analysis suggests a key function for microexons in axon growth and synapse formation. Finally, we analyse single-cell RNA-seq data from the mouse visual cortex, and for the first time, we report differential inclusion between neuronal subpopulations, suggesting that some microexons could be cell type-specific.

Conclusions: MicroExonator facilitates the investigation of microexons in transcriptome studies, particularly when analysing large volumes of data. As a proof of principle, we use MicroExonator to analyse a large collection of both mouse bulk and single-cell RNA-seq datasets. The analyses enabled the discovery of previously uncharacterized microexons, and our study provides a comprehensive microexon inclusion catalogue during mouse development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02246-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7821500PMC
January 2021

Computational Stem Cell Biology: Open Questions and Guiding Principles.

Cell Stem Cell 2021 Jan;28(1):20-32

Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC 3010, Australia.

Computational biology is enabling an explosive growth in our understanding of stem cells and our ability to use them for disease modeling, regenerative medicine, and drug discovery. We discuss four topics that exemplify applications of computation to stem cell biology: cell typing, lineage tracing, trajectory inference, and regulatory networks. We use these examples to articulate principles that have guided computational biology broadly and call for renewed attention to these principles as computation becomes increasingly important in stem cell biology. We also discuss important challenges for this field with the hope that it will inspire more to join this exciting area.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.stem.2020.12.012DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7799393PMC
January 2021

Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data.

Nat Protoc 2021 01 7;16(1):1-9. Epub 2020 Dec 7.

Wellcome Sanger Institute, Hinxton, UK.

Single-cell RNA sequencing (scRNA-seq) is a popular and powerful technology that allows you to profile the whole transcriptome of a large number of individual cells. However, the analysis of the large volumes of data generated from these experiments requires specialized statistical and computational methods. Here we present an overview of the computational workflow involved in processing scRNA-seq data. We discuss some of the most common tasks and the tools available for addressing central biological questions. In this article and our companion website ( https://scrnaseq-course.cog.sanger.ac.uk/website/index.html ), we provide guidelines regarding best practices for performing computational analyses. This tutorial provides a hands-on guide for experimentalists interested in analyzing their data as well as an overview for bioinformaticians seeking to develop new computational methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41596-020-00409-wDOI Listing
January 2021

Asymmetron: a toolkit for the identification of strand asymmetry patterns in biological sequences.

Nucleic Acids Res 2021 01;49(1):e4

Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA.

DNA strand asymmetries can have a major effect on several biological functions, including replication, transcription and transcription factor binding. As such, DNA strand asymmetries and mutational strand bias can provide information about biological function. However, a versatile tool to explore this does not exist. Here, we present Asymmetron, a user-friendly computational tool that performs statistical analysis and visualizations for the evaluation of strand asymmetries. Asymmetron takes as input DNA features provided with strand annotation and outputs strand asymmetries for consecutive occurrences of a single DNA feature or between pairs of features. We illustrate the use of Asymmetron by identifying transcriptional and replicative strand asymmetries of germline structural variant breakpoints. We also show that the orientation of the binding sites of 45% of human transcription factors analyzed have a significant DNA strand bias in transcribed regions, that is also corroborated in ChIP-seq analyses, and is likely associated with transcription. In summary, we provide a novel tool to assess DNA strand asymmetries and show how it can be used to derive new insights across a variety of biological disciplines.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa1052DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7797064PMC
January 2021

Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes.

Nat Methods 2020 06 4;17(6):615-620. Epub 2020 May 4.

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

Methods to deconvolve single-cell RNA-sequencing (scRNA-seq) data are necessary for samples containing a mixture of genotypes, whether they are natural or experimentally combined. Multiplexing across donors is a popular experimental design that can avoid batch effects, reduce costs and improve doublet detection. By using variants detected in scRNA-seq reads, it is possible to assign cells to their donor of origin and identify cross-genotype doublets that may have highly similar transcriptional profiles, precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA. Ambient RNA is caused by cell lysis before droplet partitioning and is an important confounder of scRNA-seq analysis. Here we develop souporcell, a method to cluster cells using the genetic variants detected within the scRNA-seq reads. We show that it achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-020-0820-1DOI Listing
June 2020

Transcription-coupled repair and mismatch repair contribute towards preserving genome integrity at mononucleotide repeat tracts.

Nat Commun 2020 04 24;11(1):1980. Epub 2020 Apr 24.

Academic Department of Medical Genetics, The Clinical School, University of Cambridge, Cambridge, CB2 0QQ, UK.

The mechanisms that underpin how insertions or deletions (indels) become fixed in DNA have primarily been ascribed to replication-related and/or double-strand break (DSB)-related processes. Here, we introduce a method to evaluate indels, orientating them relative to gene transcription. In so doing, we reveal a number of surprising findings: First, there is a transcriptional strand asymmetry in the distribution of mononucleotide repeat tracts in the reference human genome. Second, there is a strong transcriptional strand asymmetry of indels across 2,575 whole genome sequenced human cancers. We suggest that this is due to the activity of transcription-coupled nucleotide excision repair (TC-NER). Furthermore, TC-NER interacts with mismatch repair (MMR) under physiological conditions to produce strand bias. Finally, we show how insertions and deletions differ in their dependencies on these repair pathways. Our analytical approach reveals insights into the contribution of DNA repair towards indel mutagenesis in human cells.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-15901-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7181645PMC
April 2020

Obstacles to detecting isoforms using full-length scRNA-seq data.

Genome Biol 2020 03 23;21(1):74. Epub 2020 Mar 23.

Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK.

Background: Early single-cell RNA-seq (scRNA-seq) studies suggested that it was unusual to see more than one isoform being produced from a gene in a single cell, even when multiple isoforms were detected in matched bulk RNA-seq samples. However, these studies generally did not consider the impact of dropouts or isoform quantification errors, potentially confounding the results of these analyses.

Results: In this study, we take a simulation based approach in which we explicitly account for dropouts and isoform quantification errors. We use our simulations to ask to what extent it is possible to study alternative splicing using scRNA-seq. Additionally, we ask what limitations must be overcome to make splicing analysis feasible. We find that the high rate of dropouts associated with scRNA-seq is a major obstacle to studying alternative splicing. In mice and other well-established model organisms, the relatively low rate of isoform quantification errors poses a lesser obstacle to splicing analysis. We find that different models of isoform choice meaningfully change our simulation results.

Conclusions: To accurately study alternative splicing with single-cell RNA-seq, a better understanding of isoform choice and the errors associated with scRNA-seq is required. An increase in the capture efficiency of scRNA-seq would also be beneficial. Until some or all of the above are achieved, we do not recommend attempting to resolve isoforms in individual cells using scRNA-seq.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-01981-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7087381PMC
March 2020

Astrocyte layers in the mammalian cerebral cortex revealed by a single-cell in situ transcriptomic map.

Nat Neurosci 2020 04 16;23(4):500-509. Epub 2020 Mar 16.

Department of Paediatrics, Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK.

Although the cerebral cortex is organized into six excitatory neuronal layers, it is unclear whether glial cells show distinct layering. In the present study, we developed a high-content pipeline, the large-area spatial transcriptomic (LaST) map, which can quantify single-cell gene expression in situ. Screening 46 candidate genes for astrocyte diversity across the mouse cortex, we identified superficial, mid and deep astrocyte identities in gradient layer patterns that were distinct from those of neurons. Astrocyte layer features, established in the early postnatal cortex, mostly persisted in adult mouse and human cortex. Single-cell RNA sequencing and spatial reconstruction analysis further confirmed the presence of astrocyte layers in the adult cortex. Satb2 and Reeler mutations that shifted neuronal post-mitotic development were sufficient to alter glial layering, indicating an instructive role for neuronal cues. Finally, astrocyte layer patterns diverged between mouse cortical regions. These findings indicate that excitatory neurons and astrocytes are organized into distinct lineage-associated laminae.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41593-020-0602-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7116562PMC
April 2020

Adult Human Glioblastomas Harbor Radial Glia-like Cells.

Stem Cell Reports 2020 02 30;14(2):338-350. Epub 2020 Jan 30.

Department of Neurosurgery, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA. Electronic address:

Radial glia (RG) cells are the first neural stem cells to appear during embryonic development. Adult human glioblastomas harbor a subpopulation of RG-like cells with typical RG morphology and markers. The cells exhibit the classic and unique mitotic behavior of normal RG in a cell-autonomous manner. Single-cell RNA sequencing analyses of glioblastoma cells reveal transcriptionally dynamic clusters of RG-like cells that share the profiles of normal human fetal radial glia and that reside in quiescent and cycling states. Functional assays show a role for interleukin in triggering exit from dormancy into active cycling, suggesting a role for inflammation in tumor progression. These data are consistent with the possibility of persistence of RG into adulthood and their involvement in tumor initiation or maintenance. They also provide a putative cellular basis for the persistence of normal developmental programs in adult tumors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.stemcr.2020.01.007DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7014025PMC
February 2020

Supervised clustering for single-cell analysis.

Nat Methods 2019 10;16(10):965-966

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-019-0534-4DOI Listing
October 2019

The Malaria Cell Atlas: Single parasite transcriptomes across the complete life cycle.

Science 2019 08;365(6455)

Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK.

Malaria parasites adopt a remarkable variety of morphological life stages as they transition through multiple mammalian host and mosquito vector environments. We profiled the single-cell transcriptomes of thousands of individual parasites, deriving the first high-resolution transcriptional atlas of the entire life cycle. We then used our atlas to precisely define developmental stages of single cells from three different human malaria parasite species, including parasites isolated directly from infected individuals. The Malaria Cell Atlas provides both a comprehensive view of gene usage in a eukaryotic parasite and an open-access reference dataset for the study of malaria parasites.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aaw2619DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7056351PMC
August 2019

False signals induced by single-cell imputation.

F1000Res 2018 2;7:1740. Epub 2018 Nov 2.

Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.

Single-cell RNA-seq is a powerful tool for measuring gene expression at the resolution of individual cells.  A challenge in the analysis of this data is the large amount of zero values, representing either missing data or no expression. Several imputation approaches have been proposed to address this issue, but they generally rely on structure inherent to the dataset under consideration they may not provide any additional information, hence, are limited by the information contained therein and the validity of their assumptions. We evaluated the risk of generating false positive or irreproducible differential expression when imputing data with six different methods. We applied each method to a variety of simulated datasets as well as to permuted real single-cell RNA-seq datasets and consider the number of false positive gene-gene correlations and differentially expressed genes. Using matched 10X and Smart-seq2 data we examined whether cell-type specific markers were reproducible across datasets derived from the same tissue before and after imputation. The extent of false-positives introduced by imputation varied considerably by method. Data smoothing based methods, MAGIC, knn-smooth and dca, generated many false-positives in both real and simulated data. Model-based imputation methods typically generated fewer false-positives but this varied greatly depending on the diversity of cell-types in the sample. All imputation methods decreased the reproducibility of cell-type specific markers, although this could be mitigated by selecting markers with large effect size and significance. Imputation of single-cell RNA-seq data introduces circularity that can generate false-positive results. Thus, statistical tests applied to imputed data should be treated with care. Additional filtering by effect size can reduce but not fully eliminate these effects. Of the methods we considered, SAVER was the least likely to generate false or irreproducible results, thus should be favoured over alternatives if imputation is necessary.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.16613.2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6415334PMC
September 2019

Publisher Correction: Challenges in unsupervised clustering of single-cell RNA-seq data.

Nat Rev Genet 2019 05;20(5):310

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

During typesetting of this article, errors were inadvertently introduced to the hyperlinked URLs of some of the clustering tools in table 1 (Seurat, CIDR, pcaReduce and mpath), as well as to the numbering of the bold-text annotations in the reference list. The article has now been corrected online. The editors apologize for this error.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41576-019-0095-5DOI Listing
May 2019

Challenges in unsupervised clustering of single-cell RNA-seq data.

Nat Rev Genet 2019 05;20(5):273-282

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

Single-cell RNA sequencing (scRNA-seq) allows researchers to collect large catalogues detailing the transcriptomes of individual cells. Unsupervised clustering is of central importance for the analysis of these data, as it is used to identify putative cell types. However, there are many challenges involved. We discuss why clustering is a challenging problem from a computational point of view and what aspects of the data make it challenging. We also consider the difficulties related to the biological interpretation and annotation of the identified clusters.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41576-018-0088-9DOI Listing
May 2019

M3Drop: dropout-based feature selection for scRNASeq.

Bioinformatics 2019 08;35(16):2865-2867

Department of Cellular Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridgshire, UK.

Motivation: Most genomes contain thousands of genes, but for most functional responses, only a subset of those genes are relevant. To facilitate many single-cell RNASeq (scRNASeq) analyses the set of genes is often reduced through feature selection, i.e. by removing genes only subject to technical noise.

Results: We present M3Drop, an R package that implements popular existing feature selection methods and two novel methods which take advantage of the prevalence of zeros (dropouts) in scRNASeq data to identify features. We show these new methods outperform existing methods on simulated and real datasets.

Availability And Implementation: M3Drop is freely available on github as an R package and is compatible with other popular scRNASeq tools: https://github.com/tallulandrews/M3Drop.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty1044DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6691329PMC
August 2019

Simulation-based benchmarking of isoform quantification in single-cell RNA-seq.

Genome Biol 2018 11 7;19(1):191. Epub 2018 Nov 7.

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

Single-cell RNA-seq has the potential to facilitate isoform quantification as the confounding factor of a mixed population of cells is eliminated. However, best practice for using existing quantification methods has not been established. We carry out a benchmark for five popular isoform quantification tools. Performance is generally good for simulated data based on SMARTer and SMART-seq2 data. The reduction in performance compared with bulk RNA-seq is small. An important biological insight comes from our analysis of real data which shows that genes that express two isoforms in bulk RNA-seq predominantly express one or neither isoform in individual cells.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-018-1571-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6223048PMC
November 2018

Noncanonical secondary structures arising from non-B DNA motifs are determinants of mutagenesis.

Genome Res 2018 09 13;28(9):1264-1271. Epub 2018 Aug 13.

Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom.

Somatic mutations show variation in density across cancer genomes. Previous studies have shown that chromatin organization and replication time domains are correlated with, and thus predictive of, this variation. Here, we analyze 1809 whole-genome sequences from 10 cancer types to show that a subset of repetitive DNA sequences, called non-B motifs that predict noncanonical secondary structure formation can independently account for variation in mutation density. Combined with epigenetic factors and replication timing, the variance explained can be improved to 43%-76%. Approximately twofold mutation enrichment is observed directly within non-B motifs, is focused on exposed structural components, and is dependent on physical properties that are optimal for secondary structure formation. Therefore, there is mounting evidence that secondary structures arising from non-B motifs are not simply associated with increased mutation density-they are possibly causally implicated. Our results suggest that they are determinants of mutagenesis and increase the likelihood of recurrent mutations in the genome. This analysis calls for caution in the interpretation of recurrent mutations and highlights the importance of taking non-B motifs that can simply be inferred from the reference sequence into consideration in background models of mutability henceforth.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.231688.117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6120622PMC
September 2018

Single-cell genomics.

Authors:
Martin Hemberg

Brief Funct Genomics 2018 07;17(4):207-208

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bfgp/ely025DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6063269PMC
July 2018

scmap: projection of single-cell RNA-seq data across data sets.

Nat Methods 2018 05 2;15(5):359-362. Epub 2018 Apr 2.

Wellcome Sanger Institute, Hinxton, UK.

Single-cell RNA-seq (scRNA-seq) allows researchers to define cell types on the basis of unsupervised clustering of the transcriptome. However, differences in experimental methods and computational analyses make it challenging to compare data across experiments. Here we present scmap (http://bioconductor.org/packages/scmap; web version at http://www.sanger.ac.uk/science/tools/scmap), a method for projecting cells from an scRNA-seq data set onto cell types or individual cells from other experiments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.4644DOI Listing
May 2018

Single-cell transcriptomics reveals a new dynamical function of transcription factors during embryonic hematopoiesis.

Elife 2018 03 20;7. Epub 2018 Mar 20.

European Molecular Biology Laboratory, EMBL Rome, Monterotondo, Italy.

Recent advances in single-cell transcriptomics techniques have opened the door to the study of gene regulatory networks (GRNs) at the single-cell level. Here, we studied the GRNs controlling the emergence of hematopoietic stem and progenitor cells from mouse embryonic endothelium using a combination of single-cell transcriptome assays. We found that a heptad of transcription factors (Runx1, Gata2, Tal1, Fli1, Lyl1, Erg and Lmo2) is specifically co-expressed in an intermediate population expressing both endothelial and hematopoietic markers. Within the heptad, we identified two sets of factors of opposing functions: one (Erg/Fli1) promoting the endothelial cell fate, the other (Runx1/Gata2) promoting the hematopoietic fate. Surprisingly, our data suggest that even though Fli1 initially supports the endothelial cell fate, it acquires a pro-hematopoietic role when co-expressed with Runx1. This work demonstrates the power of single-cell RNA-sequencing for characterizing complex transcription factor dynamics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.29312DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860872PMC
March 2018

Genomic positional conservation identifies topological anchor point RNAs linked to developmental loci.

Genome Biol 2018 03 15;19(1):32. Epub 2018 Mar 15.

The Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.

Background: The mammalian genome is transcribed into large numbers of long noncoding RNAs (lncRNAs), but the definition of functional lncRNA groups has proven difficult, partly due to their low sequence conservation and lack of identified shared properties. Here we consider promoter conservation and positional conservation as indicators of functional commonality.

Results: We identify 665 conserved lncRNA promoters in mouse and human that are preserved in genomic position relative to orthologous coding genes. These positionally conserved lncRNA genes are primarily associated with developmental transcription factor loci with which they are coexpressed in a tissue-specific manner. Over half of positionally conserved RNAs in this set are linked to chromatin organization structures, overlapping binding sites for the CTCF chromatin organiser and located at chromatin loop anchor points and borders of topologically associating domains (TADs). We define these RNAs as topological anchor point RNAs (tapRNAs). Characterization of these noncoding RNAs and their associated coding genes shows that they are functionally connected: they regulate each other's expression and influence the metastatic phenotype of cancer cells in vitro in a similar fashion. Furthermore, we find that tapRNAs contain conserved sequence domains that are enriched in motifs for zinc finger domain-containing RNA-binding proteins and transcription factors, whose binding sites are found mutated in cancers.

Conclusions: This work leverages positional conservation to identify lncRNAs with potential importance in genome organization, development and disease. The evidence that many developmental transcription factors are physically and functionally connected to lncRNAs represents an exciting stepping-stone to further our understanding of genome regulation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-018-1405-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5853149PMC
March 2018

The Human Cell Atlas.

Elife 2017 12 5;6. Epub 2017 Dec 5.

Ragon Institute of MGH, MIT and Harvard, Cambridge, United States.

The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.27041DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5762154PMC
December 2017

Temporal Tracking of Microglia Activation in Neurodegeneration at Single-Cell Resolution.

Cell Rep 2017 Oct;21(2):366-380

Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. Electronic address:

Microglia, the tissue-resident macrophages in the brain, are damage sensors that react to nearly any perturbation, including neurodegenerative diseases such as Alzheimer's disease (AD). Here, using single-cell RNA sequencing, we determined the transcriptome of more than 1,600 individual microglia cells isolated from the hippocampus of a mouse model of severe neurodegeneration with AD-like phenotypes and of control mice at multiple time points during progression of neurodegeneration. In this neurodegeneration model, we discovered two molecularly distinct reactive microglia phenotypes that are typified by modules of co-regulated type I and type II interferon response genes, respectively. Furthermore, our work identified previously unobserved heterogeneity in the response of microglia to neurodegeneration, discovered disease stage-specific microglia cell states, revealed the trajectory of cellular reprogramming of microglia in response to neurodegeneration, and uncovered the underlying transcriptional programs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2017.09.039DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5642107PMC
October 2017

The Helicase Aquarius/EMB-4 Is Required to Overcome Intronic Barriers to Allow Nuclear RNAi Pathways to Heritably Silence Transcription.

Dev Cell 2017 08;42(3):241-255.e6

Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK; Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK. Electronic address:

Small RNAs play a crucial role in genome defense against transposable elements and guide Argonaute proteins to nascent RNA transcripts to induce co-transcriptional gene silencing. However, the molecular basis of this process remains unknown. Here, we identify the conserved RNA helicase Aquarius/EMB-4 as a direct and essential link between small RNA pathways and the transcriptional machinery in Caenorhabditis elegans. Aquarius physically interacts with the germline Argonaute HRDE-1. Aquarius is required to initiate small-RNA-induced heritable gene silencing. HRDE-1 and Aquarius silence overlapping sets of genes and transposable elements. Surprisingly, removal of introns from a target gene abolishes the requirement for Aquarius, but not HRDE-1, for small RNA-dependent gene silencing. We conclude that Aquarius allows small RNA pathways to compete for access to nascent transcripts undergoing co-transcriptional splicing in order to detect and silence transposable elements. Thus, Aquarius and HRDE-1 act as gatekeepers coordinating gene expression and genome defense.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.devcel.2017.07.002DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5554785PMC
August 2017

Identifying cell populations with scRNASeq.

Mol Aspects Med 2018 02 25;59:114-122. Epub 2017 Jul 25.

Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. Electronic address:

Single-cell RNASeq (scRNASeq) has emerged as a powerful method for quantifying the transcriptome of individual cells. However, the data from scRNASeq experiments is often both noisy and high dimensional, making the computational analysis non-trivial. Here we provide an overview of different experimental protocols and the most popular methods for facilitating the computational analysis. We focus on approaches for identifying biologically important genes, projecting data into lower dimensions and clustering data into putative cell-populations. Finally we discuss approaches to validation and biological interpretation of the identified cell-types or cell-states.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.mam.2017.07.002DOI Listing
February 2018

Proliferation Drives Aging-Related Functional Decline in a Subpopulation of the Hematopoietic Stem Cell Compartment.

Cell Rep 2017 05;19(8):1503-1511

Cambridge Institute for Medical Research, University of Cambridge, Cambridge, Cambridgeshire CB2 0XY, UK; Department of Haematology, University of Cambridge, Cambridge, Cambridgeshire CB2 0XY, UK; Stem Cell Institute, University of Cambridge, Cambridge, Cambridgeshire CB2 0XY, UK. Electronic address:

Aging of the hematopoietic stem cell (HSC) compartment is characterized by lineage bias and reduced stem cell function, the molecular basis of which is largely unknown. Using single-cell transcriptomics, we identified a distinct subpopulation of old HSCs carrying a p53 signature indicative of stem cell decline alongside pro-proliferative JAK/STAT signaling. To investigate the relationship between JAK/STAT and p53 signaling, we challenged HSCs with a constitutively active form of JAK2 (V617F) and observed an expansion of the p53-positive subpopulation in old mice. Our results reveal cellular heterogeneity in the onset of HSC aging and implicate a role for JAK2V617F-driven proliferation in the p53-mediated functional decline of old HSCs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2017.04.074DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5457484PMC
May 2017

SC3: consensus clustering of single-cell RNA-seq data.

Nat Methods 2017 May 27;14(5):483-486. Epub 2017 Mar 27.

Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

Single-cell RNA-seq enables the quantitative characterization of cell types based on global transcriptome profiles. We present single-cell consensus clustering (SC3), a user-friendly tool for unsupervised clustering, which achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach (http://bioconductor.org/packages/SC3). We demonstrate that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.4236DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5410170PMC
May 2017