Publications by authors named "Jessica Severin"

39 Publications

Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network.

Nat Commun 2021 06 2;12(1):3297. Epub 2021 Jun 2.

Institut de Biologie Computationnelle, Montpellier, France.

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-23143-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8172540PMC
June 2021

The Transcriptional Network That Controls Growth Arrest and Macrophage Differentiation in the Human Myeloid Leukemia Cell Line THP-1.

Front Cell Dev Biol 2020 3;8:498. Epub 2020 Jul 3.

The Roslin Institute, The University of Edinburgh, Edinburgh, United Kingdom.

The response of the human acute myeloid leukemia cell line THP-1 to phorbol esters has been widely studied to test candidate leukemia therapies and as a model of cell cycle arrest and monocyte-macrophage differentiation. Here we have employed Cap Analysis of Gene Expression (CAGE) to analyze a dense time course of transcriptional regulation in THP-1 cells treated with phorbol myristate acetate (PMA) over 96 h. PMA treatment greatly reduced the numbers of cells entering S phase and also blocked cells exiting G2/M. The PMA-treated cells became adherent and expression of mature macrophage-specific genes increased progressively over the duration of the time course. Within 1-2 h PMA induced known targets of tumor protein p53 (TP53), notably , followed by gradual down-regulation of cell-cycle associated genes. Also within the first 2 h, PMA induced immediate early genes including transcription factor genes encoding proteins implicated in macrophage differentiation () and down-regulated genes for transcription factors involved in immature myeloid cell proliferation (). The dense time course revealed that the response to PMA was not linear and progressive. Rather, network-based clustering of the time course data highlighted a sequential cascade of transient up- and down-regulated expression of genes encoding feedback regulators, as well as transcription factors associated with macrophage differentiation and their inferred target genes. CAGE also identified known and candidate novel enhancers expressed in THP-1 cells and many novel inducible genes that currently lack functional annotation and/or had no previously known function in macrophages. The time course is available on the ZENBU platform allowing comparison to FANTOM4 and FANTOM5 data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fcell.2020.00498DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7347797PMC
July 2020

Functional annotation of human long noncoding RNAs via molecular phenotyping.

Genome Res 2020 07 27;30(7):1060-1072. Epub 2020 Jul 27.

Department of Computational Systems Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia.

Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-to-date lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for and .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.254219.119DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7397864PMC
July 2020

Comparative transcriptomics of primary cells in vertebrates.

Genome Res 2020 07 27;30(7):951-961. Epub 2020 Jul 27.

RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan.

Gene expression profiles in homologous tissues have been observed to be different between species, which may be due to differences between species in the gene expression program in each cell type, but may also reflect differences in cell type composition of each tissue in different species. Here, we compare expression profiles in matching primary cells in human, mouse, rat, dog, and chicken using Cap Analysis Gene Expression (CAGE) and short RNA (sRNA) sequencing data from FANTOM5. While we find that expression profiles of orthologous genes in different species are highly correlated across cell types, in each cell type many genes were differentially expressed between species. Expression of genes with products involved in transcription, RNA processing, and transcriptional regulation was more likely to be conserved, while expression of genes encoding proteins involved in intercellular communication was more likely to have diverged during evolution. Conservation of expression correlated positively with the evolutionary age of genes, suggesting that divergence in expression levels of genes critical for cell function was restricted during evolution. Motif activity analysis showed that both promoters and enhancers are activated by the same transcription factors in different species. An analysis of expression levels of mature miRNAs and of primary miRNAs identified by CAGE revealed that evolutionary old miRNAs are more likely to have conserved expression patterns than young miRNAs. We conclude that key aspects of the regulatory network are conserved, while differential expression of genes involved in cell-to-cell communication may contribute greatly to phenotypic differences between species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.255679.119DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7397866PMC
July 2020

What Is the Optimal Primary Care Panel Size?: A Systematic Review.

Ann Intern Med 2020 02 21;172(3):195-201. Epub 2020 Jan 21.

West Los Angeles Veterans Affairs Medical Center, Los Angeles, California (E.A.A., S.M., I.M.M., M.M.B., J.M.S., P.G.S.).

Background: Primary care for a panel of patients is a central component of population health, but the optimal panel size is unclear.

Purpose: To review evidence about the association of primary care panel size with health care outcomes and provider burnout.

Data Sources: English-language searches of multiple databases from inception to October 2019 and Google searches performed in September 2019.

Study Selection: English-language studies of any design, including simulation models, that assessed the association between primary care panel size and safety, efficacy, patient-centeredness, timeliness, efficiency, equity, or provider burnout.

Data Extraction: Independent, dual-reviewer extraction; group consensus rating of certainty of evidence.

Data Synthesis: Sixteen hypothesis-testing studies and 12 simulation modeling studies met inclusion criteria. All but 1 hypothesis-testing study were cross-sectional assessments of association. Three studies each provided low-certainty evidence that increasing panel size was associated with no or modestly adverse effects on patient-centered and effective care. Eight studies provided low-certainty evidence that increasing panel size was associated with variable effects on timely care. No studies assessed the effect of panel size on safety, efficiency, or equity. One study provided very-low-certainty evidence of an association between increased panel size and provider burnout. The 12 simulation studies evaluated 5 models; all used access as the only outcome of care. Five and 2 studies, respectively, provided moderate-certainty evidence that adjusting panel size for case mix and adding clinical conditions to the case mix resulted in better access.

Limitation: No studies had concurrent comparison groups, and published and unpublished studies may have been missed.

Conclusion: Evidence is insufficient to make evidence-based recommendations about the optimal primary care panel size for achieving beneficial health outcomes.

Primary Funding Source: Veterans Affairs Quality Enhancement Research Initiative.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7326/M19-2491DOI Listing
February 2020

C1 CAGE detects transcription start sites and enhancer activity at single-cell resolution.

Nat Commun 2019 01 21;10(1):360. Epub 2019 Jan 21.

RIKEN Center for Integrative Medical Sciences (IMS), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan.

Single-cell transcriptomic profiling is a powerful tool to explore cellular heterogeneity. However, most of these methods focus on the 3'-end of polyadenylated transcripts and provide only a partial view of the transcriptome. We introduce C1 CAGE, a method for the detection of transcript 5'-ends with an original sample multiplexing strategy in the C1 microfluidic system. We first quantifiy the performance of C1 CAGE and find it as accurate and sensitive as other methods in the C1 system. We then use it to profile promoter and enhancer activities in the cellular response to TGF-β of lung cancer cells and discover subpopulations of cells differing in their response. We also describe enhancer RNA dynamics revealing transcriptional bursts in subsets of cells with transcripts arising from either strand in a mutually exclusive manner, validated using single molecule fluorescence in situ hybridization.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-08126-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6341120PMC
January 2019

Author Correction: Transcription start site profiling of 15 anatomical regions of the Macaca mulatta central nervous system.

Sci Data 2018 12 11;5(1). Epub 2018 Dec 11.

German Center for Neurodegenerative Diseases, Otfried-Müller Straße 23, Tübingen, 72076, Germany.

The authors regret that Luba M. Pardo was omitted in error from the author list of the original version of this Data Descriptor. This omission has now been corrected in the HTML and PDF versions. The authors also regret that Anemieke Rozemuller was omitted in error from the Acknowledgements of the original version of this Data Descriptor. This omission has now been corrected in the HTML and PDF versions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-018-0003-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6300047PMC
December 2018

Update of the FANTOM web resource: expansion to provide additional transcriptome atlases.

Nucleic Acids Res 2019 01;47(D1):D752-D758

RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.

The FANTOM web resource (http://fantom.gsc.riken.jp/) was developed to provide easy access to the data produced by the FANTOM project. It contains the most complete and comprehensive sets of actively transcribed enhancers and promoters in the human and mouse genomes. We determined the transcription activities of these regulatory elements by CAGE (Cap Analysis of Gene Expression) for both steady and dynamic cellular states in all major and some rare cell types, consecutive stages of differentiation and responses to stimuli. We have expanded the resource by employing different assays, such as RNA-seq, short RNA-seq and a paired-end protocol for CAGE (CAGEscan), to provide new angles to study the transcriptome. That yielded additional atlases of long noncoding RNAs, miRNAs and their promoters. We have also expanded the CAGE analysis to cover rat, dog, chicken, and macaque species for a limited number of cell types. The CAGE data obtained from human and mouse were reprocessed to make them available on the latest genome assemblies. Here, we report the recent updates of both data and interfaces in the FANTOM web resource.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky1099DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323950PMC
January 2019

Monitoring transcription initiation activities in rat and dog.

Sci Data 2017 11 28;4:170173. Epub 2017 Nov 28.

RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama 230-0045, Japan.

The promoter landscape of several non-human model organisms is far from complete. As a part of FANTOM5 data collection, we generated 13 profiles of transcription initiation activities in dog and rat aortic smooth muscle cells, mesenchymal stem cells and hepatocytes by employing CAGE (Cap Analysis of Gene Expression) technology combined with single molecule sequencing. Our analyses show that the CAGE profiles recapitulate known transcription start sites (TSSs) consistently, in addition to uncover novel TSSs. Our dataset can be thus used with high confidence to support gene annotation in dog and rat species. We identified 28,497 and 23,147 CAGE peaks, or promoter regions, for rat and dog respectively, and associated them to known genes. This approach could be seen as a standard method for improvement of existing gene models, as well as discovery of novel genes. Given that the FANTOM5 data collection includes dog and rat matched cell types in human and mouse as well, this data would also be useful for cross-species studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.173DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5704677PMC
November 2017

Transcription start site profiling of 15 anatomical regions of the Macaca mulatta central nervous system.

Sci Data 2017 10 31;4:170163. Epub 2017 Oct 31.

German Center for Neurodegenerative Diseases, Otfried-Müller Straße 23, Tübingen 72076, Germany.

Rhesus macaque was the second non-human primate whose genome has been fully sequenced and is one of the most used model organisms to study human biology and disease, thanks to the close evolutionary relationship between the two species. But compared to human, where several previously unknown RNAs have been uncovered, the macaque transcriptome is less studied. Publicly available RNA expression resources for macaque are limited, even for brain, which is highly relevant to study human cognitive abilities. In an effort to complement those resources, FANTOM5 profiled 15 distinct anatomical regions of the aged macaque central nervous system using Cap Analysis of Gene Expression, a high-resolution, annotation-independent technology that allows monitoring of transcription initiation events with high accuracy. We identified 25,869 CAGE peaks, representing bona fide promoters. For each peak we provide detailed annotation, expanding the landscape of 'known' macaque genes, and we show concrete examples on how to use the resulting data. We believe this data represents a useful resource to understand the central nervous system in macaque.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.163DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5663209PMC
October 2017

Linking FANTOM5 CAGE peaks to annotations with CAGEscan.

Sci Data 2017 10 3;4:170147. Epub 2017 Oct 3.

RIKEN Center for Life Science Technologies, Division of Genomics Technologies, Yokohama 230-0045, Japan.

The FANTOM5 expression atlas is a quantitative measurement of the activity of nearly 200,000 promoter regions across nearly 2,000 different human primary cells, tissue types and cell lines. Generation of this atlas was made possible by the use of CAGE, an experimental approach to localise transcription start sites at single-nucleotide resolution by sequencing the 5' ends of capped RNAs after their conversion to cDNAs. While 50% of CAGE-defined promoter regions could be confidently associated to adjacent transcriptional units, nearly 100,000 promoter regions remained gene-orphan. To address this, we used the CAGEscan method, in which random-primed 5'-cDNAs are paired-end sequenced. Pairs starting in the same region are assembled in transcript models called CAGEscan clusters. Here, we present the production and quality control of CAGEscan libraries from 56 FANTOM5 RNA sources, which enhances the FANTOM5 expression atlas by providing experimental evidence associating core promoter regions with their cognate transcripts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.147DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5625555PMC
October 2017

FANTOM5 CAGE profiles of human and mouse samples.

Sci Data 2017 08 29;4:170112. Epub 2017 Aug 29.

Scottish Centre for Regenerative Medicine, University of Edinburgh, 5 Little France Drive, Edinburgh EH16 4UU, UK.

In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.112DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5574368PMC
August 2017

FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies.

Sci Data 2017 08 29;4:170107. Epub 2017 Aug 29.

RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.

The FANTOM5 consortium described the promoter-level expression atlas of human and mouse by using CAGE (Cap Analysis of Gene Expression) with single molecule sequencing. In the original publications, GRCh37/hg19 and NCBI37/mm9 assemblies were used as the reference genomes of human and mouse respectively; later, the Genome Reference Consortium released newer genome assemblies GRCh38/hg38 and GRCm38/mm10. To increase the utility of the atlas in forthcoming researches, we reprocessed the data to make them available on the recent genome assemblies. The data include observed frequencies of transcription starting sites (TSSs) based on the realignment of CAGE reads, and TSS peaks that are converted from those based on the previous reference. Annotations of the peak names were also updated based on the latest public databases. The reprocessed results enable us to examine frequencies of transcription initiations on the recent genome assemblies and to refer promoters with updated information across the genome assemblies consistently.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.107DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5574367PMC
August 2017

An integrated expression atlas of miRNAs and their promoters in human and mouse.

Nat Biotechnol 2017 Sep 21;35(9):872-878. Epub 2017 Aug 21.

Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Japan.

MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.3947DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5767576PMC
September 2017

An atlas of human long non-coding RNAs with accurate 5' ends.

Nature 2017 03 1;543(7644):199-204. Epub 2017 Mar 1.

Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane 4072, Australia.

Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature21374DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857182PMC
March 2017

Update of the FANTOM web resource: high resolution transcriptome of diverse cell types in mammals.

Nucleic Acids Res 2017 01 27;45(D1):D737-D743. Epub 2016 Oct 27.

Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologie, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan

Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw995DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210666PMC
January 2017

Mapping Mammalian Cell-type-specific Transcriptional Regulatory Networks Using KD-CAGE and ChIP-seq Data in the TC-YIK Cell Line.

Front Genet 2015 18;6:331. Epub 2015 Nov 18.

RIKEN Center for Life Science Technologies Yokohama, Japan ; Division of Genomic Technologies, RIKEN Center for Life Science Technologies Yokohama, Japan ; QEII Medical Centre and Centre for Medical Research, Harry Perkins Institute of Medical Research, The University of Western Australia Nedlands, WA, Australia.

Mammals are composed of hundreds of different cell types with specialized functions. Each of these cellular phenotypes are controlled by different combinations of transcription factors. Using a human non islet cell insulinoma cell line (TC-YIK) which expresses insulin and the majority of known pancreatic beta cell specific genes as an example, we describe a general approach to identify key cell-type-specific transcription factors (TFs) and their direct and indirect targets. By ranking all human TFs by their level of enriched expression in TC-YIK relative to a broad collection of samples (FANTOM5), we confirmed known key regulators of pancreatic function and development. Systematic siRNA mediated perturbation of these TFs followed by qRT-PCR revealed their interconnections with NEUROD1 at the top of the regulation hierarchy and its depletion drastically reducing insulin levels. For 15 of the TF knock-downs (KD), we then used Cap Analysis of Gene Expression (CAGE) to identify thousands of their targets genome-wide (KD-CAGE). The data confirm NEUROD1 as a key positive regulator in the transcriptional regulatory network (TRN), and ISL1, and PROX1 as antagonists. As a complimentary approach we used ChIP-seq on four of these factors to identify NEUROD1, LMX1A, PAX6, and RFX6 binding sites in the human genome. Examining the overlap between genes perturbed in the KD-CAGE experiments and genes with a ChIP-seq peak within 50 kb of their promoter, we identified direct transcriptional targets of these TFs. Integration of KD-CAGE and ChIP-seq data shows that both NEUROD1 and LMX1A work as the main transcriptional activators. In the core TRN (i.e., TF-TF only), NEUROD1 directly transcriptionally activates the pancreatic TFs HSF4, INSM1, MLXIPL, MYT1, NKX6-3, ONECUT2, PAX4, PROX1, RFX6, ST18, DACH1, and SHOX2, while LMX1A directly transcriptionally activates DACH1, SHOX2, PAX6, and PDX1. Analysis of these complementary datasets suggests the need for caution in interpreting ChIP-seq datasets. (1) A large fraction of binding sites are at distal enhancer sites and cannot be directly associated to their targets, without chromatin conformation data. (2) Many peaks may be non-functional: even when there is a peak at a promoter, the expression of the gene may not be affected in the matching perturbation experiment.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2015.00331DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4650373PMC
December 2015

Gateways to the FANTOM5 promoter level mammalian expression atlas.

Genome Biol 2015 Jan 5;16:22. Epub 2015 Jan 5.

The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). This resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-014-0560-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4310165PMC
January 2015

Technical Advance: Transcription factor, promoter, and enhancer utilization in human myeloid cells.

J Leukoc Biol 2015 May 25;97(5):985-995. Epub 2015 Feb 25.

*The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Scotland, United Kingdom; RIKEN Preventive Medicine and Diagnosis Innovation Program, Tsurumi-ku, Yokohama, Japan; RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, Japan; Department of Biosciences and Nutrition, Karolinska Institute, Huddinge, Sweden; Department of Dermatology and Allergy, Charité Universitätsmedizin Berlin, Germany; Department of Internal Medicine III, University Hospital, University of Regensburg, Germany; Department of Experimental Immunology, Academic Medical Center, Amsterdam, the Netherlands; and **RIKEN Omics Science Center, Tsurumi-ku, Yokohama, Japan

The generation of myeloid cells from their progenitors is regulated at the level of transcription by combinatorial control of key transcription factors influencing cell-fate choice. To unravel the global dynamics of this process at the transcript level, we generated transcription profiles for 91 human cell types of myeloid origin by use of CAGE profiling. The CAGE sequencing of these samples has allowed us to investigate diverse aspects of transcription control during myelopoiesis, such as identification of novel transcription factors, miRNAs, and noncoding RNAs specific to the myeloid lineage. We further reconstructed a transcription regulatory network by clustering coexpressed transcripts and associating them with enriched cis-regulatory motifs. With the use of the bidirectional expression as a proxy for enhancers, we predicted over 2000 novel enhancers, including an enhancer 38 kb downstream of and an intronic enhancer in the gene locus. Finally, we highlighted relevance of these data to dissect transcription dynamics during progressive maturation of granulocyte precursors. A multifaceted analysis of the myeloid transcriptome is made available (www.myeloidome.roslin.ed.ac.uk). This high-quality dataset provides a powerful resource to study transcriptional regulation during myelopoiesis and to infer the likely functions of unannotated genes in human innate immunity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1189/jlb.6TA1014-477RRDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4398258PMC
May 2015

Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells.

Science 2015 Feb 12;347(6225):1010-4. Epub 2015 Feb 12.

Although it is generally accepted that cellular differentiation requires changes to transcriptional networks, dynamic regulation of promoters and enhancers at specific sets of genes has not been previously studied en masse. Exploiting the fact that active promoters and enhancers are transcribed, we simultaneously measured their activity in 19 human and 14 mouse time courses covering a wide range of cell types and biological stimuli. Enhancer RNAs, then messenger RNAs encoding transcription factors, dominated the earliest responses. Binding sites for key lineage transcription factors were simultaneously overrepresented in enhancers and promoters active in each cellular system. Our data support a highly generalizable model in which enhancer transcription is the earliest event in successive waves of transcriptional change during cellular differentiation or activation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1259418DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681433PMC
February 2015

Interactive visualization and analysis of large-scale sequencing datasets using ZENBU.

Nat Biotechnol 2014 Mar;32(3):217-9

1] RIKEN Center for Life Science Technologies (Division of Genomic Technologies), Suehiro-cho, Tsurumi-ku, Yokohama, Japan. [2] RIKEN Omics Science Center (OSC), Yokohama, Japan.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.2840DOI Listing
March 2014

Transcriptional profiling of the human fibrillin/LTBP gene family, key regulators of mesenchymal cell functions.

Mol Genet Metab 2014 May 16;112(1):73-83. Epub 2013 Dec 16.

The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush EH25 9RG, UK; The University of Queensland Northside Clinical School, Prince Charles Hospital, Chermside 4032, Australia. Electronic address:

The fibrillins and latent transforming growth factor binding proteins (LTBPs) form a superfamily of extracellular matrix (ECM) proteins characterized by the presence of a unique domain, the 8-cysteine transforming growth factor beta (TGFβ) binding domain. These proteins are involved in the structure of the extracellular matrix and controlling the bioavailability of TGFβ family members. Genes encoding these proteins show differential expression in mesenchymal cell types which synthesize the extracellular matrix. We have investigated the promoter regions of the seven gene family members using the FANTOM5 CAGE database for human. While the protein and nucleotide sequences show considerable sequence similarity, the promoter regions were quite diverse. Most genes had a single predominant transcription start site region but LTBP1 and LTBP4 had two regions initiating different transcripts. Most of the family members were expressed in a range of mesenchymal and other cell types, often associated with use of alternative promoters or transcription start sites within a promoter in different cell types. FBN3 was the lowest expressed gene, and was found only in embryonic and fetal tissues. The different promoters for one gene were more similar to each other in expression than to promoters of the other family members. Notably expression of all 22 LTBP2 promoters was tightly correlated and quite distinct from all other family members. We located candidate enhancer regions likely to be involved in expression of the genes. Each gene was associated with a unique subset of transcription factors across multiple promoters although several motifs including MAZ, SP1, GTF2I and KLF4 showed overrepresentation across the gene family. FBN1 and FBN2, which had similar expression patterns, were regulated by different transcription factors. This study highlights the role of alternative transcription start sites in regulating the tissue specificity of closely related genes and suggests that this important class of extracellular matrix proteins is subject to subtle regulatory variations that explain the differential roles of members of this gene family.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ymgme.2013.12.006DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4019825PMC
May 2014

A promoter-level mammalian expression atlas.

Nature 2014 Mar;507(7493):462-70

Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body. We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles. TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved. Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs. The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses. The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature13182DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4529748PMC
March 2014

Reconstruction of monocyte transcriptional regulatory network accompanies monocytic functions in human fibroblasts.

PLoS One 2012 13;7(3):e33474. Epub 2012 Mar 13.

Omics Science Center, RIKEN Yokohama Institute, Yokohama, Kanagawa, Japan.

Transcriptional regulatory networks (TRN) control the underlying mechanisms behind cellular functions and they are defined by a set of core transcription factors regulating cascades of peripheral genes. Here we report SPI1, CEBPA, MNDA and IRF8 as core transcription factors of monocyte TRN and demonstrate functional inductions of phagocytosis, inflammatory response and chemotaxis activities in human dermal fibroblasts. The Gene Ontology and KEGG pathway analyses also revealed notable representation of genes involved in immune response and endocytosis in fibroblasts. Moreover, monocyte TRN-inducers triggered multiple monocyte-specific genes based on the transcription factor motif response analysis and suggest that complex cellular TRNs are uniquely amenable to elicit cell-specific functions in unrelated cell types.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0033474PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3302774PMC
August 2012

Promoter architecture of mouse olfactory receptor genes.

Genome Res 2012 Mar 22;22(3):486-97. Epub 2011 Dec 22.

RIKEN Yokohama Institute, Omics Science Center, Yokohama, Kanagawa, Japan.

Odorous chemicals are detected by the mouse main olfactory epithelium (MOE) by about 1100 types of olfactory receptors (OR) expressed by olfactory sensory neurons (OSNs). Each mature OSN is thought to express only one allele of a single OR gene. Major impediments to understand the transcriptional control of OR gene expression are the lack of a proper characterization of OR transcription start sites (TSSs) and promoters, and of regulatory transcripts at OR loci. We have applied the nanoCAGE technology to profile the transcriptome and the active promoters in the MOE. nanoCAGE analysis revealed the map and architecture of promoters for 87.5% of the mouse OR genes, as well as the expression of many novel noncoding RNAs including antisense transcripts. We identified candidate transcription factors for OR gene expression and among them confirmed by chromatin immunoprecipitation the binding of TBP, EBF1 (OLF1), and MEF2A to OR promoters. Finally, we showed that a short genomic fragment flanking the major TSS of the OR gene Olfr160 (M72) can drive OSN-specific expression in transgenic mice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.126201.111DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3290784PMC
March 2012

The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications.

J Biomed Semantics 2011 Aug 2;2. Epub 2011 Aug 2.

Database Center for Life Science, Research Organization of Information and Systems, 2-11-16 Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan.

Background: The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009.

Results: Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs.

Conclusions: Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/2041-1480-2-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3170566PMC
August 2011

A conditional knockout resource for the genome-wide study of mouse gene function.

Nature 2011 Jun 15;474(7351):337-42. Epub 2011 Jun 15.

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

Gene targeting in embryonic stem cells has become the principal technology for manipulation of the mouse genome, offering unrivalled accuracy in allele design and access to conditional mutagenesis. To bring these advantages to the wider research community, large-scale mouse knockout programmes are producing a permanent resource of targeted mutations in all protein-coding genes. Here we report the establishment of a high-throughput gene-targeting pipeline for the generation of reporter-tagged, conditional alleles. Computational allele design, 96-well modular vector construction and high-efficiency gene-targeting strategies have been combined to mutate genes on an unprecedented scale. So far, more than 12,000 vectors and 9,000 conditional targeted alleles have been produced in highly germline-competent C57BL/6N embryonic stem cells. High-throughput genome engineering highlighted by this study is broadly applicable to rat and human stem cells and provides a foundation for future genome-wide efforts aimed at deciphering the function of all genes encoded by the mammalian genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature10163DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3572410PMC
June 2011

Update of the FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation.

Nucleic Acids Res 2011 Jan 12;39(Database issue):D856-60. Epub 2010 Nov 12.

RIKEN Omics Science Center, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan.

The international Functional Annotation Of the Mammalian Genomes 4 (FANTOM4) research collaboration set out to better understand the transcriptional network that regulates macrophage differentiation and to uncover novel components of the transcriptome employing a series of high-throughput experiments. The primary and unique technique is cap analysis of gene expression (CAGE), sequencing mRNA 5'-ends with a second-generation sequencer to quantify promoter activities even in the absence of gene annotation. Additional genome-wide experiments complement the setup including short RNA sequencing, microarray gene expression profiling on large-scale perturbation experiments and ChIP-chip for epigenetic marks and transcription factors. All the experiments are performed in a differentiation time course of the THP-1 human leukemic cell line. Furthermore, we performed a large-scale mammalian two-hybrid (M2H) assay between transcription factors and monitored their expression profile across human and mouse tissues with qRT-PCR to address combinatorial effects of regulation by transcription factors. These interdependent data have been analyzed individually and in combination with each other and are published in related but distinct papers. We provide all data together with systematic annotation in an integrated view as resource for the scientific community (http://fantom.gsc.riken.jp/4/). Additionally, we assembled a rich set of derived analysis results including published predicted and validated regulatory interactions. Here we introduce the resource and its update after the initial release.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkq1112DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013704PMC
January 2011

Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan.

Nat Methods 2010 Jul 13;7(7):528-34. Epub 2010 Jun 13.

RIKEN Yokohama Institute, Omics Science Center, Yokohama, Japan.

Large-scale sequencing projects have revealed an unexpected complexity in the origins, structures and functions of mammalian transcripts. Many loci are known to produce overlapping coding and noncoding RNAs with capped 5' ends that vary in size. Methods to identify the 5' ends of transcripts will facilitate the discovery of new promoters and 5' ends derived from secondary capping events. Such methods often require high input amounts of RNA not obtainable from highly refined samples such as tissue microdissections and subcellular fractions. Therefore, we developed nano-cap analysis of gene expression (nanoCAGE), a method that captures the 5' ends of transcripts from as little as 10 ng of total RNA, and CAGEscan, a mate-pair adaptation of nanoCAGE that captures the transcript 5' ends linked to a downstream region. Both of these methods allow further annotation-agnostic studies of the complex human transcriptome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.1470DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2906222PMC
July 2010
-->