Publications by authors named "Takeya Kasukawa"

71 Publications

Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network.

Nat Commun 2021 06 2;12(1):3297. Epub 2021 Jun 2.

Institut de Biologie Computationnelle, Montpellier, France.

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-23143-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8172540PMC
June 2021

Automatic identification of small molecules that promote cell conversion and reprogramming.

Stem Cell Reports 2021 May 22;16(5):1381-1390. Epub 2021 Apr 22.

RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045 Japan; Graduate School of Integrated Sciences for Life, Hiroshima University, Kagamiyama, Higashi-Hiroshima, 739-8528 Japan. Electronic address:

Controlling cell fate has great potential for regenerative medicine, drug discovery, and basic research. Although transcription factors are able to promote cell reprogramming and transdifferentiation, methods based on their upregulation often show low efficiency. Small molecules that can facilitate conversion between cell types can ameliorate this problem working through safe, rapid, and reversible mechanisms. Here, we present DECCODE, an unbiased computational method for identification of such molecules based on transcriptional data. DECCODE matches a large collection of drug-induced profiles for drug treatments against a large dataset of primary cell transcriptional profiles to identify drugs that either alone or in combination enhance cell reprogramming and cell conversion. Extensive validation in the context of human induced pluripotent stem cells shows that DECCODE is able to prioritize drugs and drug combinations enhancing cell reprogramming. We also provide predictions for cell conversion with single drugs and drug combinations for 145 different cell types.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.stemcr.2021.03.028DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8185468PMC
May 2021

FANTOM enters 20th year: expansion of transcriptomic atlases and functional annotation of non-coding RNAs.

Nucleic Acids Res 2021 01;49(D1):D892-D898

RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.

The Functional ANnoTation Of the Mammalian genome (FANTOM) Consortium has continued to provide extensive resources in the pursuit of understanding the transcriptome, and transcriptional regulation, of mammalian genomes for the last 20 years. To share these resources with the research community, the FANTOM web-interfaces and databases are being regularly updated, enhanced and expanded with new data types. In recent years, the FANTOM Consortium's efforts have been mainly focused on creating new non-coding RNA datasets and resources. The existing FANTOM5 human and mouse miRNA atlas was supplemented with rat, dog, and chicken datasets. The sixth (latest) edition of the FANTOM project was launched to assess the function of human long non-coding RNAs (lncRNAs). From its creation until 2020, FANTOM6 has contributed to the research community a large dataset generated from the knock-down of 285 lncRNAs in human dermal fibroblasts; this is followed with extensive expression profiling and cellular phenotyping. Other updates to the FANTOM resource includes the reprocessing of the miRNA and promoter atlases of human, mouse and chicken with the latest reference genome assemblies. To facilitate the use and accessibility of all above resources we further enhanced FANTOM data viewers and web interfaces. The updated FANTOM web resource is publicly available at https://fantom.gsc.riken.jp/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa1054DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7779024PMC
January 2021

Functional annotation of human long noncoding RNAs via molecular phenotyping.

Genome Res 2020 07 27;30(7):1060-1072. Epub 2020 Jul 27.

Department of Computational Systems Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia.

Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-to-date lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for and .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.254219.119DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7397864PMC
July 2020

Comparative transcriptomics of primary cells in vertebrates.

Genome Res 2020 07 27;30(7):951-961. Epub 2020 Jul 27.

RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan.

Gene expression profiles in homologous tissues have been observed to be different between species, which may be due to differences between species in the gene expression program in each cell type, but may also reflect differences in cell type composition of each tissue in different species. Here, we compare expression profiles in matching primary cells in human, mouse, rat, dog, and chicken using Cap Analysis Gene Expression (CAGE) and short RNA (sRNA) sequencing data from FANTOM5. While we find that expression profiles of orthologous genes in different species are highly correlated across cell types, in each cell type many genes were differentially expressed between species. Expression of genes with products involved in transcription, RNA processing, and transcriptional regulation was more likely to be conserved, while expression of genes encoding proteins involved in intercellular communication was more likely to have diverged during evolution. Conservation of expression correlated positively with the evolutionary age of genes, suggesting that divergence in expression levels of genes critical for cell function was restricted during evolution. Motif activity analysis showed that both promoters and enhancers are activated by the same transcription factors in different species. An analysis of expression levels of mature miRNAs and of primary miRNAs identified by CAGE revealed that evolutionary old miRNAs are more likely to have conserved expression patterns than young miRNAs. We conclude that key aspects of the regulatory network are conserved, while differential expression of genes involved in cell-to-cell communication may contribute greatly to phenotypic differences between species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.255679.119DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7397866PMC
July 2020

The Number of Transcription Factors at an Enhancer Determines Switch-like Gene Expression.

Cell Rep 2020 06;31(9):107724

Institute for Protein Research, Osaka University, Suita, Osaka 565-0871, Japan; RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan; Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan. Electronic address:

NF-κB is a transcription factor that activates super enhancers (SEs) and typical enhancers (TEs) and triggers threshold and graded gene expression, respectively. However, the mechanisms by which NF-κB selectively participates in these enhancers remain unclear. Here we show using mouse primary B lymphocytes that SE activity simultaneously associates with chromatin opening and enriched NF-κB binding, resulting in a higher fold change and threshold expression upon B cell receptor (BCR) activation. The higher fold change results from longer DNA, whereas the threshold response is explained by synergy in DNA-NF-κB binding and is supported by the coexistence of PU.1 and NF-κB in a SE before cell stimulation. This model indicates that the pre-existing NF-κB functions as a seed and triggers its processive binding upon BCR activation. Our mathematical modeling of the single-cell transcriptome reveals an additional role for SEs in divergent clonal responses in B cells.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2020.107724DOI Listing
June 2020

RADICL-seq identifies general and cell type-specific principles of genome-wide RNA-chromatin interactions.

Nat Commun 2020 02 24;11(1):1018. Epub 2020 Feb 24.

RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.

Mammalian genomes encode tens of thousands of noncoding RNAs. Most noncoding transcripts exhibit nuclear localization and several have been shown to play a role in the regulation of gene expression and chromatin remodeling. To investigate the function of such RNAs, methods to massively map the genomic interacting sites of multiple transcripts have been developed; however, these methods have some limitations. Here, we introduce RNA And DNA Interacting Complexes Ligated and sequenced (RADICL-seq), a technology that maps genome-wide RNA-chromatin interactions in intact nuclei. RADICL-seq is a proximity ligation-based methodology that reduces the bias for nascent transcription, while increasing genomic coverage and unique mapping rate efficiency compared with existing methods. RADICL-seq identifies distinct patterns of genome occupancy for different classes of transcripts as well as cell type-specific RNA-chromatin interactions, and highlights the role of transcription in the establishment of chromatin structure.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-14337-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7039879PMC
February 2020

Recounting the FANTOM CAGE-Associated Transcriptome.

Genome Res 2020 07 20;30(7):1073-1081. Epub 2020 Feb 20.

Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21827, USA.

Long noncoding RNAs (lncRNAs) have emerged as key coordinators of biological and cellular processes. Characterizing lncRNA expression across cells and tissues is key to understanding their role in determining phenotypes, including human diseases. We present here FC-R2, a comprehensive expression atlas across a broadly defined human transcriptome, inclusive of over 109,000 coding and noncoding genes, as described in the FANTOM CAGE-Associated Transcriptome (FANTOM-CAT) study. This atlas greatly extends the gene annotation used in the original resource. We demonstrate the utility of the FC-R2 atlas by reproducing key findings from published large studies and by generating new results across normal and diseased human samples. In particular, we (a) identify tissue-specific transcription profiles for distinct classes of coding and noncoding genes, (b) perform differential expression analysis across thirteen cancer types, identifying novel noncoding genes potentially involved in tumor pathogenesis and progression, and (c) confirm the prognostic value for several enhancer lncRNAs expression in cancer. Our resource is instrumental for the systematic molecular characterization of lncRNA by the FANTOM6 Consortium. In conclusion, comprised of over 70,000 samples, the FC-R2 atlas will empower other researchers to investigate functions and biological roles of both known coding genes and novel lncRNAs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.254656.119DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7397872PMC
July 2020

Mesenchymal-epithelial transition regulates initiation of pluripotency exit before gastrulation.

Development 2020 02 3;147(3). Epub 2020 Feb 3.

International Research Center for Medical Sciences (IRCMS), Kumamoto University, Kumamoto 860-0811, Japan

The pluripotent epiblast gives rise to all tissues and organs in the adult body. Its differentiation starts at gastrulation, when the epiblast generates mesoderm and endoderm germ layers through epithelial-mesenchymal transition (EMT). Although gastrulation EMT coincides with loss of epiblast pluripotency, pluripotent cells in development and can adopt either mesenchymal or epithelial morphology. The relationship between epiblast cellular morphology and its pluripotency is not well understood. Here, using chicken epiblast and mammalian pluripotency stem cell (PSC) models, we show that PSCs undergo a mesenchymal-epithelial transition (MET) prior to EMT-associated pluripotency loss. Epiblast MET and its subsequent EMT are two distinct processes. The former, a partial MET, is associated with reversible initiation of pluripotency exit, whereas the latter, a full EMT, is associated with complete and irreversible pluripotency loss. We provide evidence that integrin-mediated cell-matrix interaction is a key player in pluripotency exit regulation. We propose that epiblast partial MET is an evolutionarily conserved process among all amniotic vertebrates and that epiblast pluripotency is restricted to an intermediate cellular state residing between the fully mesenchymal and fully epithelial states.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1242/dev.184960DOI Listing
February 2020

refTSS: A Reference Data Set for Human and Mouse Transcription Start Sites.

J Mol Biol 2019 06 8;431(13):2407-2422. Epub 2019 May 8.

RIKEN Center for Integrative Medical Sciences, 1-7-22, Suehiro-Cho, Tsurumi-Ku, Yokohama, Kanagawa 230-0045, Japan. Electronic address:

Transcription starts at genomic positions called transcription start sites (TSSs), producing RNAs, and is mainly regulated by genomic elements and transcription factors binding around these TSSs. This indicates that TSSs may be a better unit to integrate various data sources related to transcriptional events, including regulation and production of RNAs. However, although several TSS datasets and promoter atlases are available, a comprehensive reference set that integrates all known TSSs is lacking. Thus, we constructed a reference dataset of TSSs (refTSS) for the human and mouse genomes by collecting publicly available TSS annotations and promoter resources, such as FANTOM5, DBTSS, EPDnew, and ENCODE. The data set consists of genomic coordinates of TSS peaks, their gene annotations, quality check results, and conservation between human and mouse. We also developed a web interface to browse the refTSS (http://reftss.clst.riken.jp/). Users can access the resource for collecting and integrating data and information about transcriptional regulation and transcription products.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jmb.2019.04.045DOI Listing
June 2019

C1 CAGE detects transcription start sites and enhancer activity at single-cell resolution.

Nat Commun 2019 01 21;10(1):360. Epub 2019 Jan 21.

RIKEN Center for Integrative Medical Sciences (IMS), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan.

Single-cell transcriptomic profiling is a powerful tool to explore cellular heterogeneity. However, most of these methods focus on the 3'-end of polyadenylated transcripts and provide only a partial view of the transcriptome. We introduce C1 CAGE, a method for the detection of transcript 5'-ends with an original sample multiplexing strategy in the C1 microfluidic system. We first quantifiy the performance of C1 CAGE and find it as accurate and sensitive as other methods in the C1 system. We then use it to profile promoter and enhancer activities in the cellular response to TGF-β of lung cancer cells and discover subpopulations of cells differing in their response. We also describe enhancer RNA dynamics revealing transcriptional bursts in subsets of cells with transcripts arising from either strand in a mutually exclusive manner, validated using single molecule fluorescence in situ hybridization.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-08126-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6341120PMC
January 2019

Author Correction: Transcription start site profiling of 15 anatomical regions of the Macaca mulatta central nervous system.

Sci Data 2018 12 11;5(1). Epub 2018 Dec 11.

German Center for Neurodegenerative Diseases, Otfried-Müller Straße 23, Tübingen, 72076, Germany.

The authors regret that Luba M. Pardo was omitted in error from the author list of the original version of this Data Descriptor. This omission has now been corrected in the HTML and PDF versions. The authors also regret that Anemieke Rozemuller was omitted in error from the Acknowledgements of the original version of this Data Descriptor. This omission has now been corrected in the HTML and PDF versions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-018-0003-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6300047PMC
December 2018

Update of the FANTOM web resource: expansion to provide additional transcriptome atlases.

Nucleic Acids Res 2019 01;47(D1):D752-D758

RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.

The FANTOM web resource (http://fantom.gsc.riken.jp/) was developed to provide easy access to the data produced by the FANTOM project. It contains the most complete and comprehensive sets of actively transcribed enhancers and promoters in the human and mouse genomes. We determined the transcription activities of these regulatory elements by CAGE (Cap Analysis of Gene Expression) for both steady and dynamic cellular states in all major and some rare cell types, consecutive stages of differentiation and responses to stimuli. We have expanded the resource by employing different assays, such as RNA-seq, short RNA-seq and a paired-end protocol for CAGE (CAGEscan), to provide new angles to study the transcriptome. That yielded additional atlases of long noncoding RNAs, miRNAs and their promoters. We have also expanded the CAGE analysis to cover rat, dog, chicken, and macaque species for a limited number of cell types. The CAGE data obtained from human and mouse were reprocessed to make them available on the latest genome assemblies. Here, we report the recent updates of both data and interfaces in the FANTOM web resource.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky1099DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323950PMC
January 2019

Muscarinic Acetylcholine Receptors Chrm1 and Chrm3 Are Essential for REM Sleep.

Cell Rep 2018 08;24(9):2231-2247.e7

Laboratory for Synthetic Biology, RIKEN Center for Biosystems Dynamics Research, 1-3 Yamadaoka, Suita, Osaka 565-0871, Japan; Department of Systems Pharmacology, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan; International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo Institutes for Advanced Study (UTIAS), The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan. Electronic address:

Sleep regulation involves interdependent signaling among specialized neurons in distributed brain regions. Although acetylcholine promotes wakefulness and rapid eye movement (REM) sleep, it is unclear whether the cholinergic pathway is essential (i.e., absolutely required) for REM sleep because of redundancy from neural circuits to molecules. First, we demonstrate that synaptic inhibition of TrkA+ cholinergic neurons causes a severe short-sleep phenotype and that sleep reduction is mostly attributable to a shortened sleep duration in the dark phase. Subsequent comprehensive knockout of acetylcholine receptor genes by the triple-target CRISPR method reveals that a similar short-sleep phenotype appears in the knockout of two Gq-type acetylcholine receptors Chrm1 and Chrm3. Strikingly, Chrm1 and Chrm3 double knockout chronically diminishes REM sleep to an almost undetectable level. These results suggest that muscarinic acetylcholine receptors, Chrm1 and Chrm3, are essential for REM sleep.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2018.07.082DOI Listing
August 2018

Monitoring transcription initiation activities in rat and dog.

Sci Data 2017 11 28;4:170173. Epub 2017 Nov 28.

RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama 230-0045, Japan.

The promoter landscape of several non-human model organisms is far from complete. As a part of FANTOM5 data collection, we generated 13 profiles of transcription initiation activities in dog and rat aortic smooth muscle cells, mesenchymal stem cells and hepatocytes by employing CAGE (Cap Analysis of Gene Expression) technology combined with single molecule sequencing. Our analyses show that the CAGE profiles recapitulate known transcription start sites (TSSs) consistently, in addition to uncover novel TSSs. Our dataset can be thus used with high confidence to support gene annotation in dog and rat species. We identified 28,497 and 23,147 CAGE peaks, or promoter regions, for rat and dog respectively, and associated them to known genes. This approach could be seen as a standard method for improvement of existing gene models, as well as discovery of novel genes. Given that the FANTOM5 data collection includes dog and rat matched cell types in human and mouse as well, this data would also be useful for cross-species studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.173DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5704677PMC
November 2017

Self-patterning of rostral-caudal neuroectoderm requires dual role of Fgf signaling for localized Wnt antagonism.

Nat Commun 2017 11 7;8(1):1339. Epub 2017 Nov 7.

Laboratory for Organogenesis and Neurogenesis, RIKEN Center for Developmental Biology, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan.

The neuroectoderm is patterned along a rostral-caudal axis in response to localized factors in the embryo, but exactly how these factors act as positional information for this patterning is not yet fully understood. Here, using the self-organizing properties of mouse embryonic stem cell (ESC), we report that ESC-derived neuroectoderm self-generates a Six3 rostral and a Irx3 caudal bipolarized patterning. In this instance, localized Fgf signaling performs dual roles, as it regulates Six3 rostral polarization at an earlier stage and promotes Wnt signaling at a later stage. The Wnt signaling components are differentially expressed in the polarized tissues, leading to genome-wide Irx3 caudal-polarization signals. Surprisingly, differentially expressed Wnt agonists and antagonists have essential roles in orchestrating the formation of a balanced rostral-caudal neuroectoderm pattern. Together, our findings provide key processes for dynamic self-patterning and evidence that a temporally and locally regulated interaction between Fgf and Wnt signaling controls self-patterning in ESC-derived neuroectoderm.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-017-01105-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5673904PMC
November 2017

Transcription start site profiling of 15 anatomical regions of the Macaca mulatta central nervous system.

Sci Data 2017 10 31;4:170163. Epub 2017 Oct 31.

German Center for Neurodegenerative Diseases, Otfried-Müller Straße 23, Tübingen 72076, Germany.

Rhesus macaque was the second non-human primate whose genome has been fully sequenced and is one of the most used model organisms to study human biology and disease, thanks to the close evolutionary relationship between the two species. But compared to human, where several previously unknown RNAs have been uncovered, the macaque transcriptome is less studied. Publicly available RNA expression resources for macaque are limited, even for brain, which is highly relevant to study human cognitive abilities. In an effort to complement those resources, FANTOM5 profiled 15 distinct anatomical regions of the aged macaque central nervous system using Cap Analysis of Gene Expression, a high-resolution, annotation-independent technology that allows monitoring of transcription initiation events with high accuracy. We identified 25,869 CAGE peaks, representing bona fide promoters. For each peak we provide detailed annotation, expanding the landscape of 'known' macaque genes, and we show concrete examples on how to use the resulting data. We believe this data represents a useful resource to understand the central nervous system in macaque.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.163DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5663209PMC
October 2017

SCPortalen: human and mouse single-cell centric database.

Nucleic Acids Res 2018 01;46(D1):D781-D787

Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies (CLST), Yokohama, Kanagawa 230-0045, Japan.

Published single-cell datasets are rich resources for investigators who want to address questions not originally asked by the creators of the datasets. The single-cell datasets might be obtained by different protocols and diverse analysis strategies. The main challenge in utilizing such single-cell data is how we can make the various large-scale datasets to be comparable and reusable in a different context. To challenge this issue, we developed the single-cell centric database 'SCPortalen' (http://single-cell.clst.riken.jp/). The current version of the database covers human and mouse single-cell transcriptomics datasets that are publicly available from the INSDC sites. The original metadata was manually curated and single-cell samples were annotated with standard ontology terms. Following that, common quality assessment procedures were conducted to check the quality of the raw sequence. Furthermore, primary data processing of the raw data followed by advanced analyses and interpretation have been performed from scratch using our pipeline. In addition to the transcriptomics data, SCPortalen provides access to single-cell image files whenever available. The target users of SCPortalen are all researchers interested in specific cell types or population heterogeneity. Through the web interface of SCPortalen users are easily able to search, explore and download the single-cell datasets of their interests.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx949DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753281PMC
January 2018

Linking FANTOM5 CAGE peaks to annotations with CAGEscan.

Sci Data 2017 10 3;4:170147. Epub 2017 Oct 3.

RIKEN Center for Life Science Technologies, Division of Genomics Technologies, Yokohama 230-0045, Japan.

The FANTOM5 expression atlas is a quantitative measurement of the activity of nearly 200,000 promoter regions across nearly 2,000 different human primary cells, tissue types and cell lines. Generation of this atlas was made possible by the use of CAGE, an experimental approach to localise transcription start sites at single-nucleotide resolution by sequencing the 5' ends of capped RNAs after their conversion to cDNAs. While 50% of CAGE-defined promoter regions could be confidently associated to adjacent transcriptional units, nearly 100,000 promoter regions remained gene-orphan. To address this, we used the CAGEscan method, in which random-primed 5'-cDNAs are paired-end sequenced. Pairs starting in the same region are assembled in transcript models called CAGEscan clusters. Here, we present the production and quality control of CAGEscan libraries from 56 FANTOM5 RNA sources, which enhances the FANTOM5 expression atlas by providing experimental evidence associating core promoter regions with their cognate transcripts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.147DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5625555PMC
October 2017

Systematic analysis of transcription start sites in avian development.

PLoS Biol 2017 Sep 5;15(9):e2002887. Epub 2017 Sep 5.

International Research Center for Medical Sciences (IRCMS), Kumamoto University, Kumamoto, Japan.

Cap Analysis of Gene Expression (CAGE) in combination with single-molecule sequencing technology allows precision mapping of transcription start sites (TSSs) and genome-wide capture of promoter activities in differentiated and steady state cell populations. Much less is known about whether TSS profiling can characterize diverse and non-steady state cell populations, such as the approximately 400 transitory and heterogeneous cell types that arise during ontogeny of vertebrate animals. To gain such insight, we used the chick model and performed CAGE-based TSS analysis on embryonic samples covering the full 3-week developmental period. In total, 31,863 robust TSS peaks (>1 tag per million [TPM]) were mapped to the latest chicken genome assembly, of which 34% to 46% were active in any given developmental stage. ZENBU, a web-based, open-source platform, was used for interactive data exploration. TSSs of genes critical for lineage differentiation could be precisely mapped and their activities tracked throughout development, suggesting that non-steady state and heterogeneous cell populations are amenable to CAGE-based transcriptional analysis. Our study also uncovered a large set of extremely stable housekeeping TSSs and many novel stage-specific ones. We furthermore demonstrated that TSS mapping could expedite motif-based promoter analysis for regulatory modules associated with stage-specific and housekeeping genes. Finally, using Brachyury as an example, we provide evidence that precise TSS mapping in combination with Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-on technology enables us, for the first time, to efficiently target endogenous avian genes for transcriptional activation. Taken together, our results represent the first report of genome-wide TSS mapping in birds and the first systematic developmental TSS analysis in any amniote species (birds and mammals). By facilitating promoter-based molecular analysis and genetic manipulation, our work also underscores the value of avian models in unravelling the complex regulatory mechanism of cell lineage specification during amniote development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pbio.2002887DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5600399PMC
September 2017

The FANTOM5 collection, a data series underpinning mammalian transcriptome atlases in diverse cell types.

Sci Data 2017 08 29;4:170113. Epub 2017 Aug 29.

RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama 351-0198, Japan.

The latest project from the FANTOM consortium, an international collaborative effort initiated by RIKEN, generated atlases of transcriptomes, in particular promoters, transcribed enhancers, and long-noncoding RNAs, across a diverse set of mammalian cell types. Here, we introduce the FANTOM5 collection, bringing together data descriptors, articles and analyses of FANTOM5 data published across the Nature Research journals. Associated data are openly available for reuse by all.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.113DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5574373PMC
August 2017

FANTOM5 CAGE profiles of human and mouse samples.

Sci Data 2017 08 29;4:170112. Epub 2017 Aug 29.

Scottish Centre for Regenerative Medicine, University of Edinburgh, 5 Little France Drive, Edinburgh EH16 4UU, UK.

In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.112DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5574368PMC
August 2017

FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies.

Sci Data 2017 08 29;4:170107. Epub 2017 Aug 29.

RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.

The FANTOM5 consortium described the promoter-level expression atlas of human and mouse by using CAGE (Cap Analysis of Gene Expression) with single molecule sequencing. In the original publications, GRCh37/hg19 and NCBI37/mm9 assemblies were used as the reference genomes of human and mouse respectively; later, the Genome Reference Consortium released newer genome assemblies GRCh38/hg38 and GRCm38/mm10. To increase the utility of the atlas in forthcoming researches, we reprocessed the data to make them available on the recent genome assemblies. The data include observed frequencies of transcription starting sites (TSSs) based on the realignment of CAGE reads, and TSS peaks that are converted from those based on the previous reference. Annotations of the peak names were also updated based on the latest public databases. The reprocessed results enable us to examine frequencies of transcription initiations on the recent genome assemblies and to refer promoters with updated information across the genome assemblies consistently.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.107DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5574367PMC
August 2017

An integrated expression atlas of miRNAs and their promoters in human and mouse.

Nat Biotechnol 2017 Sep 21;35(9):872-878. Epub 2017 Aug 21.

Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Japan.

MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.3947DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5767576PMC
September 2017

The FANTOM5 Computation Ecosystem: Genomic Information Hub for Promoters and Active Enhancers.

Methods Mol Biol 2017 ;1611:199-217

Division of Genomic Technologies, RIKEN Center for Life Science Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.

The Functional Annotation of the Mammalian Genome 5 (FANTOM5) project conducted transcriptome analysis of various mammalian cell types and provided a comprehensive resource to understand transcriptome and transcriptional regulation in individual cellular states encoded in the genome.FANTOM5 used cap analysis of gene expression (CAGE) with single-molecule sequencing to map transcription start sites (TSS) and measured their expression in a diverse range of samples. The main results from FANTOM5 were published as a promoter-level mammalian expression atlas and an atlas of active enhancers across human cell types. The FANTOM5 dataset is composed of raw experimental data and the results of bioinformatics analyses. In this chapter, we give a detailed description of the content of the FANTOM5 dataset and elaborate on different computing applications developed to publish the data and enable reproducibility and discovery of new findings. We present use cases in which the FANTOM5 dataset has been reused, leading to new findings.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-7015-5_15DOI Listing
February 2018

An atlas of human long non-coding RNAs with accurate 5' ends.

Nature 2017 03 1;543(7644):199-204. Epub 2017 Mar 1.

Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane 4072, Australia.

Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature21374DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857182PMC
March 2017

Genome Annotation.

Methods Mol Biol 2017 ;1525:107-121

Division of Genomic Technologies, RIKEN Center for Life Science Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.

The dynamic structure and functions of genomes are being revealed simultaneously with the progress of technologies and researches in genomics. Evidence indicating genome regional characteristics (genome annotations in a broad sense) provide the basis for further analyses. Target listing and screening can be effectively performed in silico using such data. Here, we describe steps to obtain publicly available genome annotations or to construct new annotations based on your own analyses, as well as an overview of the types of available genome annotations and corresponding resources.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-6622-6_5DOI Listing
January 2018

Update of the FANTOM web resource: high resolution transcriptome of diverse cell types in mammals.

Nucleic Acids Res 2017 01 27;45(D1):D737-D743. Epub 2016 Oct 27.

Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologie, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan

Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw995DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210666PMC
January 2017

FANTOM5 transcriptome catalog of cellular states based on Semantic MediaWiki.

Database (Oxford) 2016 9;2016. Epub 2016 Jul 9.

Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies (CLST), Kanagawa 230-0045, Japan RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama 351-0198, Japan Preventive Medicine and Applied Genomics Unit, RIKEN Advanced Center for Computing and Communication, Kanagawa 230-0045, Japan

The Functional Annotation of the Mammalian Genome project (FANTOM5) mapped transcription start sites (TSSs) and measured their activities in a diverse range of biological samples. The FANTOM5 project generated a large data set; including detailed information about the profiled samples, the uncovered TSSs at high base-pair resolution on the genome, their transcriptional initiation activities, and further information of transcriptional regulation. Data sets to explore transcriptome in individual cellular states encoded in the mammalian genomes have been enriched by a series of additional analysis, based on the raw experimental data, along with the progress of the research activities. To make the heterogeneous data set accessible and useful for investigators, we developed a web-based database called Semantic catalog of Samples, Transcription initiation And Regulators (SSTAR). SSTAR utilizes the open source wiki software MediaWiki along with the Semantic MediaWiki (SMW) extension, which provides flexibility to model, store, and display a series of data sets produced during the course of the FANTOM5 project. Our use of SMW demonstrates the utility of the framework for dissemination of large-scale analysis results. SSTAR is a case study in handling biological data generated from a large-scale research project in terms of maintenance and growth alongside research activities.Database URL: http://fantom.gsc.riken.jp/5/sstar/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/baw105DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4940433PMC
November 2017