Publications by authors named "Yulia A Medvedeva"

33 Publications

Genome-wide regulation of CpG methylation by ecCEBPα in acute myeloid leukemia.

F1000Res 2021 11;10:204. Epub 2021 Mar 11.

Department of Biological and Medical Physics, Moscow Institute of Physics and Technology, Moscow, Russian Federation.

Acute myeloid leukemia (AML) is a hematopoietic malignancy characterized by genetic and epigenetic aberrations that alter the differentiation capacity of myeloid progenitor cells. The transcription factor is frequently mutated in AML patients leading to an increase in DNA methylation in many genomic locations. Previously, it has been shown that (extra coding CEBP ) - a lncRNA transcribed in the same direction as gene - regulates DNA methylation of promoter in Here, we hypothesize that could participate in the regulation of DNA methylation in . : First, we retrieved the methylation profile of AML patients with mutated locus from The Cancer Genome Atlas (TCGA). We then predicted the secondary structure in order to check the potential of to form triplexes around CpG loci and checked if triplex formation influenced CpG methylation, genome-wide. Using DNA methylation profiles of AML patients with a mutated locus, we show that could interact with DNA by forming DNA:RNA triple helices and protect regions near its binding sites from global DNA methylation. Further analysis revealed that triplex-forming oligonucleotides in are structurally unpaired supporting the DNA-binding potential of these regions. triplexes supported with the RNA-chromatin co-localization data are located in the promoters of leukemia-linked transcriptional factors such as MLF2. Overall, these results suggest a novel regulatory mechanism for as a genome-wide epigenetic modulator through triple-helix formation which may provide a foundation for sequence-specific engineering of RNA for regulating methylation of specific genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.28146.2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8444155.2PMC
October 2021

MIREyA: a computational approach to detect miRNA-directed gene activation.

F1000Res 2021 29;10:249. Epub 2021 Mar 29.

Group of Regulatory Transcriptomics and Epigenomics, Research Center of Biotechnology, Institute of Bioengineering, Russian Academy of Sciences, Moscow, 117312, Russian Federation.

Emerging studies demonstrate the ability of microRNAs (miRNAs) to activate genes via different mechanisms. Specifically, miRNAs may trigger an enhancer promoting chromatin remodelling in the enhancer region, thus activating the enhancer and its target genes. Here we present MIREyA, a pipeline developed to predict such miRNA-gene-enhancer trios based on an expression dataset which obviates the need to write custom scripts. We applied our pipeline to primary murine macrophages infected by (HN878 strain) and detected Mir22, Mir221, Mir222, Mir155 and Mir1956, which could up-regulate genes related to immune responses. We believe that MIREyA is a useful tool for detecting putative miRNA-directed gene activation cases. MIREyA is available from:  https://github.com/veania/MIREyA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.28142.2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8411277PMC
October 2021

Computational analysis of sense-antisense chimeric transcripts reveals their potential regulatory features and the landscape of expression in human cells.

NAR Genom Bioinform 2021 Sep 25;3(3):lqab074. Epub 2021 Aug 25.

Cancer Genomics and BioComputing of Complex Diseases Lab, Azrieli Faculty of Medicine, Bar-Ilan University, Safed 1311502, Israel.

Many human genes are transcribed from both strands and produce sense-antisense gene pairs. Sense-antisense (SAS) chimeric transcripts are produced upon the coalescing of exons/introns from both sense and antisense transcripts of the same gene. SAS chimera was first reported in prostate cancer cells. Subsequently, numerous SAS chimeras have been reported in the ChiTaRS-2.1 database. However, the landscape of their expression in human cells and functional aspects are still unknown. We found that longer palindromic sequences are a unique feature of SAS chimeras. Structural analysis indicates that a long hairpin-like structure formed by many consecutive Watson-Crick base pairs appears because of these long palindromic sequences, which possibly play a similar role as double-stranded RNA (dsRNA), interfering with gene expression. RNA-RNA interaction analysis suggested that SAS chimeras could significantly interact with their parental mRNAs, indicating their potential regulatory features. Here, 267 SAS chimeras were mapped in RNA-seq data from 16 healthy human tissues, revealing their expression in normal cells. Evolutionary analysis suggested the positive selection favoring sense-antisense fusions that significantly impacted the evolution of their function and structure. Overall, our study provides detailed insight into the expression landscape of SAS chimeras in human cells and identifies potential regulatory features.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nargab/lqab074DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8386243PMC
September 2021

A Catalogus Immune Muris of the mouse immune responses to diverse pathogens.

Cell Death Dis 2021 08 17;12(9):798. Epub 2021 Aug 17.

Computational Biology Group, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, L-4362, Esch-sur-Alzette, Luxembourg.

Immunomodulation strategies are crucial for several biomedical applications. However, the immune system is highly heterogeneous and its functional responses to infections remains elusive. Indeed, the characterization of immune response particularities to different pathogens is needed to identify immunomodulatory candidates. To address this issue, we compiled a comprehensive map of functional immune cell states of mouse in response to 12 pathogens. To create this atlas, we developed a single-cell-based computational method that partitions heterogeneous cell types into functionally distinct states and simultaneously identifies modules of functionally relevant genes characterizing them. We identified 295 functional states using 114 datasets of six immune cell types, creating a Catalogus Immune Muris. As a result, we found common as well as pathogen-specific functional states and experimentally characterized the function of an unknown macrophage cell state that modulates the response to Salmonella Typhimurium infection. Thus, we expect our Catalogus Immune Muris to be an important resource for studies aiming at discovering new immunomodulatory candidates.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41419-021-04075-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8370971PMC
August 2021

Functional annotation of human long noncoding RNAs via molecular phenotyping.

Genome Res 2020 07 27;30(7):1060-1072. Epub 2020 Jul 27.

Department of Computational Systems Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia.

Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-to-date lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for and .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.254219.119DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7397864PMC
July 2020

RADICL-seq identifies general and cell type-specific principles of genome-wide RNA-chromatin interactions.

Nat Commun 2020 02 24;11(1):1018. Epub 2020 Feb 24.

RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.

Mammalian genomes encode tens of thousands of noncoding RNAs. Most noncoding transcripts exhibit nuclear localization and several have been shown to play a role in the regulation of gene expression and chromatin remodeling. To investigate the function of such RNAs, methods to massively map the genomic interacting sites of multiple transcripts have been developed; however, these methods have some limitations. Here, we introduce RNA And DNA Interacting Complexes Ligated and sequenced (RADICL-seq), a technology that maps genome-wide RNA-chromatin interactions in intact nuclei. RADICL-seq is a proximity ligation-based methodology that reduces the bias for nascent transcription, while increasing genomic coverage and unique mapping rate efficiency compared with existing methods. RADICL-seq identifies distinct patterns of genome occupancy for different classes of transcripts as well as cell type-specific RNA-chromatin interactions, and highlights the role of transcription in the establishment of chromatin structure.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-14337-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7039879PMC
February 2020

Practical Guidance in Genome-Wide RNA:DNA Triple Helix Prediction.

Int J Mol Sci 2020 Jan 28;21(3). Epub 2020 Jan 28.

Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Science, 117312 Moscow, Russia.

Long noncoding RNAs (lncRNAs) play a key role in many cellular processes including chromatin regulation. To modify chromatin, lncRNAs often interact with DNA in a sequence-specific manner forming RNA:DNA triple helices. Computational tools for triple helix search do not always provide genome-wide predictions of sufficient quality. Here, we used four human lncRNAs (MEG3, DACOR1, TERC and HOTAIR) and their experimentally determined binding regions for evaluating triplex parameters that provide the highest prediction accuracy. Additionally, we combined triplex prediction with the lncRNA secondary structure and demonstrated that considering only single-stranded fragments of lncRNA can further improve DNA-RNA triplexes prediction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/ijms21030830DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7037363PMC
January 2020

Differential Targeting of c-Maf, Bach-1, and Elmo-1 by microRNA-143 and microRNA-365 Promotes the Intracellular Growth of in Alternatively IL-4/IL-13 Activated Macrophages.

Front Immunol 2019 19;10:421. Epub 2019 Mar 19.

International Centre for Genetic Engineering and Biotechnology, Cape Town Component, Cape Town, South Africa.

(Mtb) can subvert the host defense by skewing macrophage activation toward a less microbicidal alternative activated state to avoid classical effector killing functions. Investigating the molecular basis of this evasion mechanism could uncover potential candidates for host directed therapy against tuberculosis (TB). A limited number of miRNAs have recently been shown to regulate host-mycobacterial interactions. Here, we performed time course kinetics experiments on bone marrow-derived macrophages (BMDMs) and human monocyte-derived macrophages (MDMs) alternatively activated with IL-4, IL-13, or a combination of IL-4/IL-13, followed by infection with Mtb clinical Beijing strain HN878. MiR-143 and miR-365 were highly induced in Mtb-infected M(IL-4/IL-13) BMDMs and MDMs. Knockdown of miR-143 and miR-365 using antagomiRs decreased the intracellular growth of Mtb HN878, reduced the production of IL-6 and CCL5 and promoted the apoptotic death of Mtb HN878-infected M(IL-4/IL-13) BMDMs. Computational target prediction identified c-Maf, Bach-1 and Elmo-1 as potential targets for both miR-143 and miR-365. Functional validation using luciferase assay, RNA-pulldown assay and Western blotting revealed that c-Maf and Bach-1 are directly targeted by miR-143 while c-Maf, Bach-1, and Elmo-1 are direct targets of miR-365. Knockdown of c-Maf using GapmeRs promoted intracellular Mtb growth when compared to control treated M(IL-4/IL-13) macrophages. Meanwhile, the blocking of Bach-1 had no effect and blocking Elmo-1 resulted in decreased Mtb growth. Combination treatment of M(IL-4/IL-13) macrophages with miR-143 mimics or miR-365 mimics and c-Maf, Bach-1, or Elmo-1 gene-specific GapmeRs restored Mtb growth in miR-143 mimic-treated groups and enhanced Mtb growth in miR-365 mimics-treated groups, thus suggesting the Mtb growth-promoting activities of miR-143 and miR-365 are mediated at least partially through interaction with c-Maf, Bach-1, and Elmo-1. We further show that knockdown of miR-143 and miR-365 in M(IL-4/IL-13) BMDMs decreased the expression of HO-1 and IL-10 which are known targets of Bach-1 and c-Maf, respectively, with Mtb growth-promoting activities in macrophages. Altogether, our work reports a host detrimental role of miR-143 and miR-365 during Mtb infection and highlights for the first time the role and miRNA-mediated regulation of c-Maf, Bach-1, and Elmo-1 in Mtb-infected M(IL-4/IL-13) macrophages.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fimmu.2019.00421DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6433885PMC
August 2020

CpG traffic lights are markers of regulatory regions in human genome.

BMC Genomics 2019 Feb 1;20(1):102. Epub 2019 Feb 1.

Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, 119071, Russia.

Background: DNA methylation is involved in the regulation of gene expression. Although bisulfite-sequencing based methods profile DNA methylation at a single CpG resolution, methylation levels are usually averaged over genomic regions in the downstream bioinformatic analysis.

Results: We demonstrate that on the genome level a single CpG methylation can serve as a more accurate predictor of gene expression than an average promoter / gene body methylation. We define CpG traffic lights (CpG TL) as CpG dinucleotides with a significant correlation between methylation and expression of a gene nearby. CpG TL are enriched in all regulatory regions. Among all promoters, CpG TL are especially enriched in poised ones, suggesting involvement of DNA methylation in their regulation. Yet, binding of only a handful of transcription factors, such as NRF1, ETS, STAT and IRF-family members, could be regulated by direct methylation of transcription factor binding sites (TFBS) or its close proximity. For the majority of TF, an alternative scenario is more likely: methylation and inactivation of the whole regulatory element indirectly represses functional TF binding with a CpG TL being a reliable marker of such inactivation.

Conclusions: CpG TL provide a promising insight into mechanisms of enhancer activity and gene regulation linking methylation of single CpG to gene expression. CpG TL methylation can be used as reliable markers of enhancer activity and gene expression in applications, e.g. in clinic where measuring DNA methylation is easier compared to directly measuring gene expression due to more stable nature of DNA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-018-5387-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6359853PMC
February 2019

DNA sequence features in the establishing of H3K27ac.

F1000Res 2018 8;7:165. Epub 2018 Feb 8.

Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, 119071, Russian Federation.

The presence of H3K27me3 has been demonstrated to correlate with the CpG content. In this work, we tested whether H3K27ac has similar sequence preferences. We performed a translocation of DNA sequences with various properties into a beta-globin locus to control for the local chromatin environment. Our results suggest that in contrast to H3K27me3, H3K27ac gain is unlikely affected by the CpG content of the underlying DNA sequence, while extremely high GC-content might contribute to the gain of the H3K27ac.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.13441.2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5989146PMC
August 2019

Prediction of lncRNAs and their interactions with nucleic acids: benchmarking bioinformatics tools.

Brief Bioinform 2019 03;20(2):551-564

Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Science, Moscow, Russian Federation.

The genomes of mammalian species are pervasively transcribed producing as many noncoding as protein-coding RNAs. There is a growing body of evidence supporting their functional role. Long noncoding RNA (lncRNA) can bind both nucleic acids and proteins through several mechanisms. A reliable computational prediction of the most probable mechanism of lncRNA interaction can facilitate experimental validation of its function. In this study, we benchmarked computational tools capable to discriminate lncRNA from mRNA and predict lncRNA interactions with other nucleic acids. We assessed the performance of 9 tools for distinguishing protein-coding from noncoding RNAs, as well as 19 tools for prediction of RNA-RNA and RNA-DNA interactions. Our conclusions about the considered tools were based on their performances on the entire genome/transcriptome level, as it is the most common task nowadays. We found that FEELnc and CPAT distinguish between coding and noncoding mammalian transcripts in the most accurate manner. ASSA, RIBlast and LASTAL, as well as Triplexator, turned out to be the best predictors of RNA-RNA and RNA-DNA interactions, respectively. We showed that the normalization of the predicted interaction strength to the transcript length and GC content may improve the accuracy of inferring RNA interactions. Yet, all the current tools have difficulties to make accurate predictions of short-trans RNA-RNA interactions-stretches of sparse contacts. All over, there is still room for improvement in each category, especially for predictions of RNA interactions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bby032DOI Listing
March 2019

A novel method for improved accuracy of transcription factor binding site prediction.

Nucleic Acids Res 2018 07;46(12):e72

King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia.

Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved. To overcome these problems we developed DRAF, a novel method for TFBS prediction that requires only 14 prediction models for 232 human TFs, while at the same time significantly improves prediction accuracy. DRAF models use more features than PWM models, as they combine information from TFBS sequences and physicochemical properties of TF DNA-binding domains into machine learning models. Evaluation of DRAF on 98 human ChIP-seq datasets shows on average 1.54-, 1.96- and 5.19-fold reduction of false positives at the same sensitivities compared to models from HOCOMOCO, TRANSFAC and DeepBind, respectively. This observation suggests that one can efficiently replace the PWM models for TFBS prediction by a small number of DRAF models that significantly improve prediction accuracy. The DRAF method is implemented in a web tool and in a stand-alone software freely available at http://cbrc.kaust.edu.sa/DRAF.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky237DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6037060PMC
July 2018

Peripubertal serum dioxin concentrations and subsequent sperm methylome profiles of young Russian adults.

Reprod Toxicol 2018 06 14;78:40-49. Epub 2018 Mar 14.

Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, 3 Gubkina St., 119991, Moscow, Russia; Chapaevsk Medical Association, 3a Meditsinskaya st., 446100, Chapaevsk, Samara region, Russia; A.N. Belozersky Research Institute of Physico-Chemical Biology, Moscow State University, Leninskye Gory, House 1, Building 40, 119992, Moscow, Russia. Electronic address:

Background: The association of exposure to endocrine disrupting chemicals in the peripubertal period with subsequent sperm DNA methylation is unknown.

Objective: We examined the association of peripubertal serum 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) concentrations with whole-genome bisulfite sequencing (WGBS) of sperm collected in young adulthood.

Methods: The Russian Children's Study is a prospective cohort of 516 boys who were enrolled at 8-9 years of age and provided semen samples at 18-19 years of age. WGBS of sperm was conducted to identify differentially methylated regions (DMR) between highest (n = 4) and lowest (n = 4) peripubertal TCDD groups.

Results: We found 52 DMRs that distinguished lowest and highest peripubertal serum TCDD concentrations. One of the top scoring networks, "Cellular Assembly and Organization, Cellular Function and Maintenance, Carbohydrate Metabolism", identified estrogen receptor alpha as its central regulator.

Conclusion: Findings from our limited sample size suggest that peripubertal environmental exposures are associated with sperm DNA methylation in young adults.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.reprotox.2018.03.007DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6130911PMC
June 2018

Purine-rich low complexity regions are potential RNA binding hubs in the human genome.

F1000Res 2018 17;7:76. Epub 2018 Jan 17.

Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russian Federation.

Many long noncoding RNAs are bound to the chromatin and some of these interactions are mediated by triple helices. It is usually assumed that a transcript can form triplexes with a distinct set of genomic loci also known as triplex target sites (TTSs). Here we performed computational analyses of the TTSs that have been experimentally identified for particular RNAs. To assess the ability of these TTSs to bind other transcripts we developed a method to estimate the statistical significance of the predicted number of triplexes for a given RNA-DNA pair. We demonstrated that each DNA set included a subset of sequences that have a potential to form a statistically significant (adjusted -value < 0.01) number of triplexes with the majority (>90%) of the analyzed transcripts. Due to the predicted ability of these DNA sequences to interact with a wide range of different RNAs, we called them "universal TTSs". While the universal TTSs were quite rare in the human genome (around 0.5%), they were  more frequent (>15%) among the MEG3 binding sites (ChOP-seq peaks) and especially among the shared Capture-seq peaks (40%). The universal TTSs were enriched with the purine-rich low complexity regions. Nowadays, the role of the chromatin bound RNAs in the formation of 3D chromatin structure is actively discussed. We speculated that such universal TTSs may contribute to establishing long-distance chromosomal contacts and may facilitate distal enhancer-promoter interactions. All the scripts and the data files related to this study are available at: https://github.com/vanya-antonov/universal_tts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.13522.2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6518440PMC
September 2019

HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis.

Nucleic Acids Res 2018 01;46(D1):D252-D259

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia.

We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx1106DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753240PMC
January 2018

Genome-Wide DNA Methylation Profiling Reveals Epigenetic Adaptation of Stickleback to Marine and Freshwater Conditions.

Mol Biol Evol 2017 09;34(9):2203-2213

Department of Vertebrate Genomics and Epigenomics, Institute of Bioengineering, Research Center of Biotechnology RAS, Moscow, Russia.

The three-spined stickleback (Gasterosteus aculeatus) represents a convenient model to study microevolution-adaptation to a freshwater environment. Although genetic adaptations to freshwater environments are well-studied, epigenetic adaptations have attracted little attention. In this work, we investigated the role of DNA methylation in the adaptation of the marine stickleback population to freshwater conditions. DNA methylation profiling was performed in marine and freshwater populations of sticklebacks, as well as in marine sticklebacks placed into a freshwater environment and freshwater sticklebacks placed into seawater. We showed that the DNA methylation profile after placing a marine stickleback into fresh water partially converged to that of a freshwater stickleback. For six genes including ATP4A ion pump and NELL1, believed to be involved in skeletal ossification, we demonstrated similar changes in DNA methylation in both evolutionary and short-term adaptation. This suggested that an immediate epigenetic response to freshwater conditions can be maintained in freshwater population. Interestingly, we observed enhanced epigenetic plasticity in freshwater sticklebacks that may serve as a compensatory regulatory mechanism for the lack of genetic variation in the freshwater population. For the first time, we demonstrated that genes encoding ion channels KCND3, CACNA1FB, and ATP4A were differentially methylated between the marine and the freshwater populations. Other genes encoding ion channels were previously reported to be under selection in freshwater populations. Nevertheless, the genes that harbor genetic and epigenetic changes were not the same, suggesting that epigenetic adaptation is a complementary mechanism to selection of genetic variants favorable for freshwater environment.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msx156DOI Listing
September 2017

An integrated expression atlas of miRNAs and their promoters in human and mouse.

Nat Biotechnol 2017 Sep 21;35(9):872-878. Epub 2017 Aug 21.

Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Japan.

MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.3947DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5767576PMC
September 2017

An atlas of human long non-coding RNAs with accurate 5' ends.

Nature 2017 03 1;543(7644):199-204. Epub 2017 Mar 1.

Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane 4072, Australia.

Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature21374DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857182PMC
March 2017

Update of the FANTOM web resource: high resolution transcriptome of diverse cell types in mammals.

Nucleic Acids Res 2017 01 27;45(D1):D737-D743. Epub 2016 Oct 27.

Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologie, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan

Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw995DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210666PMC
January 2017

Preservation of methylated CpG dinucleotides in human CpG islands.

Biol Direct 2016 Mar 22;11(1):11. Epub 2016 Mar 22.

Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, GSP-1, 119991, Russia.

Background: CpG dinucleotides are extensively underrepresented in mammalian genomes. It is widely accepted that genome-wide CpG depletion is predominantly caused by an elevated CpG > TpG mutation rate due to frequent cytosine methylation in the CpG context. Meanwhile the CpG content in genomic regions called CpG islands (CGIs) is noticeably higher. This observation is usually explained by lower CpG > TpG substitution rates within CGIs due to reduced cytosine methylation levels.

Results: By combining genome-wide data on substitutions and methylation levels in several human cell types we have shown that cytosine methylation in human sperm cells was strongly and consistently associated with increased CpG > TpG substitution rates. In contrast, this correlation was not observed for embryonic stem cells or fibroblasts. Surprisingly, the decreased sperm CpG methylation level was insufficient to explain the reduced CpG > TpG substitution rates in CGIs.

Conclusions: While cytosine methylation in human sperm cells is strongly associated with increased CpG > TpG substitution rates, substitution rates are significantly reduced within CGIs even after sperm CpG methylation levels and local GC content are controlled for. Our findings are consistent with strong negative selection preserving methylated CpGs within CGIs including intergenic ones.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13062-016-0113-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4804638PMC
March 2016

HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models.

Nucleic Acids Res 2016 Jan 19;44(D1):D116-25. Epub 2015 Nov 19.

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia Moscow Institute of Physics and Technology, 141700, Institutskiy per. 9, Dolgoprudny, Moscow Region, Russia

Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv1249DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702883PMC
January 2016

EpiFactors: a comprehensive database of human epigenetic factors and complexes.

Database (Oxford) 2015 7;2015:bav067. Epub 2015 Jul 7.

Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, NO-7489 Trondheim, Norway,

Epigenetics refers to stable and long-term alterations of cellular traits that are not caused by changes in the DNA sequence per se. Rather, covalent modifications of DNA and histones affect gene expression and genome stability via proteins that recognize and act upon such modifications. Many enzymes that catalyse epigenetic modifications or are critical for enzymatic complexes have been discovered, and this is encouraging investigators to study the role of these proteins in diverse normal and pathological processes. Rapidly growing knowledge in the area has resulted in the need for a resource that compiles, organizes and presents curated information to the researchers in an easily accessible and user-friendly form. Here we present EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets and products. EpiFactors contains information on 815 proteins, including 95 histones and protamines. For 789 of these genes, we include expressions values across several samples, in particular a collection of 458 human primary cell samples (for approximately 200 cell types, in many cases from three individual donors), covering most mammalian cell steady states, 255 different cancer cell lines (representing approximately 150 cancer subtypes) and 134 human postmortem tissues. Expression values were obtained by the FANTOM5 consortium using Cap Analysis of Gene Expression technique. EpiFactors also contains information on 69 protein complexes that are involved in epigenetic regulation. The resource is practical for a wide range of users, including biologists, pharmacologists and clinicians.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bav067DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494013PMC
March 2016

Insights into the Transcriptional Architecture of Behavioral Plasticity in the Honey Bee Apis mellifera.

Sci Rep 2015 Jun 15;5:11136. Epub 2015 Jun 15.

Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.

Honey bee colonies exhibit an age-related division of labor, with worker bees performing discrete sets of behaviors throughout their lifespan. These behavioral states are associated with distinct brain transcriptomic states, yet little is known about the regulatory mechanisms governing them. We used CAGEscan (a variant of the Cap Analysis of Gene Expression technique) for the first time to characterize the promoter regions of differentially expressed brain genes during two behavioral states (brood care (aka "nursing") and foraging) and identified transcription factors (TFs) that may govern their expression. More than half of the differentially expressed TFs were associated with motifs enriched in the promoter regions of differentially expressed genes (DEGs), suggesting they are regulators of behavioral state. Strikingly, five TFs (nf-kb, egr, pax6, hairy, and clockwork orange) were predicted to co-regulate nearly half of the genes that were upregulated in foragers. Finally, differences in alternative TSS usage between nurses and foragers were detected upstream of 646 genes, whose functional analysis revealed enrichment for Gene Ontology terms associated with neural function and plasticity. This demonstrates for the first time that alternative TSSs are associated with stable differences in behavior, suggesting they may play a role in organizing behavioral state.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep11136DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4466890PMC
June 2015

Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes.

PLoS One 2014 2;9(10):e109443. Epub 2014 Oct 2.

King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, Saudi Arabia.

Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptional regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0109443PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4183604PMC
December 2015

A promoter-level mammalian expression atlas.

Nature 2014 Mar;507(7493):462-70

Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body. We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles. TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved. Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs. The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses. The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature13182DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4529748PMC
March 2014

Effects of cytosine methylation on transcription factor binding sites.

BMC Genomics 2014 Mar 26;15:119. Epub 2014 Mar 26.

Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.

Background: DNA methylation in promoters is closely linked to downstream gene repression. However, whether DNA methylation is a cause or a consequence of gene repression remains an open question. If it is a cause, then DNA methylation may affect the affinity of transcription factors (TFs) for their binding sites (TFBSs). If it is a consequence, then gene repression caused by chromatin modification may be stabilized by DNA methylation. Until now, these two possibilities have been supported only by non-systematic evidence and they have not been tested on a wide range of TFs. An average promoter methylation is usually used in studies, whereas recent results suggested that methylation of individual cytosines can also be important.

Results: We found that the methylation profiles of 16.6% of cytosines and the expression profiles of neighboring transcriptional start sites (TSSs) were significantly negatively correlated. We called the CpGs corresponding to such cytosines "traffic lights". We observed a strong selection against CpG "traffic lights" within TFBSs. The negative selection was stronger for transcriptional repressors as compared with transcriptional activators or multifunctional TFs as well as for core TFBS positions as compared with flanking TFBS positions.

Conclusions: Our results indicate that direct and selective methylation of certain TFBS that prevents TF binding is restricted to special cases and cannot be considered as a general regulatory mechanism of transcription.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-15-119DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3986887PMC
March 2014

Heavy-light chain interrelations of MS-associated immunoglobulins probed by deep sequencing and rational variation.

Mol Immunol 2014 Dec 16;62(2):305-14. Epub 2014 Feb 16.

Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia; Faculty of Chemistry, Lomonosov Moscow State University, Moscow, Russia; Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia. Electronic address:

The mechanisms triggering most of autoimmune diseases are still obscure. Autoreactive B cells play a crucial role in the development of such pathologies and, in particular, production of autoantibodies of different specificities. The combination of deep-sequencing technology with functional studies of antibodies selected from highly representative immunoglobulin combinatorial libraries may provide unique information on specific features in the repertoires of autoreactive B cells. Here, we have analyzed cross-combinations of the variable regions of human immunoglobulins against the myelin basic protein (MBP) previously selected from a multiple sclerosis (MS)-related scFv phage-display library. On the other hand, we have performed deep sequencing of the sublibraries of scFvs against MBP, Epstein-Barr virus (EBV) latent membrane protein 1 (LMP1), and myelin oligodendrocyte glycoprotein (MOG). Bioinformatics analysis of sequencing data and surface plasmon resonance (SPR) studies have shown that it is the variable fragments of antibody heavy chains that mainly determine both the affinity of antibodies to the parent autoantigen and their cross-reactivity. It is suggested that LMP1-cross-reactive anti-myelin autoantibodies contain heavy chains encoded by certain germline gene segments, which may be a hallmark of the EBV-specific B cell subpopulation involved in MS triggering.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.molimm.2014.01.013DOI Listing
December 2014

Regional differences in gene expression and promoter usage in aged human brains.

Neurobiol Aging 2013 Jul 19;34(7):1825-36. Epub 2013 Feb 19.

Section Medical Genomics, Department of Clinical Genetics, VU University Medical Center, Amsterdam, The Netherlands.

To characterize the promoterome of caudate and putamen regions (striatum), frontal and temporal cortices, and hippocampi from aged human brains, we used high-throughput cap analysis of gene expression to profile the transcription start sites and to quantify the differences in gene expression across the 5 brain regions. We also analyzed the extent to which methylation influenced the observed expression profiles. We sequenced more than 71 million cap analysis of gene expression tags corresponding to 70,202 promoter regions and 16,888 genes. More than 7000 transcripts were differentially expressed, mainly because of differential alternative promoter usage. Unexpectedly, 7% of differentially expressed genes were neurodevelopmental transcription factors. Functional pathway analysis on the differentially expressed genes revealed an overrepresentation of several signaling pathways (e.g., fibroblast growth factor and wnt signaling) in hippocampus and striatum. We also found that although 73% of methylation signals mapped within genes, the influence of methylation on the expression profile was small. Our study underscores alternative promoter usage as an important mechanism for determining the regional differences in gene expression at old age.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neurobiolaging.2013.01.005DOI Listing
July 2013

HOCOMOCO: a comprehensive collection of human transcription factor binding sites models.

Nucleic Acids Res 2013 Jan 21;41(Database issue):D195-202. Epub 2012 Nov 21.

Laboratory of Bioinformatics and Systems Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Street 32, Moscow 119991, GSP-1, Russia.

Transcription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias. We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks1089DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531053PMC
January 2013
-->