Publications by authors named "Ivan Kulakovskiy"

43 Publications

Assessing Ribosome Distribution Along Transcripts with Polarity Scores and Regression Slope Estimates.

Methods Mol Biol 2021 ;2252:269-294

Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.

During translation, the rate of ribosome movement along mRNA varies. This leads to a non-uniform ribosome distribution along the transcript, depending on local mRNA sequence, structure, tRNA availability, and translation factor abundance, as well as the relationship between the overall rates of initiation, elongation, and termination. Stress, antibiotics, and genetic perturbations affecting composition and properties of translation machinery can alter the ribosome positional distribution dramatically. Here, we offer a computational protocol for analyzing positional distribution profiles using ribosome profiling (Ribo-Seq) data. The protocol uses papolarity, a new Python toolkit for the analysis of transcript-level short read coverage profiles. For a single sample, for each transcript papolarity allows for computing the classic polarity metric which, in the case of Ribo-Seq, reflects ribosome positional preferences. For comparison versus a control sample, papolarity estimates an improved metric, the relative linear regression slope of coverage along transcript length. This involves de-noising by profile segmentation with a Poisson model and aggregation of Ribo-Seq coverage within segments, thus achieving reliable estimates of the regression slope. The papolarity software and the associated protocol can be conveniently used for Ribo-Seq data analysis in the command-line Linux environment. Papolarity package is available through Python pip package manager. The source code is available at https://github.com/autosome-ru/papolarity .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-0716-1150-0_13DOI Listing
January 2021

GTRD: an integrated view of transcription regulation.

Nucleic Acids Res 2021 01;49(D1):D104-D111

BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation.

The Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org/) contains uniformly annotated and processed NGS data related to gene transcription regulation: ChIP-seq, ChIP-exo, DNase-seq, MNase-seq, ATAC-seq and RNA-seq. With the latest release, the database has reached a new level of data integration. All cell types (cell lines and tissues) presented in the GTRD were arranged into a dictionary and linked with different ontologies (BRENDA, Cell Ontology, Uberon, Cellosaurus and Experimental Factor Ontology) and with related experiments in specialized databases on transcription regulation (FANTOM5, ENCODE and GTEx). The updated version of the GTRD provides an integrated view of transcription regulation through a dedicated web interface with advanced browsing and search capabilities, an integrated genome browser, and table reports by cell types, transcription factors, and genes of interest.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa1057DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778956PMC
January 2021

A holistic view of mouse enhancer architectures reveals analogous pleiotropic effects and correlation with human disease.

BMC Genomics 2020 Nov 2;21(1):754. Epub 2020 Nov 2.

Mammalian Genetics Unit, MRC Harwell Institute, Oxfordshire, OX11 0RD, UK.

Background: Efforts to elucidate the function of enhancers in vivo are underway but their vast numbers alongside differing enhancer architectures make it difficult to determine their impact on gene activity. By systematically annotating multiple mouse tissues with super- and typical-enhancers, we have explored their relationship with gene function and phenotype.

Results: Though super-enhancers drive high total- and tissue-specific expression of their associated genes, we find that typical-enhancers also contribute heavily to the tissue-specific expression landscape on account of their large numbers in the genome. Unexpectedly, we demonstrate that both enhancer types are preferentially associated with relevant 'tissue-type' phenotypes and exhibit no difference in phenotype effect size or pleiotropy. Modelling regulatory data alongside molecular data, we built a predictive model to infer gene-phenotype associations and use this model to predict potentially novel disease-associated genes.

Conclusion: Overall our findings reveal that differing enhancer architectures have a similar impact on mammalian phenotypes whilst harbouring differing cellular and expression effects. Together, our results systematically characterise enhancers with predicted phenotypic traits endorsing the role for both types of enhancers in human disease and disorders.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-020-07109-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7607678PMC
November 2020

Functional annotation of human long noncoding RNAs via molecular phenotyping.

Authors:
Jordan A Ramilowski Chi Wai Yip Saumya Agrawal Jen-Chien Chang Yari Ciani Ivan V Kulakovskiy Mickaël Mendez Jasmine Li Ching Ooi John F Ouyang Nick Parkinson Andreas Petri Leonie Roos Jessica Severin Kayoko Yasuzawa Imad Abugessaisa Altuna Akalin Ivan V Antonov Erik Arner Alessandro Bonetti Hidemasa Bono Beatrice Borsari Frank Brombacher Christopher JF Cameron Carlo Vittorio Cannistraci Ryan Cardenas Melissa Cardon Howard Chang Josée Dostie Luca Ducoli Alexander Favorov Alexandre Fort Diego Garrido Noa Gil Juliette Gimenez Reto Guler Lusy Handoko Jayson Harshbarger Akira Hasegawa Yuki Hasegawa Kosuke Hashimoto Norihito Hayatsu Peter Heutink Tetsuro Hirose Eddie L Imada Masayoshi Itoh Bogumil Kaczkowski Aditi Kanhere Emily Kawabata Hideya Kawaji Tsugumi Kawashima S Thomas Kelly Miki Kojima Naoto Kondo Haruhiko Koseki Tsukasa Kouno Anton Kratz Mariola Kurowska-Stolarska Andrew Tae Jun Kwon Jeffrey Leek Andreas Lennartsson Marina Lizio Fernando López-Redondo Joachim Luginbühl Shiori Maeda Vsevolod J Makeev Luigi Marchionni Yulia A Medvedeva Aki Minoda Ferenc Müller Manuel Muñoz-Aguirre Mitsuyoshi Murata Hiromi Nishiyori Kazuhiro R Nitta Shuhei Noguchi Yukihiko Noro Ramil Nurtdinov Yasushi Okazaki Valerio Orlando Denis Paquette Callum J C Parr Owen J L Rackham Patrizia Rizzu Diego Fernando Sánchez Martinez Albin Sandelin Pillay Sanjana Colin A M Semple Youtaro Shibayama Divya M Sivaraman Takahiro Suzuki Suzannah C Szumowski Michihira Tagami Martin S Taylor Chikashi Terao Malte Thodberg Supat Thongjuea Vidisha Tripathi Igor Ulitsky Roberto Verardo Ilya E Vorontsov Chinatsu Yamamoto Robert S Young J Kenneth Baillie Alistair R R Forrest Roderic Guigó Michael M Hoffman Chung Chau Hon Takeya Kasukawa Sakari Kauppinen Juha Kere Boris Lenhard Claudio Schneider Harukazu Suzuki Ken Yagi Michiel J L de Hoon Jay W Shin Piero Carninci

Genome Res 2020 07 27;30(7):1060-1072. Epub 2020 Jul 27.

RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.

Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-to-date lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for and .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.254219.119DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7397864PMC
July 2020

Multifaceted deregulation of gene expression and protein synthesis with age.

Proc Natl Acad Sci U S A 2020 07 23;117(27):15581-15590. Epub 2020 Jun 23.

Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115;

Protein synthesis represents a major metabolic activity of the cell. However, how it is affected by aging and how this in turn impacts cell function remains largely unexplored. To address this question, herein we characterized age-related changes in both the transcriptome and translatome of mouse tissues over the entire life span. We showed that the transcriptome changes govern those in the translatome and are associated with altered expression of genes involved in inflammation, extracellular matrix, and lipid metabolism. We also identified genes that may serve as candidate biomarkers of aging. At the translational level, we uncovered sustained down-regulation of a set of 5'-terminal oligopyrimidine (5'-TOP) transcripts encoding protein synthesis and ribosome biogenesis machinery and regulated by the mTOR pathway. For many of them, ribosome occupancy dropped twofold or even more. Moreover, with age, ribosome coverage gradually decreased in the vicinity of start codons and increased near stop codons, revealing complex age-related changes in the translation process. Taken together, our results reveal systematic and multidimensional deregulation of protein synthesis, showing how this major cellular process declines with age.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.2001788117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7354943PMC
July 2020

Brain-related genes are specifically enriched with long phase 1 introns.

PLoS One 2020 29;15(5):e0233978. Epub 2020 May 29.

Institute of Mathematical Problems of Biology RAS-the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Moscow Region, Russia.

Intronic gene regions are mostly considered in the scope of gene expression regulation, such as alternative splicing. However, relations between basic statistical properties of introns are much rarely studied in detail, despite vast available data. Particularly, little is known regarding the relationship between the intron length and the intron phase. Intron phase distribution is significantly different at different intron length thresholds. In this study, we performed GO enrichment analysis of gene sets with a particular intron phase at varying intron length thresholds using a list of 13823 orthologous human-mouse gene pairs. We found a specific group of 153 genes with phase 1 introns longer than 50 kilobases that were specifically expressed in brain, functionally related to synaptic signaling, and strongly associated with schizophrenia and other mental disorders. We propose that the prevalence of long phase 1 introns arises from the presence of the signal peptide sequence and is connected with 1-1 exon shuffling.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0233978PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7259759PMC
August 2020

Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study.

Genome Biol 2020 05 11;21(1):114. Epub 2020 May 11.

School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland.

Background: Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets.

Results: Here we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in vitro and in vivo performance measures. However, for some in vivo experiments, the best performing PWM is assigned to an unrelated TF, indicating a binding mode involving protein-protein cooperativity.

Conclusions: In an all-against-all setting, we compute more than 18 million performance measure values for different PWM-experiment combinations and offer these results as a public resource to the research community. The benchmarking protocols are provided via a web interface and as docker images. The methods and results from this study may help others make better use of public TF specificity models, as well as public TF binding data sets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-01996-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7212583PMC
May 2020

Minor C allele of the SNP rs7873784 associated with rheumatoid arthritis and type-2 diabetes mellitus binds PU.1 and enhances TLR4 expression.

Biochim Biophys Acta Mol Basis Dis 2020 03 28;1866(3):165626. Epub 2019 Nov 28.

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia; Biological Faculty, Lomonosov Moscow State University, 119234 Moscow, Russia. Electronic address:

Toll-like receptor 4 (TLR4) is an innate immunity receptor predominantly expressed on myeloid cells and involved in the development of various diseases, many of them with complex genetics. Here we present data on functionality of single nucleotide polymorphism rs7873784 located in the 3'-untranslated region (3'-UTR) of TLR4 gene and associated with various pathologies involving chronic inflammation. We demonstrate that TLR4 3'-UTR strongly enhanced the activity of TLR4 promoter in U937 human monocytic cell line while minor rs7873784(C) allele created a binding site for transcription factor PU.1 (encoded by SPI1 gene), a known regulator of TLR4 expression. Increased binding of PU.1 further augmented the TLR4 transcription while PU.1 knockdown or complete disruption of the PU.1 binding site abrogated the effect. We hypothesize that additional functional PU.1 site may increase TLR4 expression in individuals carrying minor C variant of rs7873784 and modulate the development of certain pathologies, such as rheumatoid arthritis and type-2 diabetes mellitus.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.bbadis.2019.165626DOI Listing
March 2020

What Do Neighbors Tell About You: The Local Context of Cis-Regulatory Modules Complicates Prediction of Regulatory Variants.

Front Genet 2019 31;10:1078. Epub 2019 Oct 31.

Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.

Many problems of modern genetics and functional genomics require the assessment of functional effects of sequence variants, including gene expression changes. Machine learning is considered to be a promising approach for solving this task, but its practical applications remain a challenge due to the insufficient volume and diversity of training data. A promising source of valuable data is a saturation mutagenesis massively parallel reporter assay, which quantitatively measures changes in transcription activity caused by sequence variants. Here, we explore the computational predictions of the effects of individual single-nucleotide variants on gene transcription measured in the massively parallel reporter assays, based on the data from the recent "Regulation Saturation" Critical Assessment of Genome Interpretation challenge. We show that the estimated prediction quality strongly depends on the structure of the training and validation data. Particularly, training on the sequence segments located next to the validation data results in the "information leakage" caused by the local context. This information leakage allows reproducing the prediction quality of the best CAGI challenge submissions with a fairly simple machine learning approach, and even obtaining notably better-than-random predictions using irrelevant genomic regions. Validation scenarios preventing such information leakage dramatically reduce the measured prediction quality. The performance at independent regulatory regions entirely excluded from the training set appears to be much lower than needed for practical applications, and even the performance estimation will become reliable only in the future with richer data from multiple reporters. The source code and data are available at https://bitbucket.org/autosomeru_cagi2018/cagi2018_regsat and https://genomeinterpretation.org/content/expression-variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2019.01078DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6834773PMC
October 2019

Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay.

Hum Mutat 2019 09 23;40(9):1280-1291. Epub 2019 Jun 23.

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland.

The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.23797DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6879779PMC
September 2019

The Reduced Level of Inorganic Polyphosphate Mobilizes Antioxidant and Manganese-Resistance Systems in .

Cells 2019 05 15;8(5). Epub 2019 May 15.

Skryabin Institute of Biochemistry and Physiology of Microorganisms, FRC Pushchino Center for Biological Research of the Russian Academy of Sciences, pr. Nauki 5, Pushchino 142290, Russia.

Inorganic polyphosphate (polyP) is crucial for adaptive reactions and stress response in microorganisms. A convenient model to study the role of polyP in yeast is the strain CRN/PPN1 that overexpresses polyphosphatase Ppn1 with stably decreased polyphosphate level. In this study, we combined the whole-transcriptome sequencing, fluorescence microscopy, and polyP quantification to characterize the CRN/PPN1 response to manganese and oxidative stresses. CRN/PPN1 exhibits enhanced resistance to manganese and peroxide due to its pre-adaptive state observed in normal conditions. The pre-adaptive state is characterized by up-regulated genes involved in response to an external stimulus, plasma membrane organization, and oxidation/reduction. The transcriptome-wide data allowed the identification of particular genes crucial for overcoming the manganese excess. The key gene responsible for manganese resistance is encoding a low-affinity manganese transporter: Strong down-regulation in CRN/PPN1 increases manganese resistance by reduced manganese uptake. On the contrary, , the top up-regulated gene in CRN/PPN1, is also strongly up-regulated in the manganese-adapted parent strain. Phm7 is an unannotated protein, but manganese adaptation is significantly impaired in Δ, thus suggesting its essential function in manganese or phosphate transport.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/cells8050461DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6562782PMC
May 2019

svist4get: a simple visualization tool for genomic tracks from sequencing experiments.

BMC Bioinformatics 2019 Mar 6;20(1):113. Epub 2019 Mar 6.

Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Leninskiye gory 1, Moscow, 119234, Russia.

Background: High-throughput sequencing often provides a foundation for experimental analyses in the life sciences. For many such methods, an intermediate layer of bioinformatics data analysis is the genomic signal track constructed by short read mapping to a particular genome assembly. There are many software tools to visualize genomic tracks in a web browser or with a stand-alone graphical user interface. However, there are only few command-line applications suitable for automated usage or production of publication-ready visualizations.

Results: Here we present svist4get, a command-line tool for customizable generation of publication-quality figures based on data from genomic signal tracks. Similarly to generic genome browser software, svist4get visualizes signal tracks at a given genomic location and is able to aggregate data from several tracks on a single plot along with the transcriptome annotation. The resulting plots can be saved as the vector or high-resolution bitmap images. We demonstrate practical use cases of svist4get for Ribo-Seq and RNA-Seq data.

Conclusions: svist4get is implemented in Python 3 and runs on Linux. The command-line interface of svist4get allows for easy integration into bioinformatics pipelines in a console environment. Extra customization is possible through configuration files and Python API. For convenience, svist4get is provided as pypi package. The source code is available at https://bitbucket.org/artegorov/svist4get/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-019-2706-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404320PMC
March 2019

Translatome and transcriptome analysis of TMA20 (MCT-1) and TMA64 (eIF2D) knockout yeast strains.

Data Brief 2019 Apr 2;23:103701. Epub 2019 Feb 2.

Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119234 Russia.

TMA20 (MCT-1), TMA22 (DENR) and TMA64 (eIF2D) are eukaryotic translation factors involved in ribosome recycling and re-initiation. They operate with P-site bound tRNA in post-termination or (re-)initiation translation complexes, thus participating in the removal of 40S ribosomal subunit from mRNA stop codons after termination and controlling translation re-initiation on mRNAs with upstream open reading frames (uORFs), as well as initiation on some specific mRNAs. Here we report ribosomal profiling data of strains with individual deletions of , or both and genes. We provide RNA-Seq and Ribo-Seq data from yeast strains grown in the rich YPD or minimal SD medium. We illustrate our data by plotting differential distribution of ribosomal-bound mRNA fragments throughout uORFs in 5'-untranslated region (5' UTR) of GCN4 mRNA and on mRNA transcripts encoded in MAT locus in the mutant and wild-type strains, thus providing a basis for investigation of the role of these factors in the stress response, mating and sporulation. We also document a shift of transcription start site of the gene which occurs when the neighboring gene is replaced by the standard G418-resistance cassette used for the creation of the Yeast Deletion Library. This shift results in dramatic deregulation of the gene expression, as revealed by our Ribo-Seq data, which can be probably used to explain strong genetic interactions of with genes involved in the cell cycle and mitotic checkpoints. Raw RNA-Seq and Ribo-Seq data as well as all gene counts are available in NCBI Gene Expression Omnibus (GEO) repository under GEO accession GSE122039 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE122039).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.dib.2019.103701DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6378902PMC
April 2019

An update to database TraVA: organ-specific cold stress response in Arabidopsis thaliana.

BMC Plant Biol 2019 Feb 15;19(Suppl 1):49. Epub 2019 Feb 15.

Institute for Information Transmission Problems of the Russian Academy of Sciences, Bolshoy Karetny per. 19, build.1, Moscow, 127051, Russia.

Background: Transcriptome map is a powerful tool for a variety of biological studies; transcriptome maps that include different organs, tissues, cells and stages of development are currently available for at least 30 plants. Some of them include samples treated by environmental or biotic stresses. However, most studies explore only limited set of organs and developmental stages (leaves or seedlings). In order to provide broader view of organ-specific strategies of cold stress response we studied expression changes that follow exposure to cold (+ 4 °C) in different aerial parts of plant: cotyledons, hypocotyl, leaves, young flowers, mature flowers and seeds using RNA-seq.

Results: The results on differential expression in leaves are congruent with current knowledge on stress response pathways, in particular, the role of CBF genes. In other organs, both essence and dynamics of gene expression changes are different. We show the involvement of genes that are confined to narrow expression patterns in non-stress conditions into stress response. In particular, the genes that control cell wall modification in pollen, are activated in leaves. In seeds, predominant pattern is the change of lipid metabolism.

Conclusions: Stress response is highly organ-specific; different pathways are involved in this process in each type of organs. The results were integrated with previously published transcriptome map of Arabidopsis thaliana and used for an update of a public database TraVa: http://travadb.org/browse/Species=AthStress .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12870-019-1636-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6393959PMC
February 2019

CpG traffic lights are markers of regulatory regions in human genome.

BMC Genomics 2019 Feb 1;20(1):102. Epub 2019 Feb 1.

Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, 119071, Russia.

Background: DNA methylation is involved in the regulation of gene expression. Although bisulfite-sequencing based methods profile DNA methylation at a single CpG resolution, methylation levels are usually averaged over genomic regions in the downstream bioinformatic analysis.

Results: We demonstrate that on the genome level a single CpG methylation can serve as a more accurate predictor of gene expression than an average promoter / gene body methylation. We define CpG traffic lights (CpG TL) as CpG dinucleotides with a significant correlation between methylation and expression of a gene nearby. CpG TL are enriched in all regulatory regions. Among all promoters, CpG TL are especially enriched in poised ones, suggesting involvement of DNA methylation in their regulation. Yet, binding of only a handful of transcription factors, such as NRF1, ETS, STAT and IRF-family members, could be regulated by direct methylation of transcription factor binding sites (TFBS) or its close proximity. For the majority of TF, an alternative scenario is more likely: methylation and inactivation of the whole regulatory element indirectly represses functional TF binding with a CpG TL being a reliable marker of such inactivation.

Conclusions: CpG TL provide a promising insight into mechanisms of enhancer activity and gene regulation linking methylation of single CpG to gene expression. CpG TL methylation can be used as reliable markers of enhancer activity and gene expression in applications, e.g. in clinic where measuring DNA methylation is easier compared to directly measuring gene expression due to more stable nature of DNA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-018-5387-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6359853PMC
February 2019

Genome-wide map of human and mouse transcription factor binding sites aggregated from ChIP-Seq data.

BMC Res Notes 2018 Oct 23;11(1):756. Epub 2018 Oct 23.

Vavilov Institute of General Genetics, Russian Academy of Sciences, GSP-1, Gubkina 3, Moscow, Russia, 119991.

Objectives: Mammalian genomics studies, especially those focusing on transcriptional regulation, require information on genomic locations of regulatory regions, particularly, transcription factor (TF) binding sites. There are plenty of published ChIP-Seq data on in vivo binding of transcription factors in different cell types and conditions. However, handling of thousands of separate data sets is often impractical and it is desirable to have a single global map of genomic regions potentially bound by a particular TF in any of studied cell types and conditions.

Data Description: Here we report human and mouse cistromes, the maps of genomic regions that are routinely identified as TF binding sites, organized by TF. We provide cistromes for 349 mouse and 599 human TFs. Given a TF, its cistrome regions are supported by evidence from several ChIP-Seq experiments or several computational tools, and, as an optional filter, contain occurrences of sequence motifs recognized by the TF. Using the cistrome, we provide an annotation of TF binding sites in the vicinity of human and mouse transcription start sites. This information is useful for selecting potential gene targets of transcription factors and detecting co-regulated genes in differential gene expression data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13104-018-3856-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6199713PMC
October 2018

De novo assembling and primary analysis of genome and transcriptome of gray whale Eschrichtius robustus.

BMC Evol Biol 2017 12 28;17(Suppl 2):258. Epub 2017 Dec 28.

Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, 143026, Russia.

Background: Gray whale, Eschrichtius robustus (E. robustus), is a single member of the family Eschrichtiidae, which is considered to be the most primitive in the class Cetacea. Gray whale is often described as a "living fossil". It is adapted to extreme marine conditions and has a high life expectancy (77 years). The assembly of a gray whale genome and transcriptome will allow to carry out further studies of whale evolution, longevity, and resistance to extreme environment.

Results: In this work, we report the first de novo assembly and primary analysis of the E. robustus genome and transcriptome based on kidney and liver samples. The presented draft genome assembly is complete by 55% in terms of a total genome length, but only by 24% in terms of the BUSCO complete gene groups, although 10,895 genes were identified. Transcriptome annotation and comparison with other whale species revealed robust expression of DNA repair and hypoxia-response genes, which is expected for whales.

Conclusions: This preliminary study of the gray whale genome and transcriptome provides new data to better understand the whale evolution and the mechanisms of their adaptation to the hypoxic conditions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12862-017-1103-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751776PMC
December 2017

HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis.

Nucleic Acids Res 2018 01;46(D1):D252-D259

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia.

We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx1106DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753240PMC
January 2018

Translating natural genetic variation to gene expression in a computational model of the Drosophila gap gene regulatory network.

PLoS One 2017 12;12(9):e0184657. Epub 2017 Sep 12.

Systems Biology and Bioinformatics Laboratory, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia.

Annotating the genotype-phenotype relationship, and developing a proper quantitative description of the relationship, requires understanding the impact of natural genomic variation on gene expression. We apply a sequence-level model of gap gene expression in the early development of Drosophila to analyze single nucleotide polymorphisms (SNPs) in a panel of natural sequenced D. melanogaster lines. Using a thermodynamic modeling framework, we provide both analytical and computational descriptions of how single-nucleotide variants affect gene expression. The analysis reveals that the sequence variants increase (decrease) gene expression if located within binding sites of repressors (activators). We show that the sign of SNP influence (activation or repression) may change in time and space and elucidate the origin of this change in specific examples. The thermodynamic modeling approach predicts non-local and non-linear effects arising from SNPs, and combinations of SNPs, in individual fly genotypes. Simulation of individual fly genotypes using our model reveals that this non-linearity reduces to almost additive inputs from multiple SNPs. Further, we see signatures of the action of purifying selection in the gap gene regulatory regions. To infer the specific targets of purifying selection, we analyze the patterns of polymorphism in the data at two phenotypic levels: the strengths of binding and expression. We find that combinations of SNPs show evidence of being under selective pressure, while individual SNPs do not. The model predicts that SNPs appear to accumulate in the genotypes of the natural population in a way biased towards small increases in activating action on the expression pattern. Taken together, these results provide a systems-level view of how genetic variation translates to the level of gene regulatory networks via combinatorial SNP effects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0184657PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5595321PMC
October 2017

High-quality genome assembly of Capsella bursa-pastoris reveals asymmetry of regulatory elements at early stages of polyploid genome evolution.

Plant J 2017 Jul 12;91(2):278-291. Epub 2017 Jun 12.

A. N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991, Russia.

Polyploidization and subsequent sub- and neofunctionalization of duplicated genes represent a major mechanism of plant genome evolution. Capsella bursa-pastoris, a widespread ruderal plant, is a recent allotetraploid and, thus, is an ideal model organism for studying early changes following polyploidization. We constructed a high-quality assembly of C. bursa-pastoris genome and a transcriptome atlas covering a broad sample of organs and developmental stages (available online at http://travadb.org/browse/Species=Cbp). We demonstrate that expression of homeologs is mostly symmetric between subgenomes, and identify a set of homeolog pairs with discordant expression. Comparison of promoters within such pairs revealed emerging asymmetry of regulatory elements. Among them there are multiple binding sites for transcription factors controlling the regulation of photosynthesis and plant development by light (PIF3, HY5) and cold stress response (CBF). These results suggest that polyploidization in C. bursa-pastoris enhanced its plasticity of response to light and temperature, and allowed substantial expansion of its distribution range.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/tpj.13563DOI Listing
July 2017

The single nucleotide variant rs12722489 determines differential estrogen receptor binding and enhancer properties of an IL2RA intronic region.

PLoS One 2017 24;12(2):e0172681. Epub 2017 Feb 24.

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.

We studied functional effect of rs12722489 single nucleotide polymorphism located in the first intron of human IL2RA gene on transcriptional regulation. This polymorphism is associated with multiple autoimmune conditions (rheumatoid arthritis, multiple sclerosis, Crohn's disease, and ulcerative colitis). Analysis in silico suggested significant difference in the affinity of estrogen receptor (ER) binding site between alternative allelic variants, with stronger predicted affinity for the risk (G) allele. Electrophoretic mobility shift assay showed that purified human ERα bound only G variant of a 32-bp genomic sequence containing rs12722489. Chromatin immunoprecipitation demonstrated that endogenous human ERα interacted with rs12722489 genomic region in vivo and DNA pull-down assay confirmed differential allelic binding of amplified 189-bp genomic fragments containing rs12722489 with endogenous human ERα. In a luciferase reporter assay, a kilobase-long genomic segment containing G but not A allele of rs12722489 demonstrated enhancer properties in MT-2 cell line, an HTLV-1 transformed human cell line with a regulatory T cell phenotype.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0172681PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5325477PMC
August 2017

Multiple single nucleotide polymorphisms in the first intron of the IL2RA gene affect transcription factor binding and enhancer activity.

Gene 2017 Feb 19;602:50-56. Epub 2016 Nov 19.

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia; Moscow Institute of Physics and Technology, Department Molecular and Biological Physics, Moscow, Russia; Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia. Electronic address:

IL2RA gene encodes the alpha subunit of a high-affinity receptor for interleukin-2 which is expressed by several distinct populations of lymphocytes involved in autoimmune processes. A large number of polymorphic alleles of the IL2RA locus are associated with the development of various autoimmune diseases. With bioinformatics analysis we the dissected the first intron of the IL2RA gene and selected several single nucleotide polymorphisms (SNPs) that may influence the regulation of the IL2RA gene in cell types relevant to autoimmune pathology. We described five enhancers containing the selected SNPs that stimulated activity of the IL2RA promoter in a cell-type specific manner, and tested the effect of specific SNP alleles on activity of the respective enhancers (E1 to E5, labeled according to the distance to the promoter). The E4 enhancer with minor T variant of rs61839660 SNP demonstrated reduced activity due to disrupted binding of MEF2A/C transcription factors (TFs). Neither rs706778 nor rs706779 SNPs, both associated with a number of autoimmune diseases, had any effect on the activity of the enhancer E2. However, rare variants of several SNPs (rs139767239, rs115133228, rs12722502, rs12722635) genetically linked to either rs706778 and/or rs706779 significantly influenced the activity of E1, E3 and E5 enhancers, presumably by disrupting EBF1, GABPA and ELF1 binding sites.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.gene.2016.11.032DOI Listing
February 2017

Early B-cell factor 1 (EBF1) is critical for transcriptional control of SLAMF1 gene in human B cells.

Biochim Biophys Acta 2016 10 14;1859(10):1259-68. Epub 2016 Jul 14.

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia; Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia. Electronic address:

Signaling lymphocytic activation molecule family member 1 (SLAMF1)/CD150 is a co-stimulatory receptor expressed on a variety of hematopoietic cells, in particular on mature lymphocytes activated by specific antigen, costimulation and cytokines. Changes in CD150 expression level have been reported in association with autoimmunity and with B-cell chronic lymphocytic leukemia. We characterized the core promoter for SLAMF1 gene in human B-cell lines and explored binding sites for a number of transcription factors involved in B cell differentiation and activation. Mutations of SP1, STAT6, IRF4, NF-kB, ELF1, TCF3, and SPI1/PU.1 sites resulted in significantly decreased promoter activity of varying magnitude, depending on the cell line tested. The most profound effect on the promoter strength was observed upon mutation of the binding site for Early B-cell factor 1 (EBF1). This mutation produced a 10-20 fold drop in promoter activity and pinpointed EBF1 as the master regulator of human SLAMF1 gene in B cells. We also identified three potent transcriptional enhancers in human SLAMF1 locus, each containing functional EBF1 binding sites. Thus, EBF1 interacts with specific binding sites located both in the promoter and in the enhancer regions of the SLAMF1 gene and is critical for its expression in human B cells.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.bbagrm.2016.07.004DOI Listing
October 2016

Negative selection maintains transcription factor binding motifs in human cancer.

BMC Genomics 2016 06 23;17 Suppl 2:395. Epub 2016 Jun 23.

Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia.

Background: Somatic mutations in cancer cells affect various genomic elements disrupting important cell functions. In particular, mutations in DNA binding sites recognized by transcription factors can alter regulator binding affinities and, consequently, expression of target genes. A number of promoter mutations have been linked with an increased risk of cancer. Cancer somatic mutations in binding sites of selected transcription factors have been found under positive selection. However, action and significance of negative selection in non-coding regions remain controversial.

Results: Here we present analysis of transcription factor binding motifs co-localized with non-coding variants. To avoid statistical bias we account for mutation signatures of different cancer types. For many transcription factors, including multiple members of FOX, HOX, and NR families, we show that human cancers accumulate fewer mutations than expected by chance that increase or decrease affinity of predicted binding sites. Such stability of binding motifs is even more exhibited in DNase accessible regions.

Conclusions: Our data demonstrate negative selection against binding sites alterations and suggest that such selection pressure protects cancer cells from rewiring of regulatory circuits. Further analysis of transcription factors with conserved binding motifs can reveal cell regulatory pathways crucial for the survivability of various human cancers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-016-2728-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4928157PMC
June 2016

Architectural proteins Pita, Zw5,and ZIPIC contain homodimerization domain and support specific long-range interactions in Drosophila.

Nucleic Acids Res 2016 09 2;44(15):7228-41. Epub 2016 May 2.

Institute of Gene Biology, Russian Academy of Sciences, Vavilova str. 34/5, Moscow 119334, Russia

According to recent models, as yet poorly studied architectural proteins appear to be required for local regulation of enhancer-promoter interactions, as well as for global chromosome organization. Transcription factors ZIPIC, Pita and Zw5 belong to the class of chromatin insulator proteins and preferentially bind to promoters near the TSS and extensively colocalize with cohesin and condensin complexes. ZIPIC, Pita and Zw5 are structurally similar in containing the N-terminal zinc finger-associated domain (ZAD) and different numbers of C2H2-type zinc fingers at the C-terminus. Here we have shown that the ZAD domains of ZIPIC, Pita and Zw5 form homodimers. In Drosophila transgenic lines, these proteins are able to support long-distance interaction between GAL4 activator and the reporter gene promoter. However, no functional interaction between binding sites for different proteins has been revealed, suggesting that such interactions are highly specific. ZIPIC facilitates long-distance stimulation of the reporter gene by GAL4 activator in yeast model system. Many of the genomic binding sites of ZIPIC, Pita and Zw5 are located at the boundaries of topologically associated domains (TADs). Thus, ZAD-containing zinc-finger proteins can be attributed to the class of architectural proteins.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw371DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009728PMC
September 2016

Analysis of functional importance of binding sites in the Drosophila gap gene network model.

BMC Genomics 2015 16;16 Suppl 13:S7. Epub 2015 Dec 16.

Background: The statistical thermodynamics based approach provides a promising framework for construction of the genotype-phenotype map in many biological systems. Among important aspects of a good model connecting the DNA sequence information with that of a molecular phenotype (gene expression) is the selection of regulatory interactions and relevant transcription factor bindings sites. As the model may predict different levels of the functional importance of specific binding sites in different genomic and regulatory contexts, it is essential to formulate and study such models under different modeling assumptions.

Results: We elaborate a two-layer model for the Drosophila gap gene network and include in the model a combined set of transcription factor binding sites and concentration dependent regulatory interaction between gap genes hunchback and Kruppel. We show that the new variants of the model are more consistent in terms of gene expression predictions for various genetic constructs in comparison to previous work. We quantify the functional importance of binding sites by calculating their impact on gene expression in the model and calculate how these impacts correlate across all sites under different modeling assumptions.

Conclusions: The assumption about the dual interaction between hb and Kr leads to the most consistent modeling results, but, on the other hand, may obscure existence of indirect interactions between binding sites in regulatory regions of distinct genes. The analysis confirms the previously formulated regulation concept of many weak binding sites working in concert. The model predicts a more or less uniform distribution of functionally important binding sites over the sets of experimentally characterized regulatory modules and other open chromatin domains.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-16-S13-S7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4686791PMC
May 2016

HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models.

Nucleic Acids Res 2016 Jan 19;44(D1):D116-25. Epub 2015 Nov 19.

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia Moscow Institute of Physics and Technology, 141700, Institutskiy per. 9, Dolgoprudny, Moscow Region, Russia

Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv1249DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702883PMC
January 2016

Single-Cell Analyses of ESCs Reveal Alternative Pluripotent Cell States and Molecular Mechanisms that Control Self-Renewal.

Stem Cell Reports 2015 Aug;5(2):207-20

Department of Regenerative and Developmental Biology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA; Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA; Department of Pharmacology and System Therapeutics, Icahn School of Medicine at Mount Sinai, Systems Biology Center New York, One Gustave L. Levy Place, New York, NY 10029, USA. Electronic address:

Analyses of gene expression in single mouse embryonic stem cells (mESCs) cultured in serum and LIF revealed the presence of two distinct cell subpopulations with individual gene expression signatures. Comparisons with published data revealed that cells in the first subpopulation are phenotypically similar to cells isolated from the inner cell mass (ICM). In contrast, cells in the second subpopulation appear to be more mature. Pluripotency Gene Regulatory Network (PGRN) reconstruction based on single-cell data and published data suggested antagonistic roles for Oct4 and Nanog in the maintenance of pluripotency states. Integrated analyses of published genomic binding (ChIP) data strongly supported this observation. Certain target genes alternatively regulated by OCT4 and NANOG, such as Sall4 and Zscan10, feed back into the top hierarchical regulator Oct4. Analyses of such incoherent feedforward loops with feedback (iFFL-FB) suggest a dynamic model for the maintenance of mESC pluripotency and self-renewal.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.stemcr.2015.07.004DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4618835PMC
August 2015

EpiFactors: a comprehensive database of human epigenetic factors and complexes.

Database (Oxford) 2015 7;2015:bav067. Epub 2015 Jul 7.

Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, NO-7489 Trondheim, Norway,

Epigenetics refers to stable and long-term alterations of cellular traits that are not caused by changes in the DNA sequence per se. Rather, covalent modifications of DNA and histones affect gene expression and genome stability via proteins that recognize and act upon such modifications. Many enzymes that catalyse epigenetic modifications or are critical for enzymatic complexes have been discovered, and this is encouraging investigators to study the role of these proteins in diverse normal and pathological processes. Rapidly growing knowledge in the area has resulted in the need for a resource that compiles, organizes and presents curated information to the researchers in an easily accessible and user-friendly form. Here we present EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets and products. EpiFactors contains information on 815 proteins, including 95 histones and protamines. For 789 of these genes, we include expressions values across several samples, in particular a collection of 458 human primary cell samples (for approximately 200 cell types, in many cases from three individual donors), covering most mammalian cell steady states, 255 different cancer cell lines (representing approximately 150 cancer subtypes) and 134 human postmortem tissues. Expression values were obtained by the FANTOM5 consortium using Cap Analysis of Gene Expression technique. EpiFactors also contains information on 69 protein complexes that are involved in epigenetic regulation. The resource is practical for a wide range of users, including biologists, pharmacologists and clinicians.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bav067DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494013PMC
March 2016

Sequence-based model of gap gene regulatory network.

BMC Genomics 2014 19;15 Suppl 12:S6. Epub 2014 Dec 19.

Background: The detailed analysis of transcriptional regulation is crucially important for understanding biological processes. The gap gene network in Drosophila attracts large interest among researches studying mechanisms of transcriptional regulation. It implements the most upstream regulatory layer of the segmentation gene network. The knowledge of molecular mechanisms involved in gap gene regulation is far less complete than that of genetics of the system. Mathematical modeling goes beyond insights gained by genetics and molecular approaches. It allows us to reconstruct wild-type gene expression patterns in silico, infer underlying regulatory mechanism and prove its sufficiency.

Results: We developed a new model that provides a dynamical description of gap gene regulatory systems, using detailed DNA-based information, as well as spatial transcription factor concentration data at varying time points. We showed that this model correctly reproduces gap gene expression patterns in wild type embryos and is able to predict gap expression patterns in Kr mutants and four reporter constructs. We used four-fold cross validation test and fitting to random dataset to validate the model and proof its sufficiency in data description. The identifiability analysis showed that most model parameters are well identifiable. We reconstructed the gap gene network topology and studied the impact of individual transcription factor binding sites on the model output. We measured this impact by calculating the site regulatory weight as a normalized difference between the residual sum of squares error for the set of all annotated sites and for the set with the site of interest excluded.

Conclusions: The reconstructed topology of the gap gene network is in agreement with previous modeling results and data from literature. We showed that 1) the regulatory weights of transcription factor binding sites show very weak correlation with their PWM score; 2) sites with low regulatory weight are important for the model output; 3) functional important sites are not exclusively located in cis-regulatory elements, but are rather dispersed through regulatory region. It is of importance that some of the sites with high functional impact in hb, Kr and kni regulatory regions coincide with strong sites annotated and verified in Dnase I footprint assays.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-15-S12-S6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4303948PMC
August 2015