Publications by authors named "Vsevolod Makeev"

55 Publications

Assessing Ribosome Distribution Along Transcripts with Polarity Scores and Regression Slope Estimates.

Methods Mol Biol 2021 ;2252:269-294

Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.

During translation, the rate of ribosome movement along mRNA varies. This leads to a non-uniform ribosome distribution along the transcript, depending on local mRNA sequence, structure, tRNA availability, and translation factor abundance, as well as the relationship between the overall rates of initiation, elongation, and termination. Stress, antibiotics, and genetic perturbations affecting composition and properties of translation machinery can alter the ribosome positional distribution dramatically. Here, we offer a computational protocol for analyzing positional distribution profiles using ribosome profiling (Ribo-Seq) data. The protocol uses papolarity, a new Python toolkit for the analysis of transcript-level short read coverage profiles. For a single sample, for each transcript papolarity allows for computing the classic polarity metric which, in the case of Ribo-Seq, reflects ribosome positional preferences. For comparison versus a control sample, papolarity estimates an improved metric, the relative linear regression slope of coverage along transcript length. This involves de-noising by profile segmentation with a Poisson model and aggregation of Ribo-Seq coverage within segments, thus achieving reliable estimates of the regression slope. The papolarity software and the associated protocol can be conveniently used for Ribo-Seq data analysis in the command-line Linux environment. Papolarity package is available through Python pip package manager. The source code is available at https://github.com/autosome-ru/papolarity .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-0716-1150-0_13DOI Listing
January 2021

GTRD: an integrated view of transcription regulation.

Nucleic Acids Res 2021 01;49(D1):D104-D111

BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation.

The Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org/) contains uniformly annotated and processed NGS data related to gene transcription regulation: ChIP-seq, ChIP-exo, DNase-seq, MNase-seq, ATAC-seq and RNA-seq. With the latest release, the database has reached a new level of data integration. All cell types (cell lines and tissues) presented in the GTRD were arranged into a dictionary and linked with different ontologies (BRENDA, Cell Ontology, Uberon, Cellosaurus and Experimental Factor Ontology) and with related experiments in specialized databases on transcription regulation (FANTOM5, ENCODE and GTEx). The updated version of the GTRD provides an integrated view of transcription regulation through a dedicated web interface with advanced browsing and search capabilities, an integrated genome browser, and table reports by cell types, transcription factors, and genes of interest.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa1057DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778956PMC
January 2021

A holistic view of mouse enhancer architectures reveals analogous pleiotropic effects and correlation with human disease.

BMC Genomics 2020 Nov 2;21(1):754. Epub 2020 Nov 2.

Mammalian Genetics Unit, MRC Harwell Institute, Oxfordshire, OX11 0RD, UK.

Background: Efforts to elucidate the function of enhancers in vivo are underway but their vast numbers alongside differing enhancer architectures make it difficult to determine their impact on gene activity. By systematically annotating multiple mouse tissues with super- and typical-enhancers, we have explored their relationship with gene function and phenotype.

Results: Though super-enhancers drive high total- and tissue-specific expression of their associated genes, we find that typical-enhancers also contribute heavily to the tissue-specific expression landscape on account of their large numbers in the genome. Unexpectedly, we demonstrate that both enhancer types are preferentially associated with relevant 'tissue-type' phenotypes and exhibit no difference in phenotype effect size or pleiotropy. Modelling regulatory data alongside molecular data, we built a predictive model to infer gene-phenotype associations and use this model to predict potentially novel disease-associated genes.

Conclusion: Overall our findings reveal that differing enhancer architectures have a similar impact on mammalian phenotypes whilst harbouring differing cellular and expression effects. Together, our results systematically characterise enhancers with predicted phenotypic traits endorsing the role for both types of enhancers in human disease and disorders.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-020-07109-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7607678PMC
November 2020

Functional annotation of human long noncoding RNAs via molecular phenotyping.

Authors:
Jordan A Ramilowski Chi Wai Yip Saumya Agrawal Jen-Chien Chang Yari Ciani Ivan V Kulakovskiy Mickaël Mendez Jasmine Li Ching Ooi John F Ouyang Nick Parkinson Andreas Petri Leonie Roos Jessica Severin Kayoko Yasuzawa Imad Abugessaisa Altuna Akalin Ivan V Antonov Erik Arner Alessandro Bonetti Hidemasa Bono Beatrice Borsari Frank Brombacher Christopher JF Cameron Carlo Vittorio Cannistraci Ryan Cardenas Melissa Cardon Howard Chang Josée Dostie Luca Ducoli Alexander Favorov Alexandre Fort Diego Garrido Noa Gil Juliette Gimenez Reto Guler Lusy Handoko Jayson Harshbarger Akira Hasegawa Yuki Hasegawa Kosuke Hashimoto Norihito Hayatsu Peter Heutink Tetsuro Hirose Eddie L Imada Masayoshi Itoh Bogumil Kaczkowski Aditi Kanhere Emily Kawabata Hideya Kawaji Tsugumi Kawashima S Thomas Kelly Miki Kojima Naoto Kondo Haruhiko Koseki Tsukasa Kouno Anton Kratz Mariola Kurowska-Stolarska Andrew Tae Jun Kwon Jeffrey Leek Andreas Lennartsson Marina Lizio Fernando López-Redondo Joachim Luginbühl Shiori Maeda Vsevolod J Makeev Luigi Marchionni Yulia A Medvedeva Aki Minoda Ferenc Müller Manuel Muñoz-Aguirre Mitsuyoshi Murata Hiromi Nishiyori Kazuhiro R Nitta Shuhei Noguchi Yukihiko Noro Ramil Nurtdinov Yasushi Okazaki Valerio Orlando Denis Paquette Callum J C Parr Owen J L Rackham Patrizia Rizzu Diego Fernando Sánchez Martinez Albin Sandelin Pillay Sanjana Colin A M Semple Youtaro Shibayama Divya M Sivaraman Takahiro Suzuki Suzannah C Szumowski Michihira Tagami Martin S Taylor Chikashi Terao Malte Thodberg Supat Thongjuea Vidisha Tripathi Igor Ulitsky Roberto Verardo Ilya E Vorontsov Chinatsu Yamamoto Robert S Young J Kenneth Baillie Alistair R R Forrest Roderic Guigó Michael M Hoffman Chung Chau Hon Takeya Kasukawa Sakari Kauppinen Juha Kere Boris Lenhard Claudio Schneider Harukazu Suzuki Ken Yagi Michiel J L de Hoon Jay W Shin Piero Carninci

Genome Res 2020 07 27;30(7):1060-1072. Epub 2020 Jul 27.

RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.

Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-to-date lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for and .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.254219.119DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7397864PMC
July 2020

Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study.

Genome Biol 2020 05 11;21(1):114. Epub 2020 May 11.

School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland.

Background: Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets.

Results: Here we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in vitro and in vivo performance measures. However, for some in vivo experiments, the best performing PWM is assigned to an unrelated TF, indicating a binding mode involving protein-protein cooperativity.

Conclusions: In an all-against-all setting, we compute more than 18 million performance measure values for different PWM-experiment combinations and offer these results as a public resource to the research community. The benchmarking protocols are provided via a web interface and as docker images. The methods and results from this study may help others make better use of public TF specificity models, as well as public TF binding data sets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-01996-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7212583PMC
May 2020

Signaling Pathways Potentially Responsible for Foam Cell Formation: Cholesterol Accumulation or Inflammatory Response-What is First?

Int J Mol Sci 2020 Apr 14;21(8). Epub 2020 Apr 14.

Department of Biochemistry & Molecular Biology, Nippon Medical School, Tokyo 113-8602, Japan.

Accumulation of lipid-laden (foam) cells in the arterial wall is known to be the earliest step in the pathogenesis of atherosclerosis. There is almost no doubt that atherogenic modified low-density lipoproteins (LDL) are the main sources of accumulating lipids in foam cells. Atherogenic modified LDL are taken up by arterial cells, such as macrophages, pericytes, and smooth muscle cells in an unregulated manner bypassing the LDL receptor. The present study was conducted to reveal possible common mechanisms in the interaction of macrophages with associates of modified LDL and non-lipid latex particles of a similar size. To determine regulatory pathways that are potentially responsible for cholesterol accumulation in human macrophages after the exposure to naturally occurring atherogenic or artificially modified LDL, we used transcriptome analysis. Previous studies of our group demonstrated that any type of LDL modification facilitates the self-association of lipoprotein particles. The size of such self-associates hinders their interaction with a specific LDL receptor. As a result, self-associates are taken up by nonspecific phagocytosis bypassing the LDL receptor. That is why we used latex beads as a stimulator of macrophage phagocytotic activity. We revealed at least 12 signaling pathways that were regulated by the interaction of macrophages with the multiple-modified atherogenic naturally occurring LDL and with latex beads in a similar manner. Therefore, modified LDL was shown to stimulate phagocytosis through the upregulation of certain genes. We have identified at least three genes (, , and ) encoding inflammatory molecules and associated with signaling pathways that were upregulated in response to the interaction of modified LDL with macrophages. Knockdown of two of these genes, and , completely suppressed cholesterol accumulation in macrophages. Correspondingly, the upregulation of and promoted cholesterol accumulation. These data confirmed our hypothesis of the following chain of events in atherosclerosis: LDL particles undergo atherogenic modification; this is accompanied by the formation of self-associates; large LDL associates stimulate phagocytosis; as a result of phagocytosis stimulation, pro-inflammatory molecules are secreted; these molecules cause or at least contribute to the accumulation of intracellular cholesterol. This chain of events may explain the relationship between cholesterol accumulation and inflammation. The primary sequence of events in this chain is related to inflammatory response rather than cholesterol accumulation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/ijms21082716DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7216009PMC
April 2020

What Do Neighbors Tell About You: The Local Context of Cis-Regulatory Modules Complicates Prediction of Regulatory Variants.

Front Genet 2019 31;10:1078. Epub 2019 Oct 31.

Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.

Many problems of modern genetics and functional genomics require the assessment of functional effects of sequence variants, including gene expression changes. Machine learning is considered to be a promising approach for solving this task, but its practical applications remain a challenge due to the insufficient volume and diversity of training data. A promising source of valuable data is a saturation mutagenesis massively parallel reporter assay, which quantitatively measures changes in transcription activity caused by sequence variants. Here, we explore the computational predictions of the effects of individual single-nucleotide variants on gene transcription measured in the massively parallel reporter assays, based on the data from the recent "Regulation Saturation" Critical Assessment of Genome Interpretation challenge. We show that the estimated prediction quality strongly depends on the structure of the training and validation data. Particularly, training on the sequence segments located next to the validation data results in the "information leakage" caused by the local context. This information leakage allows reproducing the prediction quality of the best CAGI challenge submissions with a fairly simple machine learning approach, and even obtaining notably better-than-random predictions using irrelevant genomic regions. Validation scenarios preventing such information leakage dramatically reduce the measured prediction quality. The performance at independent regulatory regions entirely excluded from the training set appears to be much lower than needed for practical applications, and even the performance estimation will become reliable only in the future with richer data from multiple reporters. The source code and data are available at https://bitbucket.org/autosomeru_cagi2018/cagi2018_regsat and https://genomeinterpretation.org/content/expression-variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2019.01078DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6834773PMC
October 2019

Heteroplasmic Variants of Mitochondrial DNA in Atherosclerotic Lesions of Human Aortic Intima.

Biomolecules 2019 09 6;9(9). Epub 2019 Sep 6.

Institute for Atherosclerosis Research, Skolkovo Innovation Center, 143026 Moscow, Russia.

Mitochondrial dysfunction and oxidative stress are likely involved in atherogenesis. Since the mitochondrial genome variation can alter functional activity of cells, it is necessary to assess the presence in atherosclerotic lesions of mitochondrial DNA (mtDNA) heteroplasmic mutations known to be associated with different pathological processes and ageing. In this study, mtDNA heteroplasmy and copy number (mtCN) were evaluated in the autopsy-derived samples of aortic intima differing by the type of atherosclerotic lesions. To detect mtDNA heteroplasmic variants, next generation sequencing was used, and mtCN measurement was performed by qPCR. It was shown that mtDNA heteroplasmic mutations are characteristic for particular areas of intimal tissue; in 83 intimal samples 55 heteroplasmic variants were found; mean minor allele frequencies level accounted for 0.09, with 12% mean heteroplasmy level. The mtCN variance measured in adjacent areas of intima was high, but atherosclerotic lesions and unaffected intima did not differ significantly in mtCN values. Basing on the ratio of minor and major nucleotide mtDNA variants, we can conclude that there exists the increase in the number of heteroplasmic mtDNA variants, which corresponds to the extent of atherosclerotic morphologic phenotype.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/biom9090455DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6770808PMC
September 2019

Employing toxin-antitoxin genome markers for identification of and strains in human metagenomes.

PeerJ 2019 4;7:e6554. Epub 2019 Mar 4.

Vavilov Institute of General Genetics Russian Academy of Sciences, Moscow, Russia.

Recent research has indicated that in addition to the unique genotype each individual may also have a unique microbiota composition. Difference in microbiota composition may emerge from both its species and strain constituents. It is important to know the precise composition especially for the gut microbiota (GM), since it can contribute to the health assessment, personalized treatment, and disease prevention for individuals and groups (cohorts). The existing methods for species and strain composition in microbiota are not always precise and usually not so easy to use. Probiotic bacteria of the genus and make an essential component of human GM. Previously we have shown that in certain and species the RelBE and MazEF superfamily of toxin-antitoxin (TA) systems may be used as functional biomarkers to differentiate these groups of bacteria at the species and strain levels. We have composed a database of TA genes of these superfamily specific for all lactobacilli and bifidobacteria species with complete genome sequence and confirmed that in all and species TA gene composition is species and strain specific. To analyze composition of species and strains of two bacteria genera, and , in human GM we developed TAGMA (toxin antitoxin genes for metagenomes analyses) software based on polymorphism in TA genes. TAGMA was tested on gut metagenomic samples. The results of our analysis have shown that TAGMA can be used to characterize species and strains of and in metagenomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.6554DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404652PMC
March 2019

Defensin-like peptides in wheat analyzed by whole-transcriptome sequencing: a focus on structural diversity and role in induced resistance.

PeerJ 2019 8;7:e6125. Epub 2019 Jan 8.

Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.

Antimicrobial peptides (AMPs) are the main components of the plant innate immune system. Defensins represent the most important AMP family involved in defense and non-defense functions. In this work, global RNA sequencing and transcriptome assembly were performed to explore the diversity of defensin-like (DEFL) genes in the wheat and to study their role in induced resistance (IR) mediated by the elicitor metabolites of a non-pathogenic strain FS-94 of . Using a combination of two pipelines for DEFL mining in transcriptome data sets, as many as 143 DEFL genes were identified in the vast majority of them represent novel genes. According to the number of cysteine residues and the cysteine motif, wheat DEFLs were classified into ten groups. Classical defensins with a characteristic 8-Cys motif assigned to group 1 DEFLs represent the most abundant group comprising 52 family members. DEFLs with a characteristic 4-Cys motif CX{3,5}CX{8,17}CX{4,6}C named group 4 DEFLs previously found only in legumes were discovered in wheat. Within DEFL groups, subgroups of similar sequences originated by duplication events were isolated. Variation among DEFLs within subgroups is due to amino acid substitutions and insertions/deletions of amino acid sequences. To identify IR-related DEFL genes, transcriptional changes in DEFL gene expression during elicitor-mediated IR were monitored. Transcriptional diversity of DEFL genes in wheat seedlings in response to the fungus , FS-94 elicitors, and the combination of both (elicitors + fungus) was demonstrated, with specific sets of up- and down-regulated DEFL genes. DEFL expression profiling allowed us to gain insight into the mode of action of the elicitors from We discovered that the elicitors up-regulated a set of 24 DEFL genes. After challenge inoculation with , another set of 22 DEFLs showed enhanced expression in IR-displaying seedlings. These DEFLs, in concert with other defense molecules, are suggested to determine enhanced resistance of elicitor-pretreated wheat seedlings. In addition to providing a better understanding of the mode of action of the elicitors from FS-94 in controlling diseases, up-regulated IR-specific DEFL genes represent novel candidates for genetic transformation of plants and development of pathogen-resistant crops.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.6125DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329339PMC
January 2019

Genome-wide map of human and mouse transcription factor binding sites aggregated from ChIP-Seq data.

BMC Res Notes 2018 Oct 23;11(1):756. Epub 2018 Oct 23.

Vavilov Institute of General Genetics, Russian Academy of Sciences, GSP-1, Gubkina 3, Moscow, Russia, 119991.

Objectives: Mammalian genomics studies, especially those focusing on transcriptional regulation, require information on genomic locations of regulatory regions, particularly, transcription factor (TF) binding sites. There are plenty of published ChIP-Seq data on in vivo binding of transcription factors in different cell types and conditions. However, handling of thousands of separate data sets is often impractical and it is desirable to have a single global map of genomic regions potentially bound by a particular TF in any of studied cell types and conditions.

Data Description: Here we report human and mouse cistromes, the maps of genomic regions that are routinely identified as TF binding sites, organized by TF. We provide cistromes for 349 mouse and 599 human TFs. Given a TF, its cistrome regions are supported by evidence from several ChIP-Seq experiments or several computational tools, and, as an optional filter, contain occurrences of sequence motifs recognized by the TF. Using the cistrome, we provide an annotation of TF binding sites in the vicinity of human and mouse transcription start sites. This information is useful for selecting potential gene targets of transcription factors and detecting co-regulated genes in differential gene expression data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13104-018-3856-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6199713PMC
October 2018

Modified LDL Particles Activate Inflammatory Pathways in Monocyte-derived Macrophages: Transcriptome Analysis.

Curr Pharm Des 2018 ;24(26):3143-3151

GW School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, United States.

Background: A hallmark of atherosclerosis is its complex pathogenesis, which is dependent on altered cholesterol metabolism and inflammation. Both arms of pathogenesis involve myeloid cells. Monocytes migrating into the arterial walls interact with modified low-density lipoprotein (LDL) particles, accumulate cholesterol and convert into foam cells, which promote plaque formation and also contribute to inflammation by producing proinflammatory cytokines. A number of studies characterized transcriptomics of macrophages following interaction with modified LDL, and revealed alteration of the expression of genes responsible for inflammatory response and cholesterol metabolism. However, it is still unclear how these two processes are related to each other to contribute to atherosclerotic lesion formation.

Methods: We attempted to identify the main mater regulator genes in macrophages treated with atherogenic modified LDL using a bioinformatics approach.

Results: We found that most of the identified genes were involved in inflammation, and none of them was implicated in cholesterol metabolism. Among the key identified genes were interleukin (IL)-7, IL-7 receptor, IL- 15 and CXCL8.

Conclusion: Our results indicate that activation of the inflammatory pathway is the primary response of the immune cells to modified LDL, while the lipid metabolism genes may be a secondary response triggered by inflammatory signalling.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2174/1381612824666180911120039DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302360PMC
October 2019

HDL activates expression of genes stimulating cholesterol efflux in human monocyte-derived macrophages.

Exp Mol Pathol 2018 10 16;105(2):202-207. Epub 2018 Aug 16.

The George Washington University School of Medicine and Health Sciences, Washington, DC, USA. Electronic address:

High density lipoproteins (HDL) are key components of reverse cholesterol transport pathway. HDL removes excessive cholesterol from peripheral cells, including macrophages, providing protection from cholesterol accumulation and conversion into foam cells, which is a key event in pathogenesis of atherosclerosis. The mechanism of cellular cholesterol efflux stimulation by HDL involves interaction with the ABCA1 lipid transporter and ensuing transfer of cholesterol to HDL particles. In this study, we looked for additional proteins contributing to HDL-dependent cholesterol efflux. Using RNAseq, we analyzed mRNAs induced by HDL in human monocyte-derived macrophages and identified three genes, fatty acid desaturase 1 (FADS1), insulin induced gene 1 (INSIG1), and the low-density lipoprotein receptor (LDLR), expression of which was significantly upregulated by HDL. We individually knocked down these genes in THP-1 cells using gene silencing by siRNA, and measured cellular cholesterol efflux to HDL. Knock down of FADS1 did not significantly change cholesterol efflux (p = 0.70), but knockdown of INSIG1 and LDLR resulted in highly significant reduction of the efflux to HDL (67% and 75% of control, respectively, p < 0.001). Importantly, the suppression of cholesterol efflux was independent of known effects of these genes on cellular cholesterol content, as cells were loaded with cholesterol using acetylated LDL. These results indicate that HDL particles stimulate expression of genes that enhance cellular cholesterol transfer to HDL.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.yexmp.2018.08.003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247801PMC
October 2018

The complete genome of the oil emulsifying strain Thalassolituus oleivorans K-188 from the Barents Sea.

Mar Genomics 2018 Feb 24;37:18-20. Epub 2017 Aug 24.

Vavilov Institute of General Genetics, Gubkina str., h. 3, Moscow 119991, Russian Federation; State Institute for Genetics and Selection of Industrial Microorganisms, 1-st Dorozhniy pr., h. 1, Moscow 117545, Russian Federation; Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region 141700, Russian Federation. Electronic address:

Gammaproteobacterium Thalassolituus oleivorans plays an important role in oil degradation in sea water through emulsifying crude oil and alkanes at low temperatures in polar sea environment. Here we report the complete genome sequence of K-188 strain (VKPM B-9394) isolated in the Barents Sea and compare it with other known Thalassolituus oleivorans strains. The Thalassolituus strains are differed in orthologs number of the genes of alkane degradation, transport proteins, genes of sugar utilization, endonucleases, signaling proteins, transcriptional regulators and presence of CRISPR/Cas locus. Also only the genome of K-188 contains the 3-hydroxyalkanoate synthetase.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.margen.2017.08.005DOI Listing
February 2018

HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis.

Nucleic Acids Res 2018 01;46(D1):D252-D259

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia.

We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx1106DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753240PMC
January 2018

An integrated expression atlas of miRNAs and their promoters in human and mouse.

Nat Biotechnol 2017 Sep 21;35(9):872-878. Epub 2017 Aug 21.

Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Japan.

MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.3947DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5767576PMC
September 2017

The single nucleotide variant rs12722489 determines differential estrogen receptor binding and enhancer properties of an IL2RA intronic region.

PLoS One 2017 24;12(2):e0172681. Epub 2017 Feb 24.

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.

We studied functional effect of rs12722489 single nucleotide polymorphism located in the first intron of human IL2RA gene on transcriptional regulation. This polymorphism is associated with multiple autoimmune conditions (rheumatoid arthritis, multiple sclerosis, Crohn's disease, and ulcerative colitis). Analysis in silico suggested significant difference in the affinity of estrogen receptor (ER) binding site between alternative allelic variants, with stronger predicted affinity for the risk (G) allele. Electrophoretic mobility shift assay showed that purified human ERα bound only G variant of a 32-bp genomic sequence containing rs12722489. Chromatin immunoprecipitation demonstrated that endogenous human ERα interacted with rs12722489 genomic region in vivo and DNA pull-down assay confirmed differential allelic binding of amplified 189-bp genomic fragments containing rs12722489 with endogenous human ERα. In a luciferase reporter assay, a kilobase-long genomic segment containing G but not A allele of rs12722489 demonstrated enhancer properties in MT-2 cell line, an HTLV-1 transformed human cell line with a regulatory T cell phenotype.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0172681PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5325477PMC
August 2017

Use of Primary Macrophages for Searching Novel Immunocorrectors.

Curr Pharm Des 2017 ;23(6):915-920

Department of Biophysics, Biological Faculty, Moscow State University, Moscow 119991, Russian Federation.

In this mini-review, the role of macrophage phenotypes in atherogenesis is considered. Recent studies on distribution of M1 and M2 macrophages in different types of atherosclerotic lesions indicate that macrophages exhibit a high degree of plasticity of phenotype in response to various conditions in microenvironment. The effect of the accumulation of cholesterol, a key event in atherogenesis, on the macrophage phenotype is also discussed. The article presents the results of transcriptome analysis of cholesterol-loaded macrophages revealing genes involved in immune response whose expression rate has changed the most. It turned out that the interaction of macrophages with modified LDL leads to higher expression levels of pro-inflammatory marker TNF-α and antiinflammatory marker CCL18. Phenotypic profile of macrophage activation could be a good target for testing of novel anti-atherogenic immunocorrectors. A number of anti-atherogenic drugs were tested as potential immunocorrectors using primary macrophage-based model.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2174/1381612823666170125110128DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6446906PMC
February 2018

Negative selection maintains transcription factor binding motifs in human cancer.

BMC Genomics 2016 06 23;17 Suppl 2:395. Epub 2016 Jun 23.

Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia.

Background: Somatic mutations in cancer cells affect various genomic elements disrupting important cell functions. In particular, mutations in DNA binding sites recognized by transcription factors can alter regulator binding affinities and, consequently, expression of target genes. A number of promoter mutations have been linked with an increased risk of cancer. Cancer somatic mutations in binding sites of selected transcription factors have been found under positive selection. However, action and significance of negative selection in non-coding regions remain controversial.

Results: Here we present analysis of transcription factor binding motifs co-localized with non-coding variants. To avoid statistical bias we account for mutation signatures of different cancer types. For many transcription factors, including multiple members of FOX, HOX, and NR families, we show that human cancers accumulate fewer mutations than expected by chance that increase or decrease affinity of predicted binding sites. Such stability of binding motifs is even more exhibited in DNase accessible regions.

Conclusions: Our data demonstrate negative selection against binding sites alterations and suggest that such selection pressure protects cancer cells from rewiring of regulatory circuits. Further analysis of transcription factors with conserved binding motifs can reveal cell regulatory pathways crucial for the survivability of various human cancers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-016-2728-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4928157PMC
June 2016

Preservation of methylated CpG dinucleotides in human CpG islands.

Biol Direct 2016 Mar 22;11(1):11. Epub 2016 Mar 22.

Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, GSP-1, 119991, Russia.

Background: CpG dinucleotides are extensively underrepresented in mammalian genomes. It is widely accepted that genome-wide CpG depletion is predominantly caused by an elevated CpG > TpG mutation rate due to frequent cytosine methylation in the CpG context. Meanwhile the CpG content in genomic regions called CpG islands (CGIs) is noticeably higher. This observation is usually explained by lower CpG > TpG substitution rates within CGIs due to reduced cytosine methylation levels.

Results: By combining genome-wide data on substitutions and methylation levels in several human cell types we have shown that cytosine methylation in human sperm cells was strongly and consistently associated with increased CpG > TpG substitution rates. In contrast, this correlation was not observed for embryonic stem cells or fibroblasts. Surprisingly, the decreased sperm CpG methylation level was insufficient to explain the reduced CpG > TpG substitution rates in CGIs.

Conclusions: While cytosine methylation in human sperm cells is strongly associated with increased CpG > TpG substitution rates, substitution rates are significantly reduced within CGIs even after sperm CpG methylation levels and local GC content are controlled for. Our findings are consistent with strong negative selection preserving methylated CpGs within CGIs including intergenic ones.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13062-016-0113-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4804638PMC
March 2016

HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models.

Nucleic Acids Res 2016 Jan 19;44(D1):D116-25. Epub 2015 Nov 19.

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia Moscow Institute of Physics and Technology, 141700, Institutskiy per. 9, Dolgoprudny, Moscow Region, Russia

Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv1249DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702883PMC
January 2016

Single-Cell Analyses of ESCs Reveal Alternative Pluripotent Cell States and Molecular Mechanisms that Control Self-Renewal.

Stem Cell Reports 2015 Aug;5(2):207-20

Department of Regenerative and Developmental Biology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA; Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA; Department of Pharmacology and System Therapeutics, Icahn School of Medicine at Mount Sinai, Systems Biology Center New York, One Gustave L. Levy Place, New York, NY 10029, USA. Electronic address:

Analyses of gene expression in single mouse embryonic stem cells (mESCs) cultured in serum and LIF revealed the presence of two distinct cell subpopulations with individual gene expression signatures. Comparisons with published data revealed that cells in the first subpopulation are phenotypically similar to cells isolated from the inner cell mass (ICM). In contrast, cells in the second subpopulation appear to be more mature. Pluripotency Gene Regulatory Network (PGRN) reconstruction based on single-cell data and published data suggested antagonistic roles for Oct4 and Nanog in the maintenance of pluripotency states. Integrated analyses of published genomic binding (ChIP) data strongly supported this observation. Certain target genes alternatively regulated by OCT4 and NANOG, such as Sall4 and Zscan10, feed back into the top hierarchical regulator Oct4. Analyses of such incoherent feedforward loops with feedback (iFFL-FB) suggest a dynamic model for the maintenance of mESC pluripotency and self-renewal.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.stemcr.2015.07.004DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4618835PMC
August 2015

Phenomenon of individual difference in human monocyte activation.

Exp Mol Pathol 2015 Aug 21;99(1):151-4. Epub 2015 Jun 21.

Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia.

Macrophages play an important role in the pathogenesis of atherosclerosis, including the early pre-clinical stages of the disease development. We have explored the possibility that the disease onset could be associated with altered monocyte/macrophage response to activating pro- and anti-inflammatory stimuli. We evaluated the susceptibility of circulating monocytes from healthy individuals and patients with asymptomatic carotid atherosclerosis to M1 and M2 activation. The obtained data indicated the existence of a remarkable individual difference in susceptibility to activation among monocytes isolated from the blood of different subjects, regardless of the presence or absence of atherosclerosis. The identified differences in susceptibility to activation between monocytes may explain the individual peculiarities of the immune response in different subjects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.yexmp.2015.06.011DOI Listing
August 2015

Complete Genome Sequence of Bifidobacterium longum GT15: Identification and Characterization of Unique and Global Regulatory Genes.

Microb Ecol 2015 Oct 17;70(3):819-34. Epub 2015 Apr 17.

Vavilov Institute of General Genetics, Gubkina str. 3, 119991, Moscow, Russia.

In this study, we report the first completely annotated genome sequence of the Russia origin Bifidobacterium longum subsp. longum strain GT15. Comparative genomic analysis of this genome with other available completely annotated genome sequences of B. longum strains isolated from other countries has revealed a high degree of conservation and synteny across the entire genomes. However, it was discovered that the open reading frames to 35 genes were detected only from the B. longum GT15 genome and absent from other genomes B. longum strains (not of Russian origin). These so-called unique genes (UGs) represent a total length of 39,066 bp, with G + C content ranging from 37 to 65 %. Interestingly, certain genes were detected in other B. longum strains of Russian origin. In our analysis, we examined genes for global regulatory systems: proteins of toxin-antitoxin (TA) systems type II, serine/threonine protein kinases (STPKs) of eukaryotic type, and genes of the WhiB-like family proteins. In addition, we have made in silico analysis of all the most significant probiotic genes and considered genes involved in epigenetic regulation and genes responsible for producing various neuromediators. This genome sequence may elucidate the biology of this probiotic strain as a promising candidate for practical (pharmaceutical) applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00248-015-0603-xDOI Listing
October 2015

Complete Genome Sequence of Bifidobacterium longum GT15: Unique Genes for Russian Strains.

Genome Announc 2014 Dec 18;2(6). Epub 2014 Dec 18.

Vavilov Institute of General Genetics, Moscow, Russia

In this study, we report the first completely annotated genome sequence of the Russian-origin Bifidobacterium longum subsp. longum strain GT15. We discovered 35 unique genes (UGs) which were detected from only the B. longum GT15 genome and were absent from other B. longum strain genomes (not of Russian origin).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/genomeA.01348-14DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271175PMC
December 2014

A promoter-level mammalian expression atlas.

Authors:
Alistair R R Forrest Hideya Kawaji Michael Rehli J Kenneth Baillie Michiel J L de Hoon Vanja Haberle Timo Lassmann Ivan V Kulakovskiy Marina Lizio Masayoshi Itoh Robin Andersson Christopher J Mungall Terrence F Meehan Sebastian Schmeier Nicolas Bertin Mette Jørgensen Emmanuel Dimont Erik Arner Christian Schmidl Ulf Schaefer Yulia A Medvedeva Charles Plessy Morana Vitezic Jessica Severin Colin A Semple Yuri Ishizu Robert S Young Margherita Francescatto Intikhab Alam Davide Albanese Gabriel M Altschuler Takahiro Arakawa John A C Archer Peter Arner Magda Babina Sarah Rennie Piotr J Balwierz Anthony G Beckhouse Swati Pradhan-Bhatt Judith A Blake Antje Blumenthal Beatrice Bodega Alessandro Bonetti James Briggs Frank Brombacher A Maxwell Burroughs Andrea Califano Carlo V Cannistraci Daniel Carbajo Yun Chen Marco Chierici Yari Ciani Hans C Clevers Emiliano Dalla Carrie A Davis Michael Detmar Alexander D Diehl Taeko Dohi Finn Drabløs Albert S B Edge Matthias Edinger Karl Ekwall Mitsuhiro Endoh Hideki Enomoto Michela Fagiolini Lynsey Fairbairn Hai Fang Mary C Farach-Carson Geoffrey J Faulkner Alexander V Favorov Malcolm E Fisher Martin C Frith Rie Fujita Shiro Fukuda Cesare Furlanello Masaaki Furino Jun-ichi Furusawa Teunis B Geijtenbeek Andrew P Gibson Thomas Gingeras Daniel Goldowitz Julian Gough Sven Guhl Reto Guler Stefano Gustincich Thomas J Ha Masahide Hamaguchi Mitsuko Hara Matthias Harbers Jayson Harshbarger Akira Hasegawa Yuki Hasegawa Takehiro Hashimoto Meenhard Herlyn Kelly J Hitchens Shannan J Ho Sui Oliver M Hofmann Ilka Hoof Furni Hori Lukasz Huminiecki Kei Iida Tomokatsu Ikawa Boris R Jankovic Hui Jia Anagha Joshi Giuseppe Jurman Bogumil Kaczkowski Chieko Kai Kaoru Kaida Ai Kaiho Kazuhiro Kajiyama Mutsumi Kanamori-Katayama Artem S Kasianov Takeya Kasukawa Shintaro Katayama Sachi Kato Shuji Kawaguchi Hiroshi Kawamoto Yuki I Kawamura Tsugumi Kawashima Judith S Kempfle Tony J Kenna Juha Kere Levon M Khachigian Toshio Kitamura S Peter Klinken Alan J Knox Miki Kojima Soichi Kojima Naoto Kondo Haruhiko Koseki Shigeo Koyasu Sarah Krampitz Atsutaka Kubosaki Andrew T Kwon Jeroen F J Laros Weonju Lee Andreas Lennartsson Kang Li Berit Lilje Leonard Lipovich Alan Mackay-Sim Ri-ichiroh Manabe Jessica C Mar Benoit Marchand Anthony Mathelier Niklas Mejhert Alison Meynert Yosuke Mizuno David A de Lima Morais Hiromasa Morikawa Mitsuru Morimoto Kazuyo Moro Efthymios Motakis Hozumi Motohashi Christine L Mummery Mitsuyoshi Murata Sayaka Nagao-Sato Yutaka Nakachi Fumio Nakahara Toshiyuki Nakamura Yukio Nakamura Kenichi Nakazato Erik van Nimwegen Noriko Ninomiya Hiromi Nishiyori Shohei Noma Shohei Noma Tadasuke Noazaki Soichi Ogishima Naganari Ohkura Hiroko Ohimiya Hiroshi Ohno Mitsuhiro Ohshima Mariko Okada-Hatakeyama Yasushi Okazaki Valerio Orlando Dmitry A Ovchinnikov Arnab Pain Robert Passier Margaret Patrikakis Helena Persson Silvano Piazza James G D Prendergast Owen J L Rackham Jordan A Ramilowski Mamoon Rashid Timothy Ravasi Patrizia Rizzu Marco Roncador Sugata Roy Morten B Rye Eri Saijyo Antti Sajantila Akiko Saka Shimon Sakaguchi Mizuho Sakai Hiroki Sato Suzana Savvi Alka Saxena Claudio Schneider Erik A Schultes Gundula G Schulze-Tanzil Anita Schwegmann Thierry Sengstag Guojun Sheng Hisashi Shimoji Yishai Shimoni Jay W Shin Christophe Simon Daisuke Sugiyama Takaai Sugiyama Masanori Suzuki Naoko Suzuki Rolf K Swoboda Peter A C 't Hoen Michihira Tagami Naoko Takahashi Jun Takai Hiroshi Tanaka Hideki Tatsukawa Zuotian Tatum Mark Thompson Hiroo Toyodo Tetsuro Toyoda Elvind Valen Marc van de Wetering Linda M van den Berg Roberto Verado Dipti Vijayan Ilya E Vorontsov Wyeth W Wasserman Shoko Watanabe Christine A Wells Louise N Winteringham Ernst Wolvetang Emily J Wood Yoko Yamaguchi Masayuki Yamamoto Misako Yoneda Yohei Yonekura Shigehiro Yoshida Susan E Zabierowski Peter G Zhang Xiaobei Zhao Silvia Zucchelli Kim M Summers Harukazu Suzuki Carsten O Daub Jun Kawai Peter Heutink Winston Hide Tom C Freeman Boris Lenhard Vladimir B Bajic Martin S Taylor Vsevolod J Makeev Albin Sandelin David A Hume Piero Carninci Yoshihide Hayashizaki

Nature 2014 Mar;507(7493):462-70

Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body. We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles. TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved. Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs. The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses. The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature13182DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4529748PMC
March 2014

Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data.

BMC Genomics 2014 Jan 29;15:80. Epub 2014 Jan 29.

Institute of Cytology and Genetics of the Siberian Division of Russian Academy of Sciences, Lavrentieva Prospect 10, Novosibirsk 630090, Russia.

Background: ChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However, their use for the interpretation of ChIP-Seq data is usually complicated by the absence of direct experimental verification, making it difficult both to set a threshold to avoid recognition of too many false-positive BSs, and to compare the actual performance of different models.

Results: Using ChIP-Seq data for FoxA2 binding loci in mouse adult liver and human HepG2 cells we compared FoxA binding-site predictions for four computational models of two fundamental classes: pattern matching based on existing training set of experimentally confirmed TFBSs (oPWM and SiteGA) and de novo motif discovery (ChIPMunk and diChIPMunk). To properly select prediction thresholds for the models, we experimentally evaluated affinity of 64 predicted FoxA BSs using EMSA that allows safely distinguishing sequences able to bind TF. As a result we identified thousands of reliable FoxA BSs within ChIP-Seq loci from mouse liver and human HepG2 cells. It was found that the performance of conventional position weight matrix (PWM) models was inferior with the highest false positive rate. On the contrary, the best recognition efficiency was achieved by the combination of SiteGA & diChIPMunk/ChIPMunk models, properly identifying FoxA BSs in up to 90% of loci for both mouse and human ChIP-Seq datasets.

Conclusions: The experimental study of TF binding to oligonucleotides corresponding to predicted sites increases the reliability of computational methods for TFBS-recognition in ChIP-Seq data analysis. Regarding ChIP-Seq data interpretation, basic PWMs have inferior TFBS recognition quality compared to the more sophisticated SiteGA and de novo motif discovery methods. A combination of models from different principles allowed identification of proper TFBSs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-15-80DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234207PMC
January 2014

Jaccard index based similarity measure to compare transcription factor binding site models.

Algorithms Mol Biol 2013 Sep 30;8(1):23. Epub 2013 Sep 30.

Laboratory of Bioinformatics and Systems Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov str. 32, Moscow 119991, GSP-1, Russia.

Background: Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model.TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds.

Results: We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation).

Conclusions: MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query.

Availability And Implementation: MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1748-7188-8-23DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851813PMC
September 2013

DNA sequence motif: a jack of all trades for ChIP-Seq data.

Adv Protein Chem Struct Biol 2013 ;91:135-71

Laboratory of Bioinformatics and Systems Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.

Nowadays, chromatin immunoprecipitation followed by next-generation sequencing, often referred to as ChIP-Seq, has become an industry standard to study a landscape of DNA-protein interactions in vivo. ChIP-Seq captures highly specific protein-DNA interactions, such as transcription factors (TFs) bound to appropriate binding sites, and sparse patterns formed by different histone marks. In this review, we focus on DNA sequence analysis methods adequate for TF ChIP-Seq data. We discuss numerous tasks starting from basic DNA motif finding and motif discovery as is, further applied to explore various features of experimental data. We show how sequence analysis of ChIP-Seq data derives novel biological knowledge on multiple levels, from individual transcription factor binding sites to genome segments operating as regulatory modules. Finally, we provide an overview of existing software in the field.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/B978-0-12-411637-5.00005-6DOI Listing
October 2014

From binding motifs in ChIP-Seq data to improved models of transcription factor binding sites.

J Bioinform Comput Biol 2013 Feb 16;11(1):1340004. Epub 2013 Jan 16.

Laboratory of Bioinformatics and Systems Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Street 32, Moscow 119991, GSP-1, Russia.

Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) became a method of choice to locate DNA segments bound by different regulatory proteins. ChIP-Seq produces extremely valuable information to study transcriptional regulation. The wet-lab workflow is often supported by downstream computational analysis including construction of models of nucleotide sequences of transcription factor binding sites in DNA, which can be used to detect binding sites in ChIP-Seq data at a single base pair resolution. The most popular TFBS model is represented by positional weight matrix (PWM) with statistically independent positional weights of nucleotides in different columns; such PWMs are constructed from a gapless multiple local alignment of sequences containing experimentally identified TFBSs. Modern high-throughput techniques, including ChIP-Seq, provide enough data for careful training of advanced models containing more parameters than PWM. Yet, many suggested multiparametric models often provide only incremental improvement of TFBS recognition quality comparing to traditional PWMs trained on ChIP-Seq data. We present a novel computational tool, diChIPMunk, that constructs TFBS models as optimal dinucleotide PWMs, thus accounting for correlations between nucleotides neighboring in input sequences. diChIPMunk utilizes many advantages of ChIPMunk, its ancestor algorithm, accounting for ChIP-Seq base coverage profiles ("peak shape") and using the effective subsampling-based core procedure which allows processing of large datasets. We demonstrate that diPWMs constructed by diChIPMunk outperform traditional PWMs constructed by ChIPMunk from the same ChIP-Seq data. Software website: http://autosome.ru/dichipmunk/
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0219720013400040DOI Listing
February 2013