Publications by authors named "Dmitry Penzar"

8 Publications

  • Page 1 of 1

Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study.

Genome Biol 2020 05 11;21(1):114. Epub 2020 May 11.

School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland.

Background: Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets.

Results: Here we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in vitro and in vivo performance measures. However, for some in vivo experiments, the best performing PWM is assigned to an unrelated TF, indicating a binding mode involving protein-protein cooperativity.

Conclusions: In an all-against-all setting, we compute more than 18 million performance measure values for different PWM-experiment combinations and offer these results as a public resource to the research community. The benchmarking protocols are provided via a web interface and as docker images. The methods and results from this study may help others make better use of public TF specificity models, as well as public TF binding data sets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-01996-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7212583PMC
May 2020

What Do Neighbors Tell About You: The Local Context of Cis-Regulatory Modules Complicates Prediction of Regulatory Variants.

Front Genet 2019 31;10:1078. Epub 2019 Oct 31.

Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.

Many problems of modern genetics and functional genomics require the assessment of functional effects of sequence variants, including gene expression changes. Machine learning is considered to be a promising approach for solving this task, but its practical applications remain a challenge due to the insufficient volume and diversity of training data. A promising source of valuable data is a saturation mutagenesis massively parallel reporter assay, which quantitatively measures changes in transcription activity caused by sequence variants. Here, we explore the computational predictions of the effects of individual single-nucleotide variants on gene transcription measured in the massively parallel reporter assays, based on the data from the recent "Regulation Saturation" Critical Assessment of Genome Interpretation challenge. We show that the estimated prediction quality strongly depends on the structure of the training and validation data. Particularly, training on the sequence segments located next to the validation data results in the "information leakage" caused by the local context. This information leakage allows reproducing the prediction quality of the best CAGI challenge submissions with a fairly simple machine learning approach, and even obtaining notably better-than-random predictions using irrelevant genomic regions. Validation scenarios preventing such information leakage dramatically reduce the measured prediction quality. The performance at independent regulatory regions entirely excluded from the training set appears to be much lower than needed for practical applications, and even the performance estimation will become reliable only in the future with richer data from multiple reporters. The source code and data are available at https://bitbucket.org/autosomeru_cagi2018/cagi2018_regsat and https://genomeinterpretation.org/content/expression-variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2019.01078DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6834773PMC
October 2019

H3K4me3, H3K9ac, H3K27ac, H3K27me3 and H3K9me3 Histone Tags Suggest Distinct Regulatory Evolution of Open and Condensed Chromatin Landmarks.

Cells 2019 09 5;8(9). Epub 2019 Sep 5.

Omicsway Corp., Walnut, CA 91789, USA.

Background: Transposons are selfish genetic elements that self-reproduce in host DNA. They were active during evolutionary history and now occupy almost half of mammalian genomes. Close insertions of transposons reshaped structure and regulation of many genes considerably. Co-evolution of transposons and host DNA frequently results in the formation of new regulatory regions. Previously we published a concept that the proportion of functional features held by transposons positively correlates with the rate of regulatory evolution of the respective genes.

Methods: We ranked human genes and molecular pathways according to their regulatory evolution rates based on high throughput genome-wide data on five histone modifications (H3K4me3, H3K9ac, H3K27ac, H3K27me3, H3K9me3) linked with transposons for five human cell lines.

Results: Based on the total of approximately 1.5 million histone tags, we ranked regulatory evolution rates for 25075 human genes and 3121 molecular pathways and identified groups of molecular processes that showed signs of either fast or slow regulatory evolution. However, histone tags showed different regulatory patterns and formed two distinct clusters: promoter/active chromatin tags (H3K4me3, H3K9ac, H3K27ac) vs. heterochromatin tags (H3K27me3, H3K9me3).

Conclusion: In humans, transposon-linked histone marks evolved in a coordinated way depending on their functional roles.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/cells8091034DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6770625PMC
September 2019

Correction: Nikitin, D., et al. Retroelement-Linked Transcription Factor Binding Patterns Point to Quickly Developing Molecular Pathways in Human Evolution. 2019, , 130.

Cells 2019 Aug 5;8(8). Epub 2019 Aug 5.

I.M. Sechenov First Moscow State Medical University, 119992 Moscow, Russia.

In the article 'Retroelement-Linked Transcription Factor Binding Patterns Point to Quickly Developing Molecular Pathways in Human Evolution,' a number of transcription factor binding sites (TFBS) mapped on all retroelement classes were incorrectly calculated as sum of TFBS numbers separately mapped on LINEs, SINEs and LTR retrotransposons/endogenous retroviruses (LR/ERVs) [...].
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/cells8080832DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6721673PMC
August 2019

Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay.

Hum Mutat 2019 09 23;40(9):1280-1291. Epub 2019 Jun 23.

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland.

The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.23797DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6879779PMC
September 2019

Retroelement-Linked Transcription Factor Binding Patterns Point to Quickly Developing Molecular Pathways in Human Evolution.

Cells 2019 02 6;8(2). Epub 2019 Feb 6.

I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia.

Background: Retroelements (REs) are transposable elements occupying ~40% of the human genome that can regulate genes by providing transcription factor binding sites (TFBS). RE-linked TFBS profile can serve as a marker of gene transcriptional regulation evolution. This approach allows for interrogating the regulatory evolution of organisms with RE-rich genomes. We aimed to characterize the evolution of transcriptional regulation for human genes and molecular pathways using RE-linked TFBS accumulation as a metric. We characterized human genes and molecular pathways either enriched or deficient in RE-linked TFBS regulation. We used ENCODE database with mapped TFBS for 563 transcription factors in 13 human cell lines. For 24,389 genes and 3124 molecular pathways, we calculated the score of RE-linked TFBS regulation reflecting the regulatory evolution rate at the level of individual genes and molecular pathways. The major groups enriched by RE regulation deal with gene regulation by microRNAs, olfaction, color vision, fertilization, cellular immune response, and amino acids and fatty acids metabolism and detoxication. The groups were involved in translation, RNA transcription and processing, chromatin organization, and molecular signaling. We identified genes and molecular processes that have characteristics of especially high or low evolutionary rates at the level of RE-linked TFBS regulation in human lineage.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/cells8020130DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6406739PMC
February 2019

PQ, a new program for phylogeny reconstruction.

BMC Bioinformatics 2018 Oct 12;19(1):374. Epub 2018 Oct 12.

Faculty of Bioengineering and Bioinformatics, Moscow State University, 1 Leninskiye Gory, bld. 73, Moscow, 119991, Russia.

Background: Many algorithms and programs are available for phylogenetic reconstruction of families of proteins. Methods used widely at present use either a number of distance-based principles or character-based principles of maximum parsimony or maximum likelihood.

Results: We developed a novel program, named PQ, for reconstructing protein and nucleic acid phylogenies following a new character-based principle. Being tested on natural sequences PQ improves upon the results of maximum parsimony and maximum likelihood. Working with alignments of 10 and 15 sequences, it also outperforms the FastME program, which is based on one of the distance-based principles. Among all tested programs PQ is proved to be the least susceptible to long branch attraction. FastME outperforms PQ when processing alignments of 45 sequences, however. We confirm a recent result that on natural sequences FastME outperforms maximum parsimony and maximum likelihood. At the same time, both PQ and FastME are inferior to maximum parsimony and maximum likelihood on simulated sequences. PQ is open source and available to the public via an online interface.

Conclusions: The software we developed offers an open-source alternative for phylogenetic reconstruction for relatively small sets of proteins and nucleic acids, with up to a few tens of sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-018-2399-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6186109PMC
October 2018

Profiling of Human Molecular Pathways Affected by Retrotransposons at the Level of Regulation by Transcription Factor Proteins.

Front Immunol 2018 30;9:30. Epub 2018 Jan 30.

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.

Endogenous retroviruses and retrotransposons also termed retroelements (REs) are mobile genetic elements that were active until recently in human genome evolution. REs regulate gene expression by actively reshaping chromatin structure or by directly providing transcription factor binding sites (TFBSs). We aimed to identify molecular processes most deeply impacted by the REs in human cells at the level of TFBS regulation. By using ENCODE data, we identified ~2 million TFBS overlapping with putatively regulation-competent human REs located in 5-kb gene promoter neighborhood (~17% of all TFBS in promoter neighborhoods; ~9% of all RE-linked TFBS). Most of REs hosting TFBS were highly diverged repeats, and for the evolutionary young (0-8% diverged) elements we identified only ~7% of all RE-linked TFBS. The gene-specific distributions of RE-linked TFBS generally correlated with the distributions for all TFBS. However, several groups of molecular processes were highly enriched in the RE-linked TFBS regulation. They were strongly connected with the immunity and response to pathogens, with the negative regulation of gene transcription, ubiquitination, and protein degradation, extracellular matrix organization, regulation of STAT signaling, fatty acids metabolism, regulation of GTPase activity, protein targeting to Golgi, regulation of cell division and differentiation, development and functioning of perception organs and reproductive system. By contrast, the processes most weakly affected by the REs were linked with the conservative aspects of embryo development. We also identified differences in the regulation features by the younger and older fractions of the REs. The regulation by the older fraction of the REs was linked mainly with the immunity, cell adhesion, cAMP, IGF1R, Notch, Wnt, and integrin signaling, neuronal development, chondroitin sulfate and heparin metabolism, and endocytosis. The younger REs regulate other aspects of immunity, cell cycle progression and apoptosis, PDGF, TGF beta, EGFR, and p38 signaling, transcriptional repression, structure of nuclear lumen, catabolism of phospholipids, and heterocyclic molecules, insulin and AMPK signaling, retrograde Golgi-ER transport, and estrogen signaling. The immunity-linked pathways were highly represented in both categories, but their functional roles were different and did not overlap. Our results point to the most quickly evolving molecular pathways in the recent and ancient evolution of human genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fimmu.2018.00030DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5797644PMC
February 2019