Publications by authors named "Mikhail S Gelfand"

163 Publications

Order and stochasticity in the folding of individual Drosophila genomes.

Nat Commun 2021 01 4;12(1):41. Epub 2021 Jan 4.

Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia.

Mammalian and Drosophila genomes are partitioned into topologically associating domains (TADs). Although this partitioning has been reported to be functionally relevant, it is unclear whether TADs represent true physical units located at the same genomic positions in each cell nucleus or emerge as an average of numerous alternative chromatin folding patterns in a cell population. Here, we use a single-nucleus Hi-C technique to construct high-resolution Hi-C maps in individual Drosophila genomes. These maps demonstrate chromatin compartmentalization at the megabase scale and partitioning of the genome into non-hierarchical TADs at the scale of 100 kb, which closely resembles the TAD profile in the bulk in situ Hi-C data. Over 40% of TAD boundaries are conserved between individual nuclei and possess a high level of active epigenetic marks. Polymer simulations demonstrate that chromatin folding is best described by the random walk model within TADs and is most suitably approximated by a crumpled globule build of Gaussian blobs at longer distances. We observe prominent cell-to-cell variability in the long-range contacts between either active genome loci or between Polycomb-bound regions, suggesting an important contribution of stochastic processes to the formation of the Drosophila 3D genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-20292-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7782554PMC
January 2021

Phospho-islands and the evolution of phosphorylated amino acids in mammals.

PeerJ 2020 2;8:e10436. Epub 2020 Dec 2.

Skolkovo Institute of Science and Technology, Moscow, Russia.

Background: Protein phosphorylation is the best studied post-translational modification strongly influencing protein function. Phosphorylated amino acids not only differ in physico-chemical properties from non-phosphorylated counterparts, but also exhibit different evolutionary patterns, tending to mutate to and originate from negatively charged amino acids (NCAs). The distribution of phosphosites along protein sequences is non-uniform, as phosphosites tend to cluster, forming so-called phospho-islands.

Methods: Here, we have developed a hidden Markov model-based procedure for the identification of phospho-islands and studied the properties of the obtained phosphorylation clusters. To check robustness of evolutionary analysis, we consider different models for the reconstructions of ancestral phosphorylation states.

Results: Clustered phosphosites differ from individual phosphosites in several functional and evolutionary aspects including underrepresentation of phosphotyrosines, higher conservation, more frequent mutations to NCAs. The spectrum of tissues, frequencies of specific phosphorylation contexts, and mutational patterns observed near clustered sites also are different.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.10436DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7718798PMC
December 2020

Adaptive evolution at mRNA editing sites in soft-bodied cephalopods.

PeerJ 2020 27;8:e10456. Epub 2020 Nov 27.

Skolkovo Institute of Science and Technology, Moscow, Russian Federation.

Background: The bulk of variability in mRNA sequence arises due to mutation-change in DNA sequence which is heritable if it occurs in the germline. However, variation in mRNA can also be achieved by post-transcriptional modification including mRNA editing, changes in mRNA nucleotide sequence that mimic the effect of mutations. Such modifications are not inherited directly; however, as the processes affecting them are encoded in the genome, they have a heritable component, and therefore can be shaped by selection. In soft-bodied cephalopods, adenine-to-inosine RNA editing is very frequent, and much of it occurs at nonsynonymous sites, affecting the sequence of the encoded protein.

Methods: We study selection regimes at coleoid A-to-I editing sites, estimate the prevalence of positive selection, and analyze interdependencies between the editing level and contextual characteristics of editing site.

Results: Here, we show that mRNA editing of individual nonsynonymous sites in cephalopods originates in evolution through substitutions at regions adjacent to these sites. As such substitutions mimic the effect of the substitution at the edited site itself, we hypothesize that they are favored by selection if the inosine is selectively advantageous to adenine at the edited position. Consistent with this hypothesis, we show that edited adenines are more frequently substituted with guanine, an informational analog of inosine, in the course of evolution than their unedited counterparts, and for heavily edited adenines, these transitions are favored by positive selection. Our study shows that coleoid editing sites may enhance adaptation, which, together with recent observations on and human editing sites, points at a general role of RNA editing in the molecular evolution of metazoans.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.10456DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703385PMC
November 2020

Genome-Wide Transcription Start Site Mapping and Promoter Assignments to a Sigma Factor in the Human Enteropathogen .

Front Microbiol 2020 13;11:1939. Epub 2020 Aug 13.

Laboratoire Pathogenèses des Bactéries Anaérobies, Institut Pasteur, UMR CNRS 2001, Université de Paris, Paris, France.

The emerging human enteropathogen is the main cause of diarrhea associated with antibiotherapy. Regulatory pathways underlying the adaptive responses remain understudied and the global view of promoter structure is still missing. In the genome of 630, 22 genes encoding sigma factors are present suggesting a complex pattern of transcription in this bacterium. We present here the first transcriptional map of the genome resulting from the identification of transcriptional start sites (TSS), promoter motifs and operon structures. By 5'-end RNA-seq approach, we mapped more than 1000 TSS upstream of genes. In addition to these primary TSS, this analysis revealed complex structure of transcriptional units such as alternative and internal promoters, potential RNA processing events and 5' untranslated regions. By following an iterative strategy that used as an input previously published consensus sequences and transcriptomic analysis, we identified candidate promoters upstream of most of protein-coding and non-coding RNAs genes. This strategy also led to refine consensus sequences of promoters recognized by major sigma factors of . Detailed analysis focuses on the transcription in the pathogenicity locus and regulatory genes, as well as regulons of transition phase and sporulation sigma factors as important components of regulatory network governing toxin gene expression and spore formation. Among the still uncharacterized regulons of the major sigma factors of , we defined the SigL regulon by combining transcriptome and analyses. We showed that the SigL regulon is largely involved in amino-acid degradation, a metabolism crucial for gut colonization. Finally, we combined our TSS mapping, identification of promoters and RNA-seq data to improve gene annotation and to suggest operon organization in . These data will considerably improve our knowledge of global regulatory circuits controlling gene expression in and will serve as a useful rich resource for scientific community both for the detailed analysis of specific genes and systems biology studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fmicb.2020.01939DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7438776PMC
August 2020

Cumulative contact frequency of a chromatin region is an intrinsic property linked to its function.

PeerJ 2020 10;8:e9566. Epub 2020 Aug 10.

Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia.

Regulation of gene transcription is a complex process controlled by many factors, including the conformation of chromatin in the nucleus. Insights into chromatin conformation on both local and global scales can be provided by the Hi-C (high-throughput chromosomes conformation capture) method. One of the drawbacks of Hi-C analysis and interpretation is the presence of systematic biases, such as different accessibility to enzymes, amplification, and mappability of DNA regions, which all result in different visibility of the regions. Iterative correction (IC) is one of the most popular techniques developed for the elimination of these systematic biases. IC is based on the assumption that all chromatin regions have an equal number of observed contacts in Hi-C. In other words, the IC procedure is equalizing the experimental visibility approximated by the cumulative contact frequency (CCF) for all genomic regions. However, the differences in experimental visibility might be explained by biological factors such as chromatin openness, which is characteristic of distinct chromatin states. Here we show that CCF is positively correlated with active transcription. It is associated with compartment organization, since compartment A demonstrates higher CCF and gene expression levels than compartment B. Notably, this observation holds for a wide range of species, including human, mouse, and . Moreover, we track the CCF state for syntenic blocks between human and mouse and conclude that active state assessed by CCF is an intrinsic property of the DNA region, which is independent of local genomic and epigenomic context. Our findings establish a missing link between Hi-C normalization procedures removing CCF from the data and poorly investigated and possibly relevant biological factors contributing to CCF.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.9566DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7425636PMC
August 2020

Simplification of Ribosomes in Bacteria with Tiny Genomes.

Mol Biol Evol 2021 01;38(1):58-66

Institute for Information Transmission Problems (Kharkevich Institute), Moscow, Russia.

The ribosome is an essential cellular machine performing protein biosynthesis. Its structure and composition are highly conserved in all species. However, some bacteria have been reported to have an incomplete set of ribosomal proteins. We have analyzed ribosomal protein composition in 214 small bacterial genomes (<1 Mb) and found that although the ribosome composition is fairly stable, some ribosomal proteins may be absent, especially in bacteria with dramatically reduced genomes. The protein composition of the large subunit is less conserved than that of the small subunit. We have identified the set of frequently lost ribosomal proteins and demonstrated that they tend to be positioned on the ribosome surface and have fewer contacts to other ribosome components. Moreover, some proteins are lost in an evolutionary correlated manner. The reduction of ribosomal RNA is also common, with deletions mostly occurring in free loops. Finally, the loss of the anti-Shine-Dalgarno sequence is associated with the loss of a higher number of ribosomal proteins.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msaa184DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7782861PMC
January 2021

Translation at first sight: the influence of leading codons.

Nucleic Acids Res 2020 07;48(12):6931-6942

Skolkovo Institute of Science and Technology, Skolkovo, Moscow region 143025, Russia.

First triplets of mRNA coding region affect the yield of translation. We have applied the flowseq method to analyze >30 000 variants of the codons 2-11 of the fluorescent protein reporter to identify factors affecting the protein synthesis. While the negative influence of mRNA secondary structure on translation has been confirmed, a positive role of rare codons at the beginning of a coding sequence for gene expression has not been observed. The identity of triplets proximal to the start codon contributes more to the protein yield then more distant ones. Additional in-frame start codons enhance translation, while Shine-Dalgarno-like motifs downstream the initiation codon are inhibitory. The metabolic cost of amino acids affects the yield of protein in the poor medium. The most efficient translation was observed for variants with features resembling those of native Escherichia coli genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa430DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7337518PMC
July 2020

Draft genome sequences of Hirudo medicinalis and salivary transcriptome of three closely related medicinal leeches.

BMC Genomics 2020 Apr 29;21(1):331. Epub 2020 Apr 29.

Federal Research and Clinical Centre of Physical-Chemical Medicine of Federal Medical Biological Agency, 1a Malaya Pirogovskaya Str, Moscow, 119435, Russia.

Background: Salivary cell secretion (SCS) plays a critical role in blood feeding by medicinal leeches, making them of use for certain medical purposes even today.

Results: We annotated the Hirudo medicinalis genome and performed RNA-seq on salivary cells isolated from three closely related leech species, H. medicinalis, Hirudo orientalis, and Hirudo verbana. Differential expression analysis verified by proteomics identified salivary cell-specific gene expression, many of which encode previously unknown salivary components. However, the genes encoding known anticoagulants have been found to be expressed not only in salivary cells. The function-related analysis of the unique salivary cell genes enabled an update of the concept of interactions between salivary proteins and components of haemostasis.

Conclusions: Here we report a genome draft of Hirudo medicinalis and describe identification of novel salivary proteins and new homologs of genes encoding known anticoagulants in transcriptomes of three medicinal leech species. Our data provide new insights in genetics of blood-feeding lifestyle in leeches.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-020-6748-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7191736PMC
April 2020

Influence of the spacer region between the Shine-Dalgarno box and the start codon for fine-tuning of the translation efficiency in Escherichia coli.

Microb Biotechnol 2020 07 23;13(4):1254-1261. Epub 2020 Mar 23.

Skolkovo Institute of Science and Technology, Moscow, 143025, Russia.

Translation efficiency contributes several orders of magnitude difference in the overall yield of exogenous gene expression in bacteria. In diverse bacteria, the translation initiation site, whose sequence is the primary determinant of the translation performance, is comprised of the start codon and the Shine-Dalgarno box located upstream. Here, we have examined how the sequence of a spacer between these main components of the translation initiation site contributes to the yield of synthesized protein. We have created a library of reporter constructs with the randomized spacer region, performed fluorescently activated cell sorting and applied next-generation sequencing analysis (the FlowSeq protocol). As a result, we have identified sequence motifs for the spacer region between the Shine-Dalgarno box and AUG start codon that may modulate the translation efficiency in a 100-fold range.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/1751-7915.13561DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7264876PMC
July 2020

Chlamydia pan-genomic analysis reveals balance between host adaptation and selective pressure to genome reduction.

BMC Genomics 2019 Sep 12;20(1):710. Epub 2019 Sep 12.

Kharkevich Institute for Information Transmission Problems, RAS, Moscow, Russia.

Background: Chlamydia are ancient intracellular pathogens with reduced, though strikingly conserved genome. Despite their parasitic lifestyle and isolated intracellular environment, these bacteria managed to avoid accumulation of deleterious mutations leading to subsequent genome degradation characteristic for many parasitic bacteria.

Results: We report pan-genomic analysis of sixteen species from genus Chlamydia including identification and functional annotation of orthologous genes, and characterization of gene gains, losses, and rearrangements. We demonstrate the overall genome stability of these bacteria as indicated by a large fraction of common genes with conserved genomic locations. On the other hand, extreme evolvability is confined to several paralogous gene families such as polymorphic membrane proteins and phospholipase D, and likely is caused by the pressure from the host immune system.

Conclusions: This combination of a large, conserved core genome and a small, evolvable periphery likely reflect the balance between the selective pressure towards genome reduction and the need to adapt to escape from the host immunity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-019-6059-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6740158PMC
September 2019

Predictive models of protease specificity based on quantitative protease-activity profiling data.

Biochim Biophys Acta Proteins Proteom 2019 11 19;1867(11):140253. Epub 2019 Jul 19.

A.A.Kharkevich Institute of Information Transmission Problems, Moscow 127051, Russia; Skolkovo Institute of Science and Technology, Moscow 121205, Russia; Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Moscow 117997, Russia. Electronic address:

Bioinformatics-based prediction of protease substrates can help to elucidate regulatory proteolytic pathways that control a broad range of biological processes such as apoptosis and blood coagulation. The majority of published predictive models are position weight matrices (PWM) reflecting specificity of proteases toward target sequence. These models are typically derived from experimental data on positions of hydrolyzed peptide bonds and show a reasonable predictive power. New emerging techniques that not only register the cleavage position but also measure catalytic efficiency of proteolysis are expected to improve the quality of predictions or at least substantially reduce the number of tested substrates required for confident predictions. The main goal of this study was to develop new prediction models based on such data and to estimate the performance of the constructed models. We used data on catalytic efficiency of proteolysis measured for eight major human matrix metalloproteinases to construct predictive models of protease specificity using a variety of regression analysis techniques. The obtained results suggest that efficiency-based (quantitative) models show a comparable performance with conventional PWM-based algorithms, while less training data are required. The derived list of candidate cleavage sites in human secreted proteins may serve as a starting point for experimental analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.bbapap.2019.07.006DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6745255PMC
November 2019

Comparative Genomic Analysis of the Regulation of Aromatic Metabolism in Betaproteobacteria.

Front Microbiol 2019 29;10:642. Epub 2019 Mar 29.

Institute for Information Transmission Problems RAS (The Kharkevich Institute), Moscow, Russia.

Aromatic compounds are a common carbon and energy source for many microorganisms, some of which can even degrade toxic chloroaromatic xenobiotics. This comparative study of aromatic metabolism in 32 Betaproteobacteria species describes the links between several transcription factors (TFs) that control benzoate (BenR, BenM, BoxR, BzdR), catechol (CatR, CatM, BenM), chlorocatechol (ClcR), methylcatechol (MmlR), 2,4-dichlorophenoxyacetate (TfdR, TfdS), phenol (AphS, AphR, AphT), biphenyl (BphS), and toluene (TbuT) metabolism. We characterize the complexity and variability in the organization of aromatic metabolism operons and the structure of regulatory networks that may differ even between closely related species. Generally, the upper parts of pathways, rare pathway variants, and degradative pathways of exotic and complex, in particular, xenobiotic compounds are often controlled by a single TF, while the regulation of more common and/or central parts of the aromatic metabolism may vary widely and often involves several TFs with shared and/or dual, or cascade regulation. The most frequent and at the same time variable connections exist between AphS, AphR, AphT, and BenR. We have identified a novel LysR-family TF that regulates the metabolism of catechol (or some catechol derivative) and either substitutes CatR(M)/BenM, or shares functions with it. We have also predicted several new members of aromatic metabolism regulons, in particular, some COGs regulated by several different TFs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fmicb.2019.00642DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449761PMC
March 2019

Micro-evolution of three Streptococcus species: selection, antigenic variation, and horizontal gene inflow.

BMC Evol Biol 2019 03 27;19(1):83. Epub 2019 Mar 27.

Kharkevich Institute for Information Transmission Problems, 19, Bolshoy Karetny per., Moscow, 127051, Russia.

Background: The genus Streptococcus comprises pathogens that strongly influence the health of humans and animals. Genome sequencing of multiple Streptococcus strains demonstrated high variability in gene content and order even in closely related strains of the same species and created a newly emerged object for genomic analysis, the pan-genome. Here we analysed the genome evolution of 25 strains of Streptococcus suis, 50 strains of Streptococcus pyogenes and 28 strains of Streptococcus pneumoniae.

Results: Fractions of the pan-genome, unique, periphery, and universal genes differ in size, functional composition, the level of nucleotide substitutions, and predisposition to horizontal gene transfer and genomic rearrangements. The density of substitutions in intergenic regions appears to be correlated with selection acting on adjacent genes, implying that more conserved genes tend to have more conserved regulatory regions. The total pan-genome of the genus is open, but only due to strain-specific genes, whereas other pan-genome fractions reach saturation. We have identified the set of genes with phylogenies inconsistent with species and non-conserved location in the chromosome; these genes are rare in at least one species and have likely experienced recent horizontal transfer between species. The strain-specific fraction is enriched with mobile elements and hypothetical proteins, but also contains a number of candidate virulence-related genes, so it may have a strong impact on adaptability and pathogenicity. Mapping the rearrangements to the phylogenetic tree revealed large parallel inversions in all species. A parallel inversion of length 15 kB with breakpoints formed by genes encoding surface antigen proteins PhtD and PhtB in S. pneumoniae leads to replacement of gene fragments that likely indicates the action of an antigen variation mechanism.

Conclusions: Members of genus Streptococcus have a highly dynamic, open pan-genome, that potentially confers them with the ability to adapt to changing environmental conditions, i.e. antibiotic resistance or transmission between different hosts. Hence, integrated analysis of all aspects of genome evolution is important for the identification of potential pathogens and design of drugs and vaccines.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12862-019-1403-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6437910PMC
March 2019

Nuclear lamina integrity is required for proper spatial organization of chromatin in Drosophila.

Nat Commun 2019 03 12;10(1):1176. Epub 2019 Mar 12.

Institute of Molecular Genetics, Russian Academy of Sciences, Moscow, 123182, Russia.

How the nuclear lamina (NL) impacts on global chromatin architecture is poorly understood. Here, we show that NL disruption in Drosophila S2 cells leads to chromatin compaction and repositioning from the nuclear envelope. This increases the chromatin density in a fraction of topologically-associating domains (TADs) enriched in active chromatin and enhances interactions between active and inactive chromatin. Importantly, upon NL disruption the NL-associated TADs become more acetylated at histone H3 and less compact, while background transcription is derepressed. Two-colour FISH confirms that a TAD becomes less compact following its release from the NL. Finally, polymer simulations show that chromatin binding to the NL can per se compact attached TADs. Collectively, our findings demonstrate a dual function of the NL in shaping the 3D genome. Attachment of TADs to the NL makes them more condensed but decreases the overall chromatin density in the nucleus by stretching interphase chromosomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-019-09185-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6414625PMC
March 2019

Genome rearrangements and selection in multi-chromosome bacteria Burkholderia spp.

BMC Genomics 2018 Dec 27;19(1):965. Epub 2018 Dec 27.

Kharkevich Institute for Information Transmission Problems, Moscow, Russia.

Background: The genus Burkholderia consists of species that occupy remarkably diverse ecological niches. Its best known members are important pathogens, B. mallei and B. pseudomallei, which cause glanders and melioidosis, respectively. Burkholderia genomes are unusual due to their multichromosomal organization, generally comprised of 2-3 chromosomes.

Results: We performed integrated genomic analysis of 127 Burkholderia strains. The pan-genome is open with the saturation to be reached between 86,000 and 88,000 genes. The reconstructed rearrangements indicate a strong avoidance of intra-replichore inversions that is likely caused by selection against the transfer of large groups of genes between the leading and the lagging strands. Translocated genes also tend to retain their position in the leading or the lagging strand, and this selection is stronger for large syntenies. Integrated reconstruction of chromosome rearrangements in the context of strains phylogeny reveals parallel rearrangements that may indicate inversion-based phase variation and integration of new genomic islands. In particular, we detected parallel inversions in the second chromosomes of B. pseudomallei with breakpoints formed by genes encoding membrane components of multidrug resistance complex, that may be linked to a phase variation mechanism. Two genomic islands, spreading horizontally between chromosomes, were detected in the B. cepacia group.

Conclusions: This study demonstrates the power of integrated analysis of pan-genomes, chromosome rearrangements, and selection regimes. Non-random inversion patterns indicate selective pressure, inversions are particularly frequent in a recent pathogen B. mallei, and, together with periods of positive selection at other branches, may indicate adaptation to new niches. One such adaptation could be a possible phase variation mechanism in B. pseudomallei.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-018-5245-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6307245PMC
December 2018

Microbiomes of gall-inducing copepod crustaceans from the corals Stylophora pistillata (Scleractinia) and Gorgonia ventalina (Alcyonacea).

Sci Rep 2018 08 1;8(1):11563. Epub 2018 Aug 1.

Naturalis Biodiversity Center, Leiden, 2332 AA, The Netherlands.

Corals harbor complex and diverse microbial communities that strongly impact host fitness and resistance to diseases, but these microbes themselves can be influenced by stresses, like those caused by the presence of macroscopic symbionts. In addition to directly influencing the host, symbionts may transmit pathogenic microbial communities. We analyzed two coral gall-forming copepod systems by using 16S rRNA gene metagenomic sequencing: (1) the sea fan Gorgonia ventalina with copepods of the genus Sphaerippe from the Caribbean and (2) the scleractinian coral Stylophora pistillata with copepods of the genus Spaniomolgus from the Saudi Arabian part of the Red Sea. We show that bacterial communities in these two systems were substantially different with Actinobacteria, Alphaproteobacteria, and Betaproteobacteria more prevalent in samples from Gorgonia ventalina, and Gammaproteobacteria in Stylophora pistillata. In Stylophora pistillata, normal coral microbiomes were enriched with the common coral symbiont Endozoicomonas and some unclassified bacteria, while copepod and gall-tissue microbiomes were highly enriched with the family ME2 (Oceanospirillales) or Rhodobacteraceae. In Gorgonia ventalina, no bacterial group had significantly different prevalence in the normal coral tissues, copepods, and injured tissues. The total microbiome composition of polyps injured by copepods was different. Contrary to our expectations, the microbial community composition of the injured gall tissues was not directly affected by the microbiome of the gall-forming symbiont copepods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-018-29953-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6070567PMC
August 2018

Neanderthal and Denisovan ancestry in Papuans: A functional study.

J Bioinform Comput Biol 2018 04;16(2):1840011

† Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute for Science and Technology, Moscow, Russia.

Sequencing of complete nuclear genomes of Neanderthal and Denisovan stimulated studies about their relationship with modern humans demonstrating, in particular, that DNA alleles from both Neanderthal and Denisovan genomes are present in genomes of modern humans. The Papuan genome is a unique object because it contains both Neanderthal and Denisovan alleles. Here, we have shown that the Papuan genomes contain different gene functional groups inherited from each of the ancient people. The Papuan genomes demonstrate a relative prevalence of Neanderthal alleles in genes responsible for the regulation of transcription and neurogenesis. The enrichment of specific functional groups with Denisovan alleles is less pronounced; these groups are responsible for bone and tissue remodeling. This analysis shows that introgression of alleles from Neanderthals and Denisovans to Papuans occurred independently and retention of these alleles may carry specific adaptive advantages.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0219720018400115DOI Listing
April 2018

Comparative Genomic Analysis of spp., Intranuclear Symbionts of Paramecia.

Front Microbiol 2018 16;9:738. Epub 2018 Apr 16.

Skolkovo Institute of Science and Technology, Moscow, Russia.

While most endosymbiotic bacteria are transmitted only vertically, spp., an alphaproteobacterium from the order, can desert its host and invade a new one. All bacteria from the genus are intranuclear symbionts of ciliates spp. with strict species and nuclear specificity. Comparative metabolic reconstruction based on the newly sequenced genome of , a macronuclear symbiont of , and known genomes of other species shows that even though all spp. can persist outside the host, they cannot synthesize most of the essential small molecules, such as amino acids, and lack some central energy metabolic pathways, including glycolysis and the citric acid cycle. As the main energy source, spp. likely rely on nucleotides pirated from the host. -specific genes absent from other are possibly involved in the lifestyle switch from the infectious to the reproductive form and in cell invasion.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fmicb.2018.00738DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5911502PMC
April 2018

Genome rearrangements and phylogeny reconstruction in .

PeerJ 2018 27;6:e4545. Epub 2018 Mar 27.

Kharkevich Institute for Information Transmission Problems, Moscow, Russia.

Genome rearrangements have played an important role in the evolution of from its progenitor . Traditional phylogenetic trees for based on sequence comparison have short internal branches and low bootstrap supports as only a small number of nucleotide substitutions have occurred. On the other hand, even a small number of genome rearrangements may resolve topological ambiguities in a phylogenetic tree. We reconstructed phylogenetic trees based on genome rearrangements using several popular approaches such as Maximum likelihood for Gene Order and the Bayesian model of genome rearrangements by inversions. We also reconciled phylogenetic trees for each of the three CRISPR loci to obtain an integrated scenario of the CRISPR cassette evolution. Analysis of contradictions between the obtained evolutionary trees yielded numerous parallel inversions and gain/loss events. Our data indicate that an integrated analysis of sequence-based and inversion-based trees enhances the resolution of phylogenetic reconstruction. In contrast, reconstructions of strain relationships based on solely CRISPR loci may not be reliable, as the history is obscured by large deletions, obliterating the order of spacer gains. Similarly, numerous parallel gene losses preclude reconstruction of phylogeny based on gene content.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.4545DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5877447PMC
March 2018

Pangenomic Definition of Prokaryotic Species and the Phylogenetic Structure of spp.

Front Microbiol 2018 12;9:428. Epub 2018 Mar 12.

A.A.Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences (RAS), Moscow, Russia.

The pangenome is the collection of all groups of orthologous genes (OGGs) from a set of genomes. We apply the pangenome analysis to propose a definition of prokaryotic species based on identification of lineage-specific gene sets. While being similar to the classical biological definition based on allele flow, it does not rely on DNA similarity levels and does not require analysis of homologous recombination. Hence this definition is relatively objective and independent of arbitrary thresholds. A systematic analysis of 110 accepted species with the largest numbers of sequenced strains yields results largely consistent with the existing nomenclature. However, it has revealed that abundant marine cyanobacteria should be divided into two species. As a control we have confirmed the paraphyletic origin of (with embedded, monophyletic ) and (with ). We also demonstrate that by our definition and in accordance with recent studies and spp. are one species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fmicb.2018.00428DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5857598PMC
March 2018

Comparative genomic analysis of fungal TPP-riboswitches.

Fungal Genet Biol 2018 05 13;114:34-41. Epub 2018 Mar 13.

A.A. Kharkevich Institute for Information Transmission Problems, RAS, Bolshoy Karetny per. 19, Moscow 127051, Russia; Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Vorobievy Gory 1-73, Moscow 119991, Russia,; Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow 143028, Russia; Faculty of Computer Science, Higher School of Economics, Kochnovsky pr. 3, Moscow 125319, Russia.

Riboswitches are conserved RNA structures located in non-coding regions of mRNA and able to bind small molecules (e.g. metabolites) changing conformation upon binding. This feature enables them to function as regulators of gene expression. The thiamin pyrophosphate (TPP) riboswitch is the only type of riboswitches found not only in bacteria, but also in eukaryotes - in plants, green algae, protists, and fungi. Two main mechanisms of fungal TPP riboswitch action, involving alternative splicing, have been established so far. Here, we report a large-scale bioinformatic study of riboswitch structural features, action mechanisms, and distribution along the fungal taxonomy groups. For each putatively regulated gene, we reconstruct the riboswitch structure, identify other components of the regulation machinery, and establish mechanisms of riboswitch-mediated regulation. In addition to three genes known to be regulated by TPP riboswitches, thiazole synthase THI4, hydroxymethilpyrimidine-syntase NMT1, and putative transporter NCU01977, we identify two new genes, a putative thiamin transporter THI9 and a transporter of unknown specificity. While the riboswitch sequence and structure remain highly conserved in all species and genes, the mode of riboswitch-mediated regulation varies between regulated genes. The riboswitch usage varies strongly between fungal taxa, with the largest number of riboswitch-regulated genes found in Pezizomycotina and no riboswitch-mediated regulation established in Saccaromycotina.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.fgb.2018.03.004DOI Listing
May 2018

Cooption of heat shock regulatory system for anhydrobiosis in the sleeping chironomid .

Proc Natl Acad Sci U S A 2018 03 20;115(10):E2477-E2486. Epub 2018 Feb 20.

Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, 420012, Russian Federation;

is a striking and unique example of an insect that can survive almost complete desiccation. Its genome and a set of dehydration-rehydration transcriptomes, together with the genome of (a congeneric desiccation-sensitive midge), were recently released. Here, using published and newly generated datasets reflecting detailed transcriptome changes during anhydrobiosis, as well as a developmental series, we show that the TCTAGAA DNA motif, which closely resembles the binding motif of the heat shock transcription activator (Hsf), is significantly enriched in the promoter regions of desiccation-induced genes in , such as genes encoding late embryogenesis abundant (LEA) proteins, thioredoxins, or trehalose metabolism-related genes, but not in Unlike , has double TCTAGAA sites upstream of the Hsf gene itself, which is probably responsible for the stronger activation of Hsf in during desiccation compared with To confirm the role of Hsf in desiccation-induced gene activation, we used the Pv11 cell line, derived from embryo. After preincubation with trehalose, Pv11 cells can enter anhydrobiosis and survive desiccation. We showed that Hsf knockdown suppresses trehalose-induced activation of multiple predicted Hsf targets (including -specific LEA protein genes) and reduces the desiccation survival rate of Pv11 cells fivefold. Thus, cooption of the heat shock regulatory system has been an important evolutionary mechanism for adaptation to desiccation in .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1719493115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5877948PMC
March 2018

The genes of the sulphoquinovose catabolism in Escherichia coli are also associated with a previously unknown pathway of lactose degradation.

Sci Rep 2018 02 16;8(1):3177. Epub 2018 Feb 16.

A. A. Kharkevich Institute for Information Transmission Problems, RAS, Bolshoy Karetny per. 19, Moscow, 127051, Russia.

Comparative genomics analysis of conserved gene cassettes demonstrated resemblance between a recently described cassette of genes involved in sulphoquinovose degradation in Escherichia coli K-12 MG1655 and a Bacilli cassette linked with lactose degradation. Six genes from both cassettes had similar functions related to carbohydrate metabolism, namely, hydrolase, aldolase, kinase, isomerase, transporter, and transcription factor. The Escherichia coli sulphoglycolysis cassette was thus predicted to be associated with lactose degradation. This prediction was confirmed experimentally: expression of genes coding for aldolase (yihT), isomerase (yihS), and kinase (yihV) was dramatically increased during growth on lactose. These genes were previously shown to be activated during growth on sulphoquinovose, so our observation may indicate multi-functional capabilities of the respective proteins. Transcription starts for yihT, yihV and yihW were mapped in silico, in vitro and in vivo. Out of three promoters for yihT, one was active only during growth on lactose. We further showed that switches in yihT transcription are controlled by YihW, a DeoR-family transcription factor in the Escherichia coli cassette. YihW acted as a carbon source-dependent dual regulator involved in sustaining the baseline growth in the absence of lac-operon, with function either complementary, or opposite to a global regulator of carbohydrate metabolism, cAMP-CRP.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-018-21534-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5816610PMC
February 2018

Conservation, evolution, and regulation of splicing during prefrontal cortex development in humans, chimpanzees, and macaques.

RNA 2018 04 23;24(4):585-596. Epub 2018 Jan 23.

Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow 143028, Russia.

Changes in splicing are known to affect the function and regulation of genes. We analyzed splicing events that take place during the postnatal development of the prefrontal cortex in humans, chimpanzees, and rhesus macaques based on data obtained from 168 individuals. Our study revealed that among the 38,822 quantified alternative exons, 15% are differentially spliced among species, and more than 6% splice differently at different ages. Mutations in splicing acceptor and/or donor sites might explain more than 14% of all splicing differences among species and up to 64% of high-amplitude differences. A reconstructed -regulatory network containing 21 RNA-binding proteins explains a further 4% of splicing variations within species. While most age-dependent splicing patterns are conserved among the three species, developmental changes in intron retention are substantially more pronounced in humans.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1261/rna.064931.117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5855957PMC
April 2018

Sugar Lego: gene composition of bacterial carbohydrate metabolism genomic loci.

Biol Direct 2017 Nov 25;12(1):28. Epub 2017 Nov 25.

A.A.Kharkevich Institute for Information Transmission Problems, RAS, Bolshoy Karetny per. 19, Moscow, 127051, Russia.

Background: Bacterial carbohydrate metabolism is extremely diverse, since carbohydrates serve as a major energy source and are involved in a variety of cellular processes. Bacterial genes belonging to same metabolic pathway are often co-localized in the chromosome, but it is not a strict rule. Gene co-localization in linked to co-evolution and co-regulation. This study focuses on a large-scale analysis of bacterial genomic loci related to the carbohydrate metabolism.

Results: We demonstrate that only 53% of 148,000 studied genes from over six hundred bacterial genomes are co-localized in bacterial genomes with other carbohydrate metabolism genes, which points to a significant role of singleton genes. Co-localized genes form cassettes, ranging in size from two to fifteen genes. Two major factors influencing the cassette-forming tendency are gene function and bacterial phylogeny. We have obtained a comprehensive picture of co-localization preferences of genes for nineteen major carbohydrate metabolism functional classes, over two hundred gene orthologous clusters, and thirty bacterial classes, and characterized the cassette variety in size and content among different species, highlighting a significant role of short cassettes. The preference towards co-localization of carbohydrate metabolism genes varies between 40 and 76% for bacterial taxa. Analysis of frequently co-localized genes yielded forty-five significant pairwise links between genes belonging to different functional classes. The number of such links per class range from zero to eight, demonstrating varying preferences of respective genes towards a specific chromosomal neighborhood. Genes from eleven functional classes tend to co-localize with genes from the same class, indicating an important role of clustering of genes with similar functions. At that, in most cases such co-localization does not originate from local duplication events.

Conclusions: Overall, we describe a complex web formed by evolutionary relationships of bacterial carbohydrate metabolism genes, manifested as co-localization patterns.

Reviewers: This article was reviewed by Daria V. Dibrova (A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia), nominated by Armen Mulkidjanian (University of Osnabrück, Germany), Igor Rogozin (NCBI, NLM, NIH, USA) and Yuri Wolf (NCBI, NLM, NIH, USA).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13062-017-0200-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5702140PMC
November 2017

Genome analysis of E. coli isolated from Crohn's disease patients.

BMC Genomics 2017 07 19;18(1):544. Epub 2017 Jul 19.

Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, Moscow, Russia.

Background: Escherichia coli (E. coli) has been increasingly implicated in the pathogenesis of Crohn's disease (CD). The phylogeny of E. coli isolated from Crohn's disease patients (CDEC) was controversial, and while genotyping results suggested heterogeneity, the sequenced strains of E. coli from CD patients were closely related.

Results: We performed the shotgun genome sequencing of 28 E. coli isolates from ten CD patients and compared genomes from these isolates with already published genomes of CD strains and other pathogenic and non-pathogenic strains. CDEC was shown to belong to A, B1, B2 and D phylogenetic groups. The plasmid and several operons from the reference CD-associated E. coli strain LF82 were demonstrated to be more often present in CDEC genomes belonging to different phylogenetic groups than in genomes of commensal strains. The operons include carbon-source induced invasion GimA island, prophage I, iron uptake operons I and II, capsular assembly pathogenetic island IV and propanediol and galactitol utilization operons.

Conclusions: Our findings suggest that CDEC are phylogenetically diverse. However, some strains isolated from independent sources possess highly similar chromosome or plasmids. Though no CD-specific genes or functional domains were present in all CD-associated strains, some genes and operons are more often found in the genomes of CDEC than in commensal E. coli. They are principally linked to gut colonization and utilization of propanediol and other sugar alcohols.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-017-3917-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5517970PMC
July 2017

Activation of the alpha-globin gene expression correlates with dramatic upregulation of nearby non-globin genes and changes in local and large-scale chromatin spatial structure.

Epigenetics Chromatin 2017 07 11;10(1):35. Epub 2017 Jul 11.

Institute of Gene Biology of the Russian Academy of Sciences, Moscow, Russia 119334.

Background: In homeotherms, the alpha-globin gene clusters are located within permanently open genome regions enriched in housekeeping genes. Terminal erythroid differentiation results in dramatic upregulation of alpha-globin genes making their expression comparable to the rRNA transcriptional output. Little is known about the influence of the erythroid-specific alpha-globin gene transcription outburst on adjacent, widely expressed genes and large-scale chromatin organization. Here, we have analyzed the total transcription output, the overall chromatin contact profile, and CTCF binding within the 2.7 Mb segment of chicken chromosome 14 harboring the alpha-globin gene cluster in cultured lymphoid cells and cultured erythroid cells before and after induction of terminal erythroid differentiation.

Results: We found that, similarly to mammalian genome, the chicken genomes is organized in TADs and compartments. Full activation of the alpha-globin gene transcription in differentiated erythroid cells is correlated with upregulation of several adjacent housekeeping genes and the emergence of abundant intergenic transcription. An extended chromosome region encompassing the alpha-globin cluster becomes significantly decompacted in differentiated erythroid cells, and depleted in CTCF binding and CTCF-anchored chromatin loops, while the sub-TAD harboring alpha-globin gene cluster and the upstream major regulatory element (MRE) becomes highly enriched with chromatin interactions as compared to lymphoid and proliferating erythroid cells. The alpha-globin gene domain and the neighboring loci reside within the A-like chromatin compartment in both lymphoid and erythroid cells and become further segregated from the upstream gene desert upon terminal erythroid differentiation.

Conclusions: Our findings demonstrate that the effects of tissue-specific transcription activation are not restricted to the host genomic locus but affect the overall chromatin structure and transcriptional output of the encompassing topologically associating domain.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13072-017-0142-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5504709PMC
July 2017

Genomic Analysis of , the Thermophilic Anaerobic Bacterium of the Novel Bacterial Phylum .

Front Microbiol 2017 20;8:195. Epub 2017 Feb 20.

Winogradsky Institute of Microbiology, Research Center of Biotechnology, Russian Academy of Sciences Moscow, Russia.

The genome of , the first cultivated representative of a phylum-level bacterial lineage, was sequenced within the framework of Genomic Encyclopedia of Bacteria and Archaea (GEBA) project. The genomic analysis revealed mechanisms allowing this anaerobic bacterium to ferment peptides or to implement nitrate reduction with acetate or molecular hydrogen as electron donors. The genome encoded five different [NiFe]- and [FeFe]-hydrogenases, one of which, group 1 [NiFe]-hydrogenase, is presumably involved in lithoheterotrophic growth, three other produce H during fermentation, and one is apparently bidirectional. The ability to reduce nitrate is determined by a nitrate reductase of the Nap family, while nitrite reduction to ammonia is presumably catalyzed by an octaheme cytochrome nitrite reductase εHao. The genome contained genes of respiratory polysulfide/thiosulfate reductase, however, elemental sulfur and thiosulfate were not used as the electron acceptors for anaerobic respiration with acetate or H, probably due to the lack of the gene of the maturation protein. Nevertheless, elemental sulfur and thiosulfate stimulated growth on fermentable substrates (peptides), being reduced to sulfide, most probably through the action of the cytoplasmic sulfide dehydrogenase and/or NAD(P)-dependent [NiFe]-hydrogenase (sulfhydrogenase) encoded by the genome. Surprisingly, the genome of this anaerobic microorganism encoded all genes for cytochrome oxidase, however, its maturation machinery seems to be non-operational due to genomic rearrangements of supplementary genes. Despite the fact that sugars were not among the substrates reported when was first described, our genomic analysis revealed multiple genes of glycoside hydrolases, and some of them were predicted to be secreted. This finding aided in bringing out four carbohydrates that supported the growth of : starch, cellobiose, glucomannan and xyloglucan. The genomic analysis demonstrated the ability of to synthesize nucleotides and most amino acids and vitamins. Finally, the genomic sequence allowed us to perform a phylogenomic analysis, based on 38 protein sequences, which confirmed the deep branching of this lineage and justified the proposal of a novel phylum .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fmicb.2017.00195DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5317091PMC
February 2017

Application of sorting and next generation sequencing to study 5΄-UTR influence on translation efficiency in Escherichia coli.

Nucleic Acids Res 2017 04;45(6):3487-3502

Department of Chemistry, Faculty of Bioinformatics and Bioengeneering, Lomonosov Moscow State University, Moscow, 119992, Russia.

Yield of protein per translated mRNA may vary by four orders of magnitude. Many studies analyzed the influence of mRNA features on the translation yield. However, a detailed understanding of how mRNA sequence determines its propensity to be translated is still missing. Here, we constructed a set of reporter plasmid libraries encoding CER fluorescent protein preceded by randomized 5΄ untranslated regions (5΄-UTR) and Red fluorescent protein (RFP) used as an internal control. Each library was transformed into Escherchia coli cells, separated by efficiency of CER mRNA translation by a cell sorter and subjected to next generation sequencing. We tested efficiency of translation of the CER gene preceded by each of 48 natural 5΄-UTR sequences and introduced random and designed mutations into natural and artificially selected 5΄-UTRs. Several distinct properties could be ascribed to a group of 5΄-UTRs most efficient in translation. In addition to known ones, several previously unrecognized features that contribute to the translation enhancement were found, such as low proportion of cytidine residues, multiple SD sequences and AG repeats. The latter could be identified as translation enhancer, albeit less efficient than SD sequence in several natural 5΄-UTRs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw1141DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389652PMC
April 2017