Publications by authors named "Giovanni Bussotti"

28 Publications

  • Page 1 of 1

Colonization and genetic diversification processes of Leishmania infantum in the Americas.

Commun Biol 2021 Jan 29;4(1):139. Epub 2021 Jan 29.

Laboratório de Pesquisa em Leishmaniose, Instituto Oswaldo Cruz, FIOCRUZ, 21040-365, Rio de Janeiro, Brazil.

Leishmania infantum causes visceral leishmaniasis, a deadly vector-borne disease introduced to the Americas during the colonial era. This non-native trypanosomatid parasite has since established widespread transmission cycles using alternative vectors, and human infection has become a significant concern to public health, especially in Brazil. A multi-kilobase deletion was recently detected in Brazilian L. infantum genomes and is suggested to reduce susceptibility to the anti-leishmanial drug miltefosine. We show that deletion-carrying strains occur in at least 15 Brazilian states and describe diversity patterns suggesting that these derive from common ancestral mutants rather than from recurrent independent mutation events. We also show that the deleted locus and associated enzymatic activity is restored by hybridization with non-deletion type strains. Genetic exchange appears common in areas of secondary contact but also among closely related parasites. We examine demographic and ecological scenarios underlying this complex L. infantum population structure and discuss implications for disease control.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-021-01658-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7846609PMC
January 2021

The transcriptional response of pathogenic Leptospira to peroxide reveals new defenses against infection-related oxidative stress.

PLoS Pathog 2020 10 6;16(10):e1008904. Epub 2020 Oct 6.

Unité de Biologie des Spirochètes, Department of Microbiology, Institut Pasteur, Paris, France.

Pathogenic Leptospira spp. are the causative agents of the waterborne zoonotic disease leptospirosis. Leptospira are challenged by numerous adverse conditions, including deadly reactive oxygen species (ROS), when infecting their hosts. Withstanding ROS produced by the host innate immunity is an important strategy evolved by pathogenic Leptospira for persisting in and colonizing hosts. In L. interrogans, genes encoding defenses against ROS are repressed by the peroxide stress regulator, PerR. In this study, RNA sequencing was performed to characterize both the L. interrogans response to low and high concentrations of hydrogen peroxide and the PerR regulon. We showed that Leptospira solicit three main peroxidase machineries (catalase, cytochrome C peroxidase and peroxiredoxin) and heme to detoxify oxidants produced during peroxide stress. In addition, canonical molecular chaperones of the heat shock response and DNA repair proteins from the SOS response were required for Leptospira recovering from oxidative damage. Identification of the PerR regulon upon exposure to H2O2 allowed to define the contribution of this regulator in the oxidative stress response. This study has revealed a PerR-independent regulatory network involving other transcriptional regulators, two-component systems and sigma factors as well as non-coding RNAs that putatively orchestrate, in concert with PerR, the oxidative stress response. We have shown that PerR-regulated genes encoding a TonB-dependent transporter and a two-component system (VicKR) are involved in Leptospira tolerance to superoxide. This could represent the first defense mechanism against superoxide in L. interrogans, a bacterium lacking canonical superoxide dismutase. Our findings provide an insight into the mechanisms required by pathogenic Leptospira to overcome oxidative damage during infection-related conditions. This will participate in framing future hypothesis-driven studies to identify and decipher novel virulence mechanisms in this life-threatening pathogen.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.ppat.1008904DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7567364PMC
October 2020

Nuclear and mitochondrial genome sequencing of North-African isolates from cured and relapsed visceral leishmaniasis patients reveals variations correlating with geography and phenotype.

Microb Genom 2020 10;6(10)

Laboratoire de recherche, LR 16IPT06, Parasitoses médicales, Biotechnologies et Biomolécules, Institut Pasteur de Tunis, Université Tunis El-Manar, 13 Place Pasteur, Tunis, Tunisie.

Although several studies have investigated genetic diversity of in North Africa, genome-wide analyses are lacking. Here, we conducted comparative analyses of nuclear and mitochondrial genomes of seven . isolates from Tunisia with the aim to gain insight into factors that drive genomic and phenotypic adaptation. Isolates were from cured (=4) and recurrent (=3) visceral leishmaniasis (VL) cases, originating from northern (=2) and central (=5) Tunisia, where respectively stable and emerging VL foci are observed. All isolates from relapsed patients were from Kairouan governorate (Centre); one showing resistance to the anti-leishmanial drug Meglumine antimoniate. Nuclear genome diversity of the isolates was analysed by comparison to the JPCM5 reference genome. Kinetoplast maxi and minicircle sequences (1 and 59, respectively) were extracted from unmapped reads and identified by blast analysis against public data sets. The genome variation analysis grouped together isolates from the same geographical origins. Strains from the North were very different from the reference showing more than 34 587 specific single nucleotide variants, with one isolate representing a full genetic hybrid as judged by variant frequency. Composition of minicircle classes within isolates corroborated this geographical population structure. Read depth analysis revealed several significant gene copy number variations correlating with either geographical origin (amastin and Hsp33 genes) or relapse (CLN3 gene). However, no specific gene copy number variation was found in the drug-resistant isolate. In contrast, resistance was associated with a specific minicircle pattern suggesting mitochondrial DNA as a potential novel source for biomarker discovery.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1099/mgen.0.000444DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660250PMC
October 2020

Targeting Macrophage Histone H3 Modification as a Leishmania Strategy to Dampen the NF-κB/NLRP3-Mediated Inflammatory Response.

Cell Rep 2020 02;30(6):1870-1882.e4

INSERM U1201, Unité de Parasitologie Moléculaire et Signalisation, Département des Parasites et Insectes Vecteurs, Institut Pasteur, 25 Rue du Dr Roux, 75015 Paris, France; Institut Pasteur International Mixed Unit "Inflammation and Leishmania infection," Paris, France. Electronic address:

Aberrant macrophage activation during intracellular infection generates immunopathologies that can cause severe human morbidity. A better understanding of immune subversion strategies and macrophage phenotypic and functional responses is necessary to design host-directed intervention strategies. Here, we uncover a fine-tuned transcriptional response that is induced in primary and lesional macrophages infected by the parasite Leishmania amazonensis and dampens NF-κB and NLRP3 inflammasome activation. Subversion is amastigote-specific and characterized by a decreased expression of activating and increased expression of de-activating components of these pro-inflammatory pathways, thus revealing a regulatory dichotomy that abrogates the anti-microbial response. Changes in transcript abundance correlate with histone H3K9/14 hypoacetylation and H3K4 hypo-trimethylation in infected primary and lesional macrophages at promoters of NF-κB-related, pro-inflammatory genes. Our results reveal a Leishmania immune subversion strategy targeting host cell epigenetic regulation to establish conditions beneficial for parasite survival and open avenues for host-directed, anti-microbial drug discovery.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2020.01.030DOI Listing
February 2020

Trans-Atlantic Spill Over: Deconstructing the Ecological Adaptation of in the Americas.

Genes (Basel) 2019 12 19;11(1). Epub 2019 Dec 19.

Laboratory of Research on Leishmaniasis, Oswaldo Cruz Institute, FIOCRUZ, 21040-360 Rio de Janeiro, Brazil.

Pathogen fitness landscapes change when transmission cycles establish in non-native environments or spill over into new vectors and hosts. The introduction of in the Americas into the Neotropics during European colonization represents a unique case study to investigate the mechanisms of ecological adaptation of this important parasite. Defining the evolutionary trajectories that drive fitness in this new environment are of great public health importance as they will allow unique insight into pathways of host/pathogen co-evolution and their consequences for region-specific changes in disease manifestation. This review summarizes current knowledge on genetic and phenotypic diversity in the Americas and its possible role in the unique epidemiology of visceral leishmaniasis (VL) in the New World. We highlight the importance of appreciating adaptive molecular mechanisms in to understand the parasites' successful establishment on the continent.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/genes11010004DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7017240PMC
December 2019

Genome Dynamics during Environmental Adaptation Reveal Strain-Specific Differences in Gene Copy Number Variation, Karyotype Instability, and Telomeric Amplification.

mBio 2018 11 6;9(6). Epub 2018 Nov 6.

Unité de Parasitologiemoléculaire et Signalisation, Institut Pasteur, Paris, France

Protozoan parasites of the genus adapt to environmental change through chromosome and gene copy number variations. Only little is known about external or intrinsic factors that govern genomic adaptation. Here, by conducting longitudinal genome analyses of 10 new clinical isolates, we uncovered important differences in gene copy number among genetically highly related strains and revealed gain and loss of gene copies as potential drivers of long-term environmental adaptation in the field. In contrast, chromosome rather than gene amplification was associated with short-term environmental adaptation to culture. Karyotypic solutions were highly reproducible but unique for a given strain, suggesting that chromosome amplification is under positive selection and dependent on species- and strain-specific intrinsic factors. We revealed a progressive increase in read depth towards the chromosome ends for various isolates, which may represent a nonclassical mechanism of telomere maintenance that can preserve integrity of chromosome ends during selection for fast growth. Together our data draw a complex picture of genomic adaptation in the field and in culture, which is driven by a combination of intrinsic genetic factors that generate strain-specific phenotypic variations, which are under environmental selection and allow for fitness gain. Protozoan parasites of the genus cause severe human and veterinary diseases worldwide, termed leishmaniases. A hallmark of biology is its capacity to adapt to a variety of unpredictable fluctuations inside its human host, notably pharmacological interventions, thus, causing drug resistance. Here we investigated mechanisms of environmental adaptation using a comparative genomics approach by sequencing 10 new clinical isolates of the , , and complexes that were sampled across eight distinct geographical regions. Our data provide new evidence that parasites adapt to environmental change in the field and in culture through a combination of chromosome and gene amplification that likely causes phenotypic variation and drives parasite fitness gains in response to environmental constraints. This novel form of gene expression regulation through genomic change compensates for the absence of classical transcriptional control in these early-branching eukaryotes and opens new venues for biomarker discovery.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/mBio.01399-18DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6222132PMC
November 2018

Non-coding RNA Expression, Function, and Variation during Drosophila Embryogenesis.

Curr Biol 2018 11 1;28(22):3547-3561.e9. Epub 2018 Nov 1.

Genome Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany. Electronic address:

Long non-coding RNAs (lncRNAs) can often function in the regulation of gene expression during development; however, their generality as essential regulators in developmental processes and organismal phenotypes remains unclear. Here, we performed a tailored investigation of lncRNA expression and function during Drosophila embryogenesis, interrogating multiple stages, tissue specificity, nuclear localization, and genetic backgrounds. Our results almost double the number of annotated lncRNAs expressed at these embryonic stages. lncRNA levels are generally positively correlated with those of their neighboring genes, with little evidence of transcriptional interference. Using fluorescent in situ hybridization, we report the spatiotemporal expression of 15 new lncRNAs, revealing very dynamic tissue-specific patterns. Despite this, deletion of selected lncRNA genes had no obvious developmental defects or effects on viability under standard and stressed conditions. However, two lncRNA deletions resulted in modest expression changes of a small number of genes, suggesting that they fine-tune expression of non-essential genes. Several lncRNAs have strain-specific expression, indicating that they are not fixed within the population. This intra-species variation across genetic backgrounds may thereby be a useful tool to distinguish rapidly evolving lncRNAs with as yet non-essential roles.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cub.2018.09.026DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6264527PMC
November 2018

Haplotype selection as an adaptive mechanism in the protozoan pathogen Leishmania donovani.

Nat Ecol Evol 2017 Dec 6;1(12):1961-1969. Epub 2017 Nov 6.

Institut Pasteur, INSERM U1201, Unité de Parasitologie Moléculaire et Signalisation, 75015, Paris, France.

The parasite Leishmania  donovani causes a fatal disease termed visceral leishmaniasis. The process through which the parasite adapts to environmental change remains largely unknown. Here we show that aneuploidy is integral for parasite adaptation and that karyotypic fluctuations allow for selection of beneficial haplotypes, which impact transcriptomic output and correlate with phenotypic variations in proliferation and infectivity. To avoid loss of diversity following karyotype and haplotype selection, L. donovani utilizes two mechanisms: polyclonal selection of beneficial haplotypes to create coexisting subpopulations that preserve the original diversity, and generation of new diversity as aneuploidy-prone chromosomes tolerate higher mutation rates. Our results reveal high aneuploidy turnover and haplotype selection as a unique evolutionary adaptation mechanism that L. donovani uses to preserve genetic diversity under strong selection. This unexplored process may function in other human diseases, including fungal infection and cancer, and stimulate innovative treatment options.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41559-017-0361-xDOI Listing
December 2017

Transposon-driven transcription is a conserved feature of vertebrate spermatogenesis and transcript evolution.

EMBO Rep 2017 07 12;18(7):1231-1247. Epub 2017 May 12.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK

Spermatogenesis is associated with major and unique changes to chromosomes and chromatin. Here, we sought to understand the impact of these changes on spermatogenic transcriptomes. We show that long terminal repeats (LTRs) of specific mouse endogenous retroviruses (ERVs) drive the expression of many long non-coding transcripts (lncRNA). This process occurs post-mitotically predominantly in spermatocytes and round spermatids. We demonstrate that this transposon-driven lncRNA expression is a conserved feature of vertebrate spermatogenesis. We propose that transposon promoters are a mechanism by which the genome can explore novel transcriptional substrates, increasing evolutionary plasticity and allowing for the genesis of novel coding and non-coding genes. Accordingly, we show that a small fraction of these novel ERV-driven transcripts encode short open reading frames that produce detectable peptides. Finally, we find that distinct ERV elements from the same subfamilies act as differentially activated promoters in a tissue-specific context. In summary, we demonstrate that LTRs can act as tissue-specific promoters and contribute to post-mitotic spermatogenic transcriptome diversity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.15252/embr.201744059DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5494522PMC
July 2017

Intron retention-dependent gene regulation in Cryptococcus neoformans.

Sci Rep 2016 08 31;6:32252. Epub 2016 Aug 31.

Institut Pasteur, Unité Biologie des ARN des Pathogènes Fongiques, Département de Mycologie, F-75015, Paris, France.

The biological impact of alternative splicing is poorly understood in fungi, although recent studies have shown that these microorganisms are usually intron-rich. In this study, we re-annotated the genome of C. neoformans var. neoformans using RNA-Seq data. Comparison with C. neoformans var. grubii revealed that more than 99% of ORF-introns are in the same exact position in the two varieties whereas UTR-introns are much less evolutionary conserved. We also confirmed that alternative splicing is very common in C. neoformans, affecting nearly all expressed genes. We also observed specific regulation of alternative splicing by environmental cues in this yeast. However, alternative splicing does not appear to be an efficient method to diversify the C. neoformans proteome. Instead, our data suggest the existence of an intron retention-dependent mechanism of gene expression regulation that is not dependent on NMD. This regulatory process represents an additional layer of gene expression regulation in fungi and provides a mechanism to tune gene expression levels in response to any environmental modification.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep32252DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5006051PMC
August 2016

Improved definition of the mouse transcriptome via targeted RNA sequencing.

Genome Res 2016 05;26(5):705-16

EMBL, European Bioinformatics Institute, Cambridge, CB10 1SD, United Kingdom;

Targeted RNA sequencing (CaptureSeq) uses oligonucleotide probes to capture RNAs for sequencing, providing enriched read coverage, accurate measurement of gene expression, and quantitative expression data. We applied CaptureSeq to refine transcript annotations in the current murine GRCm38 assembly. More than 23,000 regions corresponding to putative or annotated long noncoding RNAs (lncRNAs) and 154,281 known splicing junction sites were selected for targeted sequencing across five mouse tissues and three brain subregions. The results illustrate that the mouse transcriptome is considerably more complex than previously thought. We assemble more complete transcript isoforms than GENCODE, expand transcript boundaries, and connect interspersed islands of mapped reads. We describe a novel filtering pipeline that identifies previously unannotated but high-quality transcript isoforms. In this set, 911 GENCODE neighboring genes are condensed into 400 expanded gene models. Additionally, 594 GENCODE lncRNAs acquire an open reading frame (ORF) when their structure is extended with CaptureSeq. Finally, we validate our observations using current FANTOM and Mouse ENCODE resources.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.199760.115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4864457PMC
May 2016

Multiple sequence alignment modeling: methods and applications.

Brief Bioinform 2016 11 27;17(6):1009-1023. Epub 2015 Nov 27.

This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbv099DOI Listing
November 2016

CARMEN, a human super enhancer-associated long noncoding RNA controlling cardiac specification, differentiation and homeostasis.

J Mol Cell Cardiol 2015 Dec 28;89(Pt A):98-112. Epub 2015 Sep 28.

Experimental Cardiology Unit, Department of Medicine, University of Lausanne Medical School, Lausanne, Switzerland. Electronic address:

Long noncoding RNAs (lncRNAs) are emerging as important regulators of developmental pathways. However, their roles in human cardiac precursor cell (CPC) remain unexplored. To characterize the long noncoding transcriptome during human CPC cardiac differentiation, we profiled the lncRNA transcriptome in CPCs isolated from the human fetal heart and identified 570 lncRNAs that were modulated during cardiac differentiation. Many of these were associated with active cardiac enhancer and super enhancers (SE) with their expression being correlated with proximal cardiac genes. One of the most upregulated lncRNAs was a SE-associated lncRNA that was named CARMEN, (CAR)diac (M)esoderm (E)nhancer-associated (N)oncoding RNA. CARMEN exhibits RNA-dependent enhancing activity and is upstream of the cardiac mesoderm-specifying gene regulatory network. Interestingly, CARMEN interacts with SUZ12 and EZH2, two components of the polycomb repressive complex 2 (PRC2). We demonstrate that CARMEN knockdown inhibits cardiac specification and differentiation in cardiac precursor cells independently of MIR-143 and -145 expression, two microRNAs located proximal to the enhancer sequences. Importantly, CARMEN expression was activated during pathological remodeling in the mouse and human hearts, and was necessary for maintaining cardiac identity in differentiated cardiomyocytes. This study demonstrates therefore that CARMEN is a crucial regulator of cardiac cell differentiation and homeostasis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.yjmcc.2015.09.016DOI Listing
December 2015

Quantitative gene profiling of long noncoding RNAs with targeted RNA sequencing.

Nat Methods 2015 Apr 9;12(4):339-42. Epub 2015 Mar 9.

1] Garvan Institute of Medical Research, Sydney, Australia. [2] St Vincents Clinical School, Faculty of Medicine, University of New South Wales, Sydney, Australia.

We compared quantitative RT-PCR (qRT-PCR), RNA-seq and capture sequencing (CaptureSeq) in terms of their ability to assemble and quantify long noncoding RNAs and novel coding exons across 20 human tissues. CaptureSeq was superior for the detection and quantification of genes with low expression, showed little technical variation and accurately measured differential expression. This approach expands and refines previous annotations and simultaneously generates an expression atlas.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.3321DOI Listing
April 2015

Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression.

Nat Commun 2015 Jan 13;6:5903. Epub 2015 Jan 13.

Functional Genomics Group, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA.

Mice have been a long-standing model for human biology and disease. Here we characterize, by RNA sequencing, the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles in human cell lines reveals substantial conservation of transcriptional programmes, and uncovers a distinct class of genes with levels of expression that have been constrained early in vertebrate evolution. This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms6903DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4308717PMC
January 2015

A comparative encyclopedia of DNA elements in the mouse genome.

Nature 2014 Nov;515(7527):355-64

Bioinformatics and Genomics, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, 08003 Barcelona, Catalonia, Spain.

The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature13992DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4266106PMC
November 2014

SARA-Coffee web server, a tool for the computation of RNA sequence and structure multiple alignments.

Nucleic Acids Res 2014 Jul 27;42(Web Server issue):W356-60. Epub 2014 Jun 27.

Comparative Bioinformatics, Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain

This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for every sequence, column and aligned residue. SARA-Coffee combines SARA, a pairwise structural RNA aligner with the R-Coffee multiple RNA aligner in a way that has been shown to improve alignment accuracy over most sequence aligners when enough structural data is available. The server can be accessed from http://tcoffee.crg.cat/apps/tcoffee/do:saracoffee.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku459DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086076PMC
July 2014

T-Coffee: Tree-based consistency objective function for alignment evaluation.

Methods Mol Biol 2014 ;1079:117-29

Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Universitat Pompeu Fabra, Barcelona, Spain.

T-Coffee, for Tree-based consistency objective function for alignment evaluation, is a versatile multiple sequence alignment (MSA) method suitable for aligning virtually any type of biological sequences. T-Coffee provides more than a simple sequence aligner; rather it is a framework in which alternative alignment methods and/or extra information (i.e., structural, evolutionary, or experimental information) can be combined to reach more accurate and more meaningful MSAs. T-Coffee can be used either by running input data via the Web server ( http://tcoffee.crg.cat/apps/tcoffee/index.html ) or by downloading the T-Coffee package. Here, we present how the package can be used in its command line mode to carry out the most common tasks and multiply align proteins, DNA, and RNA sequences. This chapter particularly emphasizes on the description of T-Coffee special flavors also called "modes," designed to address particular biological problems.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-62703-646-7_7DOI Listing
May 2014

Detecting and comparing non-coding RNAs in the high-throughput era.

Int J Mol Sci 2013 Jul 24;14(8):15423-58. Epub 2013 Jul 24.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/ijms140815423DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759867PMC
July 2013

Using tertiary structure for the computation of highly accurate multiple RNA alignments with the SARA-Coffee package.

Bioinformatics 2013 May 28;29(9):1112-9. Epub 2013 Feb 28.

Bioinformatics and Genomics Program, Centre for Genomic Regulation, 08003 Barcelona, Spain.

Motivation: Aligning RNAs is useful to search for homologous genes, study evolutionary relationships, detect conserved regions and identify any patterns that may be of biological relevance. Poor levels of conservation among homologs, however, make it difficult to compare RNA sequences, even when considering closely evolutionary related sequences.

Results: We describe SARA-Coffee, a tertiary structure-based multiple RNA aligner, which has been validated using BRAliDARTS, a new benchmark framework designed for evaluating tertiary structure-based multiple RNA aligners. We provide two methods to measure the capacity of alignments to match corresponding secondary and tertiary structure features. On this benchmark, SARA-Coffee outperforms both regular aligners and those using secondary structure information. Furthermore, we show that on sequences in which <60% of the nucleotides form base pairs, primary sequence methods usually perform better than secondary-structure aware aligners.

Availability And Implementation: The package and the datasets are available from http://www.tcoffee.org/Projects/saracoffee and http://structure.biofold.org/sara/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt096DOI Listing
May 2013

The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.

Genome Res 2012 Sep;22(9):1775-89

Bioinformatics and Genomics, Centre for Genomic Regulation and UPF, 08003 Barcelona, Catalonia, Spain.

The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.132159.111DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431493PMC
September 2012

Use of ChIP-Seq data for the design of a multiple promoter-alignment method.

Nucleic Acids Res 2012 Apr 9;40(7):e52. Epub 2012 Jan 9.

Bioinformatics and Genomics program, Centre for Genomic Regulation and UPF, 08003 Barcelona, Spain.

We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkr1292DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3326335PMC
April 2012

Exploring the gonad transcriptome of two extreme male pigs with RNA-seq.

BMC Genomics 2011 Nov 8;12:552. Epub 2011 Nov 8.

Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain.

Background: Although RNA-seq greatly advances our understanding of complex transcriptome landscapes, such as those found in mammals, complete RNA-seq studies in livestock and in particular in the pig are still lacking. Here, we used high-throughput RNA sequencing to gain insight into the characterization of the poly-A RNA fraction expressed in pig male gonads. An expression analysis comparing different mapping approaches and detection of allele specific expression is also discussed in this study.

Results: By sequencing testicle mRNA of two phenotypically extreme pigs, one Iberian and one Large White, we identified hundreds of unannotated protein-coding genes (PcGs) in intergenic regions, some of them presenting orthology with closely related species. Interestingly, we also detected 2047 putative long non-coding RNA (lncRNA), including 469 with human homologues. Two methods, DEGseq and Cufflinks, were used for analyzing expression. DEGseq identified 15% less expressed genes than Cufflinks, because DEGseq utilizes only unambiguously mapped reads. Moreover, a large fraction of the transcriptome is made up of transposable elements (14500 elements encountered), as has been reported in previous studies. Gene expression results between microarray and RNA-seq technologies were relatively well correlated (r = 0.71 across individuals). Differentially expressed genes between Large White and Iberian showed a significant overrepresentation of gamete production and lipid metabolism gene ontology categories. Finally, allelic imbalance was detected in ~ 4% of heterozygous sites.

Conclusions: RNA-seq is a powerful tool to gain insight into complex transcriptomes. In addition to uncovering many unnanotated genes, our study allowed us to determine that a considerable fraction is made up of long non-coding transcripts and transposable elements. Their biological roles remain to be determined in future studies. In terms of differences in expression between Large White and Iberian pigs, these were largest for genes involved in spermatogenesis and lipid metabolism, which is consistent with phenotypic extreme differences in prolificacy and fat deposition between these two breeds.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-12-552DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3221674PMC
November 2011

Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures.

Nat Protoc 2011 Nov;6(11):1669-82

Comparative Bioinformatics Group, Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), Universitat Pompeu Fabra, Barcelona, Spain.

T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homology extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nprot.2011.393DOI Listing
November 2011

BlastR--fast and accurate database searches for non-coding RNAs.

Nucleic Acids Res 2011 Sep 30;39(16):6886-95. Epub 2011 May 30.

Bioinformatics and Genomics program, Center for Genomic Regulation (CRG) and UPF, Barcelona, C/ D. Aiguader, 88, 08003 Barcelona, Spain.

We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkr335DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3167602PMC
September 2011

Long noncoding RNAs with enhancer-like function in human cells.

Cell 2010 Oct;143(1):46-58

The Wistar Institute, 3601 Spruce Street, Philadelphia, PA 19104, USA.

While the long noncoding RNAs (ncRNAs) constitute a large portion of the mammalian transcriptome, their biological functions has remained elusive. A few long ncRNAs that have been studied in any detail silence gene expression in processes such as X-inactivation and imprinting. We used a GENCODE annotation of the human genome to characterize over a thousand long ncRNAs that are expressed in multiple cell lines. Unexpectedly, we found an enhancer-like function for a set of these long ncRNAs in human cell lines. Depletion of a number of ncRNAs led to decreased expression of their neighboring protein-coding genes, including the master regulator of hematopoiesis, SCL (also called TAL1), Snai1 and Snai2. Using heterologous transcription assays we demonstrated a requirement for the ncRNAs in activation of gene expression. These results reveal an unanticipated role for a class of long ncRNAs in activation of critical regulators of development and differentiation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2010.09.001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4108080PMC
October 2010

A novel, noncanonical mechanism of cytoplasmic polyadenylation operates in Drosophila embryogenesis.

Genes Dev 2010 Jan;24(2):129-34

Gene Regulation Programme, Centre de Regulació Genòmica (CRG-UPF), 08003 Barcelona, Spain.

Cytoplasmic polyadenylation is a widespread mechanism to regulate mRNA translation that requires two sequences in the 3' untranslated region (UTR) of vertebrate substrates: the polyadenylation hexanucleotide, and the cytoplasmic polyadenylation element (CPE). Using a cell-free Drosophila system, we show that these signals are not relevant for Toll polyadenylation but, instead, a "polyadenylation region" (PR) is necessary. Competition experiments indicate that PR-mediated polyadenylation is required for viability and is mechanistically distinct from the CPE/hexanucleotide-mediated process. These data indicate that Toll mRNA is polyadenylated by a noncanonical mechanism, and suggest that a novel machinery functions for cytoplasmic polyadenylation during Drosophila embryogenesis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gad.568610DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2807348PMC
January 2010

The MoVIN server for the analysis of protein interaction networks.

BMC Bioinformatics 2008 Mar 26;9 Suppl 2:S11. Epub 2008 Mar 26.

Department of Biochemical Sciences, "Sapienza" University, Rome, Italy.

Background: Protein-protein interactions are at the basis of most cellular processes and crucial for many bio-technological applications. During the last few years the development of high-throughput technologies has produced several large-scale protein-protein interaction data sets for various organisms. It is important to develop tools for dissecting their content and analyse the information they embed by data-integration and computational methods.

Results: Interactions can be mediated by the presence of specific features, such as motifs, surface patches and domains. The co-occurrence of these features on proteins interacting with the same protein can indicate mutually exclusive interactions and, therefore, can be used for inferring the involvement of the proteins in common biological processes. We present here a publicly available server that allows the user to investigate protein interaction data in light of other biological information, such as their sequences, presence of specific domains, process and component ontologies. The server can be effectively used to construct a high-confidence set of mutually exclusive interactions by identifying similar features in groups of proteins sharing a common interaction partner. As an example, we describe here the identification of common motifs, function, cellular localization and domains in different datasets of yeast interactions.

Conclusions: The server can be used to analyse user-supplied datasets, it contains pre-processed data for four yeast Protein Protein interaction datasets and the results of their statistical analysis. These show that the presence of common motifs in proteins interacting with the same partner is a valuable source of information, it can be used to investigate the properties of the interacting proteins and provides information that can be effectively integrated with other sources. As more experimental interaction data become available, this tool will become more and more useful to gain a more detailed picture of the interactome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-9-S2-S11DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2323660PMC
March 2008