Publications by authors named "Emmanuel Mongin"

14 Publications

  • Page 1 of 1

Mapping association between long-range cis-regulatory regions and their target genes using synteny.

J Comput Biol 2011 Sep;18(9):1115-30

McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada.

In chordates, long-range cis-regulatory regions are involved in the control of transcription initiation (either as repressors or enhancers). Their main characteristics are that (i) they can be located as far as 1 Mb away from the transcription start site of the target gene, (ii) they can regulate more than one gene, and (iii) they are usually orientation-independent. Therefore, proper characterization of functional interactions between long-range cis-regulatory regions and their target genes remains problematic. We present a novel method to predict such interactions based on the analysis of rearrangements between the human and 16 other vertebrate genomes. Our method is based on the assumption that genome rearrangements that would disrupt the functional interaction between a cis-regulatory region and its target gene are likely to be deleterious. Therefore, conservation of synteny through evolution would be an indication of a functional interaction. We use our algorithm to predict the association between a set of 123,905 human candidate regulatory regions to their target gene(s). This genome-wide map of interactions has many potential applications, including the selection of candidate regions prior to in vivo experimental characterization, a better characterization of regulatory regions involved in position effect diseases, and an improved understanding of the mechanisms and importance of long-range regulation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2011.0088DOI Listing
September 2011

Combining computational prediction of cis-regulatory elements with a new enhancer assay to efficiently label neuronal structures in the medaka fish.

PLoS One 2011 27;6(5):e19747. Epub 2011 May 27.

McGill Centre for Bioinformatics, McGill University, Montréal, Canada.

The developing vertebrate nervous system contains a remarkable array of neural cells organized into complex, evolutionarily conserved structures. The labeling of living cells in these structures is key for the understanding of brain development and function, yet the generation of stable lines expressing reporter genes in specific spatio-temporal patterns remains a limiting step. In this study we present a fast and reliable pipeline to efficiently generate a set of stable lines expressing a reporter gene in multiple neuronal structures in the developing nervous system in medaka. The pipeline combines both the accurate computational genome-wide prediction of neuronal specific cis-regulatory modules (CRMs) and a newly developed experimental setup to rapidly obtain transgenic lines in a cost-effective and highly reproducible manner. 95% of the CRMs tested in our experimental setup show enhancer activity in various and numerous neuronal structures belonging to all major brain subdivisions. This pipeline represents a significant step towards the dissection of embryonic neuronal development in vertebrates.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0019747PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3103512PMC
September 2011

Long-range regulation is a major driving force in maintaining genome integrity.

BMC Evol Biol 2009 Aug 15;9:203. Epub 2009 Aug 15.

McGill Centre for Bioinformatics, McGill University, Montreal, Canada.

Background: The availability of newly sequenced vertebrate genomes, along with more efficient and accurate alignment algorithms, have enabled the expansion of the field of comparative genomics. Large-scale genome rearrangement events modify the order of genes and non-coding conserved regions on chromosomes. While certain large genomic regions have remained intact over much of vertebrate evolution, others appear to be hotspots for genomic breakpoints. The cause of the non-uniformity of breakpoints that occurred during vertebrate evolution is poorly understood.

Results: We describe a machine learning method to distinguish genomic regions where breakpoints would be expected to have deleterious effects (called breakpoint-refractory regions) from those where they are expected to be neutral (called breakpoint-susceptible regions). Our predictor is trained using breakpoints that took place along the human lineage since amniote divergence. Based on our predictions, refractory and susceptible regions have very distinctive features. Refractory regions are significantly enriched for conserved non-coding elements as well as for genes involved in development, whereas susceptible regions are enriched for housekeeping genes, likely to have simpler transcriptional regulation.

Conclusion: We postulate that long-range transcriptional regulation strongly influences chromosome break fixation. In many regions, the fitness cost of altering the spatial association between long-range regulatory regions and their target genes may be so high that rearrangements are not allowed. Consequently, only a limited, identifiable fraction of the genome is susceptible to genome rearrangements.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2148-9-203DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2741452PMC
August 2009

BASC: an integrated bioinformatics system for Brassica research.

Nucleic Acids Res 2007 Jan 5;35(Database issue):D870-3. Epub 2006 Dec 5.

Plant Biotechnology Centre, Victorian AgriBiosciences Centre, 1 Park Drive, Bundoora, Victoria 3083, Australia.

The BASC system provides tools for the integrated mining and browsing of genetic, genomic and phenotypic data. This public resource hosts information on Brassica species supporting the Multinational Brassica Genome Sequencing Project, and is based upon five distinct modules, ESTDB, Microarray, MarkerQTL, CMap and EnsEMBL. ESTDB hosts expressed gene sequences and related annotation derived from comparison with GenBank, UniRef and the genome sequence of Arabidopsis. The Microarray module hosts gene expression information related to genes annotated within ESTDB. MarkerQTL is the most complex module and integrates information on genetic markers, maps, individuals, genotypes and traits. Two further modules include an Arabidopsis EnsEMBL genome viewer and the CMap comparative genetic map viewer for the visualization and integration of genetic and genomic data. The database is accessible at http://bioinformatics.pbcbasc.latrobe.edu.au.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkl998DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1761444PMC
January 2007

Anopheles gambiae immune responses to human and rodent Plasmodium parasite species.

PLoS Pathog 2006 Jun 9;2(6):e52. Epub 2006 Jun 9.

W. Harry Feinstone Department of Molecular Microbiology and Immunology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA.

Transmission of malaria is dependent on the successful completion of the Plasmodium lifecycle in the Anopheles vector. Major obstacles are encountered in the midgut tissue, where most parasites are killed by the mosquito's immune system. In the present study, DNA microarray analyses have been used to compare Anopheles gambiae responses to invasion of the midgut epithelium by the ookinete stage of the human pathogen Plasmodium falciparum and the rodent experimental model pathogen P. berghei. Invasion by P. berghei had a more profound impact on the mosquito transcriptome, including a variety of functional gene classes, while P. falciparum elicited a broader immune response at the gene transcript level. Ingestion of human malaria-infected blood lacking invasive ookinetes also induced a variety of immune genes, including several anti-Plasmodium factors. Twelve selected genes were assessed for effect on infection with both parasite species and bacteria using RNAi gene silencing assays, and seven of these genes were found to influence mosquito resistance to both parasite species. An MD2-like receptor, AgMDL1, and an immunolectin, FBN39, showed specificity in regulating only resistance to P. falciparum, while the antimicrobial peptide gambicin and a novel putative short secreted peptide, IRSP5, were more specific for defense against the rodent parasite P. berghei. While all the genes that affected Plasmodium development also influenced mosquito resistance to bacterial infection, four of the antimicrobial genes had no effect on Plasmodium development. Our study shows that the impact of P. falciparum and P. berghei infection on A. gambiae biology at the gene transcript level is quite diverse, and the defense against the two Plasmodium species is mediated by antimicrobial factors with both universal and Plasmodium-species specific activities. Furthermore, our data indicate that the mosquito is capable of sensing infected blood constituents in the absence of invading ookinetes, thereby inducing anti-Plasmodium immune responses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.ppat.0020052DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1475661PMC
June 2006

SNPServer: a real-time SNP discovery tool.

Nucleic Acids Res 2005 Jul;33(Web Server issue):W493-5

Plant Biotechnology Centre, La Trobe University, Bundoora 3086, Victoria, Australia.

SNPServer is a real-time flexible tool for the discovery of SNPs (single nucleotide polymorphisms) within DNA sequence data. The program uses BLAST, to identify related sequences, and CAP3, to cluster and align these sequences. The alignments are parsed to the SNP discovery software autoSNP, a program that detects SNPs and insertion/deletion polymorphisms (indels). Alternatively, lists of related sequences or pre-assembled sequences may be entered for SNP discovery. SNPServer and autoSNP use redundancy to differentiate between candidate SNPs and sequence errors. For each candidate SNP, two measures of confidence are calculated, the redundancy of the polymorphism at a SNP locus and the co-segregation of the candidate SNP with other SNPs in the alignment. SNPServer is available at http://hornbill.cspp.latrobe.edu.au/snpdiscovery.html.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gki462DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1160223PMC
July 2005

The Ensembl automatic gene annotation system.

Genome Res 2004 May;14(5):942-50

The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

As more genomes are sequenced, there is an increasing need for automated first-pass annotation which allows timely access to important genomic information. The Ensembl gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, and EST sequences. The gene-building system rests on top of the core Ensembl (MySQL) database schema and Perl Application Programming Interface (API), and the data generated are accessible through the Ensembl genome browser (http://www.ensembl.org). To date, the Ensembl predicted gene sets are available for the A. gambiae, C. briggsae, zebrafish, mouse, rat, and human genomes and have been heavily relied upon in the publication of the human, mouse, rat, and A. gambiae genome sequence analysis. Here we describe in detail the gene-building system and the algorithms involved. All code and data are freely available from http://www.ensembl.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.1858004DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC479124PMC
May 2004

The Ensembl analysis pipeline.

Genome Res 2004 May;14(5):934-41

The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

The Ensembl pipeline is an extension to the Ensembl system which allows automated annotation of genomic sequence. The software comprises two parts. First, there is a set of Perl modules ("Runnables" and "RunnableDBs") which are 'wrappers' for a variety of commonly used analysis tools. These retrieve sequence data from a relational database, run the analysis, and write the results back to the database. They inherit from a common interface, which simplifies the writing of new wrapper modules. On top of this sits a job submission system (the "RuleManager") which allows efficient and reliable submission of large numbers of jobs to a compute farm. Here we describe the fundamental software components of the pipeline, and we also highlight some features of the Sanger installation which were necessary to enable the pipeline to scale to whole-genome analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.1859804DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC479123PMC
May 2004

An overview of Ensembl.

Genome Res 2004 May 12;14(5):925-8. Epub 2004 Apr 12.

EMBL European Bioinformatics Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl's aims are to continue to "widen" this biological integration to include other model organisms relevant to understanding human biology as they become available; to "deepen" this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.1860604DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC479121PMC
May 2004

Genome sequence of the Brown Norway rat yields insights into mammalian evolution.

Nature 2004 Apr;428(6982):493-521

Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, MS BCM226, One Baylor Plaza, Houston, Texas 77030, USA. http://www.hgsc.bcm.tmc.edu

The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature02426DOI Listing
April 2004

The Anopheles gambiae genome: an update.

Trends Parasitol 2004 Feb;20(2):49-52

Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA.

As a result of an international collaborative effort, the first draft of the Anopheles gambiae genome sequence and its preliminary annotation were published in October 2002. Since then, the assembly, annotation and means of accession of the An. gambiae genome have been under continuous development. This article reviews progress and considers limitations in the current sequence assembly and gene annotation, as well as approaches to address these problems and outstanding issues that users of the data must bear in mind.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.pt.2003.11.003DOI Listing
February 2004

Initial sequencing and comparative analysis of the mouse genome.

Nature 2002 Dec;420(6915):520-62

Genome Sequencing Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA.

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature01262DOI Listing
December 2002

The genome sequence of the malaria mosquito Anopheles gambiae.

Science 2002 Oct;298(5591):129-49

Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.

Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1076181DOI Listing
October 2002

Genome annotation techniques: new approaches and challenges.

Drug Discov Today 2002 Jun;7(11):S70-6

European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton CB101SD, Cambridge, UK.

As more of the human genome draft sequence is finished, and genomes from other organisms begin to be sequenced, the demand for accurate and reliable genome annotation will increase significantly. To facilitate this industrial-scale genome annotation, automated bioinformatics solutions are increasingly required. As a result, automatic genome annotation systems have become more important in gene discovery within recent years. The design of such large-scale bioinformatics systems is an evolving and dynamic field, based on central cores of bioinformatics software tools and relational databases. Not only must these systems efficiently manage and integrate large volumes of genomic data, but they must also deliver accurate gene predictions and effectively distribute annotation data to the biosciences community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/s1359-6446(02)02289-4DOI Listing
June 2002
-->