Publications by authors named "Thibaut Hourlier"

22 Publications

  • Page 1 of 1

GENCODE 2021.

Nucleic Acids Res 2021 01;49(D1):D916-D923

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa1087DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778937PMC
January 2021

Ensembl 2021.

Nucleic Acids Res 2021 01;49(D1):D884-D891

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa942DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778975PMC
January 2021

An improved pig reference genome sequence to enable pig genetics and genomics research.

Gigascience 2020 06;9(6)

Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK.

Background: The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model given its similarity in size, anatomy, physiology, metabolism, pathology, and pharmacology to humans. The draft reference genome (Sscrofa10.2) of a purebred Duroc female pig established using older clone-based sequencing methods was incomplete, and unresolved redundancies, short-range order and orientation errors, and associated misassembled genes limited its utility.

Results: We present 2 annotated highly contiguous chromosome-level genome assemblies created with more recent long-read technologies and a whole-genome shotgun strategy, 1 for the same Duroc female (Sscrofa11.1) and 1 for an outbred, composite-breed male (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy than Sscrofa10.2.

Conclusions: These highly contiguous assemblies plus annotation of a further 11 short-read assemblies provide an unprecedented view of the genetic make-up of this important agricultural and biomedical model species. We propose that the improved Duroc assembly (Sscrofa11.1) become the reference genome for genomic research in pigs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giaa051DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7448572PMC
June 2020

Ensembl 2020.

Nucleic Acids Res 2020 01;48(D1):D682-D688

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz966DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7145704PMC
January 2020

Ensembl 2019.

Nucleic Acids Res 2019 01;47(D1):D745-D751

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. Ensembl seeks to be a fundamental resource driving scientific progress by creating, maintaining and updating reference genome annotation and comparative genomics resources. This year we describe our new and expanded gene, variant and comparative annotation capabilities, which led to a 50% increase in the number of vertebrate genomes we support. We have also doubled the number of available human variants and added regulatory regions for many mouse cell types and developmental stages. Our data sets and tools are available via the Ensembl website as well as a through a RESTful webservice, Perl application programming interface and as data files for download.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky1113DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323964PMC
January 2019

GENCODE reference annotation for the human and mouse genomes.

Nucleic Acids Res 2019 01;47(D1):D766-D773

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky955DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323946PMC
January 2019

Ensembl 2018.

Nucleic Acids Res 2018 01;46(D1):D754-D761

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx1098DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753206PMC
January 2018

A high-resolution mRNA expression time course of embryonic development in zebrafish.

Elife 2017 11 16;6. Epub 2017 Nov 16.

Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

We have produced an mRNA expression time course of zebrafish development across 18 time points from 1 cell to 5 days post-fertilisation sampling individual and pools of embryos. Using poly(A) pulldown stranded RNA-seq and a 3' end transcript counting method we characterise temporal expression profiles of 23,642 genes. We identify temporal and functional transcript co-variance that associates 5024 unnamed genes with distinct developmental time points. Specifically, a class of over 100 previously uncharacterised zinc finger domain containing genes, located on the long arm of chromosome 4, is expressed in a sharp peak during zygotic genome activation. In addition, the data reveal new genes and transcripts, differential use of exons and previously unidentified 3' ends across development, new primary microRNAs and temporal divergence of gene paralogues generated in the teleost genome duplication. To make this dataset a useful baseline reference, the data can be browsed and downloaded at Expression Atlas and Ensembl.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.30860DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5690287PMC
November 2017

Ensembl 2017.

Nucleic Acids Res 2017 01 28;45(D1):D635-D642. Epub 2016 Nov 28.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw1104DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210575PMC
January 2017

The Ensembl gene annotation system.

Database (Oxford) 2016 23;2016. Epub 2016 Jun 23.

Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK

The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail.Database URL: http://www.ensembl.org/index.html.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/baw093DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4919035PMC
November 2017

Ensembl 2016.

Nucleic Acids Res 2016 Jan 19;44(D1):D710-6. Epub 2015 Dec 19.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv1157DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702834PMC
January 2016

The draft genome sequence of the ferret (Mustela putorius furo) facilitates study of human respiratory disease.

Nat Biotechnol 2014 Dec 17;32(12):1250-5. Epub 2014 Nov 17.

Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, Florida, USA.

The domestic ferret (Mustela putorius furo) is an important animal model for multiple human respiratory diseases. It is considered the 'gold standard' for modeling human influenza virus infection and transmission. Here we describe the 2.41 Gb draft genome assembly of the domestic ferret, constituting 2.28 Gb of sequence plus gaps. We annotated 19,910 protein-coding genes on this assembly using RNA-seq data from 21 ferret tissues. We characterized the ferret host response to two influenza virus infections by RNA-seq analysis of 42 ferret samples from influenza time-course data and showed distinct signatures in ferret trachea and lung tissues specific to 1918 or 2009 human pandemic influenza virus infections. Using microarray data from 16 ferret samples reflecting cystic fibrosis disease progression, we showed that transcriptional changes in the CFTR-knockout ferret lung reflect pathways of early disease that cannot be readily studied in human infants with cystic fibrosis disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.3079DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262547PMC
December 2014

Ensembl 2015.

Nucleic Acids Res 2015 Jan 28;43(Database issue):D662-9. Epub 2014 Oct 28.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku1010DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383879PMC
January 2015

The genomic substrate for adaptive radiation in African cichlid fish.

Nature 2014 Sep 3;513(7518):375-381. Epub 2014 Sep 3.

Zoological Institute, University of Basel, CH-4051 Basel, Switzerland.

Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature13726DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4353498PMC
September 2014

The sheep genome illuminates biology of the rumen and lipid metabolism.

Science 2014 Jun;344(6188):1168-1173

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.

Sheep (Ovis aries) are a major source of meat, milk, and fiber in the form of wool and represent a distinct class of animals that have a specialized digestive organ, the rumen, that carries out the initial digestion of plant material. We have developed and analyzed a high-quality reference sheep genome and transcriptomes from 40 different tissues. We identified highly expressed genes encoding keratin cross-linking proteins associated with rumen evolution. We also identified genes involved in lipid metabolism that had been amplified and/or had altered tissue expression patterns. This may be in response to changes in the barrier lipids of the skin, an interaction between lipid metabolism and wool synthesis, and an increased role of volatile fatty acids in ruminants compared with nonruminant animals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1252806DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4157056PMC
June 2014

Ensembl 2014.

Nucleic Acids Res 2014 Jan 6;42(Database issue):D749-55. Epub 2013 Dec 6.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkt1196DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964975PMC
January 2014

Ensembl 2013.

Nucleic Acids Res 2013 Jan 30;41(Database issue):D48-55. Epub 2012 Nov 30.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.

The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks1236DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531136PMC
January 2013

A gene-phenotype network based on genetic variability for drought responses reveals key physiological processes in controlled and natural environments.

PLoS One 2012 8;7(10):e45249. Epub 2012 Oct 8.

INRA, Laboratoire des Interactions Plantes-Microorganismes (LIPM), UMR441, Castanet-Tolosan, France.

Identifying the connections between molecular and physiological processes underlying the diversity of drought stress responses in plants is key for basic and applied science. Drought stress response involves a large number of molecular pathways and subsequent physiological processes. Therefore, it constitutes an archetypical systems biology model. We first inferred a gene-phenotype network exploiting differences in drought responses of eight sunflower (Helianthus annuus) genotypes to two drought stress scenarios. Large transcriptomic data were obtained with the sunflower Affymetrix microarray, comprising 32423 probesets, and were associated to nine morpho-physiological traits (integrated transpired water, leaf transpiration rate, osmotic potential, relative water content, leaf mass per area, carbon isotope discrimination, plant height, number of leaves and collar diameter) using sPLS regression. Overall, we could associate the expression patterns of 1263 probesets to six phenotypic traits and identify if correlations were due to treatment, genotype and/or their interaction. We also identified genes whose expression is affected at moderate and/or intense drought stress together with genes whose expression variation could explain phenotypic and drought tolerance variability among our genetic material. We then used the network model to study phenotypic changes in less tractable agronomical conditions, i.e. sunflower hybrids subjected to different watering regimes in field trials. Mapping this new dataset in the gene-phenotype network allowed us to identify genes whose expression was robustly affected by water deprivation in both controlled and field conditions. The enrichment in genes correlated to relative water content and osmotic potential provides evidence of the importance of these traits in agronomical conditions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0045249PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3466295PMC
April 2013

Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome.

Genome Res 2012 Sep;22(9):1698-710

Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.

Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.134478.111DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431487PMC
September 2012

Ensembl 2012.

Nucleic Acids Res 2012 Jan 15;40(Database issue):D84-90. Epub 2011 Nov 15.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.

The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkr991DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245178PMC
January 2012

Transcriptomic analysis of the interaction between Helianthus annuus and its obligate parasite Plasmopara halstedii shows single nucleotide polymorphisms in CRN sequences.

BMC Genomics 2011 Oct 11;12:498. Epub 2011 Oct 11.

INRA, Laboratoire des Interactions Plantes-Microorganismes (LIPM), UMR441, F-31326 Castanet-Tolosan, France.

Background: Downy mildew in sunflowers (Helianthus annuus L.) is caused by the oomycete Plasmopara halstedii (Farl.) Berlese et de Toni. Despite efforts by the international community to breed mildew-resistant varieties, downy mildew remains a major threat to the sunflower crop. Very few genomic, genetic and molecular resources are currently available to study this pathogen. Using a 454 sequencing method, expressed sequence tags (EST) during the interaction between H. annuus and P. halstedii have been generated and a search was performed for sites in putative effectors to show polymorphisms between the different races of P. halstedii.

Results: A 454 pyrosequencing run of two infected sunflower samples (inbred lines XRQ and PSC8 infected with race 710 of P. halstedii, which exhibit incompatible and compatible interactions, respectively) generated 113,720 and 172,107 useable reads. From these reads, 44,948 contigs and singletons have been produced. A bioinformatic portal, HP, was specifically created for in-depth analysis of these clusters. Using in silico filtering, 405 clusters were defined as being specific to oomycetes, and 172 were defined as non-specific oomycete clusters. A subset of these two categories was checked using PCR amplification, and 86% of the tested clusters were validated. Twenty putative RXLR and CRN effectors were detected using PSI-BLAST. Using corresponding sequences from four races (100, 304, 703 and 710), 22 SNPs were detected, providing new information on pathogen polymorphisms.

Conclusions: This study identified a large number of genes that are expressed during H. annuus/P. halstedii compatible or incompatible interactions. It also reveals, for the first time, that an infection mechanism exists in P. halstedii similar to that in other oomycetes associated with the presence of putative RXLR and CRN effectors. SNPs discovered in CRN effector sequences were used to determine the genetic distances between the four races of P. halstedii. This work therefore provides valuable tools for further discoveries regarding the H. annuus/P. halstedii pathosystem.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-12-498DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3204308PMC
October 2011

Ensembl 2011.

Nucleic Acids Res 2011 Jan 2;39(Database issue):D800-6. Epub 2010 Nov 2.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkq1064DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013672PMC
January 2011