Publications by authors named "Magali Ruffier"

22 Publications

  • Page 1 of 1

GENCODE 2021.

Nucleic Acids Res 2021 01;49(D1):D916-D923

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa1087DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778937PMC
January 2021

Ensembl 2021.

Nucleic Acids Res 2021 01;49(D1):D884-D891

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa942DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778975PMC
January 2021

Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project.

Wellcome Open Res 2019 30;4:50. Epub 2019 Dec 30.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

We present a set of biallelic SNVs and INDELs, from 2,548 samples spanning 26 populations from the 1000 Genomes Project, called on GRCh38. We believe this will be a useful reference resource for those using GRCh38. It represents an improvement over the "lift-overs" of the 1000 Genomes Project data that have been available to date by encompassing all of the GRCh38 primary assembly autosomes and pseudo-autosomal regions, including novel, medically relevant loci. Here, we describe how the data set was created and benchmark our call set against that produced by the final phase of the 1000 Genomes Project on GRCh37 and the lift-over of that data to GRCh38.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/wellcomeopenres.15126.2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7059836PMC
December 2019

Ensembl 2020.

Nucleic Acids Res 2020 01;48(D1):D682-D688

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz966DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7145704PMC
January 2020

Ensembl 2019.

Nucleic Acids Res 2019 01;47(D1):D745-D751

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. Ensembl seeks to be a fundamental resource driving scientific progress by creating, maintaining and updating reference genome annotation and comparative genomics resources. This year we describe our new and expanded gene, variant and comparative annotation capabilities, which led to a 50% increase in the number of vertebrate genomes we support. We have also doubled the number of available human variants and added regulatory regions for many mouse cell types and developmental stages. Our data sets and tools are available via the Ensembl website as well as a through a RESTful webservice, Perl application programming interface and as data files for download.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky1113DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323964PMC
January 2019

GENCODE reference annotation for the human and mouse genomes.

Nucleic Acids Res 2019 01;47(D1):D766-D773

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky955DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323946PMC
January 2019

A comprehensive genomic history of extinct and living elephants.

Proc Natl Acad Sci U S A 2018 03 26;115(11):E2566-E2574. Epub 2018 Feb 26.

Broad Institute of MIT and Harvard, Cambridge, MA 02142.

Elephantids are the world's most iconic megafaunal family, yet there is no comprehensive genomic assessment of their relationships. We report a total of 14 genomes, including 2 from the American mastodon, which is an extinct elephantid relative, and 12 spanning all three extant and three extinct elephantid species including an ∼120,000-y-old straight-tusked elephant, a Columbian mammoth, and woolly mammoths. Earlier genetic studies modeled elephantid evolution via simple bifurcating trees, but here we show that interspecies hybridization has been a recurrent feature of elephantid evolution. We found that the genetic makeup of the straight-tusked elephant, previously placed as a sister group to African forest elephants based on lower coverage data, in fact comprises three major components. Most of the straight-tusked elephant's ancestry derives from a lineage related to the ancestor of African elephants while its remaining ancestry consists of a large contribution from a lineage related to forest elephants and another related to mammoths. Columbian and woolly mammoths also showed evidence of interbreeding, likely following a latitudinal cline across North America. While hybridization events have shaped elephantid history in profound ways, isolation also appears to have played an important role. Our data reveal nearly complete isolation between the ancestors of the African forest and savanna elephants for ∼500,000 y, providing compelling justification for the conservation of forest and savanna elephants as separate species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1720554115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5856550PMC
March 2018

Ensembl 2018.

Nucleic Acids Res 2018 01;46(D1):D754-D761

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx1098DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753206PMC
January 2018

Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation.

Database (Oxford) 2017 01;2017(1)

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl software resources are a stable infrastructure to store, access and manipulate genome assemblies and their functional annotations. The Ensembl 'Core' database and Application Programming Interface (API) was our first major piece of software infrastructure and remains at the centre of all of our genome resources. Since its initial design more than fifteen years ago, the number of publicly available genomic, transcriptomic and proteomic datasets has grown enormously, accelerated by continuous advances in DNA-sequencing technology. Initially intended to provide annotation for the reference human genome, we have extended our framework to support the genomes of all species as well as richer assembly models. Cross-referenced links to other informatics resources facilitate searching our database with a variety of popular identifiers such as UniProt and RefSeq. Our comprehensive and robust framework storing a large diversity of genome annotations in one location serves as a platform for other groups to generate and maintain their own tailored annotation. We welcome reuse and contributions: our databases and APIs are publicly available, all of our source code is released with a permissive Apache v2.0 licence at http://github.com/Ensembl and we have an active developer mailing list ( http://www.ensembl.org/info/about/contact/index.html ).

Database Url: http://www.ensembl.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bax020DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467575PMC
January 2017

Ensembl 2017.

Nucleic Acids Res 2017 01 28;45(D1):D635-D642. Epub 2016 Nov 28.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw1104DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210575PMC
January 2017

The Ensembl gene annotation system.

Database (Oxford) 2016 23;2016. Epub 2016 Jun 23.

Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK

The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail.Database URL: http://www.ensembl.org/index.html.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/baw093DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4919035PMC
November 2017

Ensembl 2016.

Nucleic Acids Res 2016 Jan 19;44(D1):D710-6. Epub 2015 Dec 19.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv1157DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702834PMC
January 2016

Ensembl 2015.

Nucleic Acids Res 2015 Jan 28;43(Database issue):D662-9. Epub 2014 Oct 28.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku1010DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383879PMC
January 2015

The Ensembl REST API: Ensembl Data for Any Language.

Bioinformatics 2015 Jan 17;31(1):143-5. Epub 2014 Sep 17.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

Motivation: We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language.

Availability And Implementation: The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btu613DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271150PMC
January 2015

Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication.

Science 2014 Aug;345(6200):1074-1079

Wellcome Trust Sanger Institute, Hinxton, UK.

The genetic changes underlying the initial steps of animal domestication are still poorly understood. We generated a high-quality reference genome for the rabbit and compared it to resequencing data from populations of wild and domestic rabbits. We identified more than 100 selective sweeps specific to domestic rabbits but only a relatively small number of fixed (or nearly fixed) single-nucleotide polymorphisms (SNPs) for derived alleles. SNPs with marked allele frequency differences between wild and domestic rabbits were enriched for conserved noncoding sites. Enrichment analyses suggest that genes affecting brain and neuronal development have often been targeted during domestication. We propose that because of a truly complex genetic background, tame behavior in rabbits and other domestic animals evolved by shifts in allele frequencies at many loci, rather than by critical changes at only a few domestication loci.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1253714DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5421586PMC
August 2014

Ensembl 2014.

Nucleic Acids Res 2014 Jan 6;42(Database issue):D749-55. Epub 2013 Dec 6.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkt1196DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964975PMC
January 2014

Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution.

Nat Genet 2013 Apr 24;45(4):415-21, 421e1-2. Epub 2013 Feb 24.

Department of Biology, University of Kentucky, Lexington, Kentucky, USA.

Lampreys are representatives of an ancient vertebrate lineage that diverged from our own ∼500 million years ago. By virtue of this deeply shared ancestry, the sea lamprey (P. marinus) genome is uniquely poised to provide insight into the ancestry of vertebrate genomes and the underlying principles of vertebrate biology. Here, we present the first lamprey whole-genome sequence and assembly. We note challenges faced owing to its high content of repetitive elements and GC bases, as well as the absence of broad-scale sequence information from closely related species. Analyses of the assembly indicate that two whole-genome duplications likely occurred before the divergence of ancestral lamprey and gnathostome lineages. Moreover, the results help define key evolutionary events within vertebrate lineages, including the origin of myelin-associated proteins and the development of appendages. The lamprey genome provides an important resource for reconstructing vertebrate origins and the evolutionary events that have shaped the genomes of extant organisms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.2568DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3709584PMC
April 2013

Ensembl 2013.

Nucleic Acids Res 2013 Jan 30;41(Database issue):D48-55. Epub 2012 Nov 30.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.

The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks1236DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531136PMC
January 2013

Ensembl 2012.

Nucleic Acids Res 2012 Jan 15;40(Database issue):D84-90. Epub 2011 Nov 15.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.

The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkr991DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245178PMC
January 2012

Ensembl 2011.

Nucleic Acids Res 2011 Jan 2;39(Database issue):D800-6. Epub 2010 Nov 2.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkq1064DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013672PMC
January 2011

Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis.

PLoS Biol 2010 Sep 7;8(9). Epub 2010 Sep 7.

Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America.

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pbio.1000475DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935454PMC
September 2010

Ensembl's 10th year.

Nucleic Acids Res 2010 Jan 11;38(Database issue):D557-62. Epub 2009 Nov 11.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkp972DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808936PMC
January 2010
-->