Publications by authors named "Denise Carvalho-Silva"

36 Publications

The ELIXIR Human Copy Number Variations Community: building bioinformatics infrastructure for research.

F1000Res 2020 13;9. Epub 2020 Oct 13.

Centre for Skin Sciences, University of Bradford, Bradford, UK.

Copy number variations (CNVs) are major causative contributors both in the genesis of genetic diseases and human neoplasias. While "High-Throughput" sequencing technologies are increasingly becoming the primary choice for genomic screening analysis, their ability to efficiently detect CNVs is still heterogeneous and remains to be developed. The aim of this white paper is to provide a guiding framework for the future contributions of ELIXIR's recently established with implications beyond human disease diagnostics and population genomics. This white paper is the direct result of a strategy meeting that took place in September 2018 in Hinxton (UK) and involved representatives of 11 ELIXIR Nodes. The meeting led to the definition of priority objectives and tasks, to address a wide range of CNV-related challenges ranging from detection and interpretation to sharing and training. Here, we provide suggestions on how to align these tasks within the ELIXIR Platforms strategy, and on how to frame the activities of this new ELIXIR Community in the international context.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.24887.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8311797PMC
August 2021

Open Targets Platform: supporting systematic drug-target identification and prioritisation.

Nucleic Acids Res 2021 01;49(D1):D1302-D1310

Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

The Open Targets Platform (https://www.targetvalidation.org/) provides users with a queryable knowledgebase and user interface to aid systematic target identification and prioritisation for drug discovery based upon underlying evidence. It is publicly available and the underlying code is open source. Since our last update two years ago, we have had 10 releases to maintain and continuously improve evidence for target-disease relationships from 20 different data sources. In addition, we have integrated new evidence from key datasets, including prioritised targets identified from genome-wide CRISPR knockout screens in 300 cancer models (Project Score), and GWAS/UK BioBank statistical genetic analysis evidence from the Open Targets Genetics Portal. We have evolved our evidence scoring framework to improve target identification. To aid the prioritisation of targets and inform on the potential impact of modulating a given target, we have added evaluation of post-marketing adverse drug reactions and new curated information on target tractability and safety. We have also developed the user interface and backend technologies to improve performance and usability. In this article, we describe the latest enhancements to the Platform, to address the fundamental challenge that developing effective and safe drugs is difficult and expensive.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa1027DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7779013PMC
January 2021

Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics.

Nucleic Acids Res 2021 01;49(D1):D1311-D1320

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

Open Targets Genetics (https://genetics.opentargets.org) is an open-access integrative resource that aggregates human GWAS and functional genomics data including gene expression, protein abundance, chromatin interaction and conformation data from a wide range of cell types and tissues to make robust connections between GWAS-associated loci, variants and likely causal genes. This enables systematic identification and prioritisation of likely causal variants and genes across all published trait-associated loci. In this paper, we describe the public resources we aggregate, the technology and analyses we use, and the functionality that the portal offers. Open Targets Genetics can be searched by variant, gene or study/phenotype. It offers tools that enable users to prioritise causal variants and genes at disease-associated loci and access systematic cross-disease and disease-molecular trait colocalization analysis across 92 cell types and tissues including the eQTL Catalogue. Data visualizations such as Manhattan-like plots, regional plots, credible sets overlap between studies and PheWAS plots enable users to explore GWAS signals in depth. The integrated data is made available through the web portal, for bulk download and via a GraphQL API, and the software is open source. Applications of this integrated data include identification of novel targets for drug discovery and drug repurposing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa840DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778936PMC
January 2021

Ten simple rules for making training materials FAIR.

PLoS Comput Biol 2020 05 21;16(5):e1007854. Epub 2020 May 21.

SIB Training group, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.

Everything we do today is becoming more and more reliant on the use of computers. The field of biology is no exception; but most biologists receive little or no formal preparation for the increasingly computational aspects of their discipline. In consequence, informal training courses are often needed to plug the gaps; and the demand for such training is growing worldwide. To meet this demand, some training programs are being expanded, and new ones are being developed. Key to both scenarios is the creation of new course materials. Rather than starting from scratch, however, it's sometimes possible to repurpose materials that already exist. Yet finding suitable materials online can be difficult: They're often widely scattered across the internet or hidden in their home institutions, with no systematic way to find them. This is a common problem for all digital objects. The scientific community has attempted to address this issue by developing a set of rules (which have been called the Findable, Accessible, Interoperable and Reusable [FAIR] principles) to make such objects more findable and reusable. Here, we show how to apply these rules to help make training materials easier to find, (re)use, and adapt, for the benefit of all.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1007854DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7241697PMC
May 2020

Eleven quick tips to build a usable REST API for life sciences.

PLoS Comput Biol 2018 12 13;14(12):e1006542. Epub 2018 Dec 13.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1006542DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6292566PMC
December 2018

Open Targets Platform: new developments and updates two years on.

Nucleic Acids Res 2019 01;47(D1):D1056-D1065

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

The Open Targets Platform integrates evidence from genetics, genomics, transcriptomics, drugs, animal models and scientific literature to score and rank target-disease associations for drug target identification. The associations are displayed in an intuitive user interface (https://www.targetvalidation.org), and are available through a REST-API (https://api.opentargets.io/v3/platform/docs/swagger-ui) and a bulk download (https://www.targetvalidation.org/downloads/data). In addition to target-disease associations, we also aggregate and display data at the target and disease levels to aid target prioritisation. Since our first publication two years ago, we have made eight releases, added new data sources for target-disease associations, started including causal genetic variants from non genome-wide targeted arrays, added new target and disease annotations, launched new visualisations and improved existing ones and released a new web tool for batch search of up to 200 targets. We have a new URL for the Open Targets Platform REST-API, new REST endpoints and also removed the need for authorisation for API fair use. Here, we present the latest developments of the Open Targets Platform, expanding the evidence and target-disease associations with new and improved data sources, refining data quality, enhancing website usability, and increasing our user base with our training workshops, user support, social media and bioinformatics forum engagement.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky1133DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6324073PMC
January 2019

Ten simple rules for delivering live distance training in bioinformatics across the globe using webinars.

PLoS Comput Biol 2018 11 15;14(11):e1006419. Epub 2018 Nov 15.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1006419DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6237289PMC
November 2018

Designing an intuitive web application for drug discovery scientists.

Drug Discov Today 2018 06 11;23(6):1169-1174. Epub 2018 Jan 11.

Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK; European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

We discuss how we designed the Open Targets Platform (www.targetvalidation.org), an intuitive application for bench scientists working in early drug discovery. To meet the needs of our users, we applied lean user experience (UX) design methods: we started engaging with users very early and carried out research, design and evaluation activities within an iterative development process. We also emphasize the collaborative nature of applying lean UX design, which we believe is a foundation for success in this and many other scientific projects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.drudis.2018.01.032DOI Listing
June 2018

Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species.

Nucleic Acids Res 2018 01;46(D1):D802-D808

Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA.

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx1011DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753204PMC
January 2018

Open Targets: a platform for therapeutic target identification and validation.

Nucleic Acids Res 2017 01 29;45(D1):D985-D994. Epub 2016 Nov 29.

Open Targets, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

We have designed and developed a data integration and visualization platform that provides evidence about the association of known and potential drug targets with diseases. The platform is designed to support identification and prioritization of biological targets for follow-up. Each drug target is linked to a disease using integrated genome-wide data from a broad range of data sources. The platform provides either a target-centric workflow to identify diseases that may be associated with a specific target, or a disease-centric workflow to identify targets that may be associated with a specific disease. Users can easily transition between these target- and disease-centric workflows. The Open Targets Validation Platform is accessible at https://www.targetvalidation.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw1055DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210543PMC
January 2017

Ensembl 2017.

Nucleic Acids Res 2017 01 28;45(D1):D635-D642. Epub 2016 Nov 28.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw1104DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210575PMC
January 2017

Ensembl 2016.

Nucleic Acids Res 2016 Jan 19;44(D1):D710-6. Epub 2015 Dec 19.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv1157DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702834PMC
January 2016

Ensembl Genomes 2016: more genomes, more complexity.

Nucleic Acids Res 2016 Jan 17;44(D1):D574-80. Epub 2015 Nov 17.

Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA.

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv1209DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702859PMC
January 2016

The pig X and Y Chromosomes: structure, sequence, and evolution.

Genome Res 2016 Jan 11;26(1):130-9. Epub 2015 Nov 11.

Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom;

We have generated an improved assembly and gene annotation of the pig X Chromosome, and a first draft assembly of the pig Y Chromosome, by sequencing BAC and fosmid clones from Duroc animals and incorporating information from optical mapping and fiber-FISH. The X Chromosome carries 1033 annotated genes, 690 of which are protein coding. Gene order closely matches that found in primates (including humans) and carnivores (including cats and dogs), which is inferred to be ancestral. Nevertheless, several protein-coding genes present on the human X Chromosome were absent from the pig, and 38 pig-specific X-chromosomal genes were annotated, 22 of which were olfactory receptors. The pig Y-specific Chromosome sequence generated here comprises 30 megabases (Mb). A 15-Mb subset of this sequence was assembled, revealing two clusters of male-specific low copy number genes, separated by an ampliconic region including the HSFY gene family, which together make up most of the short arm. Both clusters contain palindromes with high sequence identity, presumably maintained by gene conversion. Many of the ancestral X-related genes previously reported in at least one mammalian Y Chromosome are represented either as active genes or partial sequences. This sequencing project has allowed us to identify genes--both single copy and amplified--on the pig Y Chromosome, to compare the pig X and Y Chromosomes for homologous sequences, and thereby to reveal mechanisms underlying pig X and Y Chromosome evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.188839.114DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4691746PMC
January 2016

Ensembl 2015.

Nucleic Acids Res 2015 Jan 28;43(Database issue):D662-9. Epub 2014 Oct 28.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku1010DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383879PMC
January 2015

A linguistically informed autosomal STR survey of human populations residing in the greater Himalayan region.

PLoS One 2014 10;9(3):e91534. Epub 2014 Mar 10.

MGC Department of Human and Clinical Genetics, Leiden University Medical Centre, Leiden, the Netherlands.

The greater Himalayan region demarcates two of the most prominent linguistic phyla in Asia: Tibeto-Burman and Indo-European. Previous genetic surveys, mainly using Y-chromosome polymorphisms and/or mitochondrial DNA polymorphisms suggested a substantially reduced geneflow between populations belonging to these two phyla. These studies, however, have mainly focussed on populations residing far to the north and/or south of this mountain range, and have not been able to study geneflow patterns within the greater Himalayan region itself. We now report a detailed, linguistically informed, genetic survey of Tibeto-Burman and Indo-European speakers from the Himalayan countries Nepal and Bhutan based on autosomal microsatellite markers and compare these populations with surrounding regions. The genetic differentiation between populations within the Himalayas seems to be much higher than between populations in the neighbouring countries. We also observe a remarkable genetic differentiation between the Tibeto-Burman speaking populations on the one hand and Indo-European speaking populations on the other, suggesting that language and geography have played an equally large role in defining the genetic composition of present-day populations within the Himalayas.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0091534PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3948894PMC
March 2016

Gene conversion violates the stepwise mutation model for microsatellites in y-chromosomal palindromic repeats.

Hum Mutat 2014 May;35(5):609-17

UMR5288 CNRS/UPS-AMIS-Université Paul Sabatier, Toulouse, France; Department of Genetics, University of Leicester, Leicester, UK.

The male-specific region of the human Y chromosome (MSY) contains eight large inverted repeats (palindromes), in which high-sequence similarity between repeat arms is maintained by gene conversion. These palindromes also harbor microsatellites, considered to evolve via a stepwise mutation model (SMM). Here, we ask whether gene conversion between palindrome microsatellites contributes to their mutational dynamics. First, we study the duplicated tetranucleotide microsatellite DYS385a,b lying in palindrome P4. We show, by comparing observed data with simulated data under a SMM within haplogroups, that observed heteroallelic combinations in which the modal repeat number difference between copies was large, can give rise to homoallelic combinations with zero-repeats difference, equivalent to many single-step mutations. These are unlikely to be generated under a strict SMM, suggesting the action of gene conversion. Second, we show that the intercopy repeat number difference for a large set of duplicated microsatellites in all palindromes in the MSY reference sequence is significantly reduced compared with that for nonpalindrome-duplicated microsatellites, suggesting that the former are characterized by unusual evolutionary dynamics. These observations indicate that gene conversion violates the SMM for microsatellites in palindromes, homogenizing copies within individual Y chromosomes, but increasing overall haplotype diversity among chromosomes within related groups.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.22542DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4233959PMC
May 2014

Ensembl 2014.

Nucleic Acids Res 2014 Jan 6;42(Database issue):D749-55. Epub 2013 Dec 6.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkt1196DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964975PMC
January 2014

Structural and functional annotation of the porcine immunome.

BMC Genomics 2013 May 15;14:332. Epub 2013 May 15.

USDA-ARS, Beltsville Human Nutrition Research Center, Diet, Genomics, Immunology Laboratory, Beltsville, MD 20705, USA.

Background: The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems.

Results: The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome.

Conclusions: This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig's adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-14-332DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3658956PMC
May 2013

Ensembl 2013.

Nucleic Acids Res 2013 Jan 30;41(Database issue):D48-55. Epub 2012 Nov 30.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.

The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks1236DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531136PMC
January 2013

Analyses of pig genomes provide insight into porcine demography and evolution.

Nature 2012 Nov;491(7424):393-8

Animal Breeding and Genomics Centre, Wageningen University, De Elst 1, 6708 WD, Wageningen, The Netherlands.

For 10,000 years pigs and humans have shared a close and complex relationship. From domestication to modern breeding practices, humans have shaped the genomes of domestic pigs. Here we present the assembly and analysis of the genome sequence of a female domestic Duroc pig (Sus scrofa) and a comparison with the genomes of wild and domestic pigs from Europe and Asia. Wild pigs emerged in South East Asia and subsequently spread across Eurasia. Our results reveal a deep phylogenetic split between European and Asian wild boars ∼1 million years ago, and a selective sweep analysis indicates selection on genes involved in RNA processing and regulation. Genes associated with immune response and olfaction exhibit fast evolution. Pigs have the largest repertoire of functional olfactory receptor genes, reflecting the importance of smell in this scavenging animal. The pig genome sequence provides an important resource for further improvements of this important livestock species, and our identification of many putative disease-causing variants extends the potential of the pig as a biomedical model.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature11622DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3566564PMC
November 2012

A systematic survey of loss-of-function variants in human protein-coding genes.

Science 2012 Feb;335(6070):823-8

Wellcome Trust Sanger Institute, Hinxton, UK.

Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1215040DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3299548PMC
February 2012

Ensembl 2012.

Nucleic Acids Res 2012 Jan 15;40(Database issue):D84-90. Epub 2011 Nov 15.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.

The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkr991DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245178PMC
January 2012

Genomic complexity of the Y-STR DYS19: inversions, deletions and founder lineages carrying duplications.

Int J Legal Med 2009 Jan 14;123(1):15-23. Epub 2008 Jun 14.

Department of Genetics, University of Leicester, University Road, Leicester, LE1 7RH, UK.

The Y-STR DYS19 is firmly established in the repertoire of Y-chromosomal markers used in forensic analysis yet is poorly understood at the molecular level, lying in a complex genomic environment and exhibiting null alleles, as well as duplications and occasional triplications in population samples. Here, we analyse three null alleles and 51 duplications and show that DYS19 can also be involved in inversion events, so that even its location within the short arm of the Y chromosome is uncertain. Deletion mapping in the three chromosomes carrying null alleles shows that their deletions are less than approximately 300 kb in size. Haplotypic analysis with binary markers shows that they belong to three different haplogroups and so represent independent events. In contrast, a collection of 51 DYS19 duplication chromosomes belong to only four haplogroups: two are singletons and may represent somatic mutation in lymphoblastoid cell lines, but two, in haplogroups G and C3c, represent founder lineages that have spread widely in Central Europe/West Asia and East Asia, respectively. Consideration of candidate mechanisms underlying both deletions and duplications provides no evidence for the involvement of non-allelic homologous recombination, and they are likely to represent sporadic events with low mutation rates. Understanding the basis and population distribution of these DYS19 alleles will aid in the utilisation and interpretation of profiles that contain them.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00414-008-0253-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2680205PMC
January 2009

Dynamic nature of the proximal AZFc region of the human Y chromosome: multiple independent deletion and duplication events revealed by microsatellite analysis.

Hum Mutat 2008 Oct;29(10):1171-80

Department of Genetics, University of Leicester, Leicester, United Kingdom.

The human Y chromosome shows frequent structural variants, some of which are selectively neutral, while others cause impaired fertility due to the loss of spermatogenic genes. The large-scale use of multiple Y-chromosomal microsatellites in forensic and population genetic studies can reveal such variants, through the absence or duplication of specific markers in haplotypes. We describe Y chromosomes in apparently normal males carrying null and duplicated alleles at the microsatellite DYS448, which lies in the proximal part of the azoospermia factor c (AZFc) region, important in spermatogenesis, and made up of "ampliconic" repeats that act as substrates for nonallelic homologous recombination (NAHR). Physical mapping in 26 DYS448 deletion chromosomes reveals that only three cases belong to a previously described class, representing independent occurrences of an approximately 1.5-Mb deletion mediated by recombination between the b1 and b3 repeat units. The remainder belong to five novel classes; none appears to be mediated through homologous recombination, and all remove some genes, but are likely to be compatible with normal fertility. A combination of deletion analysis with binary-marker and microsatellite haplotyping shows that the 26 deletions represent nine independent events. Nine DYS448 duplication chromosomes can be explained by four independent events. Some lineages have risen to high frequency in particular populations, in particular a deletion within haplogroup (hg) C(*)(xC3a,C3c) found in 18 Asian males. The nonrandom phylogenetic distribution of duplication and deletion events suggests possible structural predisposition to such mutations in hgs C and G.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.20757DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2689608PMC
October 2008

The Grandest Genetic Experiment Ever Performed on Man? - A Y-Chromosomal Perspective on Genetic Variation in India.

Int J Hum Genet 2008 May;8(1-2):21-29

The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs. CB10 1SA, UK.

We have analysed Y-chromosomal data from Indian caste, Indian tribal and East Asian populations in order to investigate the impact of the caste system on male genetic variation. We find that variation within populations is lower in India than in East Asia, while variation between populations is overall higher. This observation can be explained by greater subdivision within the Indian population, leading to more genetic drift. However, the effect is most marked in the tribal populations, and the level of variation between caste populations is similar to the level between Chinese populations. The caste system has therefore had a detectable impact on Y-chromosomal variation, but this has been less strong than the influence of the tribal system, perhaps because of larger population sizes in the castes, more gene flow or a shorter period of time.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/09723757.2008.11886016DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987567PMC
May 2008

Maternal footprints of Southeast Asians in North India.

Hum Hered 2008 28;66(1):1-9. Epub 2008 Jan 28.

Centre for Cellular and Molecular Biology, Hyderabad, India.

We have analyzed 7,137 samples from 125 different caste, tribal and religious groups of India and 99 samples from three populations of Nepal for the length variation in the COII/tRNA(Lys) region of mtDNA. Samples showing length variation were subjected to detailed phylogenetic analysis based on HVS-I and informative coding region sequence variation. The overall frequencies of the 9-bp deletion and insertion variants in South Asia were 1.9 and 0.6%, respectively. We have also defined a novel deep-rooting haplogroup M43 and identified the rare haplogroup H14 in Indian populations carrying the 9-bp deletion by complete mtDNA sequencing. Moreover, we redefined haplogroup M6 and dissected it into two well-defined subclades. The presence of haplogroups F1 and B5a in Uttar Pradesh suggests minor maternal contribution from Southeast Asia to Northern India. The occurrence of haplogroup F1 in the Nepalese sample implies that Nepal might have served as a bridge for the flow of eastern lineages to India. The presence of R6 in the Nepalese, on the other hand, suggests that the gene flow between India and Nepal has been reciprocal.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1159/000114160DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2588665PMC
April 2008

Structural variation on the short arm of the human Y chromosome: recurrent multigene deletions encompassing Amelogenin Y.

Hum Mol Genet 2007 Feb 22;16(3):307-16. Epub 2006 Dec 22.

Department of Genetics, University of Leicester, University Road, Leicester LE1 7RH, UK.

Structural polymorphism is increasingly recognized as a major form of human genome variation, and is particularly prevalent on the Y chromosome. Assay of the Amelogenin Y gene (AMELY) on Yp is widely used in DNA-based sex testing, and sometimes reveals males who have interstitial deletions. In a collection of 45 deletion males from 12 populations, we used a combination of sequence-tagged site mapping, and binary-marker and Y-short tandem repeat haplotyping to understand the structural basis of this variation. Of the 45 deletion males, 41 carry indistinguishable deletions, 3.0-3.8 Mb in size. Breakpoint mapping strongly implicates a mechanism of non-allelic homologous recombination between the proximal major array of TSPY gene-containing repeats, and a single distal copy of TSPY; this is supported by the estimation of TSPY copy number in deleted and non-deleted males. The remaining four males carry three distinct non-recurrent deletions (2.5-4.0 Mb), which may be due to non-homologous mechanisms. Haplotyping shows that TSPY-mediated deletions have arisen seven times independently in the sample. One instance, represented by 30 chromosomes mostly of Indian origin within haplogroup J2e1*/M241, has a time-to-most-recent-common-ancestor of approximately 7700+/-1300 years. In addition to AMELY, deletion males all lack the genes PRKY and TBL1Y, and the rarer deletion classes also lack PCDH11Y. The persistence and expansion of deletion lineages, together with direct phenotypic evidence, suggests that absence of these genes has no major deleterious effects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddl465DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2590852PMC
February 2007

Nepalese populations show no association between the distribution of malaria and protective alleles.

J Mol Genet Med 2006 Nov;2(1):101-106

The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

Malaria is perhaps the most important parasitic infection and strongest known force for selection in the recent evolutionary history of the human genome. Genetically-determined resistance to malaria has been well-documented in some populations, mainly from Africa. The disease is also endemic in South Asia, the world's second most populous region, where resistance to malaria has also been observed, for example in Nepal. The biological basis of this resistance, however, remains unclear. We have therefore investigated whether known African resistance alleles also confer resistance in Asia. We typed seven single nucleotide polymorphisms (SNPs) from the genes HBB, FY, G6PD, TNFSF5, TNF, NOS2 and FCGR2A in 928 healthy individuals from Nepal. Five loci were found to be fixed for the non-resistant allele (HBB, FY, G6PD, TNFSF5 and NOS2). The remaining two (rs1800629 and rs1801274) showed the presence of the resistant allele at a frequency of 93% and 27% in TNF and FCGR2A, respectively. However, the frequencies of these alleles did not differ significantly between highland (susceptible) and lowland (resistant) populations. The observed differences in allele and genotype frequencies in Nepalese populations therefore seem to reflect demographic processes or other selective forces in the Himalayan region, rather than malaria selection pressure actin on these alleles.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.4172/1747-0862.1000020DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2684443PMC
November 2006

A shared Y-chromosomal heritage between Muslims and Hindus in India.

Hum Genet 2006 Nov 2;120(4):543-51. Epub 2006 Sep 2.

Department of Medicine, University of Texas Health Science Center, San Antonio, TX, USA.

Arab forces conquered the Indus Delta region in 711 AD: and, although a Muslim state was established there, their influence was barely felt in the rest of South Asia at that time. By the end of the tenth century, Central Asian Muslims moved into India from the northwest and expanded throughout the subcontinent. Muslim communities are now the largest minority religion in India, comprising more than 138 million people in a predominantly Hindu population of over one billion. It is unclear whether the Muslim expansion in India was a purely cultural phenomenon or had a genetic impact on the local population. To address this question from a male perspective, we typed eight microsatellite loci and 16 binary markers from the Y chromosome in 246 Muslims from Andhra Pradesh, and compared them to published data on 4,204 males from East Asia, Central Asia, other parts of India, Sri Lanka, Pakistan, Iran, the Middle East, Turkey, Egypt and Morocco. We find that the Muslim populations in general are genetically closer to their non-Muslim geographical neighbors than to other Muslims in India, and that there is a highly significant correlation between genetics and geography (but not religion). Our findings indicate that, despite the documented practice of marriage between Muslim men and Hindu women, Islamization in India did not involve large-scale replacement of Hindu Y chromosomes. The Muslim expansion in India was predominantly a cultural change and was not accompanied by significant gene flow, as seen in other places, such as China and Central Asia.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00439-006-0234-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2590854PMC
November 2006
-->