Publications by authors named "Helen Parkinson"

116 Publications

The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation.

Nat Genet 2021 04;53(4):420-425

Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-021-00783-5DOI Listing
April 2021

Improving reporting standards for polygenic scores in risk prediction studies.

Nature 2021 Mar 10;591(7849):211-219. Epub 2021 Mar 10.

Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.

Polygenic risk scores (PRSs), which often aggregate results from genome-wide association studies, can bridge the gap between initial discovery efforts and clinical applications for the estimation of disease risk using genetics. However, there is notable heterogeneity in the application and reporting of these risk scores, which hinders the translation of PRSs into clinical care. Here, in a collaboration between the Clinical Genome Resource (ClinGen) Complex Disease Working Group and the Polygenic Score (PGS) Catalog, we present the Polygenic Risk Score Reporting Standards (PRS-RS), in which we update the Genetic Risk Prediction Studies (GRIPS) Statement to reflect the present state of the field. Drawing on the input of experts in epidemiology, statistics, disease-specific applications, implementation and policy, this comprehensive reporting framework defines the minimal information that is needed to interpret and evaluate PRSs, especially with respect to downstream clinical applications. Items span detailed descriptions of study populations, statistical methods for the development and validation of PRSs and considerations for the potential limitations of these scores. In addition, we emphasize the need for data availability and transparency, and we encourage researchers to deposit and share PRSs through the PGS Catalog to facilitate reproducibility and comparative benchmarking. By providing these criteria in a structured format that builds on existing standards and ontologies, the use of this framework in publishing PRSs will facilitate translation into clinical care and progress towards defining best practice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03243-6DOI Listing
March 2021

OpenStats: A robust and scalable software package for reproducible analysis of high-throughput phenotypic data.

PLoS One 2020 30;15(12):e0242933. Epub 2020 Dec 30.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.

Reproducibility in the statistical analyses of data from high-throughput phenotyping screens requires a robust and reliable analysis foundation that allows modelling of different possible statistical scenarios. Regular challenges are scalability and extensibility of the analysis software. In this manuscript, we describe OpenStats, a freely available software package that addresses these challenges. We show the performance of the software in a high-throughput phenomic pipeline in the International Mouse Phenotyping Consortium (IMPC) and compare the agreement of the results with the most similar implementation in the literature. OpenStats has significant improvements in speed and scalability compared to existing software packages including a 13-fold improvement in computational time to the current production analysis pipeline in the IMPC. Reduced complexity also promotes FAIR data analysis by providing transparency and benefiting other groups in reproducing and re-usability of the statistical methods and results. OpenStats is freely available under a Creative Commons license at www.bioconductor.org/packages/OpenStats.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0242933PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7773254PMC
January 2021

Mouse mutant phenotyping at scale reveals novel genes controlling bone mineral density.

PLoS Genet 2020 12 28;16(12):e1009190. Epub 2020 Dec 28.

German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, Neuherberg, Germany.

The genetic landscape of diseases associated with changes in bone mineral density (BMD), such as osteoporosis, is only partially understood. Here, we explored data from 3,823 mutant mouse strains for BMD, a measure that is frequently altered in a range of bone pathologies, including osteoporosis. A total of 200 genes were found to significantly affect BMD. This pool of BMD genes comprised 141 genes with previously unknown functions in bone biology and was complementary to pools derived from recent human studies. Nineteen of the 141 genes also caused skeletal abnormalities. Examination of the BMD genes in osteoclasts and osteoblasts underscored BMD pathways, including vesicle transport, in these cells and together with in silico bone turnover studies resulted in the prioritization of candidate genes for further investigation. Overall, the results add novel pathophysiological and molecular insight into bone health and disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1009190DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7822523PMC
December 2020

Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics.

Nucleic Acids Res 2021 01;49(D1):D1311-D1320

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

Open Targets Genetics (https://genetics.opentargets.org) is an open-access integrative resource that aggregates human GWAS and functional genomics data including gene expression, protein abundance, chromatin interaction and conformation data from a wide range of cell types and tissues to make robust connections between GWAS-associated loci, variants and likely causal genes. This enables systematic identification and prioritisation of likely causal variants and genes across all published trait-associated loci. In this paper, we describe the public resources we aggregate, the technology and analyses we use, and the functionality that the portal offers. Open Targets Genetics can be searched by variant, gene or study/phenotype. It offers tools that enable users to prioritise causal variants and genes at disease-associated loci and access systematic cross-disease and disease-molecular trait colocalization analysis across 92 cell types and tissues including the eQTL Catalogue. Data visualizations such as Manhattan-like plots, regional plots, credible sets overlap between studies and PheWAS plots enable users to explore GWAS signals in depth. The integrated data is made available through the web portal, for bulk download and via a GraphQL API, and the software is open source. Applications of this integrated data include identification of novel targets for drug discovery and drug repurposing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa840DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778936PMC
January 2021

LifeTime and improving European healthcare through cell-based interceptive medicine.

Nature 2020 11 7;587(7834):377-386. Epub 2020 Sep 7.

Department of Human Genetics, KU Leuven, Leuven, Belgium.

Here we describe the LifeTime Initiative, which aims to track, understand and target human cells during the onset and progression of complex diseases, and to analyse their response to therapy at single-cell resolution. This mission will be implemented through the development, integration and application of single-cell multi-omics and imaging, artificial intelligence and patient-derived experimental disease models during the progression from health to disease. The analysis of large molecular and clinical datasets will identify molecular mechanisms, create predictive computational models of disease progression, and reveal new drug targets and therapies. The timely detection and interception of disease embedded in an ethical and patient-centred vision will be achieved through interactions across academia, hospitals, patient associations, health data management systems and industry. The application of this strategy to key medical challenges in cancer, neurological and neuropsychiatric disorders, and infectious, chronic inflammatory and cardiovascular diseases at the single-cell level will usher in cell-based interceptive medicine in Europe over the next decade.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2715-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7656507PMC
November 2020

Gene Ontology Curation of Neuroinflammation Biology Improves the Interpretation of Alzheimer's Disease Gene Expression Data.

J Alzheimers Dis 2020 ;75(4):1417-1435

Functional Gene Annotation, Preclinical and Fundamental Science, UCL Institute of Cardiovascular Science, University College London, London, UK.

Background: Gene Ontology (GO) is a major bioinformatic resource used for analysis of large biomedical datasets, for example from genome-wide association studies, applied universally across biological fields, including Alzheimer's disease (AD) research.

Objective: We aim to demonstrate the applicability of GO for interpretation of AD datasets to improve the understanding of the underlying molecular disease mechanisms, including the involvement of inflammatory pathways and dysregulated microRNAs (miRs).

Methods: We have undertaken a systematic full article GO annotation approach focused on microglial proteins implicated in AD and the miRs regulating their expression. PANTHER was used for enrichment analysis of previously published AD data. Cytoscape was used for visualizing and analyzing miR-target interactions captured from published experimental evidence.

Results: We contributed 3,084 new annotations for 494 entities, i.e., on average six new annotations per entity. This included a total of 1,352 annotations for 40 prioritized microglial proteins implicated in AD and 66 miRs regulating their expression, yielding an average of twelve annotations per prioritized entity. The updated GO resource was then used to re-analyze previously published data. The re-analysis showed novel processes associated with AD-related genes, not identified in the original study, such as 'gliogenesis', 'regulation of neuron projection development', or 'response to cytokine', demonstrating enhanced applicability of GO for neuroscience research.

Conclusions: This study highlights ongoing development of the neurobiological aspects of GO and demonstrates the value of biocuration activities in the area, thus helping to delineate the molecular bases of AD to aid the development of diagnostic tools and treatments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3233/JAD-200207DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7369085PMC
January 2020

Human and mouse essentiality screens as a resource for disease gene discovery.

Nat Commun 2020 01 31;11(1):655. Epub 2020 Jan 31.

Clinical Pharmacology, William Harvey Research Institute, School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK.

The identification of causal variants in sequencing studies remains a considerable challenge that can be partially addressed by new gene-specific knowledge. Here, we integrate measures of how essential a gene is to supporting life, as inferred from viability and phenotyping screens performed on knockout mice by the International Mouse Phenotyping Consortium and essentiality screens carried out on human cell lines. We propose a cross-species gene classification across the Full Spectrum of Intolerance to Loss-of-function (FUSIL) and demonstrate that genes in five mutually exclusive FUSIL categories have differing biological properties. Most notably, Mendelian disease genes, particularly those associated with developmental disorders, are highly overrepresented among genes non-essential for cell survival but required for organism development. After screening developmental disorder cases from three independent disease sequencing consortia, we identify potentially pathogenic variants in genes not previously associated with rare diseases. We therefore propose FUSIL as an efficient approach for disease gene discovery.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-14284-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994715PMC
January 2020

High-throughput discovery of genetic determinants of circadian misalignment.

PLoS Genet 2020 01 13;16(1):e1008577. Epub 2020 Jan 13.

Cambridge-Suda Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical college of Soochow University, Suzhou, Jiangsu, China.

Circadian systems provide a fitness advantage to organisms by allowing them to adapt to daily changes of environmental cues, such as light/dark cycles. The molecular mechanism underlying the circadian clock has been well characterized. However, how internal circadian clocks are entrained with regular daily light/dark cycles remains unclear. By collecting and analyzing indirect calorimetry (IC) data from more than 2000 wild-type mice available from the International Mouse Phenotyping Consortium (IMPC), we show that the onset time and peak phase of activity and food intake rhythms are reliable parameters for screening defects of circadian misalignment. We developed a machine learning algorithm to quantify these two parameters in our misalignment screen (SyncScreener) with existing datasets and used it to screen 750 mutant mouse lines from five IMPC phenotyping centres. Mutants of five genes (Slc7a11, Rhbdl1, Spop, Ctc1 and Oxtr) were found to be associated with altered patterns of activity or food intake. By further studying the Slc7a11tm1a/tm1a mice, we confirmed its advanced activity phase phenotype in response to a simulated jetlag and skeleton photoperiod stimuli. Disruption of Slc7a11 affected the intercellular communication in the suprachiasmatic nucleus, suggesting a defect in synchronization of clock neurons. Our study has established a systematic phenotype analysis approach that can be used to uncover the mechanism of circadian entrainment in mice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008577DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6980734PMC
January 2020

Soft windowing application to improve analysis of high-throughput phenotyping data.

Bioinformatics 2020 03;36(5):1492-1500

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

Motivation: High-throughput phenomic projects generate complex data from small treatment and large control groups that increase the power of the analyses but introduce variation over time. A method is needed to utlize a set of temporally local controls that maximizes analytic power while minimizing noise from unspecified environmental factors.

Results: Here we introduce 'soft windowing', a methodological approach that selects a window of time that includes the most appropriate controls for analysis. Using phenotype data from the International Mouse Phenotyping Consortium (IMPC), adaptive windows were applied such that control data collected proximally to mutants were assigned the maximal weight, while data collected earlier or later had less weight. We applied this method to IMPC data and compared the results with those obtained from a standard non-windowed approach. Validation was performed using a resampling approach in which we demonstrate a 10% reduction of false positives from 2.5 million analyses. We applied the method to our production analysis pipeline that establishes genotype-phenotype associations by comparing mutant versus control data. We report an increase of 30% in significant P-values, as well as linkage to 106 versus 99 disease models via phenotype overlap with the soft-windowed and non-windowed approaches, respectively, from a set of 2082 mutant mouse lines. Our method is generalizable and can benefit large-scale human phenomic projects such as the UK Biobank and the All of Us resources.

Availability And Implementation: The method is freely available in the R package SmoothWin, available on CRAN http://CRAN.R-project.org/package=SmoothWin.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz744DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7115897PMC
March 2020

Leveraging European infrastructures to access 1 million human genomes by 2022.

Nat Rev Genet 2019 11 27;20(11):693-701. Epub 2019 Aug 27.

ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK.

Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41576-019-0156-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7115898PMC
November 2019

Erratum: Author Correction: Identification of genes required for eye development by high-throughput screening of mouse knockouts.

Commun Biol 2019 7;2:97. Epub 2019 Mar 7.

Department of Ophthalmology & Vision Science, School of Medicine, U.C. Davis, Sacramento, CA, 95817, USA.

[This corrects the article DOI: 10.1038/s42003-018-0226-0.].
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-019-0349-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6405960PMC
March 2019

Identification of genes required for eye development by high-throughput screening of mouse knockouts.

Commun Biol 2018 21;1:236. Epub 2018 Dec 21.

Department of Ophthalmology & Vision Science, School of Medicine, U.C. Davis, Sacramento, CA, 95817, USA.

Despite advances in next generation sequencing technologies, determining the genetic basis of ocular disease remains a major challenge due to the limited access and prohibitive cost of human forward genetics. Thus, less than 4,000 genes currently have available phenotype information for any organ system. Here we report the ophthalmic findings from the International Mouse Phenotyping Consortium, a large-scale functional genetic screen with the goal of generating and phenotyping a null mutant for every mouse gene. Of 4364 genes evaluated, 347 were identified to influence ocular phenotypes, 75% of which are entirely novel in ocular pathology. This discovery greatly increases the current number of genes known to contribute to ophthalmic disease, and it is likely that many of the genes will subsequently prove to be important in human ocular development and disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-018-0226-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6303268PMC
December 2018

PDX Finder: A portal for patient-derived tumor xenograft model discovery.

Nucleic Acids Res 2019 01;47(D1):D1073-D1079

The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA.

Patient-derived tumor xenograft (PDX) mouse models are a versatile oncology research platform for studying tumor biology and for testing chemotherapeutic approaches tailored to genomic characteristics of individual patients' tumors. PDX models are generated and distributed by a diverse group of academic labs, multi-institution consortia and contract research organizations. The distributed nature of PDX repositories and the use of different metadata standards for describing model characteristics presents a significant challenge to identifying PDX models relevant to specific cancer research questions. The Jackson Laboratory and EMBL-EBI are addressing these challenges by co-developing PDX Finder, a comprehensive open global catalog of PDX models and their associated datasets. Within PDX Finder, model attributes are harmonized and integrated using a previously developed community minimal information standard to support consistent searching across the originating resources. Links to repositories are provided from the PDX Finder search results to facilitate model acquisition and/or collaboration. The PDX Finder resource currently contains information for 1985 PDX models of diverse cancers including those from large resources such as the Patient-Derived Models Repository, PDXNet and EurOPDX. Individuals or organizations that generate and distribute PDXs are invited to increase the 'findability' of their models by participating in the PDX Finder initiative at www.pdxfinder.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky984DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323912PMC
January 2019

Improving the Gene Ontology Resource to Facilitate More Informative Analysis and Interpretation of Alzheimer's Disease Data.

Genes (Basel) 2018 Nov 29;9(12). Epub 2018 Nov 29.

UCL Institute of Cardiovascular Science, University College London, Rayne Building, 5 University Street, London WC1E 6JF, UK.

The analysis and interpretation of high-throughput datasets relies on access to high-quality bioinformatics resources, as well as processing pipelines and analysis tools. Gene Ontology (GO, geneontology.org) is a major resource for gene enrichment analysis. The aim of this project, funded by the Alzheimer's Research United Kingdom (ARUK) foundation and led by the University College London (UCL) biocuration team, was to enhance the GO resource by developing new neurological GO terms, and use GO terms to annotate gene products associated with dementia. Specifically, proteins and protein complexes relevant to processes involving amyloid-beta and tau have been annotated and the resulting annotations are denoted in GO databases as 'ARUK-UCL'. Biological knowledge presented in the scientific literature was captured through the association of GO terms with dementia-relevant protein records; GO itself was revised, and new GO terms were added. This literature biocuration increased the number of Alzheimer's-relevant gene products that were being associated with neurological GO terms, such as 'amyloid-beta clearance' or 'learning or memory', as well as neuronal structures and their compartments. Of the total 2055 annotations that we contributed for the prioritised gene products, 526 have associated proteins and complexes with neurological GO terms. To ensure that these descriptive annotations could be provided for Alzheimer's-relevant gene products, over 70 new GO terms were created. Here, we describe how the improvements in ontology development and biocuration resulting from this initiative can benefit the scientific community and enhance the interpretation of dementia data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/genes9120593DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6315915PMC
November 2018

The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

Nucleic Acids Res 2019 01;47(D1):D1005-D1012

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by ∼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability: https://www.ebi.ac.uk/gwas/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky1120DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323933PMC
January 2019

BioSamples database: an updated sample metadata hub.

Nucleic Acids Res 2019 01;47(D1):D1172-D1178

EMBL-EBI, Wellcome Genome Campus, Hinxton CB10 1SD, UK.

The BioSamples database at EMBL-EBI provides a central hub for sample metadata storage and linkage to other EMBL-EBI resources. BioSamples has recently undergone major changes, both in terms of data content and supporting infrastructure. The data content has more than doubled from around 2 million samples in 2014 to just over 5 million samples in 2018. Fast, reciprocal data exchange was fully established between sister Biosample databases and other INSDC partners, enabling a worldwide common representation and centralization of sample metadata. The BioSamples platform has been upgraded to accommodate anticipated increases in the number of submissions via GA4GH driver projects such as the Human Cell Atlas and the EGA, as well as from mirroring of NCBI dbGaP data. The BioSamples database is now the authoritative repository for all INSDC sample metadata, an ELIXIR Deposition Database for Biomolecular Data and the EMBL-EBI sample metadata hub. To support faster turnaround for sample submission, and to increase scalability and resilience, we have upgraded the BioSamples database backend storage, APIs and user interface. Finally, the website has been redesigned to allow search and retrieval of records based on specific filters, such as 'disease' or 'organism'. These changes are targeted at answering current use cases as well as providing functionalities for future emerging and anticipated developments. Availability: The BioSamples database is freely available at http://www.ebi.ac.uk/biosamples. Content is distributed under the EMBL-EBI Terms of Use available at https://www.ebi.ac.uk/about/terms-of-use.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky1061DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323949PMC
January 2019

The International Mouse Phenotyping Consortium (IMPC): a functional catalogue of the mammalian genome that informs conservation.

Conserv Genet 2018 19;19(4):995-1005. Epub 2018 May 19.

1European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK.

The International Mouse Phenotyping Consortium (IMPC) is building a catalogue of mammalian gene function by producing and phenotyping a knockout mouse line for every protein-coding gene. To date, the IMPC has generated and characterised 5186 mutant lines. One-third of the lines have been found to be non-viable and over 300 new mouse models of human disease have been identified thus far. While current bioinformatics efforts are focused on translating results to better understand human disease processes, IMPC data also aids understanding genetic function and processes in other species. Here we show, using gorilla genomic data, how genes essential to development in mice can be used to help assess the potentially deleterious impact of gene variants in other species. This type of analyses could be used to select optimal breeders in endangered species to maintain or increase fitness and avoid variants associated to impaired-health phenotypes or loss-of-function mutations in genes of critical importance. We also show, using selected examples from various mammal species, how IMPC data can aid in the identification of candidate genes for studying a condition of interest, deliver information about the mechanisms involved, or support predictions for the function of genes that may play a role in adaptation. With genotyping costs decreasing and the continued improvements of bioinformatics tools, the analyses we demonstrate can be routinely applied.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10592-018-1072-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6061128PMC
May 2018

A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog.

Genome Biol 2018 02 15;19(1):21. Epub 2018 Feb 15.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

The accurate description of ancestry is essential to interpret, access, and integrate human genomics data, and to ensure that these benefit individuals from all ancestral backgrounds. However, there are no established guidelines for the representation of ancestry information. Here we describe a framework for the accurate and standardized description of sample ancestry, and validate it by application to the NHGRI-EBI GWAS Catalog. We confirm known biases and gaps in diversity, and find that African and Hispanic or Latin American ancestry populations contribute a disproportionately high number of associations. It is our hope that widespread adoption of this framework will lead to improved analysis, interpretation, and integration of human genomics data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-018-1396-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5815218PMC
February 2018

Using OWL reasoning to support the generation of novel gene sets for enrichment analysis.

J Biomed Semantics 2018 02 14;9(1):10. Epub 2018 Feb 14.

Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, -4070, Basel, CH, Switzerland.

Background: The Gene Ontology (GO) consists of over 40,000 terms for biological processes, cell components and gene product activities linked into a graph structure by over 90,000 relationships. It has been used to annotate the functions and cellular locations of several million gene products. The graph structure is used by a variety of tools to group annotated genes into sets whose products share function or location. These gene sets are widely used to interpret the results of genomics experiments by assessing which sets are significantly over- or under-represented in results lists. F Hoffmann-La Roche Ltd. has developed a bespoke, manually maintained controlled vocabulary (RCV) for use in over-representation analysis. Many terms in this vocabulary group GO terms in novel ways that cannot easily be derived using the graph structure of the GO. For example, some RCV terms group GO terms by the cell, chemical or tissue type they refer to. Recent improvements in the content and formal structure of the GO make it possible to use logical queries in Web Ontology Language (OWL) to automatically map these cross-cutting classifications to sets of GO terms. We used this approach to automate mapping between RCV and GO, largely replacing the increasingly unsustainable manual mapping process. We then tested the utility of the resulting groupings for over-representation analysis.

Results: We successfully mapped 85% of RCV terms to logical OWL definitions and showed that these could be used to recapitulate and extend manual mappings between RCV terms and the sets of GO terms subsumed by them. We also show that gene sets derived from the resulting GO terms sets can be used to detect the signatures of cell and tissue types in whole genome expression data.

Conclusions: The rich formal structure of the GO makes it possible to use reasoning to dynamically generate novel, biologically relevant groupings of GO terms. GO term groupings generated with this approach can be used in. over-representation analysis to detect cell and tissue type signatures in whole genome expression data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13326-018-0175-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813370PMC
February 2018

Harmonising phenomics information for a better interoperability in the rare disease field.

Eur J Med Genet 2018 Nov 7;61(11):706-714. Epub 2018 Feb 7.

INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France. Electronic address:

HIPBI-RD (Harmonising phenomics information for a better interoperability in the rare disease field) is a three-year project which started in 2016 funded via the E-Rare 3 ERA-NET program. This project builds on three resources largely adopted by the rare disease (RD) community: Orphanet, its ontology ORDO (the Orphanet Rare Disease Ontology), HPO (the Human Phenotype Ontology) as well as PhenoTips software for the capture and sharing of structured phenotypic data for RD patients. Our project is further supported by resources developed by the European Bioinformatics Institute and the Garvan Institute. HIPBI-RD aims to provide the community with an integrated, RD-specific bioinformatics ecosystem that will harmonise the way phenomics information is stored in databases and patient files worldwide, and thereby contribute to interoperability. This ecosystem will consist of a suite of tools and ontologies, optimized to work together, and made available through commonly used software repositories. The project workplan follows three main objectives: The HIPBI-RD ecosystem will contribute to the interpretation of variants identified through exome and full genome sequencing by harmonising the way phenotypic information is collected, thus improving diagnostics and delineation of RD. The ultimate goal of HIPBI-RD is to provide a resource that will contribute to bridging genome-scale biology and a disease-centered view on human pathobiology. Achievements in Year 1.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ejmg.2018.01.013DOI Listing
November 2018

Identification of genetic elements in metabolism by high-throughput mouse phenotyping.

Nat Commun 2018 01 18;9(1):288. Epub 2018 Jan 18.

German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstr. 1, 85764, Neuherberg, Germany.

Metabolic diseases are a worldwide problem but the underlying genetic factors and their relevance to metabolic disease remain incompletely understood. Genome-wide research is needed to characterize so-far unannotated mammalian metabolic genes. Here, we generate and analyze metabolic phenotypic data of 2016 knockout mouse strains under the aegis of the International Mouse Phenotyping Consortium (IMPC) and find 974 gene knockouts with strong metabolic phenotypes. 429 of those had no previous link to metabolism and 51 genes remain functionally completely unannotated. We compared human orthologues of these uncharacterized genes in five GWAS consortia and indeed 23 candidate genes are associated with metabolic disease. We further identify common regulatory elements in promoters of candidate genes. As each regulatory element is composed of several transcription factor binding sites, our data reveal an extensive metabolic phenotype-associated network of co-regulated genes. Our systematic mouse phenotype analysis thus paves the way for full functional annotation of the genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-017-01995-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5773596PMC
January 2018

Comparison, alignment, and synchronization of cell line information between CLO and EFO.

BMC Bioinformatics 2017 12 21;18(Suppl 17):557. Epub 2017 Dec 21.

Center of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.

Background: The Experimental Factor Ontology (EFO) is an application ontology driven by experimental variables including cell lines to organize and describe the diverse experimental variables and data resided in the EMBL-EBI resources. The Cell Line Ontology (CLO) is an OBO community-based ontology that contains information of immortalized cell lines and relevant experimental components. EFO integrates and extends ontologies from the bio-ontology community to drive a number of practical applications. It is desirable that the community shares design patterns and therefore that EFO reuses the cell line representation from the Cell Line Ontology (CLO). There are, however, challenges to be addressed when developing a common ontology design pattern for representing cell lines in both EFO and CLO.

Results: In this study, we developed a strategy to compare and map cell line terms between EFO and CLO. We examined Cellosaurus resources for EFO-CLO cross-references. Text labels of cell lines from both ontologies were verified by biological information axiomatized in each source. The study resulted in the identification 873 EFO-CLO aligned and 344 EFO unique immortalized permanent cell lines. All of these cell lines were updated to CLO and the cell line related information was merged. A design pattern that integrates EFO and CLO was also developed.

Conclusion: Our study compared, aligned, and synchronized the cell line information between CLO and EFO. The final updated CLO will be examined as the candidate ontology to import and replace eligible EFO cell line classes thereby supporting the interoperability in the bio-ontology domain. Our mapping pipeline illustrates the use of ontology in aiding biological data standardization and integration through the biological and semantics content of cell lines.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-017-1979-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5763470PMC
December 2017

A Standard Nomenclature for Referencing and Authentication of Pluripotent Stem Cells.

Stem Cell Reports 2018 01;10(1):1-6

Faculty of Health Sciences, University of Macau, Avenida da Universidade, Taipa, Macau, China.

Unambiguous cell line authentication is essential to avoid loss of association between data and cells. The risk for loss of references increases with the rapidity that new human pluripotent stem cell (hPSC) lines are generated, exchanged, and implemented. Ideally, a single name should be used as a generally applied reference for each cell line to access and unify cell-related information across publications, cell banks, cell registries, and databases and to ensure scientific reproducibility. We discuss the needs and requirements for such a unique identifier and implement a standard nomenclature for hPSCs, which can be automatically generated and registered by the human pluripotent stem cell registry (hPSCreg). To avoid ambiguities in PSC-line referencing, we strongly urge publishers to demand registration and use of the standard name when publishing research based on hPSC lines.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.stemcr.2017.12.002DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5768986PMC
January 2018

Corrigendum: High-throughput discovery of novel developmental phenotypes.

Nature 2017 11 8;551(7680):398. Epub 2017 Nov 8.

This corrects the article DOI: 10.1038/nature19356.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature24643DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5849394PMC
November 2017

PDX-MI: Minimal Information for Patient-Derived Tumor Xenograft Models.

Cancer Res 2017 11;77(21):e62-e66

The Jackson Laboratory, Bar Harbor, Maine.

Patient-derived tumor xenograft (PDX) mouse models have emerged as an important oncology research platform to study tumor evolution, mechanisms of drug response and resistance, and tailoring chemotherapeutic approaches for individual patients. The lack of robust standards for reporting on PDX models has hampered the ability of researchers to find relevant PDX models and associated data. Here we present the PDX models minimal information standard (PDX-MI) for reporting on the generation, quality assurance, and use of PDX models. PDX-MI defines the minimal information for describing the clinical attributes of a patient's tumor, the processes of implantation and passaging of tumors in a host mouse strain, quality assurance methods, and the use of PDX models in cancer research. Adherence to PDX-MI standards will facilitate accurate search results for oncology models and their associated data across distributed repository databases and promote reproducibility in research studies using these models. .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1158/0008-5472.CAN-17-0582DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5738926PMC
November 2017

A large scale hearing loss screen reveals an extensive unexplored genetic landscape for auditory dysfunction.

Nat Commun 2017 10 12;8(1):886. Epub 2017 Oct 12.

Medical Research Council Harwell Institute (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire, OX11 0RD, UK.

The developmental and physiological complexity of the auditory system is likely reflected in the underlying set of genes involved in auditory function. In humans, over 150 non-syndromic loci have been identified, and there are more than 400 human genetic syndromes with a hearing loss component. Over 100 non-syndromic hearing loss genes have been identified in mouse and human, but we remain ignorant of the full extent of the genetic landscape involved in auditory dysfunction. As part of the International Mouse Phenotyping Consortium, we undertook a hearing loss screen in a cohort of 3006 mouse knockout strains. In total, we identify 67 candidate hearing loss genes. We detect known hearing loss genes, but the vast majority, 52, of the candidate genes were novel. Our analysis reveals a large and unexplored genetic landscape involved with auditory function.The full extent of the genetic basis for hearing impairment is unknown. Here, as part of the International Mouse Phenotyping Consortium, the authors perform a hearing loss screen in 3006 mouse knockout strains and identify 52 new candidate genes for genetic hearing loss.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-017-00595-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5638796PMC
October 2017