Publications by authors named "Peter Woollard"

16 Publications

  • Page 1 of 1

Ontology mapping for semantically enabled applications.

Drug Discov Today 2019 10 31;24(10):2068-2075. Epub 2019 May 31.

GlaxoSmithKline, Stevenage, UK.

In this review, we provide a summary of recent progress in ontology mapping (OM) at a crucial time when biomedical research is under a deluge of an increasing amount and variety of data. This is particularly important for realising the full potential of semantically enabled or enriched applications and for meaningful insights, such as drug discovery, using machine-learning technologies. We discuss challenges and solutions for better ontology mappings, as well as how to select ontologies before their application. In addition, we describe tools and algorithms for ontology mapping, including evaluation of tool capability and quality of mappings. Finally, we outline the requirements for an ontology mapping service (OMS) and the progress being made towards implementation of such sustainable services.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.drudis.2019.05.020DOI Listing
October 2019

Explicit interaction information from WikiPathways in RDF facilitates drug discovery in the Open PHACTS Discovery Platform.

F1000Res 2018 17;7:75. Epub 2018 Jan 17.

Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands.

Open PHACTS is a pre-competitive project to answer scientific questions developed recently by the pharmaceutical industry. Having high quality biological interaction information in the Open PHACTS Discovery Platform is needed to answer multiple pathway related questions. To address this, updated WikiPathways data has been added to the platform. This data includes information about biological interactions, such as stimulation and inhibition. The platform's Application Programming Interface (API) was extended with appropriate calls to reference these interactions.  These new methods of the Open PHACTS API are available now.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.13197.2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6206606PMC
August 2019

Matching disease and phenotype ontologies in the ontology alignment evaluation initiative.

J Biomed Semantics 2017 Dec 2;8(1):55. Epub 2017 Dec 2.

Department of Informatics, University of Oslo, Oslo, Norway.

Background: The disease and phenotype track was designed to evaluate the relative performance of ontology matching systems that generate mappings between source ontologies. Disease and phenotype ontologies are important for applications such as data mining, data integration and knowledge management to support translational science in drug discovery and understanding the genetics of disease.

Results: Eleven systems (out of 21 OAEI participating systems) were able to cope with at least one of the tasks in the Disease and Phenotype track. AML, FCA-Map, LogMap(Bio) and PhenoMF systems produced the top results for ontology matching in comparison to consensus alignments. The results against manually curated mappings proved to be more difficult most likely because these mapping sets comprised mostly subsumption relationships rather than equivalence. Manual assessment of unique equivalence mappings showed that AML, LogMap(Bio) and PhenoMF systems have the highest precision results.

Conclusions: Four systems gave the highest performance for matching disease and phenotype ontologies. These systems coped well with the detection of equivalence matches, but struggled to detect semantic similarity. This deserves more attention in the future development of ontology matching systems. The findings of this evaluation show that such systems could help to automate equivalence matching in the workflow of curators, who maintain ontology mapping services in numerous domains such as disease and phenotype.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13326-017-0162-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5712086PMC
December 2017

An Integrated Data Driven Approach to Drug Repositioning Using Gene-Disease Associations.

PLoS One 2016 19;11(5):e0155811. Epub 2016 May 19.

Interdisciplinary Computing and Complex BioSystems (ICOS) Research Group, School of Computing Science, Newcastle University, Newcastle upon Tyne, United Kingdom.

Drug development is both increasing in cost whilst decreasing in productivity. There is a general acceptance that the current paradigm of R&D needs to change. One alternative approach is drug repositioning. With target-based approaches utilised heavily in the field of drug discovery, it becomes increasingly necessary to have a systematic method to rank gene-disease associations. Although methods already exist to collect, integrate and score these associations, they are often not a reliable reflection of expert knowledge. Furthermore, the amount of data available in all areas covered by bioinformatics is increasing dramatically year on year. It thus makes sense to move away from more generalised hypothesis driven approaches to research to one that allows data to generate their own hypothesis. We introduce an integrated, data driven approach to drug repositioning. We first apply a Bayesian statistics approach to rank 309,885 gene-disease associations using existing knowledge. Ranked associations are then integrated with other biological data to produce a semantically-rich drug discovery network. Using this network, we show how our approach identifies diseases of the central nervous system (CNS) to be an area of interest. CNS disorders are identified due to the low numbers of such disorders that currently have marketed treatments, in comparison to other therapeutic areas. We then systematically mine our network for semantic subgraphs that allow us to infer drug-disease relations that are not captured in the network. We identify and rank 275,934 drug-disease has_indication associations after filtering those that are more likely to be side effects, whilst commenting on the top ranked associations in more detail. The dataset has been created in Neo4j and is available for download at https://bitbucket.org/ncl-intbio/genediseaserepositioning along with a Java implementation of the searching algorithm.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0155811PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4873016PMC
July 2017

Mining integrated semantic networks for drug repositioning opportunities.

PeerJ 2016 19;4:e1558. Epub 2016 Jan 19.

Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing Science, University of Newcastle-upon-Tyne , Newcastle upon Tyne , United Kingdom.

Current research and development approaches to drug discovery have become less fruitful and more costly. One alternative paradigm is that of drug repositioning. Many marketed examples of repositioned drugs have been identified through serendipitous or rational observations, highlighting the need for more systematic methodologies to tackle the problem. Systems level approaches have the potential to enable the development of novel methods to understand the action of therapeutic compounds, but requires an integrative approach to biological data. Integrated networks can facilitate systems level analyses by combining multiple sources of evidence to provide a rich description of drugs, their targets and their interactions. Classically, such networks can be mined manually where a skilled person is able to identify portions of the graph (semantic subgraphs) that are indicative of relationships between drugs and highlight possible repositioning opportunities. However, this approach is not scalable. Automated approaches are required to systematically mine integrated networks for these subgraphs and bring them to the attention of the user. We introduce a formal framework for the definition of integrated networks and their associated semantic subgraphs for drug interaction analysis and describe DReSMin, an algorithm for mining semantically-rich networks for occurrences of a given semantic subgraph. This algorithm allows instances of complex semantic subgraphs that contain data about putative drug repositioning opportunities to be identified in a computationally tractable fashion, scaling close to linearly with network data. We demonstrate the utility of our approach by mining an integrated drug interaction network built from 11 sources. This work identified and ranked 9,643,061 putative drug-target interactions, showing a strong correlation between highly scored associations and those supported by literature. We discuss the 20 top ranked associations in more detail, of which 14 are novel and 6 are supported by the literature. We also show that our approach better prioritizes known drug-target interactions, than other state-of-the art approaches for predicting such interactions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.1558DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4736989PMC
February 2016

A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources.

Drug Discov Today 2014 Jul 4;19(7):882-9. Epub 2013 Nov 4.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.drudis.2013.10.024DOI Listing
July 2014

Minipig and beagle animal model genomes aid species selection in pharmaceutical discovery and development.

Toxicol Appl Pharmacol 2013 Jul 19;270(2):149-57. Epub 2013 Apr 19.

Computational Biology, Quantitative Sciences, GlaxoSmithKline, Stevenage, UK.

Improving drug attrition remains a challenge in pharmaceutical discovery and development. A major cause of early attrition is the demonstration of safety signals which can negate any therapeutic index previously established. Safety attrition needs to be put in context of clinical translation (i.e. human relevance) and is negatively impacted by differences between animal models and human. In order to minimize such an impact, an earlier assessment of pharmacological target homology across animal model species will enhance understanding of the context of animal safety signals and aid species selection during later regulatory toxicology studies. Here we sequenced the genomes of the Sus scrofa Göttingen minipig and the Canis familiaris beagle, two widely used animal species in regulatory safety studies. Comparative analyses of these new genomes with other key model organisms, namely mouse, rat, cynomolgus macaque, rhesus macaque, two related breeds (S. scrofa Duroc and C. familiaris boxer) and human reveal considerable variation in gene content. Key genes in toxicology and metabolism studies, such as the UGT2 family, CYP2D6, and SLCO1A2, displayed unique duplication patterns. Comparisons of 317 known human drug targets revealed surprising variation such as species-specific positive selection, duplication and higher occurrences of pseudogenized targets in beagle (41 genes) relative to minipig (19 genes). These data will facilitate the more effective use of animals in biomedical research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.taap.2013.04.007DOI Listing
July 2013

Towards virtual knowledge broker services for semantic integration of life science literature and data sources.

Drug Discov Today 2013 May 12;18(9-10):428-34. Epub 2012 Dec 12.

Ian Harrow Consulting, UK.

Research in the life sciences requires ready access to primary data, derived information and relevant knowledge from a multitude of sources. Integration and interoperability of such resources are crucial for sharing content across research domains relevant to the life sciences. In this article we present a perspective review of data integration with emphasis on a semantics driven approach to data integration that pushes content into a shared infrastructure, reduces data redundancy and clarifies any inconsistencies. This enables much improved access to life science data from numerous primary sources. The Semantic Enrichment of the Scientific Literature (SESL) pilot project demonstrates feasibility for using already available open semantic web standards and technologies to integrate public and proprietary data resources, which span structured and unstructured content. This has been accomplished through a precompetitive consortium, which provides a cost effective approach for numerous stakeholders to work together to solve common problems.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.drudis.2012.11.012DOI Listing
May 2013

An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people.

Science 2012 Jul 17;337(6090):100-4. Epub 2012 May 17.

Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.

Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1217876DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4319976PMC
July 2012

Deep resequencing unveils genetic architecture of ADIPOQ and identifies a novel low-frequency variant strongly associated with adiponectin variation.

Diabetes 2012 May 8;61(5):1297-301. Epub 2012 Mar 8.

Quantitative Sciences, GlaxoSmithKline, Research Triangle Park, North Carolina, USA.

Increased adiponectin levels have been shown to be associated with a lower risk of type 2 diabetes. To understand the relations between genetic variation at the adiponectin-encoding gene, ADIPOQ, and adiponectin levels, and subsequently its role in disease, we conducted a deep resequencing experiment of ADIPOQ in 14,002 subjects, including 12,514 Europeans, 594 African Americans, and 567 Indian Asians. We identified 296 single nucleotide polymorphisms (SNPs), including 30 amino acid changes, and carried out association analyses in a subset of 3,665 subjects from two independent studies. We confirmed multiple genome-wide association study findings and identified a novel association between a low-frequency SNP (rs17366653) and adiponectin levels (P = 2.2E-17). We show that seven SNPs exert independent effects on adiponectin levels. Together, they explained 6% of adiponectin variation in our samples. We subsequently assessed association between these SNPs and type 2 diabetes in the Genetics of Diabetes Audit and Research in Tayside Scotland (GO-DARTS) study, comprised of 5,145 case and 6,374 control subjects. No evidence of association with type 2 diabetes was found, but we were also unable to exclude the possibility of substantial effects (e.g., odds ratio 95% CI for rs7366653 [0.91-1.58]). Further investigation by large-scale and well-powered Mendelian randomization studies is warranted.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2337/db11-0985DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3331741PMC
May 2012

The application of next-generation sequencing technologies to drug discovery and development.

Drug Discov Today 2011 Jun 1;16(11-12):512-9. Epub 2011 Apr 1.

Computational Biology, Drug Discovery, GlaxoSmithKline Research and Development, Gunnels Wood Road, Stevenage, UK.

Next-generation sequencing (NGS) technologies represent a paradigm shift in sequencing capability. The technology has already been extensively applied to biological research, resulting in significant and remarkable insights into the molecular biology of cells. In this review, we focus on current and potential applications of the technology as applied to the drug discovery and development process. Early applications have focused on the oncology and infectious disease therapeutic areas, with emerging use in biopharmaceutical development and vaccine production in evidence. Although this technology has great potential, significant challenges remain, particularly around the storage, transfer and analysis of the substantial data sets generated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.drudis.2011.03.006DOI Listing
June 2011

Asking complex questions of the genome without programming.

Authors:
Peter M Woollard

Methods Mol Biol 2010 ;628:39-52

Computational Biology, Quantitative Sciences, GlaxoSmithKline Pharmaceuticals, Stevenage, Hertfordshire, UK.

Increasingly, vast amounts of genomics and genetic data are available. Although much of the data is largely accessible to relatively simple web queries, in some cases, more complex queries are required. This paper reviews the hierarchy of tools for querying genetic and genomic data. For querying multiple genes, variants or regions ENSEMBL BioMart and the UCSC Table Browser offer flexible interfaces. For more complex queries, GALAXY is a sophisticated tool for building workflows over existing internet resources. For the most challenging genome scale queries, programmatic access may be required through a defined application programming interface (API) - such as the one provided by Ensembl. All these tools allow one to rapidly ask many questions that were difficult to answer a few years ago, but choosing the appropriate tool for the job is critical.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-60327-367-1_3DOI Listing
May 2010

A community standard format for the representation of protein affinity reagents.

Mol Cell Proteomics 2010 Jan 11;9(1):1-10. Epub 2009 Aug 11.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.

Protein affinity reagents (PARs), most commonly antibodies, are essential reagents for protein characterization in basic research, biotechnology, and diagnostics as well as the fastest growing class of therapeutics. Large numbers of PARs are available commercially; however, their quality is often uncertain. In addition, currently available PARs cover only a fraction of the human proteome, and their cost is prohibitive for proteome scale applications. This situation has triggered several initiatives involving large scale generation and validation of antibodies, for example the Swedish Human Protein Atlas and the German Antibody Factory. Antibodies targeting specific subproteomes are being pursued by members of Human Proteome Organisation (plasma and liver proteome projects) and the United States National Cancer Institute (cancer-associated antigens). ProteomeBinders, a European consortium, aims to set up a resource of consistently quality-controlled protein-binding reagents for the whole human proteome. An ultimate PAR database resource would allow consumers to visit one on-line warehouse and find all available affinity reagents from different providers together with documentation that facilitates easy comparison of their cost and quality. However, in contrast to, for example, nucleotide databases among which data are synchronized between the major data providers, current PAR producers, quality control centers, and commercial companies all use incompatible formats, hindering data exchange. Here we propose Proteomics Standards Initiative (PSI)-PAR as a global community standard format for the representation and exchange of protein affinity reagent data. The PSI-PAR format is maintained by the Human Proteome Organisation PSI and was developed within the context of ProteomeBinders by building on a mature proteomics standard format, PSI-molecular interaction, which is a widely accepted and established community standard for molecular interaction data. Further information and documentation are available on the PSI-PAR web site.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1074/mcp.M900185-MCP200DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808255PMC
January 2010

Broadening the horizon--level 2.5 of the HUPO-PSI format for molecular interactions.

BMC Biol 2007 Oct 9;5:44. Epub 2007 Oct 9.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

Background: Molecular interaction Information is a key resource in modern biomedical research. Publicly available data have previously been provided in a broad array of diverse formats, making access to this very difficult. The publication and wide implementation of the Human Proteome Organisation Proteomics Standards Initiative Molecular Interactions (HUPO PSI-MI) format in 2004 was a major step towards the establishment of a single, unified format by which molecular interactions should be presented, but focused purely on protein-protein interactions.

Results: The HUPO-PSI has further developed the PSI-MI XML schema to enable the description of interactions between a wider range of molecular types, for example nucleic acids, chemical entities, and molecular complexes. Extensive details about each supported molecular interaction can now be captured, including the biological role of each molecule within that interaction, detailed description of interacting domains, and the kinetic parameters of the interaction. The format is supported by data management and analysis tools and has been adopted by major interaction data providers. Additionally, a simpler, tab-delimited format MITAB2.5 has been developed for the benefit of users who require only minimal information in an easy to access configuration.

Conclusion: The PSI-MI XML2.5 and MITAB2.5 formats have been jointly developed by interaction data producers and providers from both the academic and commercial sector, and are already widely implemented and well supported by an active development community. PSI-MI XML2.5 enables the description of highly detailed molecular interaction data and facilitates data exchange between databases and users without loss of information. MITAB2.5 is a simpler format appropriate for fast Perl parsing or loading into Microsoft Excel.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1741-7007-5-44DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2189715PMC
October 2007

The minimum information required for reporting a molecular interaction experiment (MIMIx).

Nat Biotechnol 2007 Aug;25(8):894-8

European Molecular Biology Laboratory (EMBL) - European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK.

A wealth of molecular interaction data is available in the literature, ranging from large-scale datasets to a single interaction confirmed by several different techniques. These data are all too often reported either as free text or in tables of variable format, and are often missing key pieces of information essential for a full understanding of the experiment. Here we propose MIMIx, the minimum information required for reporting a molecular interaction experiment. Adherence to these reporting guidelines will result in publications of increased clarity and usefulness to the scientific community and will support the rapid, systematic capture of molecular interaction data in public databases, thereby improving access to valuable interaction data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt1324DOI Listing
August 2007

Characterization of a Mycobacterium tuberculosis H37Rv transposon library reveals insertions in 351 ORFs and mutants with altered virulence.

Microbiology (Reading) 2002 10;148(Pt 10):2975-2986

GlaxoSmithKline, Medicines Research Centre, Gunnels Wood Road, Stevenage SG1 2NY, UK1.

A library of Mycobacterium tuberculosis insertional mutants was generated with the transposon Tn5370. The junction sequence between the transposon and the mycobacterial chromosome was determined, revealing the positions of 1329 unique insertions, 1189 of which were located in 351 different ORFs. Transposition was not completely random and examination of the most susceptible genome regions revealed a lower-than-average G+C content ranging from 54 to 62 mol%. Mutants were obtained in all of the recognized M. tuberculosis functional protein-coding gene classes. About 30% of the disrupted ORFs had matches elsewhere in the genome that suggested redundancy of function. The effect of gene disruption on the virulence of a selected set of defined mutants was investigated in a severe combined immune deficiency (SCID) mouse model. A range of phenotypes was observed in these mutants, the most notable being the severe attenuation in virulence of a strain disrupted in the Rv1290c gene, which encodes a protein of unknown function. The library described in this study provides a resource of defined mutant strains for use in functional analyses aimed at investigating the role of particular M. tuberculosis genes in virulence and defining their potential as targets for new anti-mycobacterial drugs or as candidates for deletion in a rationally attenuated live vaccine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1099/00221287-148-10-2975DOI Listing
October 2002