Publications by authors named "Midori A Harris"

28 Publications

  • Page 1 of 1

Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns.

Open Biol 2020 09 2;10(9):200149. Epub 2020 Sep 2.

Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.

Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1098/rsob.200149DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7536087PMC
September 2020

Community curation in PomBase: enabling fission yeast experts to provide detailed, standardized, sharable annotation from research publications.

Database (Oxford) 2020 01;2020

Cell Cycle Laboratory, The Francis Crick Institute, Midland Rd, London NW1 1AT, UK.

Maximizing the impact and value of scientific research requires efficient knowledge distribution, which increasingly depends on the integration of standardized published data into online databases. To make data integration more comprehensive and efficient for fission yeast research, PomBase has pioneered a community curation effort that engages publication authors directly in FAIR-sharing of data representing detailed biological knowledge from hypothesis-driven experiments. Canto, an intuitive online curation tool that enables biologists to describe their detailed functional data using shared ontologies, forms the core of PomBase's system. With 8 years' experience, and as the author response rate reaches 50%, we review community curation progress and the insights we have gained from the project. We highlight incentives and nudges we deploy to maximize participation, and summarize project outcomes, which include increased knowledge integration and dissemination as well as the unanticipated added value arising from co-curation by publication authors and professional curators.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/baaa028DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7192550PMC
January 2020

Hidden in plain sight: what remains to be discovered in the eukaryotic proteome?

Open Biol 2019 02;9(2):180241

1 Cambridge Systems Biology Centre, University of Cambridge , Cambridge , UK.

The first decade of genome sequencing stimulated an explosion in the characterization of unknown proteins. More recently, the pace of functional discovery has slowed, leaving around 20% of the proteins even in well-studied model organisms without informative descriptions of their biological roles. Remarkably, many uncharacterized proteins are conserved from yeasts to human, suggesting that they contribute to fundamental biological processes (BP). To fully understand biological systems in health and disease, we need to account for every part of the system. Unstudied proteins thus represent a collective blind spot that limits the progress of both basic and applied biosciences. We use a simple yet powerful metric based on Gene Ontology BP terms to define characterized and uncharacterized proteins for human, budding yeast and fission yeast. We then identify a set of conserved but unstudied proteins in S. pombe, and classify them based on a combination of orthogonal attributes determined by large-scale experimental and comparative methods. Finally, we explore possible reasons why these proteins remain neglected, and propose courses of action to raise their profile and thereby reap the benefits of completing the catalogue of proteins' biological roles.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1098/rsob.180241DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6395881PMC
February 2019

Annotation of gene product function from high-throughput studies using the Gene Ontology.

Database (Oxford) 2019 01 1;2019. Epub 2019 Jan 1.

Zebrafish Information Network, University of Oregon, Eugene, OR, USA.

High-throughput studies constitute an essential and valued source of information for researchers. However, high-throughput experimental workflows are often complex, with multiple data sets that may contain large numbers of false positives. The representation of high-throughput data in the Gene Ontology (GO) therefore presents a challenging annotation problem, when the overarching goal of GO curation is to provide the most precise view of a gene's role in biology. To address this, representatives from annotation teams within the GO Consortium reviewed high-throughput data annotation practices. We present an annotation framework for high-throughput studies that will facilitate good standards in GO curation and, through the use of new high-throughput evidence codes, increase the visibility of these annotations to the research community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/baz007DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6355445PMC
January 2019

PomBase 2018: user-driven reimplementation of the fission yeast database provides rapid and intuitive access to diverse, interconnected information.

Nucleic Acids Res 2019 01;47(D1):D821-D827

Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK.

PomBase (www.pombase.org), the model organism database for the fission yeast Schizosaccharomyces pombe, has undergone a complete redevelopment, resulting in a more fully integrated, better-performing service. The new infrastructure supports daily data updates as well as fast, efficient querying and smoother navigation within and between pages. New pages for publications and genotypes provide routes to all data curated from a single source and to all phenotypes associated with a specific genotype, respectively. For ontology-based annotations, improved displays balance comprehensive data coverage with ease of use. The default view now uses ontology structure to provide a concise, non-redundant summary that can be expanded to reveal underlying details and metadata. The phenotype annotation display also offers filtering options to allow users to focus on specific areas of interest. An instance of the JBrowse genome browser has been integrated, facilitating loading of and intuitive access to, genome-scale datasets. Taken together, the new data and pages, along with improvements in annotation display and querying, allow users to probe connections among different types of data to form a comprehensive view of fission yeast biology. The new PomBase implementation also provides a rich set of modular, reusable tools that can be deployed to create new, or enhance existing, organism-specific databases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky961DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6324063PMC
January 2019

PomBase: The Scientific Resource for Fission Yeast.

Methods Mol Biol 2018 ;1757:49-68

Department of Biochemistry, Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK.

The fission yeast Schizosaccharomyces pombe has become well established as a model species for studying conserved cell-level biological processes, especially the mechanics and regulation of cell division. PomBase integrates the S. pombe genome sequence with traditional genetic, molecular, and cell biological experimental data as well as the growing body of large datasets generated by emerging high-throughput methods. This chapter provides insight into the curation philosophy and data organization at PomBase, and provides a guide to using PomBase for infrequent visitors and anyone considering exploring S. pombe in their research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-7737-6_4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6440643PMC
January 2019

Model organism databases: essential resources that need the support of both funders and users.

BMC Biol 2016 06 22;14:49. Epub 2016 Jun 22.

Cambridge Systems Biology Centre & Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge, CB2 1GA, UK.

Modern biomedical research depends critically on access to databases that house and disseminate genetic, genomic, molecular, and cell biological knowledge. Even as the explosion of available genome sequences and associated genome-scale data continues apace, the sustainability of professionally maintained biological databases is under threat due to policy changes by major funding agencies. Here, we focus on model organism databases to demonstrate the myriad ways in which biological databases not only act as repositories but actively facilitate advances in research. We present data that show that reducing financial support to model organism databases could prove to be not just scientifically, but also economically, unsound.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12915-016-0276-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4918006PMC
June 2016

PomBase 2015: updates to the fission yeast database.

Nucleic Acids Res 2015 Jan 31;43(Database issue):D656-61. Epub 2014 Oct 31.

Cambridge Systems Biology and Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge, Cambridgeshire CB2 1GA, UK

PomBase (http://www.pombase.org) is the model organism database for the fission yeast Schizosaccharomyces pombe. PomBase provides a central hub for the fission yeast community, supporting both exploratory and hypothesis-driven research. It provides users easy access to data ranging from the sequence level, to molecular and phenotypic annotations, through to the display of genome-wide high-throughput studies. Recent improvements to the site extend annotation specificity, improve usability and allow for monthly data updates. Both in-house curators and community researchers provide manually curated data to PomBase. The genome browser provides access to published high-throughput data sets and the genomes of three additional Schizosaccharomyces species (Schizosaccharomyces cryophilus, Schizosaccharomyces japonicus and Schizosaccharomyces octosporus).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku1040DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383888PMC
January 2015

Representing kidney development using the gene ontology.

PLoS One 2014 18;9(6):e99864. Epub 2014 Jun 18.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

Gene Ontology (GO) provides dynamic controlled vocabularies to aid in the description of the functional biological attributes and subcellular locations of gene products from all taxonomic groups (www.geneontology.org). Here we describe collaboration between the renal biomedical research community and the GO Consortium to improve the quality and quantity of GO terms describing renal development. In the associated annotation activity, the new and revised terms were associated with gene products involved in renal development and function. This project resulted in a total of 522 GO terms being added to the ontology and the creation of approximately 9,600 kidney-related GO term associations to 940 UniProt Knowledgebase (UniProtKB) entries, covering 66 taxonomic groups. We demonstrate the impact of these improvements on the interpretation of GO term analyses performed on genes differentially expressed in kidney glomeruli affected by diabetic nephropathy. In summary, we have produced a resource that can be utilized in the interpretation of data from small- and large-scale experiments investigating molecular mechanisms of kidney function and development and thereby help towards alleviating renal disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0099864PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4062467PMC
July 2015

A method for increasing expressivity of Gene Ontology annotations using a compositional approach.

BMC Bioinformatics 2014 May 21;15:155. Epub 2014 May 21.

Lawrence Berkeley National Laboratory, Genomics Division, Berkeley, CA 94720, USA.

Background: The Gene Ontology project integrates data about the function of gene products across a diverse range of organisms, allowing the transfer of knowledge from model organisms to humans, and enabling computational analyses for interpretation of high-throughput experimental and clinical data. The core data structure is the annotation, an association between a gene product and a term from one of the three ontologies comprising the GO. Historically, it has not been possible to provide additional information about the context of a GO term, such as the target gene or the location of a molecular function. This has limited the specificity of knowledge that can be expressed by GO annotations.

Results: The GO Consortium has introduced annotation extensions that enable manually curated GO annotations to capture additional contextual details. Extensions represent effector-target relationships such as localization dependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factors as well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describe the content and structure of annotation extensions, provide examples, and summarize the current usage of annotation extensions.

Conclusions: The additional contextual information captured by annotation extensions improves the utility of functional annotation by representing dependencies between annotations to terms in the different ontologies of GO, external ontologies, or an organism's gene products. These enhanced annotations can also support sophisticated queries and reasoning, and will provide curated, directional links between many gene products to support pathway and network reconstruction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-15-155DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4039540PMC
May 2014

Canto: an online tool for community literature curation.

Bioinformatics 2014 Jun 25;30(12):1791-2. Epub 2014 Feb 25.

Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA and Department of Genetics, Evolution and Environment, and UCL Cancer Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UKCambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA and Department of Genetics, Evolution and Environment, and UCL Cancer Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.

Motivation: Detailed curation of published molecular data is essential for any model organism database. Community curation enables researchers to contribute data from their papers directly to databases, supplementing the activity of professional curators and improving coverage of a growing body of literature. We have developed Canto, a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase. Canto supports curation using OBO ontologies, and can be easily configured for use with any species.

Availability: Canto code and documentation are available under an Open Source license from http://curation.pombase.org/. Canto is a component of the Generic Model Organism Database (GMOD) project (http://www.gmod.org/).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btu103DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058955PMC
June 2014

Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology.

BMC Genomics 2013 Jul 29;14:513. Epub 2013 Jul 29.

Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA.

Background: The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI.

Results: We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI.

Conclusions: The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-14-513DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3733925PMC
July 2013

A guide to best practices for Gene Ontology (GO) manual annotation.

Database (Oxford) 2013 9;2013:bat054. Epub 2013 Jul 9.

Saccharomyces Genome Database, Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, CA 94305, USA.

The Gene Ontology Consortium (GOC) is a community-based bioinformatics project that classifies gene product function through the use of structured controlled vocabularies. A fundamental application of the Gene Ontology (GO) is in the creation of gene product annotations, evidence-based associations between GO definitions and experimental or sequence-based analysis. Currently, the GOC disseminates 126 million annotations covering >374,000 species including all the kingdoms of life. This number includes two classes of GO annotations: those created manually by experienced biocurators reviewing the literature or by examination of biological data (1.1 million annotations covering 2226 species) and those generated computationally via automated methods. As manual annotations are often used to propagate functional predictions between related proteins within and between genomes, it is critical to provide accurate consistent manual annotations. Toward this goal, we present here the conventions defined by the GOC for the creation of manual annotation. This guide represents the best practices for manual annotation as established by the GOC project over the past 12 years. We hope this guide will encourage research communities to annotate gene products of their interest to enhance the corpus of GO annotations available to all. DATABASE URL: http://www.geneontology.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bat054DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3706743PMC
October 2013

FYPO: the fission yeast phenotype ontology.

Bioinformatics 2013 Jul 8;29(13):1671-8. Epub 2013 May 8.

Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK.

Motivation: To provide consistent computable descriptions of phenotype data, PomBase is developing a formal ontology of phenotypes observed in fission yeast.

Results: The fission yeast phenotype ontology (FYPO) is a modular ontology that uses several existing ontologies from the open biological and biomedical ontologies (OBO) collection as building blocks, including the phenotypic quality ontology PATO, the Gene Ontology and Chemical Entities of Biological Interest. Modular ontology development facilitates partially automated effective organization of detailed phenotype descriptions with complex relationships to each other and to underlying biological phenomena. As a result, FYPO supports sophisticated querying, computational analysis and comparison between different experiments and even between species.

Availability: FYPO releases are available from the Subversion repository at the PomBase SourceForge project page (https://sourceforge.net/p/pombase/code/HEAD/tree/phenotype_ontology/). The current version of FYPO is also available on the OBO Foundry Web site (http://obofoundry.org/).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt266DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694669PMC
July 2013

Semantic integration of physiology phenotypes with an application to the Cellular Phenotype Ontology.

Bioinformatics 2012 Jul 26;28(13):1783-9. Epub 2012 Apr 26.

Department of Genetics, University of Cambridge, Downing Street, Cambridge, Cambridge CB2 3EH, UK.

Motivation: The systematic observation of phenotypes has become a crucial tool of functional genomics, and several large international projects are currently underway to identify and characterize the phenotypes that are associated with genotypes in several species. To integrate phenotype descriptions within and across species, phenotype ontologies have been developed. Applying ontologies to unify phenotype descriptions in the domain of physiology has been a particular challenge due to the high complexity of the underlying domain.

Results: In this study, we present the outline of a theory and its implementation for an ontology of physiology-related phenotypes. We provide a formal description of process attributes and relate them to the attributes of their temporal parts and participants. We apply our theory to create the Cellular Phenotype Ontology (CPO). The CPO is an ontology of morphological and physiological phenotypic characteristics of cells, cell components and cellular processes. Its prime application is to provide terms and uniform definition patterns for the annotation of cellular phenotypes. The CPO can be used for the annotation of observed abnormalities in domains, such as systems microscopy, in which cellular abnormalities are observed and for which no phenotype ontology has been created.

Availability And Implementation: The CPO and the source code we generated to create the CPO are freely available on http://cell-phenotype.googlecode.com.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bts250DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3381966PMC
July 2012

PomBase: a comprehensive online resource for fission yeast.

Nucleic Acids Res 2012 Jan 28;40(Database issue):D695-9. Epub 2011 Oct 28.

Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK.

PomBase (www.pombase.org) is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkr853DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245111PMC
January 2012

How the gene ontology evolves.

BMC Bioinformatics 2011 Aug 5;12:325. Epub 2011 Aug 5.

ESRC Centre for Genomics in Society, University of Exeter, EX4 4PJ Exeter, UK.

Background: Maintaining a bio-ontology in the long term requires improving and updating its contents so that it adequately captures what is known about biological phenomena. This paper illustrates how these processes are carried out, by studying the ways in which curators at the Gene Ontology have hitherto incorporated new knowledge into their resource.

Results: Five types of circumstances are singled out as warranting changes in the ontology: (1) the emergence of anomalies within GO; (2) the extension of the scope of GO; (3) divergence in how terminology is used across user communities; (4) new discoveries that change the meaning of the terms used and their relations to each other; and (5) the extension of the range of relations used to link entities or processes described by GO terms.

Conclusion: This study illustrates the difficulties involved in applying general standards to the development of a specific ontology. Ontology curation aims to produce a faithful representation of knowledge domains as they keep developing, which requires the translation of general guidelines into specific representations of reality and an understanding of how scientific knowledge is produced and constantly updated. In this context, it is important that trained curators with technical expertise in the scientific field(s) in question are involved in supervising ontology shifts and identifying inaccuracies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-12-325DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166943PMC
August 2011

Cross-product extensions of the Gene Ontology.

J Biomed Inform 2011 Feb 10;44(1):80-6. Epub 2010 Feb 10.

Lawrence Berkeley National Laboratory, Mail Stop 64R0121, Berkeley, CA 94720, USA.

The Gene Ontology (GO) consists of nearly 30,000 classes for describing the activities and locations of gene products. Manual maintenance of ontology of this size is a considerable effort, and errors and inconsistencies inevitably arise. Reasoners can be used to assist with ontology development, automatically placing classes in a subsumption hierarchy based on their properties. However, the historic lack of computable definitions within the GO has prevented the user of these tools. In this paper, we present preliminary results of an ongoing effort to normalize the GO by explicitly stating the definitions of compositional classes in a form that can be used by reasoners. These definitions are partitioned into mutually exclusive cross-product sets, many of which reference other OBO Foundry candidate ontologies for chemical entities, proteins, biological qualities and anatomical entities. Using these logical definitions we are gradually beginning to automate many aspects of ontology development, detecting errors and filling in missing relationships. These definitions also enhance the GO by weaving it into the fabric of a wider collection of interoperating ontologies, increasing opportunities for data integration and enhancing genomic analyses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2010.02.002DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2910209PMC
February 2011

The Protein Feature Ontology: a tool for the unification of protein feature annotations.

Bioinformatics 2008 Dec 20;24(23):2767-72. Epub 2008 Oct 20.

EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Motivation: The advent of sequencing and structural genomics projects has provided a dramatic boost in the number of uncharacterized protein structures and sequences. Consequently, many computational tools have been developed to help elucidate protein function. However, such services are spread throughout the world, often with standalone web pages. Integration of these methods is needed and so far this has not been possible as there was no common vocabulary available that could be used as a standard language.

Results: The Protein Feature Ontology has been developed to provide a structured controlled vocabulary for features on a protein sequence or structure and comprises approximately 100 positional terms, now integrated into the Sequence Ontology (SO) and 40 non-positional terms which describe features relating to the whole-protein sequence. In addition, post-translational modifications are described by using a pre-existing ontology, the Protein Modification Ontology (MOD). This ontology is being used to integrate over 150 distinct annotations provided by the BioSapiens Network of Excellence, a consortium comprising 19 partner sites in Europe.

Availability: The Protein Feature Ontology can be browsed by accessing the ontology lookup service at the European Bioinformatics Institute (http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=BS).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btn528DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2912506PMC
December 2008

The Gene Ontology (GO) project: structured vocabularies for molecular biology and their application to genome and expression analysis.

Curr Protoc Bioinformatics 2008 Sep;Chapter 7:Unit 7.2

The Jackson Laboratory, Bar Harbor, Maine, USA.

Scientists wishing to utilize genomic data have quickly come to realize the benefit of standardizing descriptions of experimental procedures and results for computer-driven information retrieval systems. The focus of the Gene Ontology project is three-fold. First, the project goal is to compile the Gene Ontologies: structured vocabularies describing domains of molecular biology. Second, the project supports the use of these structured vocabularies in the annotation of gene products. Third, the gene product-to-GO annotation sets are provided by participating groups to the public through open access to the GO database and Web resource. This unit describes the current ontologies and what is beyond the scope of the Gene Ontology project. It addresses the issue of how GO vocabularies are constructed and related to genes and gene products. It concludes with a discussion of how researchers can access, browse, and utilize the GO project in the course of their own research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/0471250953.bi0702s23DOI Listing
September 2008

The Gene Ontology (GO) project: structured vocabularies for molecular biology and their application to genome and expression analysis.

Curr Protoc Bioinformatics 2002 Nov;Chapter 7:Unit 7.2

The Jackson Laboratory, Bar Harbor, Maine, USA.

Scientists wishing to utilize genomic data have quickly come to realize the benefit of standardizing descriptions of experimental procedures and results for computer-driven information retrieval systems. The focus of the Gene Ontology project is three-fold. First, the project goal is to compile the Gene Ontologies; structured vocabularies describing domains of molecular biology. Second, the project supports the use of these structured vocabularies in the annotation of gene products. Third, the gene product-to-GO annotation sets are provided by participating groups to the public through open access to the GO database and Web resource. This unit describes the current ontologies and what is beyond the scope of the Gene Ontology project. It addresses the issue of how GO vocabularies are constructed and related to genes and gene products. It concludes with a discussion of how researchers can access, browse, and utilize the GO project in the course of their own research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/0471250953.bi0702s00DOI Listing
November 2002

Standards and ontologies for functional genomics 2.

Comp Funct Genomics 2004 ;5(8):618-22

European Bioinformatics Institute, EMBL Outstation, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cfg.448DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2447478PMC
June 2010

Standards and ontologies for functional genomics: towards unified ontologies for biology and biomedicine.

Comp Funct Genomics 2003 ;4(1):116-20

European Bioinformatics Institute, EMBL Outstation, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cfg.249DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2447377PMC
June 2010

Developing an ontology.

Authors:
Midori A Harris

Methods Mol Biol 2008 ;452:111-24

European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.

In recent years, biological ontologies have emerged as a means of representing and organizing biological concepts, enabling biologists, bioinformaticians, and others to derive meaning from large datasets.This chapter provides an overview of formal principles and practical considerations of ontology construction and application. Ontology development concepts are illustrated using examples drawn from the Gene Ontology (GO) and other OBO ontologies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-60327-159-2_5DOI Listing
July 2008

OBO-Edit--an ontology editor for biologists.

Bioinformatics 2007 Aug 1;23(16):2198-200. Epub 2007 Jun 1.

Berkeley Bioinformatics and Ontology Project, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.

Unlabelled: OBO-Edit is an open source, platform-independent ontology editor developed and maintained by the Gene Ontology Consortium. Implemented in Java, OBO-Edit uses a graph-oriented approach to display and edit ontologies. OBO-Edit is particularly valuable for viewing and editing biomedical ontologies.

Availability: https://sourceforge.net/project/showfiles.php?group_id=36855.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btm112DOI Listing
August 2007

The European Bioinformatics Institute's data resources.

Nucleic Acids Res 2003 Jan;31(1):43-50

EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics Institute (EBI) hosts six core databases, which store information on DNA sequences (EMBL-Bank), protein sequences (SWISS-PROT and TrEMBL), protein structure (MSD), whole genomes (Ensembl) and gene expression (ArrayExpress). But just as a cell would be useless if it couldn't transcribe DNA or translate RNA, our resources would be compromised if each existed in isolation. We have therefore developed a range of tools that not only facilitate the deposition and retrieval of biological information, but also allow users to carry out searches that reflect the interconnectedness of biological information. The EBI's databases and tools are all available on our website at www.ebi.ac.uk.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC165513PMC
http://dx.doi.org/10.1093/nar/gkg066DOI Listing
January 2003

Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO).

Nucleic Acids Res 2002 Jan;30(1):69-72

Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305-5120, USA.

The Saccharomyces Genome Database (SGD) resources, ranging from genetic and physical maps to genome-wide analysis tools, reflect the scientific progress in identifying genes and their functions over the last decade. As emphasis shifts from identification of the genes to identification of the role of their gene products in the cell, SGD seeks to provide its users with annotations that will allow relationships to be made between gene products, both within Saccharomyces cerevisiae and across species. To this end, SGD is annotating genes to the Gene Ontology (GO), a structured representation of biological knowledge that can be shared across species. The GO consists of three separate ontologies describing molecular function, biological process and cellular component. The goal is to use published information to associate each characterized S.cerevisiae gene product with one or more GO terms from each of the three ontologies. To be useful, this must be done in a manner that allows accurate associations based on experimental evidence, modifications to GO when necessary, and careful documentation of the annotations through evidence codes for given citations. Reaching this goal is an ongoing process at SGD. For information on the current progress of GO annotations at SGD and other participating databases, as well as a description of each of the three ontologies, please visit the GO Consortium page at http://www.geneontology.org. SGD gene associations to GO can be found by visiting our site at http://genome-www.stanford.edu/Saccharomyces/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC99086PMC
http://dx.doi.org/10.1093/nar/30.1.69DOI Listing
January 2002