Publications by authors named "Robert Hoehndorf"

103 Publications

Improved characterisation of clinical text through ontology-based vocabulary expansion.

J Biomed Semantics 2021 04 12;12(1). Epub 2021 Apr 12.

Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK.

Background: Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which lead to the same entities being described by several concepts in the same or similar contexts across several ontologies. While these concepts describe the same entities, they contain different sets of complementary metadata. Linking these definitions to make use of their combined metadata could lead to improved performance in ontology-based information retrieval, extraction, and analysis tasks.

Results: We develop and present an algorithm that expands the set of labels associated with an ontology class using a combination of strict lexical matching and cross-ontology reasoner-enabled equivalency queries. Across all disease terms in the Disease Ontology, the approach found 51,362 additional labels, more than tripling the number defined by the ontology itself. Manual validation by a clinical expert on a random sampling of expanded synonyms over the Human Phenotype Ontology yielded a precision of 0.912. Furthermore, we found that annotating patient visits in MIMIC-III with an extended set of Disease Ontology labels led to semantic similarity score derived from those labels being a significantly better predictor of matching first diagnosis, with a mean average precision of 0.88 for the unexpanded set of annotations, and 0.913 for the expanded set.

Conclusions: Inter-ontology synonym expansion can lead to a vast increase in the scale of vocabulary available for text mining applications. While the accuracy of the extended vocabulary is not perfect, it nevertheless led to a significantly improved ontology-based characterisation of patients from text in one setting. Furthermore, where run-on error is not acceptable, the technique can be used to provide candidate synonyms which can be checked by a domain expert.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13326-021-00241-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8042947PMC
April 2021

Towards similarity-based differential diagnostics for common diseases.

Comput Biol Med 2021 Apr 1;133:104360. Epub 2021 Apr 1.

College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; NIHR Experimental Cancer Medicine Centre, UK; NIHR Surgical Reconstruction and Microbiology Research Centre, UK; NIHR Biomedical Research Centre, UK; MRC Health Data Research UK (HDR UK) Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.

Ontology-based phenotype profiles have been utilised for the purpose of differential diagnosis of rare genetic diseases, and for decision support in specific disease domains. Particularly, semantic similarity facilitates diagnostic hypothesis generation through comparison with disease phenotype profiles. However, the approach has not been applied for differential diagnosis of common diseases, or generalised clinical diagnostics from uncurated text-derived phenotypes. In this work, we describe the development of an approach for deriving patient phenotype profiles from clinical narrative text, and apply this to text associated with MIMIC-III patient visits. We then explore the use of semantic similarity with those text-derived phenotypes to classify primary patient diagnosis, comparing the use of patient-patient similarity and patient-disease similarity using phenotype-disease profiles previously mined from literature. We also consider a combined approach, in which literature-derived phenotypes are extended with the content of text-derived phenotypes we mined from 500 patients. The results reveal a powerful approach, showing that in one setting, uncurated text phenotypes can be used for differential diagnosis of common diseases, making use of information both inside and outside the setting. While the methods themselves should be explored for further optimisation, they could be applied to a variety of clinical tasks, such as differential diagnosis, cohort discovery, document and text classification, and outcome prediction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2021.104360DOI Listing
April 2021

DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes.

Bioinformatics 2021 Mar 2. Epub 2021 Mar 2.

Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia.

Motivation: Infectious diseases caused by novel viruses have become a major public health concern. Rapid identification of virus-host interactions can reveal mechanistic insights into infectious diseases and shed light on potential treatments. Current computational prediction methods for novel viruses are based mainly on protein sequences. However, it is not clear to what extent other important features, such as the symptoms caused by the viruses, could contribute to a predictor. Disease phenotypes (i.e., signs and symptoms) are readily accessible from clinical diagnosis and we hypothesize that they may act as a potential proxy and an additional source of information for the underlying molecular interactions between the pathogens and hosts.

Results: We developed DeepViral, a deep learning based method that predicts protein-protein interactions (PPI) between humans and viruses. Motivated by the potential utility of infectious disease phenotypes, we first embedded human proteins and viruses in a shared space using their associated phenotypes and functions, supported by formalized background knowledge from biomedical ontologies. By jointly learning from protein sequences and phenotype features, DeepViral significantly improves over existing sequence-based methods for intra- and inter-species PPI prediction.

Availability: Code and datasets for reproduction and customization are available at https://github.com/bio-ontology-research-group/DeepViral. Prediction results for 14 virus families are available at https://doi.org/10.5281/zenodo.4429824.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab147DOI Listing
March 2021

A fast, accurate, and generalisable heuristic-based negation detection algorithm for clinical text.

Comput Biol Med 2021 Mar 16;130:104216. Epub 2021 Jan 16.

College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; NIHR Experimental Cancer Medicine Centre, UK; NIHR Surgical Reconstruction and Microbiology Research Centre, UK; NIHR Biomedical Research Centre, UK; MRC Health Data Research UK (HDR UK) Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.

Negation detection is an important task in biomedical text mining. Particularly in clinical settings, it is of critical importance to determine whether findings mentioned in text are present or absent. Rule-based negation detection algorithms are a common approach to the task, and more recent investigations have resulted in the development of rule-based systems utilising the rich grammatical information afforded by typed dependency graphs. However, interacting with these complex representations inevitably necessitates complex rules, which are time-consuming to develop and do not generalise well. We hypothesise that a heuristic approach to determining negation via dependency graphs could offer a powerful alternative. We describe and implement an algorithm for negation detection based on grammatical distance from a negatory construct in a typed dependency graph. To evaluate the algorithm, we develop two testing corpora comprised of sentences of clinical text extracted from the MIMIC-III database and documents related to hypertrophic cardiomyopathy patients routinely collected at University Hospitals Birmingham NHS trust. Gold-standard validation datasets were built by a combination of human annotation and examination of algorithm error. Finally, we compare the performance of our approach with four other rule-based algorithms on both gold-standard corpora. The presented algorithm exhibits the best performance by f-measure over the MIMIC-III dataset, and a similar performance to the syntactic negation detection systems over the HCM dataset. It is also the fastest of the dependency-based negation systems explored in this study. Our results show that while a single heuristic approach to dependency-based negation detection is ignorant to certain advanced cases, it nevertheless forms a powerful and stable method, requiring minimal training and adaptation between datasets. As such, it could present a drop-in replacement or augmentation for many-rule negation approaches in clinical text-mining pipelines, particularly for cases where adaptation and rule development is not required or possible.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2021.104216DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7910278PMC
March 2021

Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies.

BMC Med Inform Decis Mak 2020 12 15;20(Suppl 10):311. Epub 2020 Dec 15.

Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.

Background: Ontologies are widely used throughout the biomedical domain. These ontologies formally represent the classes and relations assumed to exist within a domain. As scientific domains are deeply interlinked, so too are their representations. While individual ontologies can be tested for consistency and coherency using automated reasoning methods, systematically combining ontologies of multiple domains together may reveal previously hidden contradictions.

Methods: We developed a method that tests for hidden unsatisfiabilities in an ontology that arise when combined with other ontologies. For this purpose, we combined sets of ontologies and use automated reasoning to determine whether unsatisfiable classes are present. In addition, we designed and implemented a novel algorithm that can determine justifications for contradictions across extremely large and complicated ontologies, and use these justifications to semi-automatically repair ontologies by identifying a small set of axioms that, when removed, result in a consistent and coherent set of ontologies.

Results: We tested the mutual consistency of the OBO Foundry and the OBO ontologies and find that the combined OBO Foundry gives rise to at least 636 unsatisfiable classes, while the OBO ontologies give rise to more than 300,000 unsatisfiable classes. We also applied our semi-automatic repair algorithm to each combination of OBO ontologies that resulted in unsatisfiable classes, finding that only 117 axioms could be removed to account for all cases of unsatisfiability across all OBO ontologies.

Conclusions: We identified a large set of hidden unsatisfiability across a broad range of biomedical ontologies, and we find that this large set of unsatisfiable classes is the result of a relatively small amount of axiomatic disagreements. Our results show that hidden unsatisfiability is a serious problem in ontology interoperability; however, our results also provide a way towards more consistent ontologies by addressing the issues we identified.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12911-020-01336-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7736131PMC
December 2020

DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier.

PLoS Comput Biol 2020 11 18;16(11):e1008453. Epub 2020 Nov 18.

King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia.

Predicting the phenotypes resulting from molecular perturbations is one of the key challenges in genetics. Both forward and reverse genetic screen are employed to identify the molecular mechanisms underlying phenotypes and disease, and these resulted in a large number of genotype-phenotype association being available for humans and model organisms. Combined with recent advances in machine learning, it may now be possible to predict human phenotypes resulting from particular molecular aberrations. We developed DeepPheno, a neural network based hierarchical multi-class multi-label classification method for predicting the phenotypes resulting from loss-of-function in single genes. DeepPheno uses the functional annotations with gene products to predict the phenotypes resulting from a loss-of-function; additionally, we employ a two-step procedure in which we predict these functions first and then predict phenotypes. Prediction of phenotypes is ontology-based and we propose a novel ontology-based classifier suitable for very large hierarchical classification tasks. These methods allow us to predict phenotypes associated with any known protein-coding gene. We evaluate our approach using evaluation metrics established by the CAFA challenge and compare with top performing CAFA2 methods as well as several state of the art phenotype prediction approaches, demonstrating the improvement of DeepPheno over established methods. Furthermore, we show that predictions generated by DeepPheno are applicable to predicting gene-disease associations based on comparing phenotypes, and that a large number of new predictions made by DeepPheno have recently been added as phenotype databases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1008453DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7710064PMC
November 2020

Predicting candidate genes from phenotypes, functions and anatomical site of expression.

Bioinformatics 2021 May;37(6):853-860

Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.

Motivation: Over the past years, many computational methods have been developed to incorporate information about phenotypes for disease-gene prioritization task. These methods generally compute the similarity between a patient's phenotypes and a database of gene-phenotype to find the most phenotypically similar match. The main limitation in these methods is their reliance on knowledge about phenotypes associated with particular genes, which is not complete in humans as well as in many model organisms, such as the mouse and fish. Information about functions of gene products and anatomical site of gene expression is available for more genes and can also be related to phenotypes through ontologies and machine-learning models.

Results: We developed a novel graph-based machine-learning method for biomedical ontologies, which is able to exploit axioms in ontologies and other graph-structured data. Using our machine-learning method, we embed genes based on their associated phenotypes, functions of the gene products and anatomical location of gene expression. We then develop a machine-learning model to predict gene-disease associations based on the associations between genes and multiple biomedical ontologies, and this model significantly improves over state-of-the-art methods. Furthermore, we extend phenotype-based gene prioritization methods significantly to all genes, which are associated with phenotypes, functions or site of expression.

Availability And Implementation: Software and data are available at https://github.com/bio-ontology-research-group/DL2Vec.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa879DOI Listing
May 2021

Semantic similarity and machine learning with ontologies.

Brief Bioinform 2020 Oct 13. Epub 2020 Oct 13.

King Abdullah University of Science and Technology.

Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbaa199DOI Listing
October 2020

EMC10 homozygous variant identified in a family with global developmental delay, mild intellectual disability, and speech delay.

Clin Genet 2020 12 15;98(6):555-561. Epub 2020 Sep 15.

Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences, Ministry of National Guard Health Affairs (MNGH), Riyadh, Saudi Arabia.

In recent years, several genes have been implicated in the variable disease presentation of global developmental delay (GDD) and intellectual disability (ID). The endoplasmic reticulum membrane protein complex (EMC) family is known to be involved in GDD and ID. Homozygous variants of EMC1 are associated with GDD, scoliosis, and cerebellar atrophy, indicating the relevance of this pathway for neurogenetic disorders. EMC10 is a bone marrow-derived angiogenic growth factor that plays an important role in infarct vascularization and promoting tissue repair. However, this gene has not been previously associated with human disease. Herein, we describe a Saudi family with two individuals segregating a recessive neurodevelopmental disorder. Both of the affected individuals showed mild ID, speech delay, and GDD. Whole-exome sequencing (WES) and Sanger sequencing were performed to identify candidate genes. Further, to elucidate the functional effects of the variant, quantitative real-time PCR (RT-qPCR)-based expression analysis was performed. WES revealed a homozygous splice acceptor site variant (c.679-1G>A) in EMC10 (chromosome 19q13.33) that segregated perfectly within the family. RT-qPCR showed a substantial decrease in the relative EMC10 gene expression in the patients, indicating the pathogenicity of the identified variant. For the first time in the literature, the EMC10 gene variant was associated with mild ID, speech delay, and GDD. Thus, this gene plays a key role in developmental milestones, with the potential to cause neurodevelopmental disorders in humans.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/cge.13842DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7756316PMC
December 2020

What is the right sequencing approach? Solo VS extended family analysis in consanguineous populations.

BMC Med Genomics 2020 07 17;13(1):103. Epub 2020 Jul 17.

Division of Genetics, Department of Pediatrics, King Abdulaziz Medical City, Riyadh, Saudi Arabia.

Background: Testing strategies is crucial for genetics clinics and testing laboratories. In this study, we tried to compare the hit rate between solo and trio and trio plus testing and between trio and sibship testing. Finally, we studied the impact of extended family analysis, mainly in complex and unsolved cases.

Methods: Three cohorts were used for this analysis: one cohort to assess the hit rate between solo, trio and trio plus testing, another cohort to examine the impact of the testing strategy of sibship genome vs trio-based analysis, and a third cohort to test the impact of an extended family analysis of up to eight family members to lower the number of candidate variants.

Results: The hit rates in solo, trio and trio plus testing were 39, 40, and 41%, respectively. The total number of candidate variants in the sibship testing strategy was 117 variants compared to 59 variants in the trio-based analysis. We noticed that the average number of coding candidate variants in trio-based analysis was 1192 variants and 26,454 noncoding variants, and this number was lowered by 50-75% after adding additional family members, with up to two coding and 66 noncoding homozygous variants only, in families with eight family members.

Conclusion: There was no difference in the hit rate between solo and extended family members. Trio-based analysis was a better approach than sibship testing, even in a consanguineous population. Finally, each additional family member helped to narrow down the number of variants by 50-75%. Our findings could help clinicians, researchers and testing laboratories select the most cost-effective and appropriate sequencing approach for their patients. Furthermore, using extended family analysis is a very useful tool for complex cases with novel genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12920-020-00743-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7368798PMC
July 2020

DDIEM: drug database for inborn errors of metabolism.

Orphanet J Rare Dis 2020 06 11;15(1):146. Epub 2020 Jun 11.

Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Kingdom of Saudi Arabia.

Background: Inborn errors of metabolism (IEM) represent a subclass of rare inherited diseases caused by a wide range of defects in metabolic enzymes or their regulation. Of over a thousand characterized IEMs, only about half are understood at the molecular level, and overall the development of treatment and management strategies has proved challenging. An overview of the changing landscape of therapeutic approaches is helpful in assessing strategic patterns in the approach to therapy, but the information is scattered throughout the literature and public data resources.

Results: We gathered data on therapeutic strategies for 300 diseases into the Drug Database for Inborn Errors of Metabolism (DDIEM). Therapeutic approaches, including both successful and ineffective treatments, were manually classified by their mechanisms of action using a new ontology.

Conclusions: We present a manually curated, ontologically formalized knowledgebase of drugs, therapeutic procedures, and mitigated phenotypes. DDIEM is freely available through a web interface and for download at http://ddiem.phenomebrowser.net.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13023-020-01428-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7291537PMC
June 2020

BioHackathon 2015: Semantics of data for life sciences and reproducible research.

F1000Res 2020 24;9:136. Epub 2020 Feb 24.

St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Darlinghurst, Australia.

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.18236.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7141167PMC
February 2021

Combining lexical and context features for automatic ontology extension.

J Biomed Semantics 2020 01 13;11(1). Epub 2020 Jan 13.

Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.

Background: Ontologies are widely used across biology and biomedicine for the annotation of databases. Ontology development is often a manual, time-consuming, and expensive process. Automatic or semi-automatic identification of classes that can be added to an ontology can make ontology development more efficient.

Results: We developed a method that uses machine learning and word embeddings to identify words and phrases that are used to refer to an ontology class in biomedical Europe PMC full-text articles. Once labels and synonyms of a class are known, we use machine learning to identify the super-classes of a class. For this purpose, we identify lexical term variants, use word embeddings to capture context information, and rely on automated reasoning over ontologies to generate features, and we use an artificial neural network as classifier. We demonstrate the utility of our approach in identifying terms that refer to diseases in the Human Disease Ontology and to distinguish between different types of diseases.

Conclusions: Our method is capable of discovering labels that refer to a class in an ontology but are not present in an ontology, and it can identify whether a class should be a subclass of some high-level ontology classes. Our approach can therefore be used for the semi-automatic extension and quality control of ontologies. The algorithm, corpora and evaluation datasets are available at https://github.com/bio-ontology-research-group/ontology-extension.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13326-019-0218-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6958746PMC
January 2020

Formal axioms in biomedical ontologies improve analysis and interpretation of associated data.

Bioinformatics 2020 04;36(7):2229-2236

Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

Motivation: Over the past years, significant resources have been invested into formalizing biomedical ontologies. Formal axioms in ontologies have been developed and used to detect and ensure ontology consistency, find unsatisfiable classes, improve interoperability, guide ontology extension through the application of axiom-based design patterns and encode domain background knowledge. The domain knowledge of biomedical ontologies may have also the potential to provide background knowledge for machine learning and predictive modelling.

Results: We use ontology-based machine learning methods to evaluate the contribution of formal axioms and ontology meta-data to the prediction of protein-protein interactions and gene-disease associations. We find that the background knowledge provided by the Gene Ontology and other ontologies significantly improves the performance of ontology-based prediction models through provision of domain-specific background knowledge. Furthermore, we find that the labels, synonyms and definitions in ontologies can also provide background knowledge that may be exploited for prediction. The axioms and meta-data of different ontologies contribute to improving data analysis in a context-specific manner. Our results have implications on the further development of formal knowledge bases and ontologies in the life sciences, in particular as machine learning methods are more frequently being applied. Our findings motivate the need for further development, and the systematic, application-driven evaluation and improvement, of formal axioms in ontologies.

Availability And Implementation: https://github.com/bio-ontology-research-group/tsoe.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz920DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7141863PMC
April 2020

Ontology-based prediction of cancer driver genes.

Sci Rep 2019 11 22;9(1):17405. Epub 2019 Nov 22.

Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.

Identifying and distinguishing cancer driver genes among thousands of candidate mutations remains a major challenge. Accurate identification of driver genes and driver mutations is critical for advancing cancer research and personalizing treatment based on accurate stratification of patients. Due to inter-tumor genetic heterogeneity many driver mutations within a gene occur at low frequencies, which make it challenging to distinguish them from non-driver mutations. We have developed a novel method for identifying cancer driver genes. Our approach utilizes multiple complementary types of information, specifically cellular phenotypes, cellular locations, functions, and whole body physiological phenotypes as features. We demonstrate that our method can accurately identify known cancer driver genes and distinguish between their role in different types of cancer. In addition to confirming known driver genes, we identify several novel candidate driver genes. We demonstrate the utility of our method by validating its predictions in nasopharyngeal cancer and colorectal cancer using whole exome and whole genome sequencing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-019-53454-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6874647PMC
November 2019

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Genome Biol 2019 11 19;20(1):244. Epub 2019 Nov 19.

Departments of Bioengineering and Mechanical Engineering, Berkeley, CA, USA.

Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.

Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.

Conclusion: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-019-1835-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6864930PMC
November 2019

Ontology based mining of pathogen-disease associations from literature.

J Biomed Semantics 2019 09 18;10(1):15. Epub 2019 Sep 18.

Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.

Background: Infectious diseases claim millions of lives especially in the developing countries each year. Identification of causative pathogens accurately and rapidly plays a key role in the success of treatment. To support infectious disease research and mechanisms of infection, there is a need for an open resource on pathogen-disease associations that can be utilized in computational studies. A large number of pathogen-disease associations is available from the literature in unstructured form and we need automated methods to extract the data.

Results: We developed a text mining system designed for extracting pathogen-disease relations from literature. Our approach utilizes background knowledge from an ontology and statistical methods for extracting associations between pathogens and diseases. In total, we extracted a total of 3420 pathogen-disease associations from literature. We integrated our literature-derived associations into a database which links pathogens to their phenotypes for supporting infectious disease research.

Conclusions: To the best of our knowledge, we present the first study focusing on extracting pathogen-disease associations from publications. We believe the text mined data can be utilized as a valuable resource for infectious disease research. All the data is publicly available from https://github.com/bio-ontology-research-group/padimi and through a public SPARQL endpoint from http://patho.phenomebrowser.net/ .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13326-019-0208-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6751637PMC
September 2019

DeepGOPlus: improved protein function prediction from sequence.

Bioinformatics 2020 01;36(2):422-429

Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia.

Motivation: Protein function prediction is one of the major tasks of bioinformatics that can help in wide range of biological problems such as understanding disease mechanisms or finding drug targets. Many methods are available for predicting protein functions from sequence based features, protein-protein interaction networks, protein structure or literature. However, other than sequence, most of the features are difficult to obtain or not available for many proteins thereby limiting their scope. Furthermore, the performance of sequence-based function prediction methods is often lower than methods that incorporate multiple features and predicting protein functions may require a lot of time.

Results: We developed a novel method for predicting protein functions from sequence alone which combines deep convolutional neural network (CNN) model with sequence similarity based predictions. Our CNN model scans the sequence for motifs which are predictive for protein functions and combines this with functions of similar proteins (if available). We evaluate the performance of DeepGOPlus using the CAFA3 evaluation measures and achieve an Fmax of 0.390, 0.557 and 0.614 for BPO, MFO and CCO evaluations, respectively. These results would have made DeepGOPlus one of the three best predictors in CCO and the second best performing method in the BPO and MFO evaluations. We also compare DeepGOPlus with state-of-the-art methods such as DeepText2GO and GOLabeler on another dataset. DeepGOPlus can annotate around 40 protein sequences per second on common hardware, thereby making fast and accurate function predictions available for a wide range of proteins.

Availability And Implementation: http://deepgoplus.bio2vec.net/ .

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz595DOI Listing
January 2020

FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration.

NPJ Sci Food 2018 18;2:23. Epub 2018 Dec 18.

1Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC Canada.

The construction of high capacity data sharing networks to support increasing government and commercial data exchange has highlighted a key roadblock: the content of existing Internet-connected information remains siloed due to a multiplicity of local languages and data dictionaries. This lack of a digital lingua franca is obvious in the domain of human food as materials travel from their wild or farm origin, through processing and distribution chains, to consumers. Well defined, hierarchical vocabulary, connected with logical relationships-in other words, an ontology-is urgently needed to help tackle data harmonization problems that span the domains of food security, safety, quality, production, distribution, and consumer health and convenience. FoodOn (http://foodon.org) is a consortium-driven project to build a comprehensive and easily accessible global farm-to-fork ontology about food, that accurately and consistently describes foods commonly known in cultures from around the world. FoodOn addresses food product terminology gaps and supports food traceability. Focusing on human and domesticated animal food description, FoodOn contains animal and plant food sources, food categories and products, and other facets like preservation processes, contact surfaces, and packaging. Much of FoodOn's vocabulary comes from transforming LanguaL, a mature and popular food indexing thesaurus, into a World Wide Web Consortium (W3C) OWL Web Ontology Language-formatted vocabulary that provides system interoperability, quality control, and software-driven intelligence. FoodOn compliments other technologies facilitating food traceability, which is becoming critical in this age of increasing globalization of food networks.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41538-018-0032-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6550238PMC
December 2018

PathoPhenoDB, linking human pathogens to their phenotypes in support of infectious disease research.

Sci Data 2019 Jun 3;6(1):79. Epub 2019 Jun 3.

Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.

Understanding the relationship between the pathophysiology of infectious disease, the biology of the causative agent and the development of therapeutic and diagnostic approaches is dependent on the synthesis of a wide range of types of information. Provision of a comprehensive and integrated disease phenotype knowledgebase has the potential to provide novel and orthogonal sources of information for the understanding of infectious agent pathogenesis, and support for research on disease mechanisms. We have developed PathoPhenoDB, a database containing pathogen-to-phenotype associations. PathoPhenoDB relies on manual curation of pathogen-disease relations, on ontology-based text mining as well as manual curation to associate host disease phenotypes with infectious agents. Using Semantic Web technologies, PathoPhenoDB also links to knowledge about drug resistance mechanisms and drugs used in the treatment of infectious diseases. PathoPhenoDB is accessible at http://patho.phenomebrowser.net/ , and the data are freely available through a public SPARQL endpoint.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-019-0090-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546783PMC
June 2019

Hyaline Arteriolosclerosis in 30 Strains of Aged Inbred Mice.

Vet Pathol 2019 09 6;56(5):799-806. Epub 2019 May 6.

3 The Jackson Laboratory, Bar Harbor, ME, USA.

During a screen for vascular phenotypes in aged laboratory mice, a unique discrete phenotype of hyaline arteriolosclerosis of the intertubular arteries and arterioles of the testes was identified in several inbred strains. Lesions were limited to the testes and did not occur as part of any renal, systemic, or pulmonary arteriopathy or vasculitis phenotype. There was no evidence of systemic or pulmonary hypertension, and lesions did not occur in ovaries of females. Frequency was highest in males of the SM/J (27/30, 90%) and WSB/EiJ (19/26, 73%) strains, aged 383 to 847 days. Lesions were sporadically present in males from several other inbred strains at a much lower (<20%) frequency. The risk of testicular hyaline arteriolosclerosis is at least partially underpinned by a genetic predisposition that is not associated with other vascular lesions (including vasculitis), separating out the etiology of this form and site of arteriolosclerosis from other related conditions that often co-occur in other strains of mice and in humans. Because of their genetic uniformity and controlled dietary and environmental conditions, mice are an excellent model to dissect the pathogenesis of human disease conditions. In this study, a discrete genetically driven phenotype of testicular hyaline arteriolosclerosis in aging mice was identified. These observations open the possibility of identifying the underlying genetic variant(s) associated with the predisposition and therefore allowing future interrogation of the pathogenesis of this condition.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1177/0300985819844822DOI Listing
September 2019

Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies.

Sci Rep 2019 03 11;9(1):4025. Epub 2019 Mar 11.

King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, 23955-6900, Saudi Arabia.

Data are increasingly annotated with multiple ontologies to capture rich information about the features of the subject under investigation. Analysis may be performed over each ontology separately, but recently there has been a move to combine multiple ontologies to provide more powerful analytical possibilities. However, it is often not clear how to combine ontologies or how to assess or evaluate the potential design patterns available. Here we use a large and well-characterized dataset of anatomic pathology descriptions from a major study of aging mice. We show how different design patterns based on the MPATH and MA ontologies provide orthogonal axes of analysis, and perform differently in over-representation and semantic similarity applications. We discuss how such a data-driven approach might be used generally to generate and evaluate ontology design patterns.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-019-40368-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6411989PMC
March 2019

Ontology based text mining of gene-phenotype associations: application to candidate gene prediction.

Database (Oxford) 2019 01;2019

Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia.

Gene-phenotype associations play an important role in understanding the disease mechanisms which is a requirement for treatment development. A portion of gene-phenotype associations are observed mainly experimentally and made publicly available through several standard resources such as MGI. However, there is still a vast amount of gene-phenotype associations buried in the biomedical literature. Given the large amount of literature data, we need automated text mining tools to alleviate the burden in manual curation of gene-phenotype associations and to develop comprehensive resources. In this study, we present an ontology-based approach in combination with statistical methods to text mine gene-phenotype associations from the literature. Our method achieved AUC values of 0.90 and 0.75 in recovering known gene-phenotype associations from HPO and MGI respectively. We posit that candidate genes and their relevant diseases should be expressed with similar phenotypes in publications. Thus, we demonstrate the utility of our approach by predicting disease candidate genes based on the semantic similarities of phenotypes associated with genes and diseases. To the best of our knowledge, this is the first study using an ontology based approach to extract gene-phenotype associations from the literature. We evaluated our disease candidate prediction model on the gene-disease associations from MGI. Our model achieved AUC values of 0.90 and 0.87 on OMIM (human) and MGI (mouse) datasets of gene-disease associations respectively. Our manual analysis on the text mined data revealed that our method can accurately extract gene-phenotype associations which are not currently covered by the existing public gene-phenotype resources. Overall, results indicate that our method can precisely extract known as well as new gene-phenotype associations from literature. All the data and methods are available at https://github.com/bio-ontology-research-group/genepheno.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/baz019DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6391585PMC
January 2019

DeepPVP: phenotype-based prioritization of causative variants using deep learning.

BMC Bioinformatics 2019 Feb 6;20(1):65. Epub 2019 Feb 6.

Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Kingdom of Saudi Arabia.

Background: Prioritization of variants in personal genomic data is a major challenge. Recently, computational methods that rely on comparing phenotype similarity have shown to be useful to identify causative variants. In these methods, pathogenicity prediction is combined with a semantic similarity measure to prioritize not only variants that are likely to be dysfunctional but those that are likely involved in the pathogenesis of a patient's phenotype.

Results: We have developed DeepPVP, a variant prioritization method that combined automated inference with deep neural networks to identify the likely causative variants in whole exome or whole genome sequence data. We demonstrate that DeepPVP performs significantly better than existing methods, including phenotype-based methods that use similar features. DeepPVP is freely available at https://github.com/bio-ontology-research-group/phenomenet-vp .

Conclusions: DeepPVP further improves on existing variant prioritization methods both in terms of speed as well as accuracy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-019-2633-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364462PMC
February 2019

A Review of Current Standards and the Evolution of Histopathology Nomenclature for Laboratory Animals.

ILAR J 2018 12;59(1):29-39

Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine.

The need for international collaboration in rodent pathology has evolved since the 1970s and was initially driven by the new field of toxicologic pathology. First initiated by the World Health Organization's International Agency for Research on Cancer for rodents, it has evolved to include pathology of the major species (rats, mice, guinea pigs, nonhuman primates, pigs, dogs, fish, rabbits) used in medical research, safety assessment, and mouse pathology. The collaborative effort today is driven by the needs of the regulatory agencies in multiple countries, and by needs of research involving genetically engineered animals, for "basic" research and for more translational preclinical models of human disease. These efforts led to the establishment of an international rodent pathology nomenclature program. Since that time, multiple collaborations for standardization of laboratory animal pathology nomenclature and diagnostic criteria have been developed, and just a few are described herein. Recently, approaches to a nomenclature that is amenable to sophisticated computation have been made available and implemented for large-scale programs in functional genomics and aging. Most terminologies continue to evolve as the science of human and veterinary pathology continues to develop, but standardization and successful implementation remain critical for scientific communication now as ever in the history of veterinary nosology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/ilar/ily005DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6927895PMC
December 2018

Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes.

Bioinformatics 2018 09;34(17):i901-i907

Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.

Motivation: In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease's (or patient's) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse.

Results: We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprised of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network.

Availability And Implementation: https://github.com/bio-ontology-research-group/SmuDGE.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty559DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129260PMC
September 2018

Ontology-based validation and identification of regulatory phenotypes.

Bioinformatics 2018 09;34(17):i857-i865

Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Centre, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.

Motivation: Function annotations of gene products, and phenotype annotations of genotypes, provide valuable information about molecular mechanisms that can be utilized by computational methods to identify functional and phenotypic relatedness, improve our understanding of disease and pathobiology, and lead to discovery of drug targets. Identifying functions and phenotypes commonly requires experiments which are time-consuming and expensive to carry out; creating the annotations additionally requires a curator to make an assertion based on reported evidence. Support to validate the mutual consistency of functional and phenotype annotations as well as a computational method to predict phenotypes from function annotations, would greatly improve the utility of function annotations.

Results: We developed a novel ontology-based method to validate the mutual consistency of function and phenotype annotations. We apply our method to mouse and human annotations, and identify several inconsistencies that can be resolved to improve overall annotation quality. We also apply our method to the rule-based prediction of regulatory phenotypes from functions and demonstrate that we can predict these phenotypes with Fmax of up to 0.647.

Availability And Implementation: https://github.com/bio-ontology-research-group/phenogocon.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty605DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129279PMC
September 2018

OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction.

Bioinformatics 2019 06;35(12):2133-2140

Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

Motivation: Ontologies are widely used in biology for data annotation, integration and analysis. In addition to formally structured axioms, ontologies contain meta-data in the form of annotation axioms which provide valuable pieces of information that characterize ontology classes. Annotation axioms commonly used in ontologies include class labels, descriptions or synonyms. Despite being a rich source of semantic information, the ontology meta-data are generally unexploited by ontology-based analysis methods such as semantic similarity measures.

Results: We propose a novel method, OPA2Vec, to generate vector representations of biological entities in ontologies by combining formal ontology axioms and annotation axioms from the ontology meta-data. We apply a Word2Vec model that has been pre-trained on either a corpus or abstracts or full-text articles to produce feature vectors from our collected data. We validate our method in two different ways: first, we use the obtained vector representations of proteins in a similarity measure to predict protein-protein interaction on two different datasets. Second, we evaluate our method on predicting gene-disease associations based on phenotype similarity by generating vector representations of genes and diseases using a phenotype ontology, and applying the obtained vectors to predict gene-disease associations using mouse model phenotypes. We demonstrate that OPA2Vec significantly outperforms existing methods for predicting gene-disease associations. Using evidence from mouse models, we apply OPA2Vec to identify candidate genes for several thousand rare and orphan diseases. OPA2Vec can be used to produce vector representations of any biomedical entity given any type of biomedical ontology.

Availability And Implementation: https://github.com/bio-ontology-research-group/opa2vec.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty933DOI Listing
June 2019

OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants.

Sci Rep 2018 10 2;8(1):14681. Epub 2018 Oct 2.

Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.

An increasing number of disorders have been identified for which two or more distinct alleles in two or more genes are required to either cause the disease or to significantly modify its onset, severity or phenotype. It is difficult to discover such interactions using existing approaches. The purpose of our work is to develop and evaluate a system that can identify combinations of alleles underlying digenic and oligogenic diseases in individual whole exome or whole genome sequences. Information that links patient phenotypes to databases of gene-phenotype associations observed in clinical or non-human model organism research can provide useful information and improve variant prioritization for genetic diseases. Additional background knowledge about interactions between genes can be utilized to identify sets of variants in different genes in the same individual which may then contribute to the overall disease phenotype. We have developed OligoPVP, an algorithm that can be used to prioritize causative combinations of variants in digenic and oligogenic diseases, using whole exome or whole genome sequences together with patient phenotypes as input. We demonstrate that OligoPVP has significantly improved performance when compared to state of the art pathogenicity detection methods in the case of digenic diseases. Our results show that OligoPVP can efficiently prioritize sets of variants in digenic diseases using a phenotype-driven approach and identify etiologically important variants in whole genomes. OligoPVP naturally extends to oligogenic disease involving interactions between variants in two or more genes. It can be applied to the identification of multiple interacting candidate variants contributing to phenotype, where the action of modifier genes is suspected from pedigree analysis or failure of traditional causative variant identification.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-018-32876-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6168481PMC
October 2018

Nail abnormalities identified in an ageing study of 30 inbred mouse strains.

Exp Dermatol 2019 04 4;28(4):383-390. Epub 2018 Sep 4.

The Jackson Laboratory, Bar Harbor, Maine.

In a large-scale ageing study, 30 inbred mouse strains were systematically screened for histologic evidence of lesions in all organ systems. Ten strains were diagnosed with similar nail abnormalities. The highest frequency was noted in NON/ShiLtJ mice. Lesions identified fell into two main categories: acute to chronic penetration of the third phalangeal bone through the hyponychium with associated inflammation and bone remodelling or metaplasia of the nail matrix and nail bed associated with severe orthokeratotic hyperkeratosis replacing the nail plate. Penetration of the distal phalanx through the hyponychium appeared to be the initiating feature resulting in nail abnormalities. The accompanying acute to subacute inflammatory response was associated with osteolysis of the distal phalanx. Evaluation of young NON/ShiLtJ mice revealed that these lesions were not often found, or affected only one digit. The only other nail unit abnormality identified was sporadic subungual epidermoid inclusion cysts which closely resembled similar lesions in human patients. These abnormalities, being age-related developments, may have contributed to weight loss due to impacts upon feeding and should be a consideration for future research due to the potential to interact with other experimental factors in ageing studies using the affected strains of mice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/exd.13759DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6360140PMC
April 2019