Publications by authors named "Peter N Robinson"

231 Publications

Characterizing Long COVID: Deep Phenotype of a Complex Condition.

EBioMedicine 2021 Nov 25;74:103722. Epub 2021 Nov 25.

Department of Diagnostic and Health Sciences, University of Tennessee Health Science Center, 920 Madison Ave. Suite 518N, Memphis TN 38613.

Background: Numerous publications describe the clinical manifestations of post-acute sequelae of SARS-CoV-2 (PASC or "long COVID"), but they are difficult to integrate because of heterogeneous methods and the lack of a standard for denoting the many phenotypic manifestations. Patient-led studies are of particular importance for understanding the natural history of COVID-19, but integration is hampered because they often use different terms to describe the same symptom or condition. This significant disparity in patient versus clinical characterization motivated the proposed ontological approach to specifying manifestations, which will improve capture and integration of future long COVID studies.

Methods: The Human Phenotype Ontology (HPO) is a widely used standard for exchange and analysis of phenotypic abnormalities in human disease but has not yet been applied to the analysis of COVID-19.

Findings: We identified 303 articles published before April 29, 2021, curated 59 relevant manuscripts that described clinical manifestations in 81 cohorts three weeks or more following acute COVID-19, and mapped 287 unique clinical findings to HPO terms. We present layperson synonyms and definitions that can be used to link patient self-report questionnaires to standard medical terminology. Long COVID clinical manifestations are not assessed consistently across studies, and most manifestations have been reported with a wide range of synonyms by different authors. Across at least 10 cohorts, authors reported 31 unique clinical features corresponding to HPO terms; the most commonly reported feature was Fatigue (median 45.1%) and the least commonly reported was Nausea (median 3.9%), but the reported percentages varied widely between studies.

Interpretation: Translating long COVID manifestations into computable HPO terms will improve analysis, data capture, and classification of long COVID patients. If researchers, clinicians, and patients share a common language, then studies can be compared/pooled more effectively. Furthermore, mapping lay terminology to HPO will help patients assist clinicians and researchers in creating phenotypic characterizations that are computationally accessible, thereby improving the stratification, diagnosis, and treatment of long COVID.

Funding: U24TR002306; UL1TR001439; P30AG024832; GBMF4552; R01HG010067; UL1TR002535; K23HL128909; UL1TR002389; K99GM145411 .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ebiom.2021.103722DOI Listing
November 2021

Explainable Machine Learning for Early Assessment of COVID-19 Risk Prediction in Emergency Departments.

IEEE Access 2020 26;8:196299-196325. Epub 2020 Oct 26.

Department of Computer Science "Giovanni degli Antoni,"Università degli Studi di Milano 20133 Milan Italy.

Between January and October of 2020, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus has infected more than 34 million persons in a worldwide pandemic leading to over one million deaths worldwide (data from the Johns Hopkins University). Since the virus begun to spread, emergency departments were busy with COVID-19 patients for whom a quick decision regarding in- or outpatient care was required. The virus can cause characteristic abnormalities in chest radiographs (CXR), but, due to the low sensitivity of CXR, additional variables and criteria are needed to accurately predict risk. Here, we describe a computerized system primarily aimed at extracting the most relevant radiological, clinical, and laboratory variables for improving patient risk prediction, and secondarily at presenting an explainable machine learning system, which may provide simple decision criteria to be used by clinicians as a support for assessing patient risk. To achieve robust and reliable variable selection, Boruta and Random Forest (RF) are combined in a 10-fold cross-validation scheme to produce a variable importance estimate not biased by the presence of surrogates. The most important variables are then selected to train a RF classifier, whose rules may be extracted, simplified, and pruned to finally build an associative tree, particularly appealing for its simplicity. Results show that the radiological score automatically computed through a neural network is highly correlated with the score computed by radiologists, and that laboratory variables, together with the number of comorbidities, aid risk prediction. The prediction performance of our approach was compared to that that of generalized linear models and shown to be effective and robust. The proposed machine learning-based computational system can be easily deployed and used in emergency departments for rapid and accurate risk prediction in COVID-19 patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/ACCESS.2020.3034032DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8545262PMC
October 2020

100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report.

N Engl J Med 2021 11;385(20):1868-1880

From Genomics England (D.S., K.R.S., A.M., E.A.T., E.M.M., A.T., G.C., K.I., L.M., M. Wielscher, A.N., M. Bale, E.B., C.B., H.B., M. Bleda, A. Devereau, D.H., E. Haraldsdottir, Z.H., D.K., C. Patch, D.P., A.M., R. Sultana, M.R., A.L.T.T., C. Tregidgo, C. Turnbull, M. Welland, S. Wood, C.S., E.W., S.L., R.E.F., L.C.D., O.N., I.U.S.L., C.F.W., J.C., R.H.S., T.F., A.R., M.C.), the William Harvey Research Institute, Queen Mary University of London (D.S., K.R.S., V.C., A.T., L.M., M.R.B., D.K., S. Wood, P.C., J.O.J., T.F., M.C.), University College London (UCL) Institute of Ophthalmology (V.C., G.A., M.M., A.T.M., S. Malka, N.P., P.Y.-W.-M., A.R.W.), UCL Genetics Institute (V.C., N.W.W.), GOSgene (H.J.W.), Genetics and Genomic Medicine Programme (L.V., M.R., M.D., L.C., P. Beales, M.B.-G.), National Institute for Health Research (NIHR) Great Ormond Street Hospital Biomedical Research Centre (BRC) (M.R., S. Grunewald, S.C.-L., F.M., C. Pilkington, L.R.W., L.C., P. Beales, M.B.-G.), Infection, Immunity, and Inflammation Research and Teaching Department (P.A., L.R.W.), Stem Cells and Regenerative Medicine (N.T.), and Mitochondrial Research Group (S. Rahman), UCL Great Ormond Street Institute of Child Health, UCL Ear Institute (L.V.), the Department of Renal Medicine (D. Bockenhauer), and Institute of Cardiovascular Science (P.E.), UCL, Moorfields Eye Hospital National Health Service (NHS) Foundation Trust (V.C., G.A., M.M., A.T.M., S. Malka, N.P., A.R.W.), the National Hospital for Neurology and Neurosurgery (J.V., E.O., J.Y., K. Newland, H.R.M., J.P., N.W.W., H.H.), the Metabolic Unit (L.A., S. Grunewald, S. Rahman), London Centre for Paediatric Endocrinology and Diabetes (M.D.), and the Department of Gastroenterology (N.T.), Great Ormond Street Hospital for Children NHS Foundation Trust (L.V., D. Bockenhauer, A. Broomfield, M.A.C., T. Lam, E.F., V.G., S.C.-L., F.M., C. Pilkington, R. Quinlivan, C.W., L.R.W., A. Worth, L.C., P. Beales, M.B.-G., R.H.S.), the Clinical Genetics Department (M.R., T.B., C. Compton, C.D., E. Haque, L.I., D.J., S. Mohammed, L.R., S. Rose, D.R., G.S., A.C.S., F.F., M.I.) and St. John's Institute of Dermatology (H.F., R. Sarkany), Guy's and St. Thomas' NHS Foundation Trust, the Division of Genetics and Epidemiology, Institute of Cancer Research (C. Turnbull), Florence Nightingale Faculty of Nursing, Midwifery, and Palliative Care (T.B.), Division of Genetics and Molecular Medicine (M.A.S.), and Division of Medical and Molecular Genetics (M.I.), King's College London, NIHR BRC at Moorfields Eye Hospital (P.Y.-W.-M.), NHS England and NHS Improvement, Skipton House (V.D., A. Douglas, S. Hill), and Imperial College Healthcare NHS Trust, Hammersmith Hospital (K. Naresh), London, Open Targets and European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton (E.M.M.), the Division of Evolution and Genomic Sciences, Faculty of Biology, Medicine, and Health, University of Manchester (J.M.E., S.B., J.C.-S., S.D., G.H., H.B.T., R.T.O., G. Black, W.N.), and the Manchester Centre for Genomic Medicine, St. Mary's Hospital, Manchester University NHS Foundation Trust (J.M.E., Z.H., S.B., J.C.-S., S.D., G.H., G. Black, W.N.), Manchester, the Department of Genetic and Genomic Medicine, Institute of Medical Genetics, Cardiff University, Cardiff (H.J.W.), the Department of Clinical Neurosciences (T.R., W.W., R.H., P.F.C.), the Medical Research Council (MRC) Mitochondrial Biology Unit (T.R., W.W., P.Y.-W.-M., P.F.C.), the Department of Paediatrics (T.R.), the Department of Haematology (K.S., C. Penkett, S. Gräf, R.M., W.H.O., A.R.), the School of Clinical Medicine (K.R., E.L., R.A.F., K.P., F.L.R.), the Department of Medicine (S. Gräf), and Cambridge Centre for Brain Repair, Department of Clinical Neurosciences (P.Y.-W.-M.), University of Cambridge, NIHR BioResource, Cambridge University Hospitals (K.S., S.A., R.J., C. Penkett, E.D., S. Gräf, R.M., M.K., J.R.B., P.F.C., W.H.O., F.L.R.), and Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust (G.F., P.T., O.S.-B., S. Halsall, K.P., A. Wagner, S.G.M., N.B., M.K.), Cambridge Biomedical Campus, Wellcome-MRC Institute of Metabolic Science and NIHR Cambridge BRC (M.G.), Congenica (A.H., H.S.), Illumina Cambridge (A. Wolejko, B.H., G. Burns, S. Hunter, R.J.G., S.J.H., D. Bentley), NHS Blood and Transplant (W.H.O.), and Wellcome Sanger Institute (W.H.O.), Cambridge, the Health Economics Research Centre (J. Buchanan, S. Wordsworth) and the Wellcome Centre for Human Genetics (C. Camps, J.C.T.), University of Oxford, NIHR Oxford BRC (J. Buchanan, S. Wordsworth, J.D., C. Crichton, J.W., K.W., C. Camps, S.P., N.B.A.R., A.S., J.T., J.C.T.), the Oxford Centre for Genomic Medicine (A. de Burca, A.H.N.), and the Departments of Haematology (N.B.A.R.) and Neurology (A.S.), Oxford University Hospitals NHS Foundation Trust, Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Churchill Hospital (C. Campbell, K.G., T. Lester, J.T.), the MRC Weatherall Institute of Molecular Medicine (N.K., N.B.A.R., A.O.M.W.) and the Oxford Epilepsy Research Group (A.S.), Nuffield Department of Clinical Neurosciences (A.H.N.), University of Oxford, and the Department of Clinical Immunology (S.P.), John Radcliffe Hospital, Oxford, Peninsula Clinical Genetics Service, Royal Devon and Exeter NHS Foundation Trust (E.B.), and the University of Exeter Medical School (E.B., C.F.W.), Royal Devon and Exeter Hospital (S.E.), Exeter, Newcastle Eye Centre, Royal Victoria Infirmary (A.C.B.), the Institute of Genetic Medicine, Newcastle University, International Centre for Life (V.S., P. Brennan), Wellcome Centre for Mitochondrial Research, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University (G.S.G., R.H., A.M.S., D.M.T., R. Quinton, R.M., R.W.T., J.A.S.), Highly Specialised Mitochondrial Service (G.S.G., A.M.S., D.M.T., R.M., R.W.T.) and Northern Genetics Service (J. Burn), Newcastle upon Tyne Hospitals NHS Foundation Trust (J.A.S.), and NIHR Newcastle BRC (G.S.G., D.M.T., J.A.S.), Newcastle upon Tyne, the Institute of Cancer and Genomic Sciences, Institute of Biomedical Research, University of Birmingham (C. Palles), and Birmingham Women's Hospital (D.M.), Birmingham, the Genomic Informatics Group (E.G.S.), University Hospital Southampton (I.K.T.), and the University of Southampton (I.K.T.), Southampton, Liverpool Women's NHS Foundation Trust, Liverpool (A. Douglas), the School of Cellular and Molecular Medicine, University of Bristol, Bristol (A.D.M.), and Yorkshire and Humber, Sheffield Children's Hospital, Sheffield (G.W.) - all in the United Kingdom; Fabric Genomics, Oakland (M. Babcock, M.G.R.), and the Ophthalmology Department, University of California, San Francisco School of Medicine, San Francisco (A.T.M.) - both in California; the Jackson Laboratory for Genomic Medicine, Farmington, CT (P.N.R.); and the Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis (M.H.).

Background: The U.K. 100,000 Genomes Project is in the process of investigating the role of genome sequencing in patients with undiagnosed rare diseases after usual care and the alignment of this research with health care implementation in the U.K. National Health Service. Other parts of this project focus on patients with cancer and infection.

Methods: We conducted a pilot study involving 4660 participants from 2183 families, among whom 161 disorders covering a broad spectrum of rare diseases were present. We collected data on clinical features with the use of Human Phenotype Ontology terms, undertook genome sequencing, applied automated variant prioritization on the basis of applied virtual gene panels and phenotypes, and identified novel pathogenic variants through research analysis.

Results: Diagnostic yields varied among family structures and were highest in family trios (both parents and a proband) and families with larger pedigrees. Diagnostic yields were much higher for disorders likely to have a monogenic cause (35%) than for disorders likely to have a complex cause (11%). Diagnostic yields for intellectual disability, hearing disorders, and vision disorders ranged from 40 to 55%. We made genetic diagnoses in 25% of the probands. A total of 14% of the diagnoses were made by means of the combination of research and automated approaches, which was critical for cases in which we found etiologic noncoding, structural, and mitochondrial genome variants and coding variants poorly covered by exome sequencing. Cohortwide burden testing across 57,000 genomes enabled the discovery of three new disease genes and 19 new associations. Of the genetic diagnoses that we made, 25% had immediate ramifications for clinical decision making for the patients or their relatives.

Conclusions: Our pilot study of genome sequencing in a national health care system showed an increase in diagnostic yield across a range of rare diseases. (Funded by the National Institute for Health Research and others.).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1056/NEJMoa2035790DOI Listing
November 2021

Response to Biesecker et al.

Am J Hum Genet 2021 09;108(9):1807-1808

Departments of Pediatrics, Obstetrics and Gynecology, and Epidemiology, University of Florida College of Medicine and College of Public Health and Health Professions, Gainesville, FL 32610, USA.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2021.07.004DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8456153PMC
September 2021

Interpretable prioritization of splice variants in diagnostic next-generation sequencing.

Am J Hum Genet 2021 09 21;108(9):1564-1577. Epub 2021 Jul 21.

The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA. Electronic address:

A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5' and 3' ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splice-altering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state-of-the-art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes, and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2021.06.014DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8456162PMC
September 2021

HEMDAG: a family of modular and scalable hierarchical ensemble methods to improve Gene Ontology term prediction.

Bioinformatics 2021 Jul 7. Epub 2021 Jul 7.

AnacletoLab-Dipartimento di Informatica, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy.

Motivation: Automated protein function prediction is a complex multi-class, multi-label, structured classification problem in which protein functions are organized in a controlled vocabulary, according to the Gene Ontology (GO). "Hierarchy-unaware" classifiers, also known as "flat" methods, predict GO terms without exploiting the inherent structure of the ontology, potentially violating the True-Path-Rule (TPR) that governs the GO, while "hierarchy-aware" approaches, even if they obey the TPR, do not always show clear improvements with respect to flat methods, or do not scale well when applied to the full GO.

Results: To overcome these limitations, we propose Hierarchical Ensemble Methods for Directed Acyclic Graphs (HEMDAG), a family of highly modular hierarchical ensembles of classifiers, able to build upon any flat method and to provide "TPR-safe" predictions, by leveraging a combination of isotonic regression and TPR learning strategies. Extensive experiments on synthetic and real data across several organisms firstly show that HEMDAG can be used as a general tool to improve the predictions of flat classifiers, and secondly that HEMDAG is competitive versus state-of-the-art hierarchy-aware learning methods proposed in the last CAFA international challenges.

Availability: Fully-tested R code freely available at https://anaconda.org/bioconda/r-hemdag. Tutorial and documentation at https://hemdag.readthedocs.io.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab485DOI Listing
July 2021

Betacoronavirus-specific alternate splicing.

bioRxiv 2021 Jul 2. Epub 2021 Jul 2.

Viruses can subvert a number of cellular processes in order to block innate antiviral responses, and many viruses interact with cellular splicing machinery. SARS-CoV-2 infection was shown to suppress global mRNA splicing, and at least 10 SARS-CoV-2 proteins bind specifically to one or more human RNAs. Here, we investigate 17 published experimental and clinical datasets related to SARS-CoV-2 infection as well as datasets from the betacoronaviruses SARS-CoV and MERS as well as Streptococcus pneumonia, HCV, Zika virus, Dengue virus, influenza H3N2, and RSV. We show that genes showing differential alternative splicing in SARS-CoV-2 have a similar functional profile to those of SARS-CoV and MERS and affect a diverse set of genes and biological functions, including many closely related to virus biology. Additionally, the differentially spliced transcripts of cells infected by coronaviruses were more likely to undergo intron-retention, contain a pseudouridine modification and a smaller number of exons than differentially spliced transcripts in the control groups. Viral load in clinical COVID-19 samples was correlated with isoform distribution of differentially spliced genes. A significantly higher number of ribosomal genes are affected by DAS and DGE in betacoronavirus samples, and the betacoronavirus differentially spliced genes are depleted for binding sites of RNA-binding proteins. Our results demonstrate characteristic patterns of differential splicing in cells infected by SARS-CoV-2, SARS-CoV, and MERS, potentially modifying a broad range of cellular functions and affecting a diverse set of genes and biological functions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2021.07.02.450920DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8259905PMC
July 2021

E2F6 initiates stable epigenetic silencing of germline genes during embryonic development.

Nat Commun 2021 06 11;12(1):3582. Epub 2021 Jun 11.

University of Strasbourg, Strasbourg, France.

In mouse development, long-term silencing by CpG island DNA methylation is specifically targeted to germline genes; however, the molecular mechanisms of this specificity remain unclear. Here, we demonstrate that the transcription factor E2F6, a member of the polycomb repressive complex 1.6 (PRC1.6), is critical to target and initiate epigenetic silencing at germline genes in early embryogenesis. Genome-wide, E2F6 binds preferentially to CpG islands in embryonic cells. E2F6 cooperates with MGA to silence a subgroup of germline genes in mouse embryonic stem cells and in embryos, a function that critically depends on the E2F6 marked box domain. Inactivation of E2f6 leads to a failure to deposit CpG island DNA methylation at these genes during implantation. Furthermore, E2F6 is required to initiate epigenetic silencing in early embryonic cells but becomes dispensable for the maintenance in differentiated cells. Our findings elucidate the mechanisms of epigenetic targeting of germline genes and provide a paradigm for how transient repression signals by DNA-binding factors in early embryonic cells are translated into long-term epigenetic silencing during mouse development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-021-23596-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8195999PMC
June 2021

Curation and expansion of Human Phenotype Ontology for defined groups of inborn errors of immunity.

J Allergy Clin Immunol 2021 May 12. Epub 2021 May 12.

Department of Immunology, Great Ormond Street (GOS) Hospital for Children NHS Foundation Trust, London, United Kingdom.

Background: Accurate, detailed, and standardized phenotypic descriptions are essential to support diagnostic interpretation of genetic variants and to discover new diseases. The Human Phenotype Ontology (HPO), extensively used in rare disease research, provides a rich collection of vocabulary with standardized phenotypic descriptions in a hierarchical structure. However, to date, the use of HPO has not yet been widely implemented in the field of inborn errors of immunity (IEIs), mainly due to a lack of comprehensive IEI-related terms.

Objectives: We sought to systematically review available terms in HPO for the depiction of IEIs, to expand HPO, yielding more comprehensive sets of terms, and to reannotate IEIs with HPO terms to provide accurate, standardized phenotypic descriptions.

Methods: We initiated a collaboration involving expert clinicians, geneticists, researchers working on IEIs, and bioinformaticians. Multiple branches of the HPO tree were restructured and extended on the basis of expert review. Our ontology-guided machine learning coupled with a 2-tier expert review was applied to reannotate defined subgroups of IEIs.

Results: We revised and expanded 4 main branches of the HPO tree. Here, we reannotated 73 diseases from 4 International Union of Immunological Societies-defined IEI disease subgroups with HPO terms. We achieved a 4.7-fold increase in the number of phenotypic terms per disease. Given the new HPO annotations, we demonstrated improved ability to computationally match selected IEI cases to their known diagnosis, and improved phenotype-driven disease classification.

Conclusions: Our targeted expansion and reannotation presents enhanced precision of disease annotation, will enable superior HPO-based IEI characterization, and hence benefit both IEI diagnostic and research activities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jaci.2021.04.033DOI Listing
May 2021

User testing of a diagnostic decision support system with machine-assisted chart review to facilitate clinical genomic diagnosis.

BMJ Health Care Inform 2021 May;28(1)

SimulConsult, Inc, Chestnut Hill, Massachusetts, USA.

Objectives: There is a need in clinical genomics for systems that assist in clinical diagnosis, analysis of genomic information and periodic reanalysis of results, and can use information from the electronic health record to do so. Such systems should be built using the concepts of human-centred design, fit within clinical workflows and provide solutions to priority problems.

Methods: We adapted a commercially available diagnostic decision support system (DDSS) to use extracted findings from a patient record and combine them with genomic variant information in the DDSS interface. Three representative patient cases were created in a simulated clinical environment for user testing. A semistructured interview guide was created to illuminate factors relevant to human factors in CDS design and organisational implementation.

Results: Six individuals completed the user testing process. Tester responses were positive and noted good fit with real-world clinical genetics workflow. Technical issues related to interface, interaction and design were minor and fixable. Testers suggested solving issues related to terminology and usability through training and infobuttons. Time savings was estimated at 30%-50% and additional uses such as in-house clinical variant analysis were suggested for increase fit with workflow and to further address priority problems.

Conclusion: This study provides preliminary evidence for usability, workflow fit, acceptability and implementation potential of a modified DDSS that includes machine-assisted chart review. Continued development and testing using principles from human-centred design and implementation science are necessary to improve technical functionality and acceptability for multiple stakeholders and organisational implementation potential to improve the genomic diagnosis process.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1136/bmjhci-2021-100331DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8108675PMC
May 2021

Modeling seizures in the Human Phenotype Ontology according to contemporary ILAE concepts makes big phenotypic data tractable.

Epilepsia 2021 06 5;62(6):1293-1305. Epub 2021 May 5.

Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.

Objective: The clinical features of epilepsy determine how it is defined, which in turn guides management. Therefore, consideration of the fundamental clinical entities that comprise an epilepsy is essential in the study of causes, trajectories, and treatment responses. The Human Phenotype Ontology (HPO) is used widely in clinical and research genetics for concise communication and modeling of clinical features, allowing extracted data to be harmonized using logical inference. We sought to redesign the HPO seizure subontology to improve its consistency with current epileptological concepts, supporting the use of large clinical data sets in high-throughput clinical and research genomics.

Methods: We created a new HPO seizure subontology based on the 2017 International League Against Epilepsy (ILAE) Operational Classification of Seizure Types, and integrated concepts of status epilepticus, febrile, reflex, and neonatal seizures at different levels of detail. We compared the HPO seizure subontology prior to, and following, our revision, according to the information that could be inferred about the seizures of 791 individuals from three independent cohorts: 2 previously published and 150 newly recruited individuals. Each cohort's data were provided in a different format and harmonized using the two versions of the HPO.

Results: The new seizure subontology increased the number of descriptive concepts for seizures 5-fold. The number of seizure descriptors that could be annotated to the cohort increased by 40% and the total amount of information about individuals' seizures increased by 38%. The most important qualitative difference was the relationship of focal to bilateral tonic-clonic seizure to generalized-onset and focal-onset seizures.

Significance: We have generated a detailed contemporary conceptual map for harmonization of clinical seizure data, implemented in the official 2020-12-07 HPO release and freely available at hpo.jax.org. This will help to overcome the phenotypic bottleneck in genomics, facilitate reuse of valuable data, and ultimately improve diagnostics and precision treatment of the epilepsies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/epi.16908DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8272408PMC
June 2021

Cyclooxygenase inhibitor use is associated with increased COVID-19 severity.

medRxiv 2021 Apr 20. Epub 2021 Apr 20.

Background: Cyclooxygenase (COX) inhibitors including non-steroidal anti-inflammatory drugs (NSAIDs) are commonly used to reduce pain, fever, and inflammation but have been associated with complications in community acquired pneumonia and other respiratory tract infections (RTIs). Conclusive data are not available about potential beneficial or adverse effects of COX inhibitors on COVID-19 patients.

Methods: We conducted a retrospective, multi-center observational study by leveraging the harmonized, high-granularity electronic health record data of the National COVID Cohort Collaborative (N3C). Potential associations of eight COX inhibitors with COVID-19 severity were assessed using ordinal logistic regression (OLR) on treatment with the medication in question after matching by treatment propensity as predicted by age, race, ethnicity, gender, smoking status, comorbidities, and BMI. Cox proportional hazards analysis was used to estimate the correlation of medication use with morbidity for eight subcohorts defined by common indications for COX inhibitors.

Results: OLR revealed statistically significant associations between use of any of five COX inhibitors and increased severity of COVID-19. For instance, the odds ratio of aspirin use in the osteoarthritis cohort (n=2266 patients) was 3.25 (95% CI 2.76 - 3.83). Aspirin and acetaminophen were associated with increased mortality.

Conclusions: The association between use of COX inhibitors and COVID-19 severity was consistent across five COX inhibitors and multiple indication subcohorts. Our results align with earlier reports associating NSAID use with complications in RTI patients. Further research is needed to characterize the precise risk of individual COX inhibitors in COVID-19 patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2021.04.13.21255438DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8077581PMC
April 2021

Challenges in defining Long COVID: Striking differences across literature, Electronic Health Records, and patient-reported information.

medRxiv 2021 Mar 26. Epub 2021 Mar 26.

Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenges globally. In addition to a complex acute presentation that can affect multiple organ systems, increasing evidence points to long-term sequelae being common and impactful. The worldwide scientific community is forging ahead to characterize a wide range of outcomes associated with SARS-CoV-2 infection; however the underlying assumptions in these studies have varied so widely that the resulting data are difficult to compareFormal definitions are needed in order to design robust and consistent studies of Long COVID that consistently capture variation in long-term outcomes. Even the condition itself goes by three terms, most widely "Long COVID", but also "COVID-19 syndrome (PACS)" or, "post-acute sequelae of SARS-CoV-2 infection (PASC)". In the present study, we investigate the definitions used in the literature published to date and compare them against data available from electronic health records and patient-reported information collected via surveys. Long COVID holds the potential to produce a second public health crisis on the heels of the pandemic itself. Proactive efforts to identify the characteristics of this heterogeneous condition are imperative for a rigorous scientific effort to investigate and mitigate this threat.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2021.03.20.21253896DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8010765PMC
March 2021

PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using Human Phenotype Ontology.

Bioinformatics 2021 Jan 20. Epub 2021 Jan 20.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.

Motivation: Automatic phenotype concept recognition from unstructured text remains a challenging task in biomedical text mining research. Previous works that address the task typically use dictionary-based matching methods, which can achieve high precision but suffer from lower recall. Recently, machine learning-based methods have been proposed to identify biomedical concepts, which can recognize more unseen concept synonyms by automatic feature learning. However, most methods require large corpora of manually annotated data for model training, which is difficult to obtain due to the high cost of human annotation.

Results: In this paper, we propose PhenoTagger, a hybrid method that combines both dictionary and machine learning-based methods to recognize Human Phenotype Ontology (HPO) concepts in unstructured biomedical text. We first use all concepts and synonyms in HPO to construct a dictionary, which is then used to automatically build a distantly supervised training dataset for machine learning. Next, a cutting-edge deep learning model is trained to classify each candidate phrase (n-gram from input sentence) into a corresponding concept label. Finally, the dictionary and machine learning-based prediction results are combined for improved performance. Our method is validated with two HPO corpora, and the results show that PhenoTagger compares favorably to previous methods. In addition, to demonstrate the generalizability of our method, we retrained PhenoTagger using the disease ontology MEDIC for disease concept recognition to investigate the effect of training on different ontologies. Experimental results on the NCBI disease corpus show that PhenoTagger without requiring manually annotated training data achieves competitive performance as compared with state-of-the-art supervised methods.

Availability: The source code, API information and data for PhenoTagger are freely available at https://github.com/ncbi-nlp/PhenoTagger.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab019DOI Listing
January 2021

The case for open science: rare diseases.

JAMIA Open 2020 Oct 11;3(3):472-486. Epub 2020 Sep 11.

Linus Pauling Institute, Oregon State University, Corvallis, Oregon, USA.

The premise of Open Science is that research and medical management will progress faster if data and knowledge are openly shared. The value of Open Science is nowhere more important and appreciated than in the rare disease (RD) community. Research into RDs has been limited by insufficient patient data and resources, a paucity of trained disease experts, and lack of therapeutics, leading to long delays in diagnosis and treatment. These issues can be ameliorated by following the principles and practices of sharing that are intrinsic to Open Science. Here, we describe how the RD community has adopted the core pillars of Open Science, adding new initiatives to promote care and research for RD patients and, ultimately, for all of medicine. We also present recommendations that can advance Open Science more globally.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamiaopen/ooaa030DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660964PMC
October 2020

A CRISPR-Cas9-engineered mouse model for GPI-anchor deficiency mirrors human phenotypes and exhibits hippocampal synaptic dysfunctions.

Proc Natl Acad Sci U S A 2021 01;118(2)

Institute for Genomic Statistics and Bioinformatics, University of Bonn, 53127 Bonn, Germany;

Pathogenic germline mutations in lead to glycosylphosphatidylinositol biosynthesis deficiency (GPIBD). Individuals with pathogenic biallelic mutations in genes of the glycosylphosphatidylinositol (GPI)-anchor pathway exhibit cognitive impairments, motor delay, and often epilepsy. Thus far, the pathophysiology underlying the disease remains unclear, and suitable rodent models that mirror all symptoms observed in human patients have not been available. Therefore, we used CRISPR-Cas9 to introduce the most prevalent hypomorphic missense mutation in European patients, :c.1022C > A (p.A341E), at a site that is conserved in mice. Mirroring the human pathology, mutant mice exhibited deficits in motor coordination, cognitive impairments, and alterations in sociability and sleep patterns, as well as increased seizure susceptibility. Furthermore, immunohistochemistry revealed reduced synaptophysin immunoreactivity in mice, and electrophysiology recordings showed decreased hippocampal synaptic transmission that could underlie impaired memory formation. In single-cell RNA sequencing, -hippocampal cells exhibited changes in gene expression, most prominently in a subtype of microglia and subicular neurons. A significant reduction in transcript levels in several cell clusters suggested a link to the signaling pathway of GPI-anchored ephrins. We also observed elevated levels of transcripts, which might affect histamine metabolism with consequences for circadian rhythm. This mouse model will not only open the doors to further investigation into the pathophysiology of GPIBD, but will also deepen our understanding of the role of GPI-anchor-related pathways in brain development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.2014481118DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7812744PMC
January 2021

The Human Phenotype Ontology in 2021.

Nucleic Acids Res 2021 01;49(D1):D1207-D1217

Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4367 Belvaux, Luxembourg.

The Human Phenotype Ontology (HPO, https://hpo.jax.org) was launched in 2008 to provide a comprehensive logical standard to describe and computationally analyze phenotypic abnormalities found in human disease. The HPO is now a worldwide standard for phenotype exchange. The HPO has grown steadily since its inception due to considerable contributions from clinical experts and researchers from a diverse range of disciplines. Here, we present recent major extensions of the HPO for neurology, nephrology, immunology, pulmonology, newborn screening, and other areas. For example, the seizure subontology now reflects the International League Against Epilepsy (ILAE) guidelines and these enhancements have already shown clinical validity. We present new efforts to harmonize computational definitions of phenotypic abnormalities across the HPO and multiple phenotype ontologies used for animal models of disease. These efforts will benefit software such as Exomiser by improving the accuracy and scope of cross-species phenotype matching. The computational modeling strategy used by the HPO to define disease entities and phenotypic features and distinguish between them is explained in detail.We also report on recent efforts to translate the HPO into indigenous languages. Finally, we summarize recent advances in the use of HPO in electronic health record systems.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa1043DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778952PMC
January 2021

KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response.

Patterns (N Y) 2021 Jan 9;2(1):100155. Epub 2020 Nov 9.

Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.

Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates heterogeneous biomedical data to produce knowledge graphs (KGs), and applied it to create a KG for COVID-19 response. This KG framework also can be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.patter.2020.100155DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7649624PMC
January 2021

Modelling kidney disease using ontology: insights from the Kidney Precision Medicine Project.

Nat Rev Nephrol 2020 11 16;16(11):686-696. Epub 2020 Sep 16.

Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA.

An important need exists to better understand and stratify kidney disease according to its underlying pathophysiology in order to develop more precise and effective therapeutic agents. National collaborative efforts such as the Kidney Precision Medicine Project are working towards this goal through the collection and integration of large, disparate clinical, biological and imaging data from patients with kidney disease. Ontologies are powerful tools that facilitate these efforts by enabling researchers to organize and make sense of different data elements and the relationships between them. Ontologies are critical to support the types of big data analysis necessary for kidney precision medicine, where heterogeneous clinical, imaging and biopsy data from diverse sources must be combined to define a patient's phenotype. The development of two new ontologies - the Kidney Tissue Atlas Ontology and the Ontology of Precision Medicine and Investigation - will support the creation of the Kidney Tissue Atlas, which aims to provide a comprehensive molecular, cellular and anatomical map of the kidney. These ontologies will improve the annotation of kidney-relevant data, and eventually lead to new definitions of kidney disease in support of precision medicine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41581-020-00335-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8012202PMC
November 2020

KG-COVID-19: a framework to produce customized knowledge graphs for COVID-19 response.

bioRxiv 2020 Aug 18. Epub 2020 Aug 18.

Integrated, up-to-date data about SARS-CoV-2 and coronavirus disease 2019 (COVID-19) is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community varies drastically for different tasks - the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates biomedical data to produce knowledge graphs (KGs) for COVID-19 response. This KG framework can also be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics.

Bigger Picture: An effective response to the COVID-19 pandemic relies on integration of many different types of data available about SARS-CoV-2 and related viruses. KG-COVID-19 is a framework for producing knowledge graphs that can be customized for downstream applications including machine learning tasks, hypothesis-based querying, and browsable user interface to enable researchers to explore COVID-19 data and discover relationships.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2020.08.17.254839DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7444288PMC
August 2020

Ontologies, Knowledge Representation, and Machine Learning for Translational Research: Recent Contributions.

Yearb Med Inform 2020 Aug 21;29(1):159-162. Epub 2020 Aug 21.

Oregon Clinical & Translational Research Institute, Oregon Health & Science University, Portland, OR, USA.

Objectives: To select, present, and summarize the most relevant papers published in 2018 and 2019 in the field of Ontologies and Knowledge Representation, with a particular focus on the intersection between Ontologies and Machine Learning.

Methods: A comprehensive review of the medical informatics literature was performed to select the most interesting papers published in 2018 and 2019 and that document the utility of ontologies for computational analysis, including machine learning.

Results: Fifteen articles were selected for inclusion in this survey paper. The chosen articles belong to three major themes: (i) the identification of phenotypic abnormalities in electronic health record (EHR) data using the Human Phenotype Ontology ; (ii) word and node embedding algorithms to supplement natural language processing (NLP) of EHRs and other medical texts; and (iii) hybrid ontology and NLP-based approaches to extracting structured and unstructured components of EHRs.

Conclusion: Unprecedented amounts of clinically relevant data are now available for clinical and research use. Machine learning is increasingly being applied to these data sources for predictive analytics, precision medicine, and differential diagnosis. Ontologies have become an essential component of software pipelines designed to extract, code, and analyze clinical information by machine learning algorithms. The intersection of machine learning and semantics is proving to be an innovative space in clinical research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1055/s-0040-1701991DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7442528PMC
August 2020

The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment.

J Am Med Inform Assoc 2021 03;28(3):427-443

IQVIA, Durham, North Carolina, USA.

Objective: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers.

Materials And Methods: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics.

Results: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access.

Conclusions: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocaa196DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7454687PMC
March 2021

Interpretable Clinical Genomics with a Likelihood Ratio Paradigm.

Am J Hum Genet 2020 09 4;107(3):403-417. Epub 2020 Aug 4.

William Harvey Research Institute, Charterhouse Square, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK.

Human Phenotype Ontology (HPO)-based analysis has become standard for genomic diagnostics of rare diseases. Current algorithms use a variety of semantic and statistical approaches to prioritize the typically long lists of genes with candidate pathogenic variants. These algorithms do not provide robust estimates of the strength of the predictions beyond the placement in a ranked list, nor do they provide measures of how much any individual phenotypic observation has contributed to the prioritization result. However, given that the overall success rate of genomic diagnostics is only around 25%-50% or less in many cohorts, a good ranking cannot be taken to imply that the gene or disease at rank one is necessarily a good candidate. Here, we present an approach to genomic diagnostics that exploits the likelihood ratio (LR) framework to provide an estimate of (1) the posttest probability of candidate diagnoses, (2) the LR for each observed HPO phenotype, and (3) the predicted pathogenicity of observed genotypes. LIkelihood Ratio Interpretation of Clinical AbnormaLities (LIRICAL) placed the correct diagnosis within the first three ranks in 92.9% of 384 case reports comprising 262 Mendelian diseases, and the correct diagnosis had a mean posttest probability of 67.3%. Simulations show that LIRICAL is robust to many typically encountered forms of genomic and phenomic noise. In summary, LIRICAL provides accurate, clinically interpretable results for phenotype-driven genomic diagnostics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2020.06.021DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7477017PMC
September 2020

HBA-DEALS: accurate and simultaneous identification of differential expression and splicing using hierarchical Bayesian analysis.

Genome Biol 2020 07 13;21(1):171. Epub 2020 Jul 13.

The Jackson Laboratory for Genomic Medicine, Farmington, 06032, CT, USA.

We present Hierarchical Bayesian Analysis of Differential Expression and ALternative Splicing (HBA-DEALS), which simultaneously characterizes differential expression and splicing in cohorts. HBA-DEALS attains state of the art or better performance for both expression and splicing and allows genes to be characterized as having differential gene expression, differential alternative splicing, both, or neither. HBA-DEALS analysis of GTEx data demonstrated sets of genes that show predominant DGE or DAST across multiple tissue types. These sets have pervasive differences with respect to gene structure, function, membership in protein complexes, and promoter architecture.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02072-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7358203PMC
July 2020

Correction: Phenotate: crowdsourcing phenotype annotations as exercises in undergraduate classes.

Genet Med 2020 Aug;22(8):1427

Centre for Computational Medicine, The Hospital For Sick Children, Toronto, ON, Canada.

An amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41436-020-0866-6DOI Listing
August 2020

parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.

Gigascience 2020 05;9(5)

Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy.

Background: Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data.

Results: To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version.

Conclusions: parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giaa052DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7244787PMC
May 2020

Supplementation of the ESID registry working definitions for the clinical diagnosis of inborn errors of immunity with encoded human phenotype ontology (HPO) terms.

J Allergy Clin Immunol Pract 2020 05;8(5):1778

Research Unit for Pediatric Hematology and Immunology, Division of Pediatric Hemato-Oncology, Department of Pediatrics and Adolescent Medicine, Medical University Graz, Graz, Austria. Electronic address:

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jaip.2020.02.019DOI Listing
May 2020

Phenotate: crowdsourcing phenotype annotations as exercises in undergraduate classes.

Genet Med 2020 08 5;22(8):1391-1400. Epub 2020 May 5.

Centre for Computational Medicine, The Hospital For Sick Children, Toronto, ON, Canada.

Purpose: Computational documentation of genetic disorders is highly reliant on structured data for differential diagnosis, pathogenic variant identification, and patient matchmaking. However, most information on rare diseases (RDs) exists in freeform text, such as academic literature. To increase availability of structured RD data, we developed a crowdsourcing approach for collecting phenotype information using student assignments.

Methods: We developed Phenotate, a web application for crowdsourcing disease phenotype annotations through assignments for undergraduate genetics students. Using student-collected data, we generated composite annotations for each disease through a machine learning approach. These annotations were compared with those from clinical practitioners and gold standard curated data.

Results: Deploying Phenotate in five undergraduate genetics courses, we collected annotations for 22 diseases. Student-sourced annotations showed strong similarity to gold standards, with F-measures ranging from 0.584 to 0.868. Furthermore, clinicians used Phenotate annotations to identify diseases with comparable accuracy to other annotation sources and gold standards. For six disorders, no gold standards were available, allowing us to create some of the first structured annotations for them, while students demonstrated ability to research RDs.

Conclusion: Phenotate enables crowdsourcing RD phenotypic annotations through educational assignments. Presented as an intuitive web-based tool, it offers pedagogical benefits and augments the computable RD knowledgebase.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41436-020-0812-7DOI Listing
August 2020

An Improved Phenotype-Driven Tool for Rare Mendelian Variant Prioritization: Benchmarking Exomiser on Real Patient Whole-Exome Data.

Genes (Basel) 2020 04 23;11(4). Epub 2020 Apr 23.

William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK.

Next-generation sequencing has revolutionized rare disease diagnostics, but many patients remain without a molecular diagnosis, particularly because many candidate variants usually survive despite strict filtering. Exomiser was launched in 2014 as a Java tool that performs an integrative analysis of patients' sequencing data and their phenotypes encoded with Human Phenotype Ontology (HPO) terms. It prioritizes variants by leveraging information on variant frequency, predicted pathogenicity, and gene-phenotype associations derived from human diseases, model organisms, and protein-protein interactions. Early published releases of Exomiser were able to prioritize disease-causative variants as top candidates in up to 97% of simulated whole-exomes. The size of the tested real patient datasets published so far are very limited. Here, we present the latest Exomiser version 12.0.1 with many new features. We assessed the performance using a set of 134 whole-exomes from patients with a range of rare retinal diseases and known molecular diagnosis. Using default settings, Exomiser ranked the correct diagnosed variants as the top candidate in 74% of the dataset and top 5 in 94%; not using the patients' HPO profiles (i.e., variant-only analysis) decreased the performance to 3% and 27%, respectively. In conclusion, Exomiser is an effective support tool for rare Mendelian phenotype-driven variant prioritization.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/genes11040460DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7230372PMC
April 2020
-->