Pubfacts - Scientific Publication Data
  • Categories
  • |
  • Journals
  • |
  • Authors
  • Login
  • Categories
  • Journals

Search Our Scientific Publications & Authors

Publications
  • Publications
  • Authors
find publications by category +
Translate page:

SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data.

Authors:
Chao Pang Annet Sollie Anna Sijtsma Dennis Hendriksen Bart Charbon Mark de Haan Tommy de Boer Fleur Kelpin Jonathan Jetten Joeri K van der Velde Nynke Smidt Rolf Sijmons Hans Hillege Morris A Swertz

Database (Oxford) 2015 18;2015. Epub 2015 Sep 18.

University of Groningen, University Medical Centre Groningen, Genomics Coordination Centre, Department of Genetics, Groningen, The Netherlands, University of Groningen, University Medical Center Groningen, Department of Epidemiology, Groningen, The Netherlands and LifeLines Cohort Study and Biobank, Groningen, The Netherlands

There is an urgent need to standardize the semantics of biomedical data values, such as phenotypes, to enable comparative and integrative analyses. However, it is unlikely that all studies will use the same data collection protocols. As a result, retrospective standardization is often required, which involves matching of original (unstructured or locally coded) data to widely used coding or ontology systems such as SNOMED CT (clinical terms), ICD-10 (International Classification of Disease) and HPO (Human Phenotype Ontology). This data curation process is usually a time-consuming process performed by a human expert. To help mechanize this process, we have developed SORTA, a computer-aided system for rapidly encoding free text or locally coded values to a formal coding system or ontology. SORTA matches original data values (uploaded in semicolon delimited format) to a target coding system (uploaded in Excel spreadsheet, OWL ontology web language or OBO open biomedical ontologies format). It then semi- automatically shortlists candidate codes for each data value using Lucene and n-gram based matching algorithms, and can also learn from matches chosen by human experts. We evaluated SORTA's applicability in two use cases. For the LifeLines biobank, we used SORTA to recode 90 000 free text values (including 5211 unique values) about physical exercise to MET (Metabolic Equivalent of Task) codes. For the CINEAS clinical symptom coding system, we used SORTA to map to HPO, enriching HPO when necessary (315 terms matched so far). Out of the shortlists at rank 1, we found a precision/recall of 0.97/0.98 in LifeLines and of 0.58/0.45 in CINEAS. More importantly, users found the tool both a major time saver and a quality improvement because SORTA reduced the chances of human mistakes. Thus, SORTA can dramatically ease data (re)coding tasks and we believe it will prove useful for many more projects. Database URL: http://molgenis.org/sorta or as an open source download from http://www.molgenis.org/wiki/SORTA.

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bav089DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4574036PMC
May 2016

Publication Analysis

Top Keywords

coding system
12
data
8
data values
8
locally coded
8
free text
8
sorta
7
values
5
n-gram based
4
based matching
4
algorithms learn
4
matching algorithms
4
data lucene
4
shortlists candidate
4
automatically shortlists
4
candidate codes
4
codes data
4
learn matches
4
lucene n-gram
4
human experts
4
biobank sorta
4

Altmetric Statistics


Show full details
1 Total Shares
1 Tweets
1 Citations

Similar Publications

Long non-coding RNA THRIL inhibits miRNA-24-3p to upregulate neuropilin-1 to aggravate cerebral ischemia-reperfusion injury through regulating the nuclear factor κB p65 signaling.

Authors:
Feng Kuai Liang Zhou Jianping Zhou Xuemei Sun Wanli Dong

Aging (Albany NY) 2021 Mar 6;13. Epub 2021 Mar 6.

Department of Neurology, The First Affiliated Hospital of Soochow University, Suzhou 215006, China.

Purpose: The aim of this study was to investigate the role of the tumor necrosis factor and HNRNPL related immunoregulatory long non-coding RNA (THRIL) in cerebral ischemia-reperfusion injury.

Methods: A rat middle cerebral artery occlusion/ischemia-reperfusion (MCAO/IR) model and an oxygen glucose deprivation/reoxygenation (OGD/R) cell model were constructed. THRIL was knocked down using siTHRIL. Read More

View Article and Full-Text PDF
March 2021
Similar Publications

Whole genomes reveal multiple candidate genes and pathways involved in the immune response of dolphins to a highly infectious virus.

Authors:
Kimberley C Batley Jonathan Sandoval-Castillo Catherine Kemper Nikki Zanardo Ikuko Tomo Luciano B Beheregaray Luciana M Möller

Mol Ecol 2021 Mar 6. Epub 2021 Mar 6.

Molecular Ecology Laboratory, College of Science and Engineering, Flinders University, Adelaide, South Australia, 5042, Australia.

Wildlife species are challenged by various infectious diseases that act as important demographic drivers of populations and have become a great conservation concern particularly under growing environmental changes. The new era of whole genome sequencing provides new opportunities and avenues to explore the role of genetic variants in the plasticity of immune responses, particularly in non-model systems. Cetacean morbillivirus (CeMV) has emerged as a major viral threat to cetacean populations worldwide, contributing to the death of thousands of individuals of multiple dolphin and whale species. Read More

View Article and Full-Text PDF
March 2021
Similar Publications

Characteristics Of The Macs-Wihs Combined Cohort Study: Opportunities For Research On Aging With Hiv In The Longest Us Observational Study Of HIV.

Authors:
Gypsyamber D'Souza Fiona Bhondoekhan Lorie Benning Joseph B Margolick Adebola A Adedimeji Adaora A Adimora Maria L Alcaide Mardge H Cohen Roger Detels M Reuel Friedman Susan Holman Deborah J Konkle-Parker Daniel Merenstein Igho Ofotokun Frank Palella Sean Altekruse Todd T Brown Phyllis C Tien

Am J Epidemiol 2021 Mar 2. Epub 2021 Mar 2.

Department of Medicine, University of California San Francisco, San Francisco, California.

In 2019, NIH combined the Multicenter AIDS Cohort Study and the Women's Interagency HIV Study into the MACS/WIHS Combined Cohort Study (MWCCS). Participants completing a visit October 2018-September 2019 (targeted for MWCCS enrollment) are described by HIV serostatus and compared to people living with HIV (PLWH) in the U.S. Read More

View Article and Full-Text PDF
March 2021
Similar Publications

Comparative analysis of chloroplast genomes of kenaf cytoplasmic male sterile line and its maintainer line.

Authors:
Danfeng Tang Fan Wei Ruiyang Zhou

Sci Rep 2021 Mar 5;11(1):5301. Epub 2021 Mar 5.

College of Agriculture, Guangxi University, Nanning, 530004, China.

Kenaf is a great source of bast fiber and possesses significantly industrial interests. Cytoplasmic male sterility (CMS) is the basis of heterosis utilization in kenaf. Chloroplast, an important organelle for photosynthesis, could be associated with CMS. Read More

View Article and Full-Text PDF
March 2021
Similar Publications

Improving the accuracy of ICD-10 coding of morbidity/mortality data through the introduction of an electronic diagnostic terminology tool at the general hospitals in Lagos, Nigeria.

Authors:
Olawunmi Olagundoye Kees van Boven Olufunmilola Daramola Kendra Njoku Adenike Omosun

BMJ Open Qual 2021 Mar;10(1)

Planning, Research and Statistics, Lagos State Ministry of Health, Ikeja, Nigeria.

Background: Reliable information which can only be derived from accurate data is crucial to the success of the health system. Since encoded data on diagnoses and procedures are put to a broad range of uses, the accuracy of coding is imperative. Accuracy of coding with the International Classification of Diseases, 10th revision (ICD-10) is impeded by a manual coding process that is dependent on the medical records officers' level of experience/knowledge of medical terminologies. Read More

View Article and Full-Text PDF
March 2021
Similar Publications
© 2021 PubFacts.
  • About PubFacts
  • Privacy Policy
  • Sitemap