Publications by authors named "Todd Lingren"

34 Publications

The power of genetic diversity in genome-wide association studies of lipids.

Nature 2021 Dec 9;600(7890):675-679. Epub 2021 Dec 9.

Department of Clinical Biochemistry, Landspitali-National University Hospital of Iceland, Reykjavik, Iceland.

Increased blood lipid levels are heritable risk factors of cardiovascular disease with varied prevalence worldwide owing to different dietary patterns and medication use. Despite advances in prevention and treatment, in particular through reducing low-density lipoprotein cholesterol levels, heart disease remains the leading cause of death worldwide. Genome-wideassociation studies (GWAS) of blood lipid levels have led to important biological and clinical insights, as well as new drug targets, for cardiovascular disease. However, most previous GWAS have been conducted in European ancestry populations and may have missed genetic variants that contribute to lipid-level variation in other ancestry groups. These include differences in allele frequencies, effect sizes and linkage-disequilibrium patterns. Here we conduct a multi-ancestry, genome-wide genetic discovery meta-analysis of lipid levels in approximately 1.65 million individuals, including 350,000 of non-European ancestries. We quantify the gain in studying non-European ancestries and provide evidence to support the expansion of recruitment of additional ancestries, even with relatively small sample sizes. We find that increasing diversity rather than studying additional individuals of European ancestry results in substantial improvements in fine-mapping functional variants and portability of polygenic prediction (evaluated in approximately 295,000 individuals from 7 ancestry groupings). Modest gains in the number of discovered loci and ancestry-specific variants were also achieved. As GWAS expand emphasis beyond the identification of genes and fundamental biology towards the use of genetic variants for preventive and precision medicine, we anticipate that increased diversity of participants will lead to more accurate and equitable application of polygenic scores in clinical practice.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
December 2021

Evaluation of the MC4R gene across eMERGE network identifies many unreported obesity-associated variants.

Int J Obes (Lond) 2021 01 20;45(1):155-169. Epub 2020 Sep 20.

Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center (CCHMC), Cincinnati, OH, USA.

Background/objectives: Melanocortin-4 receptor (MC4R) plays an essential role in food intake and energy homeostasis. More than 170 MC4R variants have been described over the past two decades, with conflicting reports regarding the prevalence and phenotypic effects of these variants in diverse cohorts. To determine the frequency of MC4R variants in large cohort of different ancestries, we evaluated the MC4R coding region for 20,537 eMERGE participants with sequencing data plus additional 77,454 independent individuals with genome-wide genotyping data at this locus.

Subjects/methods: The sequencing data were obtained from the eMERGE phase III study, in which multisample variant call format calls have been generated, curated, and annotated. In addition to penetrance estimation using body mass index (BMI) as a binary outcome, GWAS and PheWAS were performed using median BMI in linear regression analyses. All results were adjusted for principal components, age, sex, and sites of genotyping.

Results: Targeted sequencing data of MC4R revealed 125 coding variants in 1839 eMERGE participants including 30 unreported coding variants that were predicted to be functionally damaging. Highly penetrant unreported variants included (L325I, E308K, D298N, S270F, F261L, T248A, D111V, and Y80F) in which seven participants had obesity class III defined as BMI ≥ 40 kg/m. In GWAS analysis, in addition to known risk haplotype upstream of MC4R (best variant rs6567160 (P = 5.36 × 10, Beta = 0.37), a novel rare haplotype was detected which was protective against obesity and encompassed the V103I variant with known gain-of-function properties (P = 6.23 × 10, Beta = -0.62). PheWAS analyses extended this protective effect of V103I to type 2 diabetes, diabetic nephropathy, and chronic renal failure independent of BMI.

Conclusions: MC4R screening in a large eMERGE cohort confirmed many previous findings, extend the MC4R pleotropic effects, and discovered additional MC4R rare alleles that probably contribute to obesity.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
January 2021

Integrating and Evaluating the Data Quality and Utility of Smart Pump Information in Detecting Medication Administration Errors: Evaluation Study.

JMIR Med Inform 2020 Sep 2;8(9):e19774. Epub 2020 Sep 2.

Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.

Background: At present, electronic health records (EHRs) are the central focus of clinical informatics given their role as the primary source of clinical data. Despite their granularity, the EHR data heavily rely on manual input and are prone to human errors. Many other sources of data exist in the clinical setting, including digital medical devices such as smart infusion pumps. When incorporated with prescribing data from EHRs, smart pump records (SPRs) are capable of shedding light on actions that take place during the medication use process. However, harmoniz-ing the 2 sources is hindered by multiple technical challenges, and the data quality and utility of SPRs have not been fully realized.

Objective: This study aims to evaluate the quality and utility of SPRs incorporated with EHR data in detecting medication administration errors. Our overarching hypothesis is that SPRs would contribute unique information in the med-ication use process, enabling more comprehensive detection of discrepancies and potential errors in medication administration.

Methods: We evaluated the medication use process of 9 high-risk medications for patients admitted to the neonatal inten-sive care unit during a 1-year period. An automated algorithm was developed to align SPRs with their medica-tion orders in the EHRs using patient ID, medication name, and timestamp. The aligned data were manually re-viewed by a clinical research coordinator and 2 pediatric physicians to identify discrepancies in medication ad-ministration. The data quality of SPRs was assessed with the proportion of information that was linked to valid EHR orders. To evaluate their utility, we compared the frequency and severity of discrepancies captured by the SPR and EHR data, respectively. A novel concordance assessment was also developed to understand the detec-tion power and capabilities of SPR and EHR data.

Results: Approximately 70% of the SPRs contained valid patient IDs and medication names, making them feasible for data integration. After combining the 2 sources, the investigative team reviewed 2307 medication orders with 10,575 medication administration records (MARs) and 23,397 SPRs. A total of 321 MAR and 682 SPR dis-crepancies were identified, with vasopressors showing the highest discrepancy rates, followed by narcotics and total parenteral nutrition. Compared with EHR MARs, substantial dosing discrepancies were more commonly detectable using the SPRs. The concordance analysis showed little overlap between MAR and SPR discrepan-cies, with most discrepancies captured by the SPR data.

Conclusions: We integrated smart infusion pump information with EHR data to analyze the most error-prone phases of the medication lifecycle. The findings suggested that SPRs could be a more reliable data source for medication error detection. Ultimately, it is imperative to integrate SPR information with EHR data to fully detect and mitigate medication administration errors in the clinical setting.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
September 2020

Making work visible for electronic phenotype implementation: Lessons learned from the eMERGE network.

J Biomed Inform 2019 11 19;99:103293. Epub 2019 Sep 19.

Department of Biomedical Informatics, Columbia University, New York, NY, United States. Electronic address:

Background: Implementation of phenotype algorithms requires phenotype engineers to interpret human-readable algorithms and translate the description (text and flowcharts) into computable phenotypes - a process that can be labor intensive and error prone. To address the critical need for reducing the implementation efforts, it is important to develop portable algorithms.

Methods: We conducted a retrospective analysis of phenotype algorithms developed in the Electronic Medical Records and Genomics (eMERGE) network and identified common customization tasks required for implementation. A novel scoring system was developed to quantify portability from three aspects: Knowledge conversion, clause Interpretation, and Programming (KIP). Tasks were grouped into twenty representative categories. Experienced phenotype engineers were asked to estimate the average time spent on each category and evaluate time saving enabled by a common data model (CDM), specifically the Observational Medical Outcomes Partnership (OMOP) model, for each category.

Results: A total of 485 distinct clauses (phenotype criteria) were identified from 55 phenotype algorithms, corresponding to 1153 customization tasks. In addition to 25 non-phenotype-specific tasks, 46 tasks are related to interpretation, 613 tasks are related to knowledge conversion, and 469 tasks are related to programming. A score between 0 and 2 (0 for easy, 1 for moderate, and 2 for difficult portability) is assigned for each aspect, yielding a total KIP score range of 0 to 6. The average clause-wise KIP score to reflect portability is 1.37 ± 1.38. Specifically, the average knowledge (K) score is 0.64 ± 0.66, interpretation (I) score is 0.33 ± 0.55, and programming (P) score is 0.40 ± 0.64. 5% of the categories can be completed within one hour (median). 70% of the categories take from days to months to complete. The OMOP model can assist with vocabulary mapping tasks.

Conclusion: This study presents firsthand knowledge of the substantial implementation efforts in phenotyping and introduces a novel metric (KIP) to measure portability of phenotype algorithms for quantifying such efforts across the eMERGE Network. Phenotype developers are encouraged to analyze and optimize the portability in regards to knowledge, interpretation and programming. CDMs can be used to improve the portability for some 'knowledge-oriented' tasks.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
November 2019

Facilitating phenotype transfer using a common data model.

J Biomed Inform 2019 08 17;96:103253. Epub 2019 Jul 17.

Department of Biomedical Informatics, Columbia University, New York, NY, United States.

Background: Implementing clinical phenotypes across a network is labor intensive and potentially error prone. Use of a common data model may facilitate the process.

Methods: Electronic Medical Records and Genomics (eMERGE) sites implemented the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model across their electronic health record (EHR)-linked DNA biobanks. Two previously implemented eMERGE phenotypes were converted to OMOP and implemented across the network.

Results: It was feasible to implement the common data model across sites, with laboratory data producing the greatest challenge due to local encoding. Sites were then able to execute the OMOP phenotype in less than one day, as opposed to weeks of effort to manually implement an eMERGE phenotype in their bespoke research EHR databases. Of the sites that could compare the current OMOP phenotype implementation with the original eMERGE phenotype implementation, specific agreement ranged from 100% to 43%, with disagreements due to the original phenotype, the OMOP phenotype, changes in data, and issues in the databases. Using the OMOP query as a standard comparison revealed differences in the original implementations despite starting from the same definitions, code lists, flowcharts, and pseudocode.

Conclusion: Using a common data model can dramatically speed phenotype implementation at the cost of having to populate that data model, though this will produce a net benefit as the number of phenotype implementations increases. Inconsistencies among the implementations of the original queries point to a potential benefit of using a common data model so that actual phenotype code and logic can be shared, mitigating human error in reinterpretation of a narrative phenotype definition.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
August 2019

GWAS and enrichment analyses of non-alcoholic fatty liver disease identify new trait-associated genes and pathways across eMERGE Network.

BMC Med 2019 07 17;17(1):135. Epub 2019 Jul 17.

Division of Gastroenterology, Hepatology and Nutrition, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati School of Medicine, Cincinnati, OH, USA.

Background: Non-alcoholic fatty liver disease (NAFLD) is a common chronic liver illness with a genetically heterogeneous background that can be accompanied by considerable morbidity and attendant health care costs. The pathogenesis and progression of NAFLD is complex with many unanswered questions. We conducted genome-wide association studies (GWASs) using both adult and pediatric participants from the Electronic Medical Records and Genomics (eMERGE) Network to identify novel genetic contributors to this condition.

Methods: First, a natural language processing (NLP) algorithm was developed, tested, and deployed at each site to identify 1106 NAFLD cases and 8571 controls and histological data from liver tissue in 235 available participants. These include 1242 pediatric participants (396 cases, 846 controls). The algorithm included billing codes, text queries, laboratory values, and medication records. Next, GWASs were performed on NAFLD cases and controls and case-only analyses using histologic scores and liver function tests adjusting for age, sex, site, ancestry, PC, and body mass index (BMI).

Results: Consistent with previous results, a robust association was detected for the PNPLA3 gene cluster in participants with European ancestry. At the PNPLA3-SAMM50 region, three SNPs, rs738409, rs738408, and rs3747207, showed strongest association (best SNP rs738409 p = 1.70 × 10). This effect was consistent in both pediatric (p = 9.92 × 10) and adult (p = 9.73 × 10) cohorts. Additionally, this variant was also associated with disease severity and NAFLD Activity Score (NAS) (p = 3.94 × 10, beta = 0.85). PheWAS analysis link this locus to a spectrum of liver diseases beyond NAFLD with a novel negative correlation with gout (p = 1.09 × 10). We also identified novel loci for NAFLD disease severity, including one novel locus for NAS score near IL17RA (rs5748926, p = 3.80 × 10), and another near ZFP90-CDH1 for fibrosis (rs698718, p = 2.74 × 10). Post-GWAS and gene-based analyses identified more than 300 genes that were used for functional and pathway enrichment analyses.

Conclusions: In summary, this study demonstrates clear confirmation of a previously described NAFLD risk locus and several novel associations. Further collaborative studies including an ethnically diverse population with well-characterized liver histologic features of NAFLD are needed to further validate the novel findings.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
July 2019

Data Challenges With Real-Time Safety Event Detection And Clinical Decision Support.

J Med Internet Res 2019 05 22;21(5):e13047. Epub 2019 May 22.

Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States.

Background: The continued digitization and maturation of health care information technology has made access to real-time data easier and feasible for more health care organizations. With this increased availability, the promise of using data to algorithmically detect health care-related events in real-time has become more of a reality. However, as more researchers and clinicians utilize real-time data delivery capabilities, it has become apparent that simply gaining access to the data is not a panacea, and some unique data challenges have emerged to the forefront in the process.

Objective: The aim of this viewpoint was to highlight some of the challenges that are germane to real-time processing of health care system-generated data and the accurate interpretation of the results.

Methods: Distinct challenges related to the use and processing of real-time data for safety event detection were compiled and reported by several informatics and clinical experts at a quaternary pediatric academic institution. The challenges were collated from the experiences of the researchers implementing real-time event detection on more than half a dozen distinct projects. The challenges have been presented in a challenge category-specific challenge-example format.

Results: In total, 8 major types of challenge categories were reported, with 13 specific challenges and 9 specific examples detailed to provide a context for the challenges. The examples reported are anchored to a specific project using medication order, medication administration record, and smart infusion pump data to detect discrepancies and errors between the 3 datasets.

Conclusions: The use of real-time data to drive safety event detection and clinical decision support is extremely powerful, but it presents its own set of challenges that include data quality and technical complexity. These challenges must be recognized and accommodated for if the full promise of accurate, real-time safety event clinical decision support is to be realized.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
May 2019

Electronic medical records as a replacement for prospective research data collection in postoperative pain and opioid response studies.

Int J Med Inform 2018 Mar 17;111:45-50. Epub 2017 Dec 17.

Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA. Electronic address:

Background And Aim: Many clinical research studies claim to collect data that are also captured in the electronic medical record (EMR). We evaluate the potential for EMR data to replace prospective research data collection.

Methods: Using a dataset of 358 surgical patients enrolled in a prospective study, we examined the completeness and agreement of EMR and study entries for several variables, including the patient's stay in the post-operative care unit (PACU), surgical pain relief and pain medication side effects.

Results: For all variables with a completeness percentage, values were greater than 96%. For the adverse event variables, we found slight to substantial agreement (Cohen's kappa), ranging from 0.19 (nausea) to 0.48 (respiratory depression) to 0.73 (emesis).

Conclusion: The potential to use EMR data as a replacement for prospective research data collection shows promise, but for now, should be evaluated on a variable-by-variable basis.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
March 2018

Designing and evaluating an automated system for real-time medication administration error detection in a neonatal intensive care unit.

J Am Med Inform Assoc 2018 05;25(5):555-563

Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.

Background: Timely identification of medication administration errors (MAEs) promises great benefits for mitigating medication errors and associated harm. Despite previous efforts utilizing computerized methods to monitor medication errors, sustaining effective and accurate detection of MAEs remains challenging. In this study, we developed a real-time MAE detection system and evaluated its performance prior to system integration into institutional workflows.

Methods: Our prospective observational study included automated MAE detection of 10 high-risk medications and fluids for patients admitted to the neonatal intensive care unit at Cincinnati Children's Hospital Medical Center during a 4-month period. The automated system extracted real-time medication use information from the institutional electronic health records and identified MAEs using logic-based rules and natural language processing techniques. The MAE summary was delivered via a real-time messaging platform to promote reduction of patient exposure to potential harm. System performance was validated using a physician-generated gold standard of MAE events, and results were compared with those of current practice (incident reporting and trigger tools).

Results: Physicians identified 116 MAEs from 10 104 medication administrations during the study period. Compared to current practice, the sensitivity with automated MAE detection was improved significantly from 4.3% to 85.3% (P = .009), with a positive predictive value of 78.0%. Furthermore, the system showed potential to reduce patient exposure to harm, from 256 min to 35 min (P < .001).

Conclusions: The automated system demonstrated improved capacity for identifying MAEs while guarding against alert fatigue. It also showed promise for reducing patient exposure to potential harm following MAE events.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
May 2018

Decentralized and reproducible geocoding and characterization of community and environmental exposures for multisite studies.

J Am Med Inform Assoc 2018 Mar;25(3):309-314

Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.

Objective: Geocoding and characterizing geographic, community, and environmental characteristics of study participants is frequently done in epidemiological studies. However, participant addresses are identifiable protected health information (PHI) and geocoding must be conducted in a Health Insurance Portability and Accountability Act-compliant manner. Our objective was to create a software application for this process that addresses limitations in current approaches.

Materials And Methods: We used a containerization platform to create DeGAUSS (Decentralized Geomarker Assessment for Multi-Site Studies), a software application that facilitates reproducible geocoding and geomarker assessment while maintaining the confidentiality of PHI. To validate the software, 215 350 addresses in Hamilton County, Ohio, were geocoded using DeGAUSS, ArcGIS, Google, and SAS and compared to a gold-standard approach. We distributed the DeGAUSS software to sites in an ongoing multisite study (Electronic Medical Records and Genomics, or eMERGE), and individual sites independently geocoded and assigned median census tract-level income and distance to nearest major roadway to their participants' addresses, removed associated PHI, and returned deidentified data.

Results: Within a multisite study, 52 244 study participants' addresses across 5 sites were geocoded with a median distance to roadway of 10 022m and a median census tract income of $57 266, demonstrating the feasibility of DeGAUSS within a multisite study. Compared to other commonly used geocoding platforms, DeGAUSS had similar geocoding and geomarker assessment accuracies.

Conclusion: The open source DeGAUSS software overcomes multiple challenges in the use of address data in multisite studies and also serves as a more general reproducible research tool for geocoding and geomarker assessment.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
March 2018

Leveraging Food and Drug Administration Adverse Event Reports for the Automated Monitoring of Electronic Health Records in a Pediatric Hospital.

Biomed Inform Insights 2017 8;9:1178222617713018. Epub 2017 Jun 8.

Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center (CCHMC), Cincinnati, OH, USA.

The objective of this study was to determine whether the Food and Drug Administration's Adverse Event Reporting System (FAERS) data set could serve as the basis of automated electronic health record (EHR) monitoring for the adverse drug reaction (ADR) subset of adverse drug events. We retrospectively collected EHR entries for 71 909 pediatric inpatient visits at Cincinnati Children's Hospital Medical Center. Natural language processing (NLP) techniques were used to identify positive diseases/disorders and signs/symptoms (DDSSs) from the patients' clinical narratives. We downloaded all FAERS reports submitted by medical providers and extracted the reported drug-DDSS pairs. For each patient, we aligned the drug-DDSS pairs extracted from their clinical notes with the corresponding drug-DDSS pairs from the FAERS data set to identify Drug-Reaction Pair Sentences (DRPSs). The DRPSs were processed by NLP techniques to identify ADR-related DRPSs. We used clinician annotated, real-world EHR data as reference standard to evaluate the proposed algorithm. During evaluation, the algorithm achieved promising performance and showed great potential in identifying ADRs accurately for pediatric patients.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
June 2017

Suboptimal Clinical Documentation in Young Children with Severe Obesity at Tertiary Care Centers.

Int J Pediatr 2016 6;2016:4068582. Epub 2016 Sep 6.

Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.

The prevalence of severe obesity in children has doubled in the past decade. The objective of this study is to identify the clinical documentation of obesity in young children with a BMI ≥ 99th percentile at two large tertiary care pediatric hospitals. We used a standardized algorithm utilizing data from electronic health records to identify children with severe early onset obesity (BMI ≥ 99th percentile at age <6 years). We extracted descriptive terms and ICD-9 codes to evaluate documentation of obesity at Boston Children's Hospital and Cincinnati Children's Hospital and Medical Center between 2007 and 2014. A total of 9887 visit records of 2588 children with severe early onset obesity were identified. Based on predefined criteria for documentation of obesity, 21.5% of children (13.5% of visits) had positive documentation, which varied by institution. Documentation in children first seen under 2 years of age was lower than in older children (15% versus 26%). Documentation was significantly higher in girls (29% versus 17%, < 0.001), African American children (27% versus 19% in whites, < 0.001), and the obesity focused specialty clinics (70% versus 15% in primary care and 9% in other subspecialty clinics, < 0.001). There is significant opportunity for improvement in documentation of obesity in young children, even years after the 2007 AAP guidelines for management of obesity.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
September 2016

Identification of Four Novel Loci in Asthma in European American and African American Populations.

Am J Respir Crit Care Med 2017 Feb;195(4):456-463

1 Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania.

Rationale: Despite significant advances in knowledge of the genetic architecture of asthma, specific contributors to the variability in the burden between populations remain uncovered.

Objectives: To identify additional genetic susceptibility factors of asthma in European American and African American populations.

Methods: A phenotyping algorithm mining electronic medical records was developed and validated to recruit cases with asthma and control subjects from the Electronic Medical Records and Genomics network. Genome-wide association analyses were performed in pediatric and adult asthma cases and control subjects with European American and African American ancestry followed by metaanalysis. Nominally significant results were reanalyzed conditioning on allergy status.

Measurements And Main Results: The validation of the algorithm yielded an average of 95.8% positive predictive values for both cases and control subjects. The algorithm accrued 21,644 subjects (65.83% European American and 34.17% African American). We identified four novel population-specific associations with asthma after metaanalyses: loci 6p21.31, 9p21.2, and 10q21.3 in the European American population, and the PTGES gene in African Americans. TEK at 9p21.2, which encodes TIE2, has been shown to be involved in remodeling the airway wall in asthma, and the association remained significant after conditioning by allergy. PTGES, which encodes the prostaglandin E synthase, has also been linked to asthma, where deficient prostaglandin E synthesis has been associated with airway remodeling.

Conclusions: This study adds to understanding of the genetic architecture of asthma in European Americans and African Americans and reinforces the need to study populations of diverse ethnic backgrounds to identify shared and unique genetic predictors of asthma.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
February 2017

Electronic Health Record Based Algorithm to Identify Patients with Autism Spectrum Disorder.

PLoS One 2016 29;11(7):e0159621. Epub 2016 Jul 29.

Harvard Medical School, Pediatrics, Boston, Massachusetts, United States of America.

Objective: Cohort selection is challenging for large-scale electronic health record (EHR) analyses, as International Classification of Diseases 9th edition (ICD-9) diagnostic codes are notoriously unreliable disease predictors. Our objective was to develop, evaluate, and validate an automated algorithm for determining an Autism Spectrum Disorder (ASD) patient cohort from EHR. We demonstrate its utility via the largest investigation to date of the co-occurrence patterns of medical comorbidities in ASD.

Methods: We extracted ICD-9 codes and concepts derived from the clinical notes. A gold standard patient set was labeled by clinicians at Boston Children's Hospital (BCH) (N = 150) and Cincinnati Children's Hospital and Medical Center (CCHMC) (N = 152). Two algorithms were created: (1) rule-based implementing the ASD criteria from Diagnostic and Statistical Manual of Mental Diseases 4th edition, (2) predictive classifier. The positive predictive values (PPV) achieved by these algorithms were compared to an ICD-9 code baseline. We clustered the patients based on grouped ICD-9 codes and evaluated subgroups.

Results: The rule-based algorithm produced the best PPV: (a) BCH: 0.885 vs. 0.273 (baseline); (b) CCHMC: 0.840 vs. 0.645 (baseline); (c) combined: 0.864 vs. 0.460 (baseline). A validation at Children's Hospital of Philadelphia yielded 0.848 (PPV). Clustering analyses of comorbidities on the three-site large cohort (N = 20,658 ASD patients) identified psychiatric, developmental, and seizure disorder clusters.

Conclusions: In a large cross-institutional cohort, co-occurrence patterns of comorbidities in ASDs provide further hypothetical evidence for distinct courses in ASD. The proposed automated algorithms for cohort selection open avenues for other large-scale EHR studies and individualized treatment of ASD.
View Article and Find Full Text PDF

Download full-text PDF

August 2017

Developing an Algorithm to Detect Early Childhood Obesity in Two Tertiary Pediatric Medical Centers.

Appl Clin Inform 2016 07 20;7(3):693-706. Epub 2016 Jul 20.

Todd Lingren, Cincinnati Children's Hospital Medical Center, Biomedical Informatics, 3333 Burnet Avenue, MLC 7024 Cincinnati, OH 45229-3039, Phone: 513-803-9032, Fax: 513-636-2056, Email:

Objective: The objective of this study is to develop an algorithm to accurately identify children with severe early onset childhood obesity (ages 1-5.99 years) using structured and unstructured data from the electronic health record (EHR).

Introduction: Childhood obesity increases risk factors for cardiovascular morbidity and vascular disease. Accurate definition of a high precision phenotype through a standardize tool is critical to the success of large-scale genomic studies and validating rare monogenic variants causing severe early onset obesity.

Data And Methods: Rule based and machine learning based algorithms were developed using structured and unstructured data from two EHR databases from Boston Children's Hospital (BCH) and Cincinnati Children's Hospital and Medical Center (CCHMC). Exclusion criteria including medications or comorbid diagnoses were defined. Machine learning algorithms were developed using cross-site training and testing in addition to experimenting with natural language processing features.

Results: Precision was emphasized for a high fidelity cohort. The rule-based algorithm performed the best overall, 0.895 (CCHMC) and 0.770 (BCH). The best feature set for machine learning employed Unified Medical Language System (UMLS) concept unique identifiers (CUIs), ICD-9 codes, and RxNorm codes.

Conclusions: Detecting severe early childhood obesity is essential for the intervention potential in children at the highest long-term risk of developing comorbidities related to obesity and excluding patients with underlying pathological and non-syndromic causes of obesity assists in developing a high-precision cohort for genetic study. Further such phenotyping efforts inform future practical application in health care environments utilizing clinical decision support.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
July 2016

PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability.

J Am Med Inform Assoc 2016 11 28;23(6):1046-1052. Epub 2016 Mar 28.

Vanderbilt University Medical Center, Nashville, TN, USA.

Objective: Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB,, an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites.

Results: As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%).

Discussion: These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others.

Conclusion: By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
November 2016

A GWAS Study on Liver Function Test Using eMERGE Network Participants.

PLoS One 2015 28;10(9):e0138677. Epub 2015 Sep 28.

Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center (CCHMC), Cincinnati, OH, United States of America; University of Cincinnati, College of Medicine, Cincinnati, OH, United States of America; U.S. Department of Veterans Affairs Medical Center, Cincinnati, OH, United States of America.

Introduction: Liver enzyme levels and total serum bilirubin are under genetic control and in recent years genome-wide population-based association studies have identified different susceptibility loci for these traits. We conducted a genome-wide association study in European ancestry participants from the Electronic Medical Records and Genomics (eMERGE) Network dataset of patient medical records with available genotyping data in order to identify genetic contributors to variability in serum bilirubin levels and other liver function tests and to compare the effects between adult and pediatric populations.

Methods: The process of whole genome imputation of eMERGE samples with standard quality control measures have been described previously. After removing missing data and outliers based on principal components (PC) analyses, 3294 samples from European ancestry were used for the GWAS study. The association between each single nucleotide polymorphism (SNP) and total serum bilirubin and other liver function tests was tested using linear regression, adjusting for age, gender, site, platform and ancestry principal components (PC).

Results: Consistent with previous results, a strong association signal has been detected for UGT1A gene cluster (best SNP rs887829, beta = 0.15, p = 1.30x10-118) for total serum bilirubin level. Indeed, in this region more than 176 SNPs (or indels) had p<10-8 spanning 150Kb on the long arm of chromosome 2q37.1. In addition, we found a similar level of magnitude in a pediatric group (p = 8.26x10-47, beta = 0.17). Further imputation using sequencing data as a reference panel revealed association of other markers including known TA7 repeat indels (rs8175347) (p = 9.78x10-117) and rs111741722 (p = 5.41x10-119) which were in proxy (r2 = 0.99) with rs887829. Among rare variants, two Asian subjects homozygous for coding SNP rs4148323 (G71R) were identified. Additional known effects for total serum bilirubin were also confirmed including organic anion transporters SLCO1B1-SLCO1B3, TDRP and ZMYND8 at FDR<0.05 with no gene-gene interaction effects. Phenome-wide association studies (PheWAS) suggest a protective effect of TA7 repeat against cerebrovascular disease in an adult cohort (OR = 0.75, p = 0.0008). Among other liver function tests, we also confirmed the previous effect of the ABO blood group locus for variation in serum alkaline phosphatase (rs579459, p = 9.44x10-15).

Conclusions: Taken together, our data present interesting findings with strong confirmation of previous effects by simply using the eMERGE electronic health record phenotyping. In addition, our findings indicate that similar to the adult population, the UGT1A1 is the main locus responsible for normal variation of serum bilirubin in pediatric populations.
View Article and Find Full Text PDF

Download full-text PDF

June 2016

Desiderata for computable representations of electronic health records-driven phenotype algorithms.

J Am Med Inform Assoc 2015 Nov 5;22(6):1220-30. Epub 2015 Sep 5.

Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.

Background: Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM).

Methods: A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms.

Results: We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility.

Conclusion: A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
November 2015

Automated detection of medication administration errors in neonatal intensive care.

J Biomed Inform 2015 Oct 17;57:124-33. Epub 2015 Jul 17.

Division of Neonatology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States. Electronic address:

Objective: To improve neonatal patient safety through automated detection of medication administration errors (MAEs) in high alert medications including narcotics, vasoactive medication, intravenous fluids, parenteral nutrition, and insulin using the electronic health record (EHR); to evaluate rates of MAEs in neonatal care; and to compare the performance of computerized algorithms to traditional incident reporting for error detection.

Methods: We developed novel computerized algorithms to identify MAEs within the EHR of all neonatal patients treated in a level four neonatal intensive care unit (NICU) in 2011 and 2012. We evaluated the rates and types of MAEs identified by the automated algorithms and compared their performance to incident reporting. Performance was evaluated by physician chart review.

Results: In the combined 2011 and 2012 NICU data sets, the automated algorithms identified MAEs at the following rates: fentanyl, 0.4% (4 errors/1005 fentanyl administration records); morphine, 0.3% (11/4009); dobutamine, 0 (0/10); and milrinone, 0.3% (5/1925). We found higher MAE rates for other vasoactive medications including: dopamine, 11.6% (5/43); epinephrine, 10.0% (289/2890); and vasopressin, 12.8% (54/421). Fluid administration error rates were similar: intravenous fluids, 3.2% (273/8567); parenteral nutrition, 3.2% (649/20124); and lipid administration, 1.3% (203/15227). We also found 13 insulin administration errors with a resulting rate of 2.9% (13/456). MAE rates were higher for medications that were adjusted frequently and fluids administered concurrently. The algorithms identified many previously unidentified errors, demonstrating significantly better sensitivity (82% vs. 5%) and precision (70% vs. 50%) than incident reporting for error recognition.

Conclusions: Automated detection of medication administration errors through the EHR is feasible and performs better than currently used incident reporting systems. Automated algorithms may be useful for real-time error identification and mitigation.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
October 2015

An end-to-end hybrid algorithm for automated medication discrepancy detection.

BMC Med Inform Decis Mak 2015 May 6;15:37. Epub 2015 May 6.

Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA.

Background: In this study we implemented and developed state-of-the-art machine learning (ML) and natural language processing (NLP) technologies and built a computerized algorithm for medication reconciliation. Our specific aims are: (1) to develop a computerized algorithm for medication discrepancy detection between patients' discharge prescriptions (structured data) and medications documented in free-text clinical notes (unstructured data); and (2) to assess the performance of the algorithm on real-world medication reconciliation data.

Methods: We collected clinical notes and discharge prescription lists for all 271 patients enrolled in the Complex Care Medical Home Program at Cincinnati Children's Hospital Medical Center between 1/1/2010 and 12/31/2013. A double-annotated, gold-standard set of medication reconciliation data was created for this collection. We then developed a hybrid algorithm consisting of three processes: (1) a ML algorithm to identify medication entities from clinical notes, (2) a rule-based method to link medication names with their attributes, and (3) a NLP-based, hybrid approach to match medications with structured prescriptions in order to detect medication discrepancies. The performance was validated on the gold-standard medication reconciliation data, where precision (P), recall (R), F-value (F) and workload were assessed.

Results: The hybrid algorithm achieved 95.0%/91.6%/93.3% of P/R/F on medication entity detection and 98.7%/99.4%/99.1% of P/R/F on attribute linkage. The medication matching achieved 92.4%/90.7%/91.5% (P/R/F) on identifying matched medications in the gold-standard and 88.6%/82.5%/85.5% (P/R/F) on discrepant medications. By combining all processes, the algorithm achieved 92.4%/90.7%/91.5% (P/R/F) and 71.5%/65.2%/68.2% (P/R/F) on identifying the matched and the discrepant medications, respectively. The error analysis on algorithm outputs identified challenges to be addressed in order to improve medication discrepancy detection.

Conclusion: By leveraging ML and NLP technologies, an end-to-end, computerized algorithm achieves promising outcome in reconciling medications between clinical notes and discharge prescriptions.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
May 2015

Increasing the efficiency of trial-patient matching: automated clinical trial eligibility pre-screening for pediatric oncology patients.

BMC Med Inform Decis Mak 2015 Apr 14;15:28. Epub 2015 Apr 14.

Cincinnati Children's Hospital Medical Center, Department of Biomedical Informatics, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, USA.

Background: Manual eligibility screening (ES) for a clinical trial typically requires a labor-intensive review of patient records that utilizes many resources. Leveraging state-of-the-art natural language processing (NLP) and information extraction (IE) technologies, we sought to improve the efficiency of physician decision-making in clinical trial enrollment. In order to markedly reduce the pool of potential candidates for staff screening, we developed an automated ES algorithm to identify patients who meet core eligibility characteristics of an oncology clinical trial.

Methods: We collected narrative eligibility criteria from for 55 clinical trials actively enrolling oncology patients in our institution between 12/01/2009 and 10/31/2011. In parallel, our ES algorithm extracted clinical and demographic information from the Electronic Health Record (EHR) data fields to represent profiles of all 215 oncology patients admitted to cancer treatment during the same period. The automated ES algorithm then matched the trial criteria with the patient profiles to identify potential trial-patient matches. Matching performance was validated on a reference set of 169 historical trial-patient enrollment decisions, and workload, precision, recall, negative predictive value (NPV) and specificity were calculated.

Results: Without automation, an oncologist would need to review 163 patients per trial on average to replicate the historical patient enrollment for each trial. This workload is reduced by 85% to 24 patients when using automated ES (precision/recall/NPV/specificity: 12.6%/100.0%/100.0%/89.9%). Without automation, an oncologist would need to review 42 trials per patient on average to replicate the patient-trial matches that occur in the retrospective data set. With automated ES this workload is reduced by 90% to four trials (precision/recall/NPV/specificity: 35.7%/100.0%/100.0%/95.5%).

Conclusion: By leveraging NLP and IE technologies, automated ES could dramatically increase the trial screening efficiency of oncologists and enable participation of small practices, which are often left out from trial enrollment. The algorithm has the potential to significantly reduce the effort to execute clinical research at a point in time when new initiatives of the cancer care community intend to greatly expand both the access to trials and the number of available trials.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
April 2015

Phenome-wide association study (PheWAS) in EMR-linked pediatric cohorts, genetically links PLCL1 to speech language development and IL5-IL13 to Eosinophilic Esophagitis.

Front Genet 2014 18;5:401. Epub 2014 Nov 18.

Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA ; College of Medicine, University of Cincinnati Cincinnati, OH, USA ; U.S. Department of Veterans Affairs Medical Center Cincinnati, OH, USA.

Objective: We report the first pediatric specific Phenome-Wide Association Study (PheWAS) using electronic medical records (EMRs). Given the early success of PheWAS in adult populations, we investigated the feasibility of this approach in pediatric cohorts in which associations between a previously known genetic variant and a wide range of clinical or physiological traits were evaluated. Although computationally intensive, this approach has potential to reveal disease mechanistic relationships between a variant and a network of phenotypes.

Method: Data on 5049 samples of European ancestry were obtained from the EMRs of two large academic centers in five different genotyped cohorts. Recently, these samples have undergone whole genome imputation. After standard quality controls, removing missing data and outliers based on principal components analyses (PCA), 4268 samples were used for the PheWAS study. We scanned for associations between 2476 single-nucleotide polymorphisms (SNP) with available genotyping data from previously published GWAS studies and 539 EMR-derived phenotypes. The false discovery rate was calculated and, for any new PheWAS findings, a permutation approach (with up to 1,000,000 trials) was implemented.

Results: This PheWAS found a variety of common variants (MAF > 10%) with prior GWAS associations in our pediatric cohorts including Juvenile Rheumatoid Arthritis (JRA), Asthma, Autism and Pervasive Developmental Disorder (PDD) and Type 1 Diabetes with a false discovery rate < 0.05 and power of study above 80%. In addition, several new PheWAS findings were identified including a cluster of association near the NDFIP1 gene for mental retardation (best SNP rs10057309, p = 4.33 × 10(-7), OR = 1.70, 95%CI = 1.38 - 2.09); association near PLCL1 gene for developmental delays and speech disorder [best SNP rs1595825, p = 1.13 × 10(-8), OR = 0.65(0.57 - 0.76)]; a cluster of associations in the IL5-IL13 region with Eosinophilic Esophagitis (EoE) [best at rs12653750, p = 3.03 × 10(-9), OR = 1.73 95%CI = (1.44 - 2.07)], previously implicated in asthma, allergy, and eosinophilia; and association of variants in GCKR and JAZF1 with allergic rhinitis in our pediatric cohorts [best SNP rs780093, p = 2.18 × 10(-5), OR = 1.39, 95%CI = (1.19 - 1.61)], previously demonstrated in metabolic disease and diabetes in adults.

Conclusion: The PheWAS approach with re-mapping ICD-9 structured codes for our European-origin pediatric cohorts, as with the previous adult studies, finds many previously reported associations as well as presents the discovery of associations with potentially important clinical implications.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
December 2014

Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department.

J Am Med Inform Assoc 2015 Jan 16;22(1):166-78. Epub 2014 Jul 16.

Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA James M Anderson Center for Health Systems Excellence, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

Objectives: (1) To develop an automated eligibility screening (ES) approach for clinical trials in an urban tertiary care pediatric emergency department (ED); (2) to assess the effectiveness of natural language processing (NLP), information extraction (IE), and machine learning (ML) techniques on real-world clinical data and trials.

Data And Methods: We collected eligibility criteria for 13 randomly selected, disease-specific clinical trials actively enrolling patients between January 1, 2010 and August 31, 2012. In parallel, we retrospectively selected data fields including demographics, laboratory data, and clinical notes from the electronic health record (EHR) to represent profiles of all 202795 patients visiting the ED during the same period. Leveraging NLP, IE, and ML technologies, the automated ES algorithms identified patients whose profiles matched the trial criteria to reduce the pool of candidates for staff screening. The performance was validated on both a physician-generated gold standard of trial-patient matches and a reference standard of historical trial-patient enrollment decisions, where workload, mean average precision (MAP), and recall were assessed.

Results: Compared with the case without automation, the workload with automated ES was reduced by 92% on the gold standard set, with a MAP of 62.9%. The automated ES achieved a 450% increase in trial screening efficiency. The findings on the gold standard set were confirmed by large-scale evaluation on the reference set of trial-patient matches.

Discussion And Conclusion: By exploiting the text of trial criteria and the content of EHRs, we demonstrated that NLP-, IE-, and ML-based automated ES could successfully identify patients for clinical trials.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
January 2015

Developing and evaluating a machine learning based algorithm to predict the need of pediatric intensive care unit transfer for newly hospitalized children.

Resuscitation 2014 Aug 9;85(8):1065-71. Epub 2014 May 9.

Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; James M. Anderson Center for Health Systems Excellence, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA. Electronic address:

Background: Early warning scores (EWS) are designed to identify early clinical deterioration by combining physiologic and/or laboratory measures to generate a quantified score. Current EWS leverage only a small fraction of Electronic Health Record (EHR) content. The planned widespread implementation of EHRs brings the promise of abundant data resources for prediction purposes. The three specific aims of our research are: (1) to develop an EHR-based automated algorithm to predict the need for Pediatric Intensive Care Unit (PICU) transfer in the first 24h of admission; (2) to evaluate the performance of the new algorithm on a held-out test data set; and (3) to compare the effectiveness of the new algorithm's with those of two published Pediatric Early Warning Scores (PEWS).

Methods: The cases were comprised of 526 encounters with 24-h Pediatric Intensive Care Unit (PICU) transfer. In addition to the cases, we randomly selected 6772 control encounters from 62516 inpatient admissions that were never transferred to the PICU. We used 29 variables in a logistic regression and compared our algorithm against two published PEWS on a held-out test data set.

Results: The logistic regression algorithm achieved 0.849 (95% CI 0.753-0.945) sensitivity, 0.859 (95% CI 0.850-0.868) specificity and 0.912 (95% CI 0.905-0.919) area under the curve (AUC) in the test set. Our algorithm's AUC was significantly higher, by 11.8 and 22.6% in the test set, than two published PEWS.

Conclusion: The novel algorithm achieved higher sensitivity, specificity, and AUC than the two PEWS reported in the literature.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
August 2014

Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.

J Biomed Inform 2014 Aug 17;50:173-183. Epub 2014 Feb 17.

Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.

Objective: The current study aims to fill the gap in available healthcare de-identification resources by creating a new sharable dataset with realistic Protected Health Information (PHI) without reducing the value of the data for de-identification research. By releasing the annotated gold standard corpus with Data Use Agreement we would like to encourage other Computational Linguists to experiment with our data and develop new machine learning models for de-identification. This paper describes: (1) the modifications required by the Institutional Review Board before sharing the de-identification gold standard corpus; (2) our efforts to keep the PHI as realistic as possible; (3) and the tests to show the effectiveness of these efforts in preserving the value of the modified data set for machine learning model development.

Materials And Methods: In a previous study we built an original de-identification gold standard corpus annotated with true Protected Health Information (PHI) from 3503 randomly selected clinical notes for the 22 most frequent clinical note types of our institution. In the current study we modified the original gold standard corpus to make it suitable for external sharing by replacing HIPAA-specified PHI with newly generated realistic PHI. Finally, we evaluated the research value of this new dataset by comparing the performance of an existing published in-house de-identification system, when trained on the new de-identification gold standard corpus, with the performance of the same system, when trained on the original corpus. We assessed the potential benefits of using the new de-identification gold standard corpus to identify PHI in the i2b2 and PhysioNet datasets that were released by other groups for de-identification research. We also measured the effectiveness of the i2b2 and PhysioNet de-identification gold standard corpora in identifying PHI in our original clinical notes.

Results: Performance of the de-identification system using the new gold standard corpus as a training set was very close to training on the original corpus (92.56 vs. 93.48 overall F-measures). Best i2b2/PhysioNet/CCHMC cross-training performances were obtained when training on the new shared CCHMC gold standard corpus, although performances were still lower than corpus-specific trainings.

Discussion And Conclusion: We successfully modified a de-identification dataset for external sharing while preserving the de-identification research value of the modified gold standard corpus with limited drop in machine learning de-identification performance.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
August 2014

Phenotyping for patient safety: algorithm development for electronic health record based automated adverse event and medical error detection in neonatal intensive care.

J Am Med Inform Assoc 2014 Sep-Oct;21(5):776-84. Epub 2014 Jan 8.

Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA James M Anderson Center for Health Systems Excellence, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

Background: Although electronic health records (EHRs) have the potential to provide a foundation for quality and safety algorithms, few studies have measured their impact on automated adverse event (AE) and medical error (ME) detection within the neonatal intensive care unit (NICU) environment.

Objective: This paper presents two phenotyping AE and ME detection algorithms (ie, IV infiltrations, narcotic medication oversedation and dosing errors) and describes manual annotation of airway management and medication/fluid AEs from NICU EHRs.

Methods: From 753 NICU patient EHRs from 2011, we developed two automatic AE/ME detection algorithms, and manually annotated 11 classes of AEs in 3263 clinical notes. Performance of the automatic AE/ME detection algorithms was compared to trigger tool and voluntary incident reporting results. AEs in clinical notes were double annotated and consensus achieved under neonatologist supervision. Sensitivity, positive predictive value (PPV), and specificity are reported.

Results: Twelve severe IV infiltrates were detected. The algorithm identified one more infiltrate than the trigger tool and eight more than incident reporting. One narcotic oversedation was detected demonstrating 100% agreement with the trigger tool. Additionally, 17 narcotic medication MEs were detected, an increase of 16 cases over voluntary incident reporting.

Conclusions: Automated AE/ME detection algorithms provide higher sensitivity and PPV than currently used trigger tools or voluntary incident-reporting systems, including identification of potential dosing and frequency errors that current methods are unequipped to detect.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
October 2014

EMR-linked GWAS study: investigation of variation landscape of loci for body mass index in children.

Front Genet 2013 3;4:268. Epub 2013 Dec 3.

Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA ; Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA ; School of Medicine, University of Cincinnati Cincinnati, OH, USA ; Department of Veteran Affairs Medical Center Cincinnati, OH, USA.

Unlabelled: Common variations at the loci harboring the fat mass and obesity gene (FTO), MC4R, and TMEM18 are consistently reported as being associated with obesity and body mass index (BMI) especially in adult population. In order to confirm this effect in pediatric population five European ancestry cohorts from pediatric eMERGE-II network (CCHMC-BCH) were evaluated.

Method: Data on 5049 samples of European ancestry were obtained from the Electronic Medical Records (EMRs) of two large academic centers in five different genotyped cohorts. For all available samples, gender, age, height, and weight were collected and BMI was calculated. To account for age and sex differences in BMI, BMI z-scores were generated using 2000 Centers of Disease Control and Prevention (CDC) growth charts. A Genome-wide association study (GWAS) was performed with BMI z-score. After removing missing data and outliers based on principal components (PC) analyses, 2860 samples were used for the GWAS study. The association between each single nucleotide polymorphism (SNP) and BMI was tested using linear regression adjusting for age, gender, and PC by cohort. The effects of SNPs were modeled assuming additive, recessive, and dominant effects of the minor allele. Meta-analysis was conducted using a weighted z-score approach.

Results: The mean age of subjects was 9.8 years (range 2-19). The proportion of male subjects was 56%. In these cohorts, 14% of samples had a BMI ≥95 and 28 ≥ 85%. Meta analyses produced a signal at 16q12 genomic region with the best result of p = 1.43 × 10(-) (7) [p (rec) = 7.34 × 10(-) (8)) for the SNP rs8050136 at the first intron of FTO gene (z = 5.26) and with no heterogeneity between cohorts (p = 0.77). Under a recessive model, another published SNP at this locus, rs1421085, generates the best result [z = 5.782, p (rec) = 8.21 × 10(-) (9)]. Imputation in this region using dense 1000-Genome and Hapmap CEU samples revealed 71 SNPs with p < 10(-) (6), all at the first intron of FTO locus. When hetero-geneity was permitted between cohorts, signals were also obtained in other previously identified loci, including MC4R (rs12964056, p = 6.87 × 10(-) (7), z = -4.98), cholecystokinin CCK (rs8192472, p = 1.33 × 10(-) (6), z = -4.85), Interleukin 15 (rs2099884, p = 1.27 × 10(-) (5), z = 4.34), low density lipoprotein receptor-related protein 1B [LRP1B (rs7583748, p = 0.00013, z = -3.81)] and near transmembrane protein 18 (TMEM18) (rs7561317, p = 0.001, z = -3.17). We also detected a novel locus at chromosome 3 at COL6A5 [best SNP = rs1542829, minor allele frequency (MAF) of 5% p = 4.35 × 10(-) (9), z = 5.89].

Conclusion: An EMR linked cohort study demonstrates that the BMI-Z measurements can be successfully extracted and linked to genomic data with meaningful confirmatory results. We verified the high prevalence of childhood rate of overweight and obesity in our cohort (28%). In addition, our data indicate that genetic variants in the first intron of FTO, a known adult genetic risk factor for BMI, are also robustly associated with BMI in pediatric population.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
December 2013

Developing and evaluating an automated appendicitis risk stratification algorithm for pediatric patients in the emergency department.

J Am Med Inform Assoc 2013 Dec 15;20(e2):e212-20. Epub 2013 Oct 15.

Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

Objective: To evaluate a proposed natural language processing (NLP) and machine-learning based automated method to risk stratify abdominal pain patients by analyzing the content of the electronic health record (EHR).

Methods: We analyzed the EHRs of a random sample of 2100 pediatric emergency department (ED) patients with abdominal pain, including all with a final diagnosis of appendicitis. We developed an automated system to extract relevant elements from ED physician notes and lab values and to automatically assign a risk category for acute appendicitis (high, equivocal, or low), based on the Pediatric Appendicitis Score. We evaluated the performance of the system against a manually created gold standard (chart reviews by ED physicians) for recall, specificity, and precision.

Results: The system achieved an average F-measure of 0.867 (0.869 recall and 0.863 precision) for risk classification, which was comparable to physician experts. Recall/precision were 0.897/0.952 in the low-risk category, 0.855/0.886 in the high-risk category, and 0.854/0.766 in the equivocal-risk category. The information that the system required as input to achieve high F-measure was available within the first 4 h of the ED visit.

Conclusions: Automated appendicitis risk categorization based on EHR content, including information from clinical notes, shows comparable performance to physician chart reviewers as measured by their inter-annotator agreement and represents a promising new approach for computerized decision support to promote application of evidence-based medicine at the point of care.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
December 2013

Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements.

J Am Med Inform Assoc 2014 May-Jun;21(3):406-13. Epub 2013 Sep 3.

Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

Objective: To present a series of experiments: (1) to evaluate the impact of pre-annotation on the speed of manual annotation of clinical trial announcements; and (2) to test for potential bias, if pre-annotation is utilized.

Methods: To build the gold standard, 1400 clinical trial announcements from the website were randomly selected and double annotated for diagnoses, signs, symptoms, Unified Medical Language System (UMLS) Concept Unique Identifiers, and SNOMED CT codes. We used two dictionary-based methods to pre-annotate the text. We evaluated the annotation time and potential bias through F-measures and ANOVA tests and implemented Bonferroni correction.

Results: Time savings ranged from 13.85% to 21.5% per entity. Inter-annotator agreement (IAA) ranged from 93.4% to 95.5%. There was no statistically significant difference for IAA and annotator performance in pre-annotations.

Conclusions: On every experiment pair, the annotator with the pre-annotated text needed less time to annotate than the annotator with non-labeled text. The time savings were statistically significant. Moreover, the pre-annotation did not reduce the IAA or annotator performance. Dictionary-based pre-annotation is a feasible and practical method to reduce the cost of annotation of clinical named entity recognition in the eligibility sections of clinical trial announcements without introducing bias in the annotation process.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
June 2014

Mining FDA drug labels for medical conditions.

BMC Med Inform Decis Mak 2013 Apr 24;13:53. Epub 2013 Apr 24.

Division of Biomedical Informatics, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA.

Background: Cincinnati Children's Hospital Medical Center (CCHMC) has built the initial Natural Language Processing (NLP) component to extract medications with their corresponding medical conditions (Indications, Contraindications, Overdosage, and Adverse Reactions) as triples of medication-related information ([(1) drug name]-[(2) medical condition]-[(3) LOINC section header]) for an intelligent database system, in order to improve patient safety and the quality of health care. The Food and Drug Administration's (FDA) drug labels are used to demonstrate the feasibility of building the triples as an intelligent database system task.

Methods: This paper discusses a hybrid NLP system, called AutoMCExtractor, to collect medical conditions (including disease/disorder and sign/symptom) from drug labels published by the FDA. Altogether, 6,611 medical conditions in a manually-annotated gold standard were used for the system evaluation. The pre-processing step extracted the plain text from XML file and detected eight related LOINC sections (e.g. Adverse Reactions, Warnings and Precautions) for medical condition extraction. Conditional Random Fields (CRF) classifiers, trained on token, linguistic, and semantic features, were then used for medical condition extraction. Lastly, dictionary-based post-processing corrected boundary-detection errors of the CRF step. We evaluated the AutoMCExtractor on manually-annotated FDA drug labels and report the results on both token and span levels.

Results: Precision, recall, and F-measure were 0.90, 0.81, and 0.85, respectively, for the span level exact match; for the token-level evaluation, precision, recall, and F-measure were 0.92, 0.73, and 0.82, respectively.

Conclusions: The results demonstrate that (1) medical conditions can be extracted from FDA drug labels with high performance; and (2) it is feasible to develop a framework for an intelligent database system.
View Article and Find Full Text PDF

Download full-text PDF

Source Listing
April 2013