Publications by authors named "Juan M Banda"

30 Publications

  • Page 1 of 1

Characterizing all-cause excess mortality patterns during COVID-19 pandemic in Mexico.

BMC Infect Dis 2021 May 7;21(1):432. Epub 2021 May 7.

Department of Population Health Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA.

Background: Low testing rates and delays in reporting hinder the estimation of the mortality burden associated with the COVID-19 pandemic. During a public health emergency, estimating all cause excess deaths above an expected level of death can provide a more reliable picture of the mortality burden. Here, we aim to estimate the absolute and relative mortality impact of COVID-19 pandemic in Mexico.

Methods: We obtained weekly mortality time series due to all causes for Mexico, and by gender, and geographic region from 2015 to 2020. We also compiled surveillance data on COVID-19 cases and deaths to assess the timing and intensity of the pandemic and assembled weekly series of the proportion of tweets about 'death' from Mexico to assess the correlation between people's media interaction about 'death' and the rise in pandemic deaths. We estimated all-cause excess mortality rates and mortality rate ratio increase over baseline by fitting Serfling regression models and forecasted the total excess deaths for Mexico for the first 4 weeks of 2021 using the generalized logistic growth model.

Results: We estimated the all-cause excess mortality rate associated with the COVID-19 pandemic in Mexico in 2020 at 26.10 per 10,000 population, which corresponds to 333,538 excess deaths. Males had about 2-fold higher excess mortality rate (33.99) compared to females (18.53). Mexico City reported the highest excess death rate (63.54) and RR (2.09) compared to rest of the country (excess rate = 23.25, RR = 1.62). While COVID-19 deaths accounted for only 38.64% of total excess deaths in Mexico, our forecast estimate that Mexico has accumulated a total of ~ 61,610 [95% PI: 60,003, 63,216] excess deaths in the first 4 weeks of 2021. Proportion of tweets was significantly correlated with the excess mortality (ρ = 0.508 [95% CI: 0.245, 0.701], p-value = 0.0004).

Conclusion: The COVID-19 pandemic has heavily affected Mexico. The lab-confirmed COVID-19 deaths accounted for only 38.64% of total all cause excess deaths (333,538) in Mexico in 2020. This reflects either the effect of low testing rates in Mexico, or the surge in number of deaths due to other causes during the pandemic. A model-based forecast indicates that an average of 61,610 excess deaths have occurred in January 2021.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12879-021-06122-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8104040PMC
May 2021

Normalizing Clinical Document Titles to LOINC Document Ontology: an Initial Study.

AMIA Annu Symp Proc 2020 25;2020:1441-1450. Epub 2021 Jan 25.

School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.

The normalization of clinical documents is essential for health information management with the enormous amount of clinical documentation generated each year. The LOINC Document Ontology (DO) is a universal clinical document standard in a hierarchical structure. The objective of this study is to investigate the feasibility and generalizability of LOINC DO by mapping from clinical note titles across five institutions to five DO axes. We first developed an annotation framework based on the definition of LOINC DO axes and manually mapped 4,000 titles. Then we introduced a pre-trained deep learning model named Bidirectional Encoder Representations from Transformers (BERT) to enable automatic mapping from titles to LOINC DO axes. The results showed that the BERT-based automatic mapping achieved improved performance compared with the baseline model. By analyzing both manual annotations and predicted results, ambiguities in LOINC DO axes definition were discussed.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075502PMC
January 2021

A Minimal Information Model for Potential Drug-Drug Interactions.

Front Pharmacol 2020 8;11:608068. Epub 2021 Mar 8.

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States.

Despite the significant health impacts of adverse events associated with drug-drug interactions, no standard models exist for managing and sharing evidence describing potential interactions between medications. Minimal information models have been used in other communities to establish community consensus around simple models capable of communicating useful information. This paper reports on a new minimal information model for describing potential drug-drug interactions. A task force of the Semantic Web in Health Care and Life Sciences Community Group of the World-Wide Web consortium engaged informaticians and drug-drug interaction experts in in-depth examination of recent literature and specific potential interactions. A consensus set of information items was identified, along with example descriptions of selected potential drug-drug interactions (PDDIs). User profiles and use cases were developed to demonstrate the applicability of the model. Ten core information items were identified: drugs involved, clinical consequences, seriousness, operational classification statement, recommended action, mechanism of interaction, contextual information/modifying factors, evidence about a suspected drug-drug interaction, frequency of exposure, and frequency of harm to exposed persons. Eight best practice recommendations suggest how PDDI knowledge artifact creators can best use the 10 information items when synthesizing drug interaction evidence into artifacts intended to aid clinicians. This model has been included in a proposed implementation guide developed by the HL7 Clinical Decision Support Workgroup and in PDDIs published in the CDS Connect repository. The complete description of the model can be found at https://w3id.org/hclscg/pddi.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fphar.2020.608068DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7982727PMC
March 2021

ACE: the Advanced Cohort Engine for searching longitudinal patient records.

J Am Med Inform Assoc 2021 Mar 13. Epub 2021 Mar 13.

Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA.

Objective: To propose a paradigm for a scalable time-aware clinical data search, and to describe the design, implementation and use of a search engine realizing this paradigm.

Materials And Methods: The Advanced Cohort Engine (ACE) uses a temporal query language and in-memory datastore of patient objects to provide a fast, scalable, and expressive time-aware search. ACE accepts data in the Observational Medicine Outcomes Partnership Common Data Model, and is configurable to balance performance with compute cost. ACE's temporal query language supports automatic query expansion using clinical knowledge graphs. The ACE API can be used with R, Python, Java, HTTP, and a Web UI.

Results: ACE offers an expressive query language for complex temporal search across many clinical data types with multiple output options. ACE enables electronic phenotyping and cohort-building with subsecond response times in searching the data of millions of patients for a variety of use cases.

Discussion: ACE enables fast, time-aware search using a patient object-centric datastore, thereby overcoming many technical and design shortcomings of relational algebra-based querying. Integrating electronic phenotype development with cohort-building enables a variety of high-value uses for a learning health system. Tradeoffs include the need to learn a new query language and the technical setup burden.

Conclusion: ACE is a tool that combines a unique query language for time-aware search of longitudinal patient records with a patient object datastore for rapid electronic phenotyping, cohort extraction, and exploratory data analyses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocab027DOI Listing
March 2021

Characterization of Anonymous Physician Perspectives on COVID-19 Using Social Media Data.

Pac Symp Biocomput 2021 ;26:95-106

Data Science to Patient Value, University of Colorado School of Medicine, Aurora, CO 80045, USA* Corresponding author,

Physicians' beliefs and attitudes about COVID-19 are important to ascertain because of their central role in providing care to patients during the pandemic. Identifying topics and sentiments discussed by physicians and other healthcare workers can lead to identification of gaps relating to theCOVID-19 pandemic response within the healthcare system. To better understand physicians' perspectives on the COVID-19 response, we extracted Twitter data from a specific user group that allows physicians to stay anonymous while expressing their perspectives about the COVID-19 pandemic. All tweets were in English. We measured most frequent bigrams and trigrams, compared sentiment analysis methods, and compared our findings to a larger Twitter dataset containing general COVID-19 related discourse. We found significant differences between the two datasets for specific topical phrases. No statistically significant difference was found in sentiments between the two datasets, and both trended slightly more positive than negative. Upon comparison to manual sentiment analysis, it was determined that these sentiment analysis methods should be improved to accurately capture sentiments of anonymous physician data. Anonymous physician social media data is a unique source of information that provides important insights into COVID-19 perspectives.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7958992PMC
March 2021

Unraveling COVID-19: a large-scale characterization of 4.5 million COVID-19 cases using CHARYBDIS.

Res Sq 2021 Mar 1. Epub 2021 Mar 1.

Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response [1,2]. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) [3] Characterizing Health Associated Risks, and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation and analysis of COVID-19 RWD. We conducted a descriptive cohort study using a federated network of data partners in the United States, Europe (the Netherlands, Spain, the UK, Germany, France and Italy) and Asia (South Korea and China). The study protocol and analytical package were released on 11 June 2020 and are iteratively updated via GitHub [4]. We identified three non-mutually exclusive cohorts of 4,537,153 individuals with a clinical 886,193 , and 113,627 . All comorbidities, symptoms, medications, and outcomes are described by cohort in aggregate counts, and are available in an interactive website: https://data.ohdsi.org/Covid19CharacterizationCharybdis/. CHARYBDIS findings provide benchmarks that contribute to our understanding of COVID-19 progression, management and evolution over time. This can enable timely assessment of real-world outcomes of preventative and therapeutic options as they are introduced in clinical practice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.21203/rs.3.rs-279400/v1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7941629PMC
March 2021

Pulse of the Pandemic: Iterative Topic Filtering for Clinical Information Extraction from Social Media.

ArXiv 2021 Feb 13. Epub 2021 Feb 13.

The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the "#medtwitter" community on Twitter). However, identifying clinically-relevant content in social media without manual labeling is a challenge because of the sheer volume of irrelevant data. We present an unsupervised, iterative approach to mine clinically relevant information from social media data, which begins by heuristically filtering for HCP-authored texts and incorporates topic modeling and concept extraction with MetaMap. This approach identifies granular topics and tweets with high clinical relevance from a set of about 52 million COVID-19-related tweets from January to mid-June 2020. We also show that because the technique does not require manual labeling, it can be used to identify emerging topics on a week-to-week basis. Our method can aid in future public-health emergencies by facilitating knowledge transfer among healthcare workers in a rapidly-changing information environment, and by providing an efficient and unsupervised way of highlighting potential areas for clinical research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7885911PMC
February 2021

Pulse of the Pandemic: Iterative Topic Filtering for Clinical Information Extraction from Social Media.

ArXiv 2021 Feb 13. Epub 2021 Feb 13.

The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the "#medtwitter" community on Twitter). However, identifying clinically-relevant content in social media without manual labeling is a challenge because of the sheer volume of irrelevant data. We present an unsupervised, iterative approach to mine clinically relevant information from social media data, which begins by heuristically filtering for HCP-authored texts and incorporates topic modeling and concept extraction with MetaMap. This approach identifies granular topics and tweets with high clinical relevance from a set of about 52 million COVID-19-related tweets from January to mid-June 2020. We also show that because the technique does not require manual labeling, it can be used to identify emerging topics on a week-to-week basis. Our method can aid in future public-health emergencies by facilitating knowledge transfer among healthcare workers in a rapidly-changing information environment, and by providing an efficient and unsupervised way of highlighting potential areas for clinical research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7885911PMC
February 2021

Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data.

Lancet Digit Health 2019 12 21;1(8):e393-e402. Epub 2019 Oct 21.

The Familial Hypercholesterolemia Foundation, Pasadena, CA, USA; Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA; Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA; Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA.

Background: Cardiovascular outcomes for people with familial hypercholesterolaemia can be improved with diagnosis and medical management. However, 90% of individuals with familial hypercholesterolaemia remain undiagnosed in the USA. We aimed to accelerate early diagnosis and timely intervention for more than 1·3 million undiagnosed individuals with familial hypercholesterolaemia at high risk for early heart attacks and strokes by applying machine learning to large health-care encounter datasets.

Methods: We trained the FIND FH machine learning model using deidentified health-care encounter data, including procedure and diagnostic codes, prescriptions, and laboratory findings, from 939 clinically diagnosed individuals with familial hypercholesterolaemia (395 of whom had a molecular diagnosis) and 83 136 individuals presumed free of familial hypercholesterolaemia, sampled from four US institutions. The model was then applied to a national health-care encounter database (170 million individuals) and an integrated health-care delivery system dataset (174 000 individuals). Individuals used in model training and those evaluated by the model were required to have at least one cardiovascular disease risk factor (eg, hypertension, hypercholesterolaemia, or hyperlipidemia). A Health Insurance Portability and Accountability Act of 1996-compliant programme was developed to allow providers to receive identification of individuals likely to have familial hypercholesterolaemia in their practice.

Findings: Using a model with a measured precision (positive predictive value) of 0·85, recall (sensitivity) of 0·45, area under the precision-recall curve of 0·55, and area under the receiver operating characteristic curve of 0·89, we flagged 1 331 759 of 170 416 201 patients in the national database and 866 of 173 733 individuals in the health-care delivery system dataset as likely to have familial hypercholesterolaemia. Familial hypercholesterolaemia experts reviewed a sample of flagged individuals (45 from the national database and 103 from the health-care delivery system dataset) and applied clinical familial hypercholesterolaemia diagnostic criteria. Of those reviewed, 87% (95% Cl 73-100) in the national database and 77% (68-86) in the health-care delivery system dataset were categorised as having a high enough clinical suspicion of familial hypercholesterolaemia to warrant guideline-based clinical evaluation and treatment.

Interpretation: The FIND FH model successfully scans large, diverse, and disparate health-care encounter databases to identify individuals with familial hypercholesterolaemia.

Funding: The FH Foundation funded this study. Support was received from Amgen, Sanofi, and Regeneron.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/S2589-7500(19)30150-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8086528PMC
December 2019

Deep phenotyping of 34,128 adult patients hospitalised with COVID-19 in an international network study.

Nat Commun 2020 10 6;11(1):5009. Epub 2020 Oct 6.

Clinical Pharmacology Unit, Zealand University Hospital, Køge, Denmark.

Comorbid conditions appear to be common among individuals hospitalised with coronavirus disease 2019 (COVID-19) but estimates of prevalence vary and little is known about the prior medication use of patients. Here, we describe the characteristics of adults hospitalised with COVID-19 and compare them with influenza patients. We include 34,128 (US: 8362, South Korea: 7341, Spain: 18,425) COVID-19 patients, summarising between 4811 and 11,643 unique aggregate characteristics. COVID-19 patients have been majority male in the US and Spain, but predominantly female in South Korea. Age profiles vary across data sources. Compared to 84,585 individuals hospitalised with influenza in 2014-19, COVID-19 patients have more typically been male, younger, and with fewer comorbidities and lower medication use. While protecting groups vulnerable to influenza is likely a useful starting point in the response to COVID-19, strategies will likely need to be broadened to reflect the particular characteristics of individuals being hospitalised with COVID-19.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-18849-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7538555PMC
October 2020

Risk of hydroxychloroquine alone and in combination with azithromycin in the treatment of rheumatoid arthritis: a multinational, retrospective study.

Lancet Rheumatol 2020 Nov 21;2(11):e698-e711. Epub 2020 Aug 21.

Janssen Research and Development, Titusville, NJ, USA.

Background: Hydroxychloroquine, a drug commonly used in the treatment of rheumatoid arthritis, has received much negative publicity for adverse events associated with its authorisation for emergency use to treat patients with COVID-19 pneumonia. We studied the safety of hydroxychloroquine, alone and in combination with azithromycin, to determine the risk associated with its use in routine care in patients with rheumatoid arthritis.

Methods: In this multinational, retrospective study, new user cohort studies in patients with rheumatoid arthritis aged 18 years or older and initiating hydroxychloroquine were compared with those initiating sulfasalazine and followed up over 30 days, with 16 severe adverse events studied. Self-controlled case series were done to further establish safety in wider populations, and included all users of hydroxychloroquine regardless of rheumatoid arthritis status or indication. Separately, severe adverse events associated with hydroxychloroquine plus azithromycin (compared with hydroxychloroquine plus amoxicillin) were studied. Data comprised 14 sources of claims data or electronic medical records from Germany, Japan, the Netherlands, Spain, the UK, and the USA. Propensity score stratification and calibration using negative control outcomes were used to address confounding. Cox models were fitted to estimate calibrated hazard ratios (HRs) according to drug use. Estimates were pooled where the value was less than 0·4.

Findings: The study included 956 374 users of hydroxychloroquine, 310 350 users of sulfasalazine, 323 122 users of hydroxychloroquine plus azithromycin, and 351 956 users of hydroxychloroquine plus amoxicillin. No excess risk of severe adverse events was identified when 30-day hydroxychloroquine and sulfasalazine use were compared. Self-controlled case series confirmed these findings. However, long-term use of hydroxychloroquine appeared to be associated with increased cardiovascular mortality (calibrated HR 1·65 [95% CI 1·12-2·44]). Addition of azithromycin appeared to be associated with an increased risk of 30-day cardiovascular mortality (calibrated HR 2·19 [95% CI 1·22-3·95]), chest pain or angina (1·15 [1·05-1·26]), and heart failure (1·22 [1·02-1·45]).

Interpretation: Hydroxychloroquine treatment appears to have no increased risk in the short term among patients with rheumatoid arthritis, but in the long term it appears to be associated with excess cardiovascular mortality. The addition of azithromycin increases the risk of heart failure and cardiovascular mortality even in the short term. We call for careful consideration of the benefit-risk trade-off when counselling those on hydroxychloroquine treatment.

Funding: National Institute for Health Research (NIHR) Oxford Biomedical Research Centre, NIHR Senior Research Fellowship programme, US National Institutes of Health, US Department of Veterans Affairs, Janssen Research and Development, IQVIA, Korea Health Industry Development Institute through the Ministry of Health and Welfare Republic of Korea, Versus Arthritis, UK Medical Research Council Doctoral Training Partnership, Foundation Alfonso Martin Escudero, Innovation Fund Denmark, Novo Nordisk Foundation, Singapore Ministry of Health's National Medical Research Council Open Fund Large Collaborative Grant, VINCI, Innovative Medicines Initiative 2 Joint Undertaking, EU's Horizon 2020 research and innovation programme, and European Federation of Pharmaceutical Industries and Associations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/S2665-9913(20)30276-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7442425PMC
November 2020

Clinical decision support tool for phototherapy initiation in preterm infants.

J Perinatol 2020 10 13;40(10):1518-1523. Epub 2020 Aug 13.

Division of Neonatal-Perinatal Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA.

Objective: Adherence to guidelines for phototherapy initiation in preterm infants was 39% in our academic NICU (61% of phototherapy was initiated at total bilirubin (TB) levels below recommended thresholds). We hypothesized that adoption of an electronic health record integrated clinical decision support (CDS) tool would improve adherence to phototherapy guidelines.

Study Design: We developed and implemented Premie BiliRecs (PBR), a novel CDS tool for phototherapy initiation in preterm infants from 27 through 34 weeks postmenstrual age. The primary outcome measure was the proportion of phototherapy initiation events consistent with recommended TB thresholds.

Result: Following the implementation of PBR, adherence to guidelines for phototherapy initiation in preterm infants increased to 69.8% (p < 0.001), an improvement of 77%. There was no increase in the incidence of severe hyperbilirubinemia nor exchange transfusions.

Conclusion: The adoption of PBR was associated with improved adherence to phototherapy guidelines in preterm infants without increased adverse events.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41372-020-00782-0DOI Listing
October 2020

Social Media Mining Toolkit (SMMT).

Genomics Inform 2020 Jun 15;18(2):e16. Epub 2020 Jun 15.

Georgia State University, Atlanta, GA 30303, USA.

There has been a dramatic increase in the popularity of utilizing social media data for research purposes within the biomedical community. In PubMed alone, there have been nearly 2,500 publication entries since 2014 that deal with analyzing social media data from Twitter and Reddit. However, the vast majority of those works do not share their code or data for replicating their studies. With minimal exceptions, the few that do, place the burden on the researcher to figure out how to fetch the data, how to best format their data, and how to create automatic and manual annotations on the acquired data. In order to address this pressing issue, we introduce the Social Media Mining Toolkit (SMMT), a suite of tools aimed to encapsulate the cumbersome details of acquiring, preprocessing, annotating and standardizing social media data. The purpose of our toolkit is for researchers to focus on answering research questions, and not the technical aspects of using social media data. By using a standard toolkit, researchers will be able to acquire, use, and release data in a consistent way that is transparent for everybody using the toolkit, hence, simplifying research reproducibility and accessibility in the social media domain.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.5808/GI.2020.18.2.e16DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7362951PMC
June 2020

A large-scale COVID-19 Twitter chatter dataset for open scientific research -- an international collaboration.

ArXiv 2020 Apr 7. Epub 2020 Apr 7.

As the COVID-19 pandemic continues its march around the world, an unprecedented amount of open data is being generated for genetics and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated in the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique world-wide event into biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 152 million tweets, growing daily, related to COVID-19 chatter generated from January 1st to April 4th at the time of writing. This open dataset will allow researchers to conduct a number of research projects relating to the emotional and mental responses to social distancing measures, the identification of sources of misinformation, and the stratified measurement of sentiment towards the pandemic in near real time.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7280901PMC
April 2020

An international characterisation of patients hospitalised with COVID-19 and a comparison with those previously hospitalised with influenza.

medRxiv 2020 Apr 25. Epub 2020 Apr 25.

Science Policy and Research, National Institute for Health and Care Excellence, UK.

Background: To better understand the profile of individuals with severe coronavirus disease 2019 (COVID-19), we characterised individuals hospitalised with COVID-19 and compared them to individuals previously hospitalised with influenza.

Methods: We report the characteristics (demographics, prior conditions and medication use) of patients hospitalised with COVID-19 between December 2019 and April 2020 in the US (Columbia University Irving Medical Center [CUIMC], STAnford Medicine Research data Repository [STARR-OMOP], and the Department of Veterans Affairs [VA OMOP]) and Health Insurance Review & Assessment [HIRA] of South Korea. Patients hospitalised with COVID-19 were compared with patients previously hospitalised with influenza in 2014-19.

Results: 6,806 (US: 1,634, South Korea: 5,172) individuals hospitalised with COVID-19 were included. Patients in the US were majority male (VA OMOP: 94%, STARR-OMOP: 57%, CUIMC: 52%), but were majority female in HIRA (56%). Age profiles varied across data sources. Prevalence of asthma ranged from 7% to 14%, diabetes from 18% to 43%, and hypertensive disorder from 22% to 70% across data sources, while between 9% and 39% were taking drugs acting on the renin-angiotensin system in the 30 days prior to their hospitalisation. Compared to 52,422 individuals hospitalised with influenza, patients admitted with COVID-19 were more likely male, younger, and, in the US, had fewer comorbidities and lower medication use.

Conclusions: Rates of comorbidities and medication use are high among individuals hospitalised with COVID-19. However, COVID-19 patients are more likely to be male and appear to be younger and, in the US, generally healthier than those typically admitted with influenza.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2020.04.22.20074336DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7239064PMC
April 2020

Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network.

J Am Med Inform Assoc 2020 06;27(6):877-883

Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.

Objective: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites within the Observational Health Data Sciences and Informatics (OHDSI) network.

Materials And Methods: We constructed classifiers using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE) R-package, an open-source framework for learning phenotype classifiers using datasets in the Observational Medical Outcomes Partnership Common Data Model. We labeled training data based on the presence of multiple mentions of disease-specific codes. Performance was evaluated on cohorts derived using rule-based definitions and real-world disease prevalence. Classifiers were developed and evaluated across 3 medical centers, including 1 international site.

Results: Compared to the multiple mentions labeling heuristic, classifiers showed a mean recall boost of 0.43 with a mean precision loss of 0.17. Performance decreased slightly when classifiers were shared across medical centers, with mean recall and precision decreasing by 0.08 and 0.01, respectively, at a site within the USA, and by 0.18 and 0.10, respectively, at an international site.

Discussion And Conclusion: We demonstrate a high-throughput pipeline for constructing and sharing phenotype classifiers across sites within the OHDSI network using APHRODITE. Classifiers exhibit good portability between sites within the USA, however limited portability internationally, indicating that classifier generalizability may have geographic limitations, and, consequently, sharing the classifier-building recipe, rather than the pretrained classifiers, may be more useful for facilitating collaborative observational research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocaa032DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7309227PMC
June 2020

Fully connecting the Observational Health Data Science and Informatics (OHDSI) initiative with the world of linked open data.

Authors:
Juan M Banda

Genomics Inform 2019 Jun 11;17(2):e13. Epub 2019 Jun 11.

Panacea Laboratory, Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA.

The usage of controlled biomedical vocabularies is the cornerstone that enables seamless interoperability when using a common data model across multiple data sites. The Observational Health Data Science and Informatics (OHDSI) initiative combines over 100 controlled vocabularies into its own. However, the OHDSI vocabulary is limited in the sense that it combines multiple terminologies and does not provide a direct way to link them outside of their own self-contained scope. This issue makes the tasks of enriching feature sets by using external resources extremely difficult. In order to address these shortcomings, we have created a linked data version of the OHDSI vocabulary, connecting it with already established linked resources like bioportal, bio2rdf, etc. with the ultimate purpose of enabling the interoperability of resources previously foreign to the OHDSI universe.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.5808/GI.2019.17.2.e13DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6808628PMC
June 2019

Finding missed cases of familial hypercholesterolemia in health systems using machine learning.

NPJ Digit Med 2019 11;2:23. Epub 2019 Apr 11.

3Cardiovascular Medicine and Cardiovascular Institute, Stanford University, Stanford, CA USA.

Familial hypercholesterolemia (FH) is an underdiagnosed dominant genetic condition affecting approximately 0.4% of the population and has up to a 20-fold increased risk of coronary artery disease if untreated. Simple screening strategies have false positive rates greater than 95%. As part of the FH Foundation's FIND FH initiative, we developed a classifier to identify potential FH patients using electronic health record (EHR) data at Stanford Health Care. We trained a random forest classifier using data from known patients ( = 197) and matched non-cases ( = 6590). Our classifier obtained a positive predictive value (PPV) of 0.88 and sensitivity of 0.75 on a held-out test-set. We evaluated the accuracy of the classifier's predictions by chart review of 100 patients at risk of FH not included in the original dataset. The classifier correctly flagged 84% of patients at the highest probability threshold, with decreasing performance as the threshold lowers. In external validation on 466 FH patients (236 with genetically proven FH) and 5000 matched non-cases from the Geisinger Healthcare System our FH classifier achieved a PPV of 0.85. Our EHR-derived FH classifier is effective in finding candidate patients for further FH screening. Such machine learning guided strategies can lead to effective identification of the highest risk patients for enhanced management strategies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41746-019-0101-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6550268PMC
April 2019

Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models.

Annu Rev Biomed Data Sci 2018 Jul 23;1:53-68. Epub 2018 May 23.

Stanford Center for Biomedical Informatics Research, Stanford, California 94305, USA.

With the widespread adoption of electronic health records (EHRs), large repositories of structured and unstructured patient data are becoming available to conduct observational studies. Finding patients with specific conditions or outcomes, known as phenotyping, is one of the most fundamental research problems encountered when using these new EHR data. Phenotyping forms the basis of translational research, comparative effectiveness studies, clinical decision support, and population health analyses using routinely collected EHR data. We review the evolution of electronic phenotyping, from the early rule-based methods to the cutting edge of supervised and unsupervised machine learning models. We aim to cover the most influential papers in commensurate detail, with a focus on both methodology and implementation. Finally, future research directions are explored.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1146/annurev-biodatasci-080917-013315DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6583807PMC
July 2018

Identifying Cases of Metastatic Prostate Cancer Using Machine Learning on Electronic Health Records.

AMIA Annu Symp Proc 2018 5;2018:1498-1504. Epub 2018 Dec 5.

Department of Biomedical Informatics, Stanford School of Medicine, CA.

Cancer stage is rarely captured in structured form in the electronic health record (EHR). We evaluate the performance of a classifier, trained on structured EHR data, in identifying prostate cancer patients with metastatic disease. Using EHR data for a cohort of 5,861 prostate cancer patients mapped to the Observational Health Data Sciences and Informatics (OHDSI) data model, we constructed feature vectors containing frequency counts of conditions, procedures, medications, observations and laboratory values. Staging information from the California Cancer Registry was used as the ground-truth. For identifying patients with metastatic disease, a random forest model achieved precision and recall of 0.90, 0.40 using data within 12 months of diagnosis. This compared to precision 0.33, recall 0.54 for an ICD code-based query. High-precision classifiers using hundreds of structured data elements significantly outperform ICD queries, and may assist in identifying cohorts for observational research or clinical trial matching.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6371284PMC
January 2020

Scalable Electronic Phenotyping For Studying Patient Comorbidities.

AMIA Annu Symp Proc 2018 5;2018:740-749. Epub 2018 Dec 5.

Biomedical Informatics Training Program, Stanford University, Stanford, CA.

Over 75 million Americans have multiple concurrent chronic conditions and medical decision making for these patients is mostly based on retrospective cohort studies. Current methods to generate cohorts of patients with comorbidities are neither scalable nor generalizable. We propose a supervised machine learning algorithm for learning comorbidity phenotypes without requiring manually created training sets. First, we generated myocardial infarction (MI) and type-2 diabetes (T2DM) patient cohorts using ICD9-based imperfectly labeled samples upon which LASSO logistic regression models were trained. Second, we assessed the effects of training sample size, inclusion of physician input, and inclusion of clinical text features on model performance. Using ICD9 codes as our labeling heuristic, we achieved comparable performance to models created using keywords as labeling heuristic. We found that expert input and higher training sample sizes could compensate for the lack of clinical text derived features. However, our best performing model included clinical text as features with a large training sample size.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6371288PMC
November 2019

Association of Hemoglobin A1c Levels With Use of Sulfonylureas, Dipeptidyl Peptidase 4 Inhibitors, and Thiazolidinediones in Patients With Type 2 Diabetes Treated With Metformin: Analysis From the Observational Health Data Sciences and Informatics Initiative.

JAMA Netw Open 2018 08 3;1(4):e181755. Epub 2018 Aug 3.

Observational Health Data Sciences and Informatics, New York, New York.

Importance: Consensus around an efficient second-line treatment option for type 2 diabetes (T2D) remains ambiguous. The availability of electronic medical records and insurance claims data, which capture routine medical practice, accessed via the Observational Health Data Sciences and Informatics network presents an opportunity to generate evidence for the effectiveness of second-line treatments.

Objective: To identify which drug classes among sulfonylureas, dipeptidyl peptidase 4 (DPP-4) inhibitors, and thiazolidinediones are associated with reduced hemoglobin A1c (HbA1c) levels and lower risk of myocardial infarction, kidney disorders, and eye disorders in patients with T2D treated with metformin as a first-line therapy.

Design, Setting, And Participants: Three retrospective, propensity-matched, new-user cohort studies with replication across 8 sites were performed from 1975 to 2017. Medical data of 246 558 805 patients from multiple countries from the Observational Health Data Sciences and Informatics (OHDSI) initiative were included and medical data sets were transformed into a unified common data model, with analysis done using open-source analytical tools. Participants included patients with T2D receiving metformin with at least 1 prior HbA1c laboratory test who were then prescribed either sulfonylureas, DPP-4 inhibitors, or thiazolidinediones. Data analysis was conducted from 2015 to 2018.

Exposures: Treatment with sulfonylureas, DPP-4 inhibitors, or thiazolidinediones starting at least 90 days after the initial prescription of metformin.

Main Outcomes And Measures: The primary outcome is the first observation of the reduction of HbA1c level to 7% of total hemoglobin or less after prescription of a second-line drug. Secondary outcomes are myocardial infarction, kidney disorder, and eye disorder after prescription of a second-line drug.

Results: A total of 246 558 805 patients (126 977 785 women [51.5%]) were analyzed. Effectiveness of sulfonylureas, DPP-4 inhibitors, and thiazolidinediones prescribed after metformin to lower HbA1c level to 7% or less of total hemoglobin remained indistinguishable in patients with T2D. Patients treated with sulfonylureas compared with DPP-4 inhibitors had a small increased consensus hazard ratio of myocardial infarction (1.12; 95% CI, 1.02-1.24) and eye disorders (1.15; 95% CI, 1.11-1.19) in the meta-analysis. Hazard of observing kidney disorders after treatment with sulfonylureas, DPP-4 inhibitors, or thiazolidinediones was equally likely.

Conclusions And Relevance: The examined drug classes did not differ in lowering HbA1c and in hazards of kidney disorders in patients with T2D treated with metformin as a first-line therapy. Sulfonylureas had a small, higher observed hazard of myocardial infarction and eye disorders compared with DPP-4 inhibitors in the meta-analysis. The OHDSI collaborative network can be used to conduct a large international study examining the effectiveness of second-line treatment choices made in clinical management of T2D.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1001/jamanetworkopen.2018.1755DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6324274PMC
August 2018

Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network.

AMIA Jt Summits Transl Sci Proc 2017 26;2017:48-57. Epub 2017 Jul 26.

Stanford Univ., Stanford, CA.

The widespread usage of electronic health records (EHRs) for clinical research has produced multiple electronic phenotyping approaches. Methods for electronic phenotyping range from those needing extensive specialized medical expert supervision to those based on semi-supervised learning techniques. We present Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE), an R- package phenotyping framework that combines noisy labeling and anchor learning. APHRODITE makes these cutting-edge phenotyping approaches available for use with the Observational Health Data Sciences and Informatics (OHDSI) data model for standardized and scalable deployment. APHRODITE uses EHR data available in the OHDSI Common Data Model to build classification models for electronic phenotyping. We demonstrate the utility of APHRODITE by comparing its performance versus traditional rule-based phenotyping approaches. Finally, the resulting phenotype models and model construction workflows built with APHRODITE can be shared between multiple OHDSI sites. Such sharing allows their application on large and diverse patient populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5543379PMC
July 2017

A large-scale solar dynamics observatory image dataset for computer vision applications.

Sci Data 2017 25;4:170096. Epub 2017 Jul 25.

Department of Computer Science, Georgia State University Atlanta 30302-3987, USA.

The National Aeronautics Space Agency (NASA) Solar Dynamics Observatory (SDO) mission has given us unprecedented insight into the Sun's activity. By capturing approximately 70,000 images a day, this mission has created one of the richest and biggest repositories of solar image data available to mankind. With such massive amounts of information, researchers have been able to produce great advances in detecting solar events. In this resource, we compile SDO solar data into a single repository in order to provide the computer vision community with a standardized and curated large-scale dataset of several hundred thousand solar events found on high resolution solar images. This publicly available resource, along with the generation source code, will accelerate computer vision research on NASA's solar image data by reducing the amount of time spent performing data acquisition and curation from the multiple sources we have compiled. By improving the quality of the data with thorough curation, we anticipate a wider adoption and interest from the computer vision to the solar physics community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.96DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5525637PMC
March 2018

Risk of angioedema associated with levetiracetam compared with phenytoin: Findings of the observational health data sciences and informatics research network.

Epilepsia 2017 08 6;58(8):e101-e106. Epub 2017 Jul 6.

Observational Health Data Sciences and Informatics (OHDSI) Collaborative.

Recent adverse event reports have raised the question of increased angioedema risk associated with exposure to levetiracetam. To help address this question, the Observational Health Data Sciences and Informatics research network conducted a retrospective observational new-user cohort study of seizure patients exposed to levetiracetam (n = 276,665) across 10 databases. With phenytoin users (n = 74,682) as a comparator group, propensity score-matching was conducted and hazard ratios computed for angioedema events by per-protocol and intent-to-treat analyses. Angioedema events were rare in both the levetiracetam and phenytoin groups (54 vs. 71 in per-protocol and 248 vs. 435 in intent-to-treat). No significant increase in angioedema risk with levetiracetam was seen in any individual database (hazard ratios ranging from 0.43 to 1.31). Meta-analysis showed a summary hazard ratio of 0.72 (95% confidence interval [CI] 0.39-1.31) and 0.64 (95% CI 0.52-0.79) for the per-protocol and intent-to-treat analyses, respectively. The results suggest that levetiracetam has the same or lower risk for angioedema than phenytoin, which does not currently carry a labeled warning for angioedema. Further studies are warranted to evaluate angioedema risk across all antiepileptic drugs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/epi.13828DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6632067PMC
August 2017

Characterizing treatment pathways at scale using the OHDSI network.

Proc Natl Acad Sci U S A 2016 07 6;113(27):7329-36. Epub 2016 Jun 6.

Observational Health Data Sciences and Informatics, New York, NY 10032; Department of Statistics, Columbia University, New York, NY 10027.

Observational research promises to complement experimental research by providing large, diverse populations that would be infeasible for an experiment. Observational research can test its own clinical hypotheses, and observational studies also can contribute to the design of experiments and inform the generalizability of experimental research. Understanding the diversity of populations and the variance in care is one component. In this study, the Observational Health Data Sciences and Informatics (OHDSI) collaboration created an international data network with 11 data sources from four countries, including electronic health records and administrative claims data on 250 million patients. All data were mapped to common data standards, patient privacy was maintained by using a distributed model, and results were aggregated centrally. Treatment pathways were elucidated for type 2 diabetes mellitus, hypertension, and depression. The pathways revealed that the world is moving toward more consistent therapy over time across diseases and across locations, but significant heterogeneity remains among sources, pointing to challenges in generalizing clinical trial results. Diabetes favored a single first-line medication, metformin, to a much greater extent than hypertension or depression. About 10% of diabetes and depression patients and almost 25% of hypertension patients followed a treatment pathway that was unique within the cohort. Aside from factors such as sample size and underlying population (academic medical center versus general population), electronic health records data and administrative claims data revealed similar results. Large-scale international observational research is feasible.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1510502113DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4941483PMC
July 2016

A curated and standardized adverse drug event resource to accelerate drug safety research.

Sci Data 2016 05 10;3:160026. Epub 2016 May 10.

Center for Biomedical Informatics Research, Stanford University, Stanford, California 94305, USA.

Identification of adverse drug reactions (ADRs) during the post-marketing phase is one of the most important goals of drug safety surveillance. Spontaneous reporting systems (SRS) data, which are the mainstay of traditional drug safety surveillance, are used for hypothesis generation and to validate the newer approaches. The publicly available US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) data requires substantial curation before they can be used appropriately, and applying different strategies for data cleaning and normalization can have material impact on analysis results. We provide a curated and standardized version of FAERS removing duplicate case records, applying standardized vocabularies with drug names mapped to RxNorm concepts and outcomes mapped to SNOMED-CT concepts, and pre-computed summary statistics about drug-outcome relationships for general consumption. This publicly available resource, along with the source code, will accelerate drug safety research by reducing the amount of time spent performing data management on the source FAERS reports, improving the quality of the underlying data, and enabling standardized analyses using common vocabularies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2016.26DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4872271PMC
May 2016

Learning statistical models of phenotypes using noisy labeled training data.

J Am Med Inform Assoc 2016 11 12;23(6):1166-1173. Epub 2016 May 12.

Stanford Center for Biomedical Informatics Research, Stanford University, Stanford CA 94305-5479, USA.

Objective: Traditionally, patient groups with a phenotype are selected through rule-based definitions whose creation and validation are time-consuming. Machine learning approaches to electronic phenotyping are limited by the paucity of labeled training datasets. We demonstrate the feasibility of utilizing semi-automatically labeled training sets to create phenotype models via machine learning, using a comprehensive representation of the patient medical record.

Methods: We use a list of keywords specific to the phenotype of interest to generate noisy labeled training data. We train L1 penalized logistic regression models for a chronic and an acute disease and evaluate the performance of the models against a gold standard.

Results: Our models for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.90, 0.89, and 0.86, 0.89, respectively. Local implementations of the previously validated rule-based definitions for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.96, 0.92 and 0.84, 0.87, respectively.We have demonstrated feasibility of learning phenotype models using imperfectly labeled data for a chronic and acute phenotype. Further research in feature engineering and in specification of the keyword list can improve the performance of the models and the scalability of the approach.

Conclusions: Our method provides an alternative to manual labeling for creating training sets for statistical models of phenotypes. Such an approach can accelerate research with large observational healthcare datasets and may also be used to create local phenotype models.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocw028DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5070523PMC
November 2016

Feasibility of Prioritizing Drug-Drug-Event Associations Found in Electronic Health Records.

Drug Saf 2016 Jan;39(1):45-57

Stanford Center for Biomedical Informatics Research, 1265 Welch Road, MSOB, Stanford, CA, 94305, USA.

Background And Objective: Several studies have demonstrated the ability to detect adverse events potentially related to multiple drug exposure via data mining. However, the number of putative associations produced by such computational approaches is typically large, making experimental validation difficult. We theorized that those potential associations for which there is evidence from multiple complementary sources are more likely to be true, and explored this idea using a published database of drug-drug-adverse event associations derived from electronic health records (EHRs).

Methods: We prioritized drug-drug-event associations derived from EHRs using four sources of information: (1) public databases, (2) sources of spontaneous reports, (3) literature, and (4) non-EHR drug-drug interaction (DDI) prediction methods. After pre-filtering the associations by removing those found in public databases, we devised a ranking for associations based on the support from the remaining sources, and evaluated the results of this rank-based prioritization.

Results: We collected information for 5983 putative EHR-derived drug-drug-event associations involving 345 drugs and ten adverse events from four data sources and four prediction methods. Only seven drug-drug-event associations (<0.5 %) had support from the majority of evidence sources, and about one third (1777) had support from at least one of the evidence sources.

Conclusions: Our proof-of-concept method for scoring putative drug-drug-event associations from EHRs offers a systematic and reproducible way of prioritizing associations for further study. Our findings also quantify the agreement (or lack thereof) among complementary sources of evidence for drug-drug-event associations and highlight the challenges of developing a robust approach for prioritizing signals of these associations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s40264-015-0352-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4712252PMC
January 2016