Publications by authors named "Anoop D Shah"

30 Publications

  • Page 1 of 1

An informatics consult approach for generating clinical evidence for treatment decisions.

BMC Med Inform Decis Mak 2021 10 12;21(1):281. Epub 2021 Oct 12.

Institute of Health Informatics, University College London, London, UK.

Background: An Informatics Consult has been proposed in which clinicians request novel evidence from large scale health data resources, tailored to the treatment of a specific patient. However, the availability of such consultations is lacking. We seek to provide an Informatics Consult for a situation where a treatment indication and contraindication coexist in the same patient, i.e., anti-coagulation use for stroke prevention in a patient with both atrial fibrillation (AF) and liver cirrhosis.

Methods: We examined four sources of evidence for the effect of warfarin on stroke risk or all-cause mortality from: (1) randomised controlled trials (RCTs), (2) meta-analysis of prior observational studies, (3) trial emulation (using population electronic health records (N = 3,854,710) and (4) genetic evidence (Mendelian randomisation). We developed prototype forms to request an Informatics Consult and return of results in electronic health record systems.

Results: We found 0 RCT reports and 0 trials recruiting for patients with AF and cirrhosis. We found broad concordance across the three new sources of evidence we generated. Meta-analysis of prior observational studies showed that warfarin use was associated with lower stroke risk (hazard ratio [HR] = 0.71, CI 0.39-1.29). In a target trial emulation, warfarin was associated with lower all-cause mortality (HR = 0.61, CI 0.49-0.76) and ischaemic stroke (HR = 0.27, CI 0.08-0.91). Mendelian randomisation served as a drug target validation where we found that lower levels of vitamin K1 (warfarin is a vitamin K1 antagonist) are associated with lower stroke risk. A pilot survey with an independent sample of 34 clinicians revealed that 85% of clinicians found information on prognosis useful and that 79% thought that they should have access to the Informatics Consult as a service within their healthcare systems. We identified candidate steps for automation to scale evidence generation and to accelerate the return of results.

Conclusion: We performed a proof-of-concept Informatics Consult for evidence generation, which may inform treatment decisions in situations where there is dearth of randomised trials. Patients are surprised to know that their clinicians are currently not able to learn in clinic from data on 'patients like me'. We identify the key challenges in offering such an Informatics Consult as a service.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12911-021-01638-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8506488PMC
October 2021

Descriptors of Sepsis Using the Sepsis-3 Criteria: A Cohort Study in Critical Care Units Within the U.K. National Institute for Health Research Critical Care Health Informatics Collaborative.

Crit Care Med 2021 11;49(11):1883-1894

University College London Hospitals NHS Foundation Trust, London, United Kingdom.

Objectives: To describe the epidemiology of sepsis in critical care by applying the Sepsis-3 criteria to electronic health records.

Design: Retrospective cohort study using electronic health records.

Setting: Ten ICUs from four U.K. National Health Service hospital trusts contributing to the National Institute for Health Research Critical Care Health Informatics Collaborative.

Patients: A total of 28,456 critical care admissions (14,332 emergency medical, 4,585 emergency surgical, and 9,539 elective surgical).

Measurements And Main Results: Twenty-nine thousand three hundred forty-three episodes of clinical deterioration were identified with a rise in Sequential Organ Failure Assessment score of at least 2 points, of which 14,869 (50.7%) were associated with antibiotic escalation and thereby met the Sepsis-3 criteria for sepsis. A total of 4,100 episodes of sepsis (27.6%) were associated with vasopressor use and lactate greater than 2.0 mmol/L, and therefore met the Sepsis-3 criteria for septic shock. ICU mortality by source of sepsis was highest for ICU-acquired sepsis (23.7%; 95% CI, 21.9-25.6%), followed by hospital-acquired sepsis (18.6%; 95% CI, 17.5-19.9%), and community-acquired sepsis (12.9%; 95% CI, 12.1-13.6%) (p for comparison less than 0.0001).

Conclusions: We successfully operationalized the Sepsis-3 criteria to an electronic health record dataset to describe the characteristics of critical care patients with sepsis. This may facilitate sepsis research using electronic health record data at scale without relying on human coding.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CCM.0000000000005169DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8508729PMC
November 2021

Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit.

Artif Intell Med 2021 07 1;117:102083. Epub 2021 May 1.

Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK. Electronic address:

Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of information extraction (IE) technologies to enable clinical analysis. We present the open source Medical Concept Annotation Toolkit (MedCAT) that provides: (a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; (b) a feature-rich annotation interface for customizing and training IE models; and (c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets (F1:0.448-0.738 vs 0.429-0.650). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over ∼8.8B words from ∼17M clinical records and further fine-tuning with ∼6K clinician annotated examples. We show strong transferability (F1 > 0.94) between hospitals, datasets and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.artmed.2021.102083DOI Listing
July 2021

Data gaps in electronic health record (EHR) systems: An audit of problem list completeness during the COVID-19 pandemic.

Int J Med Inform 2021 06 1;150:104452. Epub 2021 Apr 1.

Clinical and Research Informatics Unit, UCL/UCLH NIHR Biomedical Research Centre, UCL Institute of Health Informatics, 222 Euston Road, London, NW1 2DA, UK. Electronic address:

Objective: To evaluate the completeness of diagnosis recording in problem lists in a hospital electronic health record (EHR) system during the COVID-19 pandemic.

Design: Retrospective chart review with manual review of free text electronic case notes.

Setting: Major teaching hospital trust in London, one year after the launch of a comprehensive EHR system (Epic), during the first peak of the COVID-19 pandemic in the UK.

Participants: 516 patients with suspected or confirmed COVID-19.

Main Outcome Measures: Percentage of diagnoses already included in the structured problem list.

Results: Prior to review, these patients had a combined total of 2841 diagnoses recorded in their EHR problem lists. 1722 additional diagnoses were identified, increasing the mean number of recorded problems per patient from 5.51 to 8.84. The overall percentage of diagnoses originally included in the problem list was 62.3% (2841 / 4563, 95% confidence interval 60.8%, 63.7%).

Conclusions: Diagnoses and other clinical information stored in a structured way in electronic health records is extremely useful for supporting clinical decisions, improving patient care and enabling better research. However, recording of medical diagnoses on the structured problem list for inpatients is incomplete, with almost 40% of important diagnoses mentioned only in the free text notes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ijmedinf.2021.104452DOI Listing
June 2021

A semi-supervised approach for rapidly creating clinical biomarker phenotypes in the UK Biobank using different primary care EHR and clinical terminology systems.

JAMIA Open 2020 Dec 5;3(4):545-556. Epub 2020 Dec 5.

Institute of Health Informatics, University College London, London, UK.

Objectives: The UK Biobank (UKB) is making primary care electronic health records (EHRs) for 500 000 participants available for COVID-19-related research. Data are extracted from four sources, recorded using five clinical terminologies and stored in different schemas. The aims of our research were to: (a) develop a semi-supervised approach for bootstrapping EHR phenotyping algorithms in UKB EHR, and (b) to evaluate our approach by implementing and evaluating phenotypes for 31 common biomarkers.

Materials And Methods: We describe an algorithmic approach to phenotyping biomarkers in primary care EHR involving (a) bootstrapping definitions using existing phenotypes, (b) excluding generic, rare, or semantically distant terms, (c) forward-mapping terminology terms, (d) expert review, and (e) data extraction. We evaluated the phenotypes by assessing the ability to reproduce known epidemiological associations with all-cause mortality using Cox proportional hazards models.

Results: We created and evaluated phenotyping algorithms for 31 biomarkers many of which are directly related to COVID-19 complications, for example diabetes, cardiovascular disease, respiratory disease. Our algorithm identified 1651 Read v2 and Clinical Terms Version 3 terms and automatically excluded 1228 terms. Clinical review excluded 103 terms and included 44 terms, resulting in 364 terms for data extraction (sensitivity 0.89, specificity 0.92). We extracted 38 190 682 events and identified 220 978 participants with at least one biomarker measured.

Discussion And Conclusion: Bootstrapping phenotyping algorithms from similar EHR can potentially address pre-existing methodological concerns that undermine the outputs of biomarker discovery pipelines and provide research-quality phenotyping algorithms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamiaopen/ooaa047DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7717266PMC
December 2020

Invasive versus non-invasive management of older patients with non-ST elevation myocardial infarction (SENIOR-NSTEMI): a cohort study based on routine clinical data.

Lancet 2020 08;396(10251):623-634

National Institute for Health Research Imperial Biomedical Research Centre, Imperial College London and Imperial College Healthcare NHS Trust, London, UK. Electronic address:

Background: Previous trials suggest lower long-term risk of mortality after invasive rather than non-invasive management of patients with non-ST elevation myocardial infarction (NSTEMI), but the trials excluded very elderly patients. We aimed to estimate the effect of invasive versus non-invasive management within 3 days of peak troponin concentration on the survival of patients aged 80 years or older with NSTEMI.

Methods: Routine clinical data for this study were obtained from five collaborating hospitals hosting NIHR Biomedical Research Centres in the UK (all tertiary centres with emergency departments). Eligible patients were 80 years old or older when they underwent troponin measurements and were diagnosed with NSTEMI between 2010 (2008 for University College Hospital) and 2017. Propensity scores (patients' estimated probability of receiving invasive management) based on pretreatment variables were derived using logistic regression; patients with high probabilities of non-invasive or invasive management were excluded. Patients who died within 3 days of peak troponin concentration without receiving invasive management were assigned to the invasive or non-invasive management groups based on their propensity scores, to mitigate immortal time bias. We estimated mortality hazard ratios comparing invasive with non-invasive management, and compared the rate of hospital admissions for heart failure.

Findings: Of the 1976 patients with NSTEMI, 101 died within 3 days of their peak troponin concentration and 375 were excluded because of extreme propensity scores. The remaining 1500 patients had a median age of 86 (IQR 82-89) years of whom (845 [56%] received non-invasive management. During median follow-up of 3·0 (IQR 1·2-4·8) years, 613 (41%) patients died. The adjusted cumulative 5-year mortality was 36% in the invasive management group and 55% in the non-invasive management group (adjusted hazard ratio 0·68, 95% CI 0·55-0·84). Invasive management was associated with lower incidence of hospital admissions for heart failure (adjusted rate ratio compared with non-invasive management 0·67, 95% CI 0·48-0·93).

Interpretation: The survival advantage of invasive compared with non-invasive management appears to extend to patients with NSTEMI who are aged 80 years or older.

Funding: NIHR Imperial Biomedical Research Centre, as part of the NIHR Health Informatics Collaborative.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/S0140-6736(20)30930-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7456783PMC
August 2020

Prognostic significance of troponin level in 3121 patients presenting with atrial fibrillation (The NIHR Health Informatics Collaborative TROP-AF study).

J Am Heart Assoc 2020 04 26;9(7):e013684. Epub 2020 Mar 26.

NIHR Imperial Biomedical Research Centre Imperial College London and Imperial College Healthcare NHS Trust London United Kingdom.

Background Patients presenting with atrial fibrillation (AF) often undergo a blood test to measure troponin, but interpretation of the result is impeded by uncertainty about its clinical importance. We investigated the relationship between troponin level, coronary angiography, and all-cause mortality in real-world patients presenting with AF. Methods and Results We used National Institute of Health Research Health Informatics Collaborative data to identify patients admitted between 2010 and 2017 at 5 tertiary centers in the United Kingdom with a primary diagnosis of AF. Peak troponin results were scaled as multiples of the upper limit of normal. A total of 3121 patients were included in the analysis. Over a median follow-up of 1462 (interquartile range, 929-1975) days, there were 586 deaths (18.8%). The adjusted hazard ratio for mortality associated with a positive troponin (value above upper limit of normal) was 1.20 (95% CI, 1.01-1.43; <0.05). Higher troponin levels were associated with higher risk of mortality, reaching a maximum hazard ratio of 2.6 (95% CI, 1.9-3.4) at ≈250 multiples of the upper limit of normal. There was an exponential relationship between higher troponin levels and increased odds of coronary angiography. The mortality risk was 36% lower in patients undergoing coronary angiography than in those who did not (adjusted hazard ratio, 0.61; 95% CI, 0.42-0.89; =0.01). Conclusions Increased troponin was associated with increased risk of mortality in patients presenting with AF. The lower hazard ratio in patients undergoing invasive management raises the possibility that the clinical importance of troponin release in AF may be mediated by coronary artery disease, which may be responsive to revascularization.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1161/JAHA.119.013684DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7428631PMC
April 2020

Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-Automated Simulation Based on the LeoPARDS Trial.

IEEE J Biomed Health Inform 2020 10 9;24(10):2950-2959. Epub 2020 Mar 9.

Clinical trials often fail to recruit an adequate number of appropriate patients. Identifying eligible trial participants is resource-intensive when relying on manual review of clinical notes, particularly in critical care settings where the time window is short. Automated review of electronic health records (EHR) may help, but much of the information is in free text rather than a computable form. We applied natural language processing (NLP) to free text EHR data using the CogStack platform to simulate recruitment into the LeoPARDS study, a clinical trial aiming to reduce organ dysfunction in septic shock. We applied an algorithm to identify eligible patients using a moving 1-hour time window, and compared patients identified by our approach with those actually screened and recruited for the trial, for the time period that data were available. We manually reviewed records of a random sample of patients identified by the algorithm but not screened in the original trial. Our method identified 376 patients, including 34 patients with EHR data available who were actually recruited to LeoPARDS in our centre. The sensitivity of CogStack for identifying patients screened was 90% (95% CI 85%, 93%). Of the 203 patients identified by both manual screening and CogStack, the index date matched in 95 (47%) and CogStack was earlier in 94 (47%). In conclusion, analysis of EHR data using NLP could effectively replicate recruitment in a critical care trial, and identify some eligible patients at an earlier stage, potentially improving trial recruitment if implemented in real time.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/JBHI.2020.2977925DOI Listing
October 2020

Association of troponin level and age with mortality in 250 000 patients: cohort study across five UK acute care centres.

BMJ 2019 11 20;367:l6055. Epub 2019 Nov 20.

NIHR Imperial Biomedical Research Centre, Imperial College London and Imperial College Healthcare NHS Trust, Hammersmith Hospital, London W12 0HS, UK

Objective: To determine the relation between age and troponin level and its prognostic implication.

Design: Retrospective cohort study.

Setting: Five cardiovascular centres in the UK National Institute for Health Research Health Informatics Collaborative (UK-NIHR HIC).

Participants: 257 948 consecutive patients undergoing troponin testing for any clinical reason between 2010 and 2017.

Main Outcome Measure: All cause mortality.

Results: 257 948 patients had troponin measured during the study period. Analyses on troponin were performed using the peak troponin level, which was the highest troponin level measured during the patient's hospital stay. Troponin levels were standardised as a multiple of each laboratory's 99th centile of the upper limit of normal (ULN). During a median follow-up of 1198 days (interquartile range 514-1866 days), 55 850 (21.7%) deaths occurred. A positive troponin result (that is, higher than the upper limit of normal) signified a 3.2 higher mortality hazard (95% confidence interval 3.1 to 3.2) over three years. Mortality varied noticeably with age, with a hazard ratio of 10.6 (8.5 to 13.3) in 18-29 year olds and 1.5 (1.4 to 1.6) in those older than 90. A positive troponin result was associated with an approximately 15 percentage points higher absolute three year mortality across all age groups. The excess mortality with a positive troponin result was heavily concentrated in the first few weeks. Results were analysed using multivariable adjusted restricted cubic spline Cox regression. A direct relation was seen between troponin level and mortality in patients without acute coronary syndrome (ACS, n=120 049), whereas an inverted U shaped relation was found in patients with ACS (n=14 468), with a paradoxical decline in mortality at peak troponin levels >70×ULN. In the group with ACS, the inverted U shaped relation persisted after multivariable adjustment in those who were managed invasively; however, a direct positive relation was found between troponin level and mortality in patients managed non-invasively.

Conclusions: A positive troponin result was associated with a clinically important increased mortality, regardless of age, even if the level was only slightly above normal. The excess mortality with a raised troponin was heavily concentrated in the first few weeks.

Study Registration: ClinicalTrials.gov NCT03507309.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1136/bmj.l6055DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6865859PMC
November 2019

Bleeding in cardiac patients prescribed antithrombotic drugs: electronic health record phenotyping algorithms, incidence, trends and prognosis.

BMC Med 2019 11 20;17(1):206. Epub 2019 Nov 20.

Health Data Research UK, University College London, 222 Euston Road, London, NW1 2DA, UK.

Background: Clinical guidelines and public health authorities lack recommendations on scalable approaches to defining and monitoring the occurrence and severity of bleeding in populations prescribed antithrombotic therapy.

Methods: We examined linked primary care, hospital admission and death registry electronic health records (CALIBER 1998-2010, England) of patients with newly diagnosed atrial fibrillation, acute myocardial infarction, unstable angina or stable angina with the aim to develop algorithms for bleeding events. Using the developed bleeding phenotypes, Kaplan-Meier plots were used to estimate the incidence of bleeding events and we used Cox regression models to assess the prognosis for all-cause mortality, atherothrombotic events and further bleeding.

Results: We present electronic health record phenotyping algorithms for bleeding based on bleeding diagnosis in primary or hospital care, symptoms, transfusion, surgical procedures and haemoglobin values. In validation of the phenotype, we estimated a positive predictive value of 0.88 (95% CI 0.64, 0.99) for hospitalised bleeding. Amongst 128,815 patients, 27,259 (21.2%) had at least 1 bleeding event, with 5-year risks of bleeding of 29.1%, 21.9%, 25.3% and 23.4% following diagnoses of atrial fibrillation, acute myocardial infarction, unstable angina and stable angina, respectively. Rates of hospitalised bleeding per 1000 patients more than doubled from 1.02 (95% CI 0.83, 1.22) in January 1998 to 2.68 (95% CI 2.49, 2.88) in December 2009 coinciding with the increased rates of antiplatelet and vitamin K antagonist prescribing. Patients with hospitalised bleeding and primary care bleeding, with or without markers of severity, were at increased risk of all-cause mortality and atherothrombotic events compared to those with no bleeding. For example, the hazard ratio for all-cause mortality was 1.98 (95% CI 1.86, 2.11) for primary care bleeding with markers of severity and 1.99 (95% CI 1.92, 2.05) for hospitalised bleeding without markers of severity, compared to patients with no bleeding.

Conclusions: Electronic health record bleeding phenotyping algorithms offer a scalable approach to monitoring bleeding in the population. Incidence of bleeding has doubled in incidence since 1998, affects one in four cardiovascular disease patients, and is associated with poor prognosis. Efforts are required to tackle this iatrogenic epidemic.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12916-019-1438-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6864929PMC
November 2019

Natural language processing for disease phenotyping in UK primary care records for research: a pilot study in myocardial infarction and death.

J Biomed Semantics 2019 11 12;10(Suppl 1):20. Epub 2019 Nov 12.

Health Data Research UK London, University College London, 222 Euston Road, London, NW1 2DA, UK.

Background: Free text in electronic health records (EHR) may contain additional phenotypic information beyond structured (coded) information. For major health events - heart attack and death - there is a lack of studies evaluating the extent to which free text in the primary care record might add information. Our objectives were to describe the contribution of free text in primary care to the recording of information about myocardial infarction (MI), including subtype, left ventricular function, laboratory results and symptoms; and recording of cause of death. We used the CALIBER EHR research platform which contains primary care data from the Clinical Practice Research Datalink (CPRD) linked to hospital admission data, the MINAP registry of acute coronary syndromes and the death registry. In CALIBER we randomly selected 2000 patients with MI and 1800 deaths. We implemented a rule-based natural language engine, the Freetext Matching Algorithm, on site at CPRD to analyse free text in the primary care record without raw data being released to researchers. We analysed text recorded within 90 days before or 90 days after the MI, and on or after the date of death.

Results: We extracted 10,927 diagnoses, 3658 test results, 3313 statements of negation, and 850 suspected diagnoses from the myocardial infarction patients. Inclusion of free text increased the recorded proportion of patients with chest pain in the week prior to MI from 19 to 27%, and differentiated between MI subtypes in a quarter more patients than structured data alone. Cause of death was incompletely recorded in primary care; in 36% the cause was in coded data and in 21% it was in free text. Only 47% of patients had exactly the same cause of death in primary care and the death registry, but this did not differ between coded and free text causes of death.

Conclusions: Among patients who suffer MI or die, unstructured free text in primary care records contains much information that is potentially useful for research such as symptoms, investigation results and specific diagnoses. Access to large scale unstructured data in electronic health records (millions of patients) might yield important insights.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13326-019-0214-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6849160PMC
November 2019

UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER.

J Am Med Inform Assoc 2019 12;26(12):1545-1559

Institute of Health Informatics, University College London, London,United Kingdom.

Objective: Electronic health records (EHRs) are a rich source of information on human diseases, but the information is variably structured, fragmented, curated using different coding systems, and collected for purposes other than medical research. We describe an approach for developing, validating, and sharing reproducible phenotypes from national structured EHR in the United Kingdom with applications for translational research.

Materials And Methods: We implemented a rule-based phenotyping framework, with up to 6 approaches of validation. We applied our framework to a sample of 15 million individuals in a national EHR data source (population-based primary care, all ages) linked to hospitalization and death records in England. Data comprised continuous measurements (for example, blood pressure; medication information; coded diagnoses, symptoms, procedures, and referrals), recorded using 5 controlled clinical terminologies: (1) read (primary care, subset of SNOMED-CT [Systematized Nomenclature of Medicine Clinical Terms]), (2) International Classification of Diseases-Ninth Revision and Tenth Revision (secondary care diagnoses and cause of mortality), (3) Office of Population Censuses and Surveys Classification of Surgical Operations and Procedures, Fourth Revision (hospital surgical procedures), and (4) DM+D prescription codes.

Results: Using the CALIBER phenotyping framework, we created algorithms for 51 diseases, syndromes, biomarkers, and lifestyle risk factors and provide up to 6 validation approaches. The EHR phenotypes are curated in the open-access CALIBER Portal (https://www.caliberresearch.org/portal) and have been used by 40 national and international research groups in 60 peer-reviewed publications.

Conclusions: We describe a UK EHR phenomics approach within the CALIBER EHR data platform with initial evidence of validity and use, as an important step toward international use of UK EHR data for health research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocz105DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857510PMC
December 2019

Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances.

J Biomed Inform 2018 12 24;88:11-19. Epub 2018 Oct 24.

Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; South London and Maudsley NHS Foundation Trust, London, UK. Electronic address:

The importance of incorporating Natural Language Processing (NLP) methods in clinical informatics research has been increasingly recognized over the past years, and has led to transformative advances. Typically, clinical NLP systems are developed and evaluated on word, sentence, or document level annotations that model specific attributes and features, such as document content (e.g., patient status, or report type), document section types (e.g., current medications, past medical history, or discharge summary), named entities and concepts (e.g., diagnoses, symptoms, or treatments) or semantic attributes (e.g., negation, severity, or temporality). From a clinical perspective, on the other hand, research studies are typically modelled and evaluated on a patient- or population-level, such as predicting how a patient group might respond to specific treatments or patient monitoring over time. While some NLP tasks consider predictions at the individual or group user level, these tasks still constitute a minority. Owing to the discrepancy between scientific objectives of each field, and because of differences in methodological evaluation priorities, there is no clear alignment between these evaluation approaches. Here we provide a broad summary and outline of the challenging issues involved in defining appropriate intrinsic and extrinsic evaluation methods for NLP research that is to be used for clinical outcomes research, and vice versa. A particular focus is placed on mental health research, an area still relatively understudied by the clinical NLP research community, but where NLP methods are of notable relevance. Recent advances in clinical NLP method development have been significant, but we propose more emphasis needs to be placed on rigorous evaluation for the field to advance further. To enable this, we provide actionable suggestions, including a minimal protocol that could be used when reporting clinical NLP method development and its evaluation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2018.10.005DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986921PMC
December 2018

Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease.

PLoS One 2018 31;13(8):e0202344. Epub 2018 Aug 31.

The Francis Crick Institute, London, United Kingdom.

Prognostic modelling is important in clinical practice and epidemiology for patient management and research. Electronic health records (EHR) provide large quantities of data for such models, but conventional epidemiological approaches require significant researcher time to implement. Expert selection of variables, fine-tuning of variable transformations and interactions, and imputing missing values are time-consuming and could bias subsequent analysis, particularly given that missingness in EHR is both high, and may carry meaning. Using a cohort of 80,000 patients from the CALIBER programme, we compared traditional modelling and machine-learning approaches in EHR. First, we used Cox models and random survival forests with and without imputation on 27 expert-selected, preprocessed variables to predict all-cause mortality. We then used Cox models, random forests and elastic net regression on an extended dataset with 586 variables to build prognostic models and identify novel prognostic factors without prior expert input. We observed that data-driven models used on an extended dataset can outperform conventional models for prognosis, without data preprocessing or imputing missing values. An elastic net Cox regression based with 586 unimputed variables with continuous values discretised achieved a C-index of 0.801 (bootstrapped 95% CI 0.799 to 0.802), compared to 0.793 (0.791 to 0.794) for a traditional Cox model comprising 27 expert-selected variables with imputation for missing values. We also found that data-driven models allow identification of novel prognostic variables; that the absence of values for particular variables carries meaning, and can have significant implications for prognosis; and that variables often have a nonlinear association with mortality, which discretised Cox models and random forests can elucidate. This demonstrates that machine-learning approaches applied to raw EHR data can be used to build models for use in research and clinical practice, and identify novel predictive variables and their effects to inform future research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0202344PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6118376PMC
February 2019

An electronic health records cohort study on heart failure following myocardial infarction in England: incidence and predictors.

BMJ Open 2018 03 3;8(3):e018331. Epub 2018 Mar 3.

Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, UK.

Objectives: To investigate the incidence and determinants of heart failure (HF) following a myocardial infarction (MI) in a contemporary cohort of patients with MI using routinely collected primary and hospital care electronic health records (EHRs).

Methods: Data were used from the CALIBER programme, linking EHRs in England from primary care, hospital admissions, an MI registry and mortality data. Subjects were eligible if they were 18 years or older, did not have a history of HF and survived a first MI. Factors associated with time to HF were examined using Cox proportional hazard models.

Results: Of the 24 479 patients with MI, 5775 (23.6%) developed HF during a median follow-up of 3.7 years (incidence rate per 1000 person-years: 63.8, 95% CI 62.2 to 65.5). Baseline characteristics significantly associated with developing HF were: atrial fibrillation (HR 1.62, 95% CI 1.51 to 1.75), age (per 10 years increase: 1.45, 1.41 to 1.49), diabetes (1.45, 1.35 to 1.56), peripheral arterial disease (1.38, 1.26 to 1.51), chronic obstructive pulmonary disease (1.28, 1.17 to 1.40), greater socioeconomic deprivation (5th vs 1st quintile: 1.27, 1.13 to 1.41), ST-segment elevation MI at presentation (1.19, 1.11 to 1.27) and hypertension (1.16, 1.09 to 1.23). Results were robust to various sensitivity analyses such as competing risk analysis and multiple imputation.

Conclusion: In England, one in four survivors of a first MI develop HF within 4 years. This contemporary study demonstrates that patients with MI are at considerable risk of HF. Baseline patient characteristics associated with time until HF were identified, which may be used to target preventive strategies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1136/bmjopen-2017-018331DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5855447PMC
March 2018

Association between clinically recorded alcohol consumption and initial presentation of 12 cardiovascular diseases: population based cohort study using linked health records.

BMJ 2017 Mar 22;356:j909. Epub 2017 Mar 22.

Farr Institute of Health Informatics Research (London), University College London, London NW1 2DA, UK.

 To investigate the association between alcohol consumption and cardiovascular disease at higher resolution by examining the initial lifetime presentation of 12 cardiac, cerebrovascular, abdominal, or peripheral vascular diseases among five categories of consumption. Population based cohort study of linked electronic health records covering primary care, hospital admissions, and mortality in 1997-2010 (median follow-up six years). CALIBER (ClinicAl research using LInked Bespoke studies and Electronic health Records). 1 937 360 adults (51% women), aged ≥30 who were free from cardiovascular disease at baseline. 12 common symptomatic manifestations of cardiovascular disease, including chronic stable angina, unstable angina, acute myocardial infarction, unheralded coronary heart disease death, heart failure, sudden coronary death/cardiac arrest, transient ischaemic attack, ischaemic stroke, intracerebral and subarachnoid haemorrhage, peripheral arterial disease, and abdominal aortic aneurysm. 114 859 individuals received an incident cardiovascular diagnosis during follow-up. Non-drinking was associated with an increased risk of unstable angina (hazard ratio 1.33, 95% confidence interval 1.21 to 1.45), myocardial infarction (1.32, 1.24 to1.41), unheralded coronary death (1.56, 1.38 to 1.76), heart failure (1.24, 1.11 to 1.38), ischaemic stroke (1.12, 1.01 to 1.24), peripheral arterial disease (1.22, 1.13 to 1.32), and abdominal aortic aneurysm (1.32, 1.17 to 1.49) compared with moderate drinking (consumption within contemporaneous UK weekly/daily guidelines of 21/3 and 14/2 units for men and women, respectively). Heavy drinking (exceeding guidelines) conferred an increased risk of presenting with unheralded coronary death (1.21, 1.08 to 1.35), heart failure (1.22, 1.08 to 1.37), cardiac arrest (1.50, 1.26 to 1.77), transient ischaemic attack (1.11, 1.02 to 1.37), ischaemic stroke (1.33, 1.09 to 1.63), intracerebral haemorrhage (1.37, 1.16 to 1.62), and peripheral arterial disease (1.35; 1.23 to 1.48), but a lower risk of myocardial infarction (0.88, 0.79 to 1.00) or stable angina (0.93, 0.86 to 1.00). Heterogeneous associations exist between level of alcohol consumption and the initial presentation of cardiovascular diseases. This has implications for counselling patients, public health communication, and clinical research, suggesting a more nuanced approach to the role of alcohol in prevention of cardiovascular disease is necessary. clinicaltrails.gov (NCT01864031).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5594422PMC
http://dx.doi.org/10.1136/bmj.j909DOI Listing
March 2017

Prognostic burden of heart failure recorded in primary care, acute hospital admissions, or both: a population-based linked electronic health record cohort study in 2.1 million people.

Eur J Heart Fail 2017 09 23;19(9):1119-1127. Epub 2016 Dec 23.

Farr Institute of Health Informatics Research, London, UK.

Aims: The prognosis of patients hospitalized for worsening heart failure (HF) is well described, but not that of patients managed solely in non-acute settings such as primary care or secondary outpatient care. We assessed the distribution of HF across levels of healthcare, and assessed the prognostic differences for patients with HF either recorded in primary care (including secondary outpatient care) (PC), hospital admissions alone, or known in both contexts.

Methods And Results: This study was part of the CALIBER programme, which comprises linked data from primary care, hospital admissions, and death certificates for 2.1 million inhabitants of England. We identified 89 554 patients with newly recorded HF, of whom 23 547 (26%) were recorded in PC but never hospitalized, 30 629 (34%) in hospital admissions but not known in PC, 23 681 (27%) in both, and 11 697 (13%) in death certificates only. The highest prescription rates of ACE inhibitors, beta-blockers, and mineralocorticoid receptor antagonists was found in patients known in both contexts. The respective 5-year survival in the first three groups was 43.9% [95% confidence interval (CI) 43.2-44.6%], 21.7% (95% CI 21.1-22.2%), and 39.8% (95% CI 39.2-40.5%), compared with 88.1% (95% CI 87.9-88.3%) in the age- and sex-matched general population.

Conclusion: In the general population, one in four patients with HF will not be hospitalized for worsening HF within a median follow-up of 1.7 years, yet they still have a poor 5-year prognosis. Patients admitted to hospital with worsening HF but not known with HF in primary care have the worst prognosis and management. Mitigating the prognostic burden of HF requires greater consistency across primary and secondary care in the identification, profiling, and treatment of patients.

Trial Registration: NCT02551016.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/ejhf.709DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5420446PMC
September 2017

Using electronic health records to predict costs and outcomes in stable coronary artery disease.

Heart 2016 05 10;102(10):755-62. Epub 2016 Feb 10.

Centre for Health Economics, University of York, York, UK.

Objectives: To use electronic health records (EHR) to predict lifetime costs and health outcomes of patients with stable coronary artery disease (stable-CAD) stratified by their risk of future cardiovascular events, and to evaluate the cost-effectiveness of treatments targeted at these populations.

Methods: The analysis was based on 94 966 patients with stable-CAD in England between 2001 and 2010, identified in four prospectively collected, linked EHR sources. Markov modelling was used to estimate lifetime costs and quality-adjusted life years (QALYs) stratified by baseline cardiovascular risk.

Results: For the lowest risk tenth of patients with stable-CAD, predicted discounted remaining lifetime healthcare costs and QALYs were £62 210 (95% CI £33 724 to £90 043) and 12.0 (95% CI 11.5 to 12.5) years, respectively. For the highest risk tenth of the population, the equivalent costs and QALYs were £35 549 (95% CI £31 679 to £39 615) and 2.9 (95% CI 2.6 to 3.1) years, respectively. A new treatment with a hazard reduction of 20% for myocardial infarction, stroke and cardiovascular disease death and no side-effects would be cost-effective if priced below £72 per year for the lowest risk patients and £646 per year for the highest risk patients.

Conclusions: Existing EHRs may be used to estimate lifetime healthcare costs and outcomes of patients with stable-CAD. The stable-CAD model developed in this study lends itself to informing decisions about commissioning, pricing and reimbursement. At current prices, to be cost-effective some established as well as future stable-CAD treatments may require stratification by patient risk.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1136/heartjnl-2015-308850DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4849559PMC
May 2016

Defining disease phenotypes using national linked electronic health records: a case study of atrial fibrillation.

PLoS One 2014 4;9(11):e110900. Epub 2014 Nov 4.

Farr Institute of Health Informatics Research, University College London, London, United Kingdom, and Clinical Epidemiology, Department of Epidemiology and Public Health, University College London, London, United Kingdom.

Background: National electronic health records (EHR) are increasingly used for research but identifying disease cases is challenging due to differences in information captured between sources (e.g. primary and secondary care). Our objective was to provide a transparent, reproducible model for integrating these data using atrial fibrillation (AF), a chronic condition diagnosed and managed in multiple ways in different healthcare settings, as a case study.

Methods: Potentially relevant codes for AF screening, diagnosis, and management were identified in four coding systems: Read (primary care diagnoses and procedures), British National Formulary (BNF; primary care prescriptions), ICD-10 (secondary care diagnoses) and OPCS-4 (secondary care procedures). From these we developed a phenotype algorithm via expert review and analysis of linked EHR data from 1998 to 2010 for a cohort of 2.14 million UK patients aged ≥ 30 years. The cohort was also used to evaluate the phenotype by examining associations between incident AF and known risk factors.

Results: The phenotype algorithm incorporated 286 codes: 201 Read, 63 BNF, 18 ICD-10, and four OPCS-4. Incident AF diagnoses were recorded for 72,793 patients, but only 39.6% (N = 28,795) were recorded in primary care and secondary care. An additional 7,468 potential cases were inferred from data on treatment and pre-existing conditions. The proportion of cases identified from each source differed by diagnosis age; inferred diagnoses contributed a greater proportion of younger cases (≤ 60 years), while older patients (≥ 80 years) were mainly diagnosed in SC. Associations of risk factors (hypertension, myocardial infarction, heart failure) with incident AF defined using different EHR sources were comparable in magnitude to those from traditional consented cohorts.

Conclusions: A single EHR source is not sufficient to identify all patients, nor will it provide a representative sample. Combining multiple data sources and integrating information on treatment and comorbid conditions can substantially improve case identification.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0110900PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4219705PMC
July 2015

Blood pressure and incidence of twelve cardiovascular diseases: lifetime risks, healthy life-years lost, and age-specific associations in 1·25 million people.

Lancet 2014 May;383(9932):1899-911

The Farr Institute of Health Informatics Research, London, UK; Epidemiology and Public Health, University College London, London, UK.

Background: The associations of blood pressure with the different manifestations of incident cardiovascular disease in a contemporary population have not been compared. In this study, we aimed to analyse the associations of blood pressure with 12 different presentations of cardiovascular disease.

Methods: We used linked electronic health records from 1997 to 2010 in the CALIBER (CArdiovascular research using LInked Bespoke studies and Electronic health Records) programme to assemble a cohort of 1·25 million patients, 30 years of age or older and initially free from cardiovascular disease, a fifth of whom received blood pressure-lowering treatments. We studied the heterogeneity in the age-specific associations of clinically measured blood pressure with 12 acute and chronic cardiovascular diseases, and estimated the lifetime risks (up to 95 years of age) and cardiovascular disease-free life-years lost adjusted for other risk factors at index ages 30, 60, and 80 years. This study is registered at ClinicalTrials.gov, number NCT01164371.

Findings: During 5·2 years median follow-up, we recorded 83,098 initial cardiovascular disease presentations. In each age group, the lowest risk for cardiovascular disease was in people with systolic blood pressure of 90-114 mm Hg and diastolic blood pressure of 60-74 mm Hg, with no evidence of a J-shaped increased risk at lower blood pressures. The effect of high blood pressure varied by cardiovascular disease endpoint, from strongly positive to no effect. Associations with high systolic blood pressure were strongest for intracerebral haemorrhage (hazard ratio 1·44 [95% CI 1·32-1·58]), subarachnoid haemorrhage (1·43 [1·25-1·63]), and stable angina (1·41 [1·36-1·46]), and weakest for abdominal aortic aneurysm (1·08 [1·00-1·17]). Compared with diastolic blood pressure, raised systolic blood pressure had a greater effect on angina, myocardial infarction, and peripheral arterial disease, whereas raised diastolic blood pressure had a greater effect on abdominal aortic aneurysm than did raised systolic pressure. Pulse pressure associations were inverse for abdominal aortic aneurysm (HR per 10 mm Hg 0·91 [95% CI 0·86-0·98]) and strongest for peripheral arterial disease (1·23 [1·20-1·27]). People with hypertension (blood pressure ≥140/90 mm Hg or those receiving blood pressure-lowering drugs) had a lifetime risk of overall cardiovascular disease at 30 years of age of 63·3% (95% CI 62·9-63·8) compared with 46·1% (45·5-46·8) for those with normal blood pressure, and developed cardiovascular disease 5·0 years earlier (95% CI 4·8-5·2). Stable and unstable angina accounted for most (43%) of the cardiovascular disease-free years of life lost associated with hypertension from index age 30 years, whereas heart failure and stable angina accounted for the largest proportion (19% each) of years of life lost from index age 80 years.

Interpretation: The widely held assumptions that blood pressure has strong associations with the occurrence of all cardiovascular diseases across a wide age range, and that diastolic and systolic associations are concordant, are not supported by the findings of this high-resolution study. Despite modern treatments, the lifetime burden of hypertension is substantial. These findings emphasise the need for new blood pressure-lowering strategies, and will help to inform the design of randomised trials to assess them.

Funding: Medical Research Council, National Institute for Health Research, and Wellcome Trust.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/S0140-6736(14)60685-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4042017PMC
May 2014

Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

Am J Epidemiol 2014 Mar 12;179(6):764-74. Epub 2014 Jan 12.

Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/aje/kwt312DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3939843PMC
March 2014

Data resource profile: cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER).

Int J Epidemiol 2012 Dec 5;41(6):1625-38. Epub 2012 Dec 5.

Department of Epidemiology and Public Health, Clinical Epidemiology, University College London, London, UK.

The goal of cardiovascular disease (CVD) research using linked bespoke studies and electronic health records (CALIBER) is to provide evidence to inform health care and public health policy for CVDs across different stages of translation, from discovery, through evaluation in trials to implementation, where linkages to electronic health records provide new scientific opportunities. The initial approach of the CALIBER programme is characterized as follows: (i) Linkages of multiple electronic heath record sources: examples include linkages between the longitudinal primary care data from the Clinical Practice Research Datalink, the national registry of acute coronary syndromes (Myocardial Ischaemia National Audit Project), hospitalization and procedure data from Hospital Episode Statistics and cause-specific mortality and social deprivation data from the Office of National Statistics. Current cohort analyses involve a million people in initially healthy populations and disease registries with ∼10(5) patients. (ii) Linkages of bespoke investigator-led cohort studies (e.g. UK Biobank) to registry data (e.g. Myocardial Ischaemia National Audit Project), providing new means of ascertaining, validating and phenotyping disease. (iii) A common data model in which routine electronic health record data are made research ready, and sharable, by defining and curating with meta-data >300 variables (categorical, continuous, event) on risk factors, CVDs and non-cardiovascular comorbidities. (iv) Transparency: all CALIBER studies have an analytic protocol registered in the public domain, and data are available (safe haven model) for use subject to approvals. For more information, e-mail [email protected]
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/ije/dys188DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3535749PMC
December 2012

The freetext matching algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records.

BMC Med Inform Decis Mak 2012 Aug 7;12:88. Epub 2012 Aug 7.

Clinical Epidemiology Group, Department of Epidemiology and Public Health, University College London, London, UK.

Background: Electronic health records are invaluable for medical research, but much information is stored as free text rather than in a coded form. For example, in the UK General Practice Research Database (GPRD), causes of death and test results are sometimes recorded only in free text. Free text can be difficult to use for research if it requires time-consuming manual review. Our aim was to develop an automated method for extracting coded information from free text in electronic patient records.

Methods: We reviewed the electronic patient records in GPRD of a random sample of 3310 patients who died in 2001, to identify the cause of death. We developed a computer program called the Freetext Matching Algorithm (FMA) to map diagnoses in text to the Read Clinical Terminology. The program uses lookup tables of synonyms and phrase patterns to identify diagnoses, dates and selected test results. We tested it on two random samples of free text from GPRD (1000 texts associated with death in 2001, and 1000 general texts from cases and controls in a coronary artery disease study), comparing the output to the U.S. National Library of Medicine's MetaMap program and the gold standard of manual review.

Results: Among 3310 patients registered in the GPRD who died in 2001, the cause of death was recorded in coded form in 38.1% of patients, and in the free text alone in 19.4%. On the 1000 texts associated with death, FMA coded 683 of the 735 positive diagnoses, with precision (positive predictive value) 98.4% (95% confidence interval (CI) 97.2, 99.2) and recall (sensitivity) 92.9% (95% CI 90.8, 94.7). On the general sample, FMA detected 346 of the 447 positive diagnoses, with precision 91.5% (95% CI 88.3, 94.1) and recall 77.4% (95% CI 73.2, 81.2), which was similar to MetaMap.

Conclusions: We have developed an algorithm to extract coded information from free text in GP records with good precision. It may facilitate research using free text in electronic patient records, particularly for extracting the cause of death.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1472-6947-12-88DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3483188PMC
August 2012

A healthy volunteer study to investigate trace element contamination of blood samples by stainless steel venepuncture needles.

Clin Toxicol (Phila) 2012 Feb;50(2):99-107

King's College London, UK.

Context: The trace elements cobalt (Co), chromium (Cr), manganese (Mn) and nickel (Ni) are normally present at low concentrations in blood. There has been a concern that stainless steel venepuncture needles typically used for collection of blood samples may contaminate these samples, leading to the masking of deficiency states or causing potential clinical confusion as to whether an individual has a "toxic" concentration.

Objective: To determine whether there is any difference between the concentrations of the trace elements obtained by different methods of blood sampling.

Methods: We took blood samples using a standard venepuncture needle, a "butterfly" winged infusion needle (three consecutive samples) and a plastic intravenous cannula (three consecutive samples) from 10 healthy volunteers. We measured the concentrations of Co, Cr, Mn and Ni in the samples using Inductively Coupled Plasma Mass Spectrometry, and used analysis of variance (ANOVA) to investigate if there was any difference between the methods of blood sampling.

Results: The mean ± standard deviation blood metal concentrations were: Co 0.33 ± 0.2 μg/l, Cr 2.43 ± 1.55 μg/l, Mn 8.07 ± 7.74 μg/l and Ni 10.4 ± 4.69 μg/l. There was considerable variation between blood metal concentrations of individual subjects and a few sporadic high values. By ANOVA, there was no significant difference between the metal concentrations measured using different methods of blood collection.

Conclusions: It is not necessary to routinely use a plastic cannula for blood sampling for trace element analysis. However, it is possible that sporadic contamination due to stainless steel needles may occur, so we would recommend that unexpected high concentrations are verified by taking a second sample taken through a plastic cannula.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3109/15563650.2011.654146DOI Listing
February 2012

Extracting diagnoses and investigation results from unstructured text in electronic health records by semi-supervised machine learning.

PLoS One 2012 19;7(1):e30412. Epub 2012 Jan 19.

Department of Computer Science, University College London, London, United Kingdom.

Background: Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually.

Aim: To develop an algorithm to identify relevant free texts automatically based on labelled examples.

Methods: We developed a novel machine learning algorithm, the 'Semi-supervised Set Covering Machine' (S3CM), and tested its ability to detect the presence of coronary angiogram results and ovarian cancer diagnoses in free text in the General Practice Research Database. For training the algorithm, we used texts classified as positive and negative according to their associated Read diagnostic codes, rather than by manual annotation. We evaluated the precision (positive predictive value) and recall (sensitivity) of S3CM in classifying unlabelled texts against the gold standard of manual review. We compared the performance of S3CM with the Transductive Vector Support Machine (TVSM), the original fully-supervised Set Covering Machine (SCM) and our 'Freetext Matching Algorithm' natural language processor.

Results: Only 60% of texts with Read codes for angiogram actually contained angiogram results. However, the S3CM algorithm achieved 87% recall with 64% precision on detecting coronary angiogram results, outperforming the fully-supervised SCM (recall 78%, precision 60%) and TSVM (recall 2%, precision 3%). For ovarian cancer diagnoses, S3CM had higher recall than the other algorithms tested (86%). The Freetext Matching Algorithm had better precision than S3CM (85% versus 74%) but lower recall (62%).

Conclusions: Our novel S3CM machine learning algorithm effectively detected free texts in primary care records associated with angiogram results and ovarian cancer diagnoses, after training on pre-classified test sets. It should be easy to adapt to other disease areas as it does not rely on linguistic rules, but needs further testing in other electronic health record datasets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030412PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3261909PMC
June 2012

Threshold haemoglobin levels and the prognosis of stable coronary disease: two new cohorts and a systematic review and meta-analysis.

PLoS Med 2011 May 31;8(5):e1000439. Epub 2011 May 31.

Clinical Epidemiology Group, Department of Epidemiology and Public Health, University College London, London, UK.

Background: Low haemoglobin concentration has been associated with adverse prognosis in patients with angina and myocardial infarction (MI), but the strength and shape of the association and the presence of any threshold has not been precisely evaluated.

Methods And Findings: A retrospective cohort study was carried out using the UK General Practice Research Database. 20,131 people with a new diagnosis of stable angina and no previous acute coronary syndrome, and 14,171 people with first MI who survived for at least 7 days were followed up for a mean of 3.2 years. Using semi-parametric Cox regression and multiple adjustment, there was evidence of threshold haemoglobin values below which mortality increased in a graded continuous fashion. For men with MI, the threshold value was 13.5 g/dl (95% confidence interval [CI] 13.2-13.9); the 29.5% of patients with haemoglobin below this threshold had an associated hazard ratio for mortality of 2.00 (95% CI 1.76-2.29) compared to those with haemoglobin values in the lowest risk range. Women tended to have lower threshold haemoglobin values (e.g, for MI 12.8 g/dl; 95% CI 12.1-13.5) but the shape and strength of association did not differ between the genders, nor between patients with angina and MI. We did a systematic review and meta-analysis that identified ten previously published studies, reporting a total of only 1,127 endpoints, but none evaluated thresholds of risk.

Conclusions: There is an association between low haemoglobin concentration and increased mortality. A large proportion of patients with coronary disease have haemoglobin concentrations below the thresholds of risk defined here. Intervention trials would clarify whether increasing the haemoglobin concentration reduces mortality.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pmed.1000439DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3104976PMC
May 2011

Understanding lactic acidosis in paracetamol (acetaminophen) poisoning.

Br J Clin Pharmacol 2011 Jan;71(1):20-8

Clinical Toxicology, Guy's and St Thomas' NHS Foundation Trust, Guy's Hospital, Great Maze Pond, London, UK.

Paracetamol (acetaminophen) is one of the most commonly taken drugs in overdose in many areas of the world, and the most common cause of acute liver failure in both the UK and USA. Paracetamol poisoning can result in lactic acidosis in two different scenarios. First, early in the course of poisoning and before the onset of hepatotoxicity in patients with massive ingestion; a lactic acidosis is usually associated with coma. Experimental evidence from studies in whole animals, perfused liver slices and cell cultures has shown that the toxic metabolite of paracetamol, N-acetyl-p-benzo-quinone imine, inhibits electron transfer in the mitochondrial respiratory chain and thus inhibits aerobic respiration. This occurs only at very high concentrations of paracetamol, and precedes cellular injury by several hours. The second scenario in which lactic acidosis can occur is later in the course of paracetamol poisoning as a consequence of established liver failure. In these patients lactate is elevated primarily because of reduced hepatic clearance, but in shocked patients there may also be a contribution of peripheral anaerobic respiration because of tissue hypoperfusion. In patients admitted to a liver unit with paracetamol hepatotoxicity, the post-resuscitation arterial lactate concentration has been shown to be a strong predictor of mortality, and is included in the modified King's College criteria for consideration of liver transplantation. We would therefore recommend that post-resuscitation lactate is measured in all patients with a severe paracetamol overdose resulting in either reduced conscious level or hepatic failure.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/j.1365-2125.2010.03765.xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3018022PMC
January 2011

An unusual case of transient dermatological reaction to bortezomib in AL amyloidosis.

Int J Hematol 2010 Jan 18;91(1):121-3. Epub 2009 Dec 18.

Department of Haematology, Royal Free Hospital, London, UK.

We report an unusual dermatological reaction to bortezomib in a 61-year-old man with AL amyloidosis. Systemic AL amyloidosis is a rare complication of monoclonal gammopathy or myeloma in which abnormally unstable free light chains cause fibrillary deposits in organs leading to multisystem disease. The treatment of AL amyloidosis is directed at the underlying plasma cell dyscrasia and most regimes have been adapted from myeloma, but drug toxicity is more common in AL amyloidosis because of the more extensive nature of the disease. We report a patient who developed asymptomatic purple discolouration of the veins of his left arm several days after receiving the infusion in his left hand, although the infusion itself had been uncomplicated with no extravasation. The discolouration resolved completely within 2 weeks; there was recurrence on a subsequent dose of bortezomib but this also subsided spontaneously. This reaction may have been transient phlebitis or a local vasogenic reaction; its transient nature and the lack of systemic features suggest it is a benign phenomenon. There appears to be no indication for discontinuation of bortezomib treatment or dose alteration in such cases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s12185-009-0460-9DOI Listing
January 2010

An algorithm to derive a numerical daily dose from unstructured text dosage instructions.

Pharmacoepidemiol Drug Saf 2006 Mar;15(3):161-6

General Practice Research Database Division, Medicines and Healthcare products Regulatory Agency, London, UK.

Purpose: The General Practice Research Database (GPRD) is a database of longitudinal patient records from general practices in the United Kingdom. It is an important data source for pharmacoepidemiology studies, but until now it has been tedious to calculate the daily dose and duration of exposure to drugs prescribed. This is because general practitioners routinely record dosage instructions as free text rather than in a structured way. The objective was to develop and assess the validity of an automated algorithm to derive the daily dose from text dosage instructions.

Methods: A computer program was developed to derive numerical information from unstructured text dosage instructions. It was tested on dosage texts from a random sample of one million prescription entries. A random sample of 1,000 of these converted texts were manually checked for their accuracy.

Results: Out of the sample of one million prescription entries, 74.5% had text containing the daily dose, 14.5% had text but did not include a quantitative daily dose statement and 11.0% had no text entered. Of the 1000 texts which were checked manually, 767 stated the daily dose. The program interpreted 758 (98.8%) of these correctly, produced errors in four cases and failed to extract the dose from five texts.

Conclusions: An automated algorithm has been developed which can accurately extract the daily dose from almost 99% of general practitioners' text dosage instructions. It increases the utility of GPRD and other prescription data sources by enabling researchers to estimate the duration of drug exposure more efficiently.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/pds.1151DOI Listing
March 2006
-->