Publications by authors named "Olivier Q Groot"

16 Publications

  • Page 1 of 1

Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review.

Acta Orthop 2021 Apr 18:1-9. Epub 2021 Apr 18.

Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;

Background and purpose - External validation of machine learning (ML) prediction models is an essential step before clinical application. We assessed the proportion, performance, and transparent reporting of externally validated ML prediction models in orthopedic surgery, using the Transparent Reporting for Individual Prognosis or Diagnosis (TRIPOD) guidelines.Material and methods - We performed a systematic search using synonyms for every orthopedic specialty, ML, and external validation. The proportion was determined by using 59 ML prediction models with only internal validation in orthopedic surgical outcome published up until June 18, 2020, previously identified by our group. Model performance was evaluated using discrimination, calibration, and decision-curve analysis. The TRIPOD guidelines assessed transparent reporting.Results - We included 18 studies externally validating 10 different ML prediction models of the 59 available ML models after screening 4,682 studies. All external validations identified in this review retained good discrimination. Other key performance measures were provided in only 3 studies, rendering overall performance evaluation difficult. The overall median TRIPOD completeness was 61% (IQR 43-89), with 6 items being reported in less than 4/18 of the studies.Interpretation - Most current predictive ML models are not externally validated. The 18 available external validation studies were characterized by incomplete reporting of performance measures, limiting a transparent examination of model performance. Further prospective studies are needed to validate or refute the myriad of predictive ML models in orthopedics while adhering to existing guidelines. This ensures clinicians can take full advantage of validated and clinically implementable ML decision tools.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/17453674.2021.1910448DOI Listing
April 2021

Machine learning prediction models in orthopedic surgery: A systematic review in transparent reporting.

J Orthop Res 2021 Mar 18. Epub 2021 Mar 18.

Orthopedic Oncology Service, Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA.

Machine learning (ML) studies are becoming increasingly popular in orthopedics but lack a critically appraisal of their adherence to peer-reviewed guidelines. The objective of this review was to (1) evaluate quality and transparent reporting of ML prediction models in orthopedic surgery based on the transparent reporting of multivariable prediction models for individual prognosis or diagnosis (TRIPOD), and (2) assess risk of bias with the Prediction model Risk Of Bias ASsessment Tool. A systematic review was performed to identify all ML prediction studies published in orthopedic surgery through June 18th, 2020. After screening 7138 studies, 59 studies met the study criteria and were included. Two reviewers independently extracted data and discrepancies were resolved by discussion with at least two additional reviewers present. Across all studies, the overall median completeness for the TRIPOD checklist was 53% (interquartile range 47%-60%). The overall risk of bias was low in 44% (n = 26), high in 41% (n = 24), and unclear in 15% (n = 9). High overall risk of bias was driven by incomplete reporting of performance measures, inadequate handling of missing data, and use of small datasets with inadequate outcome numbers. Although the number of ML studies in orthopedic surgery is increasing rapidly, over 40% of the existing models are at high risk of bias. Furthermore, over half incompletely reported their methods and/or performance measures. Until these issues are adequately addressed to give patients and providers trust in ML models, a considerable gap remains between the development of ML prediction models and their implementation in orthopedic practice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/jor.25036DOI Listing
March 2021

International external validation of the SORG machine learning algorithms for predicting 90-day and 1-year survival of patients with spine metastases using a Taiwanese cohort.

Spine J 2021 Feb 2. Epub 2021 Feb 2.

Department of Orthopedics, National Taiwan University College of Medicine and National Taiwan University Hospital, Taipei, Taiwan. Electronic address:

Background Context: Accurately predicting the survival of patients with spinal metastases is important for guiding surgical intervention. The SORG machine-learning (ML) algorithm for the 90-day and 1-year mortality of patients with metastatic cancer to the spine has been multiply validated, with a high degree of accuracy in both internal and external validation studies. However, prior external validations were conducted using patient groups located on the east coast of the United States, representing a generally homogeneous population. The aim of this study was to externally validate the SORG algorithms with a Taiwanese population.

Study Design/setting: Retrospective study at a single tertiary care center in Taiwan PATIENT SAMPLE: Four hundred and twenty-seven patients who underwent surgery for metastatic spine disease from November 1, 2010 to December 31, 2018 OUTCOME MEASURES: 90-Day and 1-Year Mortality METHODS: The baseline characteristics of our validation cohort were compared with those of the previously published developmental and external validation cohorts. Discrimination (c-statistic and receiver operating curve), calibration (calibration plot, intercept, and slope), overall performance (Brier score), and decision curve analysis were used to assess the performance of the SORG ML algorithms in this cohort.

Results: Ninety-day and 1-year mortality rates were 110 of 427 (26%) and 256 of 427 (60%), respectively. The external validation cohort and the developmental cohort differed in body mass index (BMI), preoperative performance status, American Spinal Injury Association impairment scale, primary tumor histology and in several laboratory measurements. The SORG ML algorithm for 90-day and 1-year mortality demonstrated a high level of discriminative ability (c-statistics of 0.73 [95% confidence interval [CI], 0.67-0.78] and 0.74 [95% CI, 0.69-0.79]), overall performance, and had a positive net benefit throughout the range of threshold probabilities in decision curve analysis. The algorithm for 1-year mortality had a calibration intercept of 0.08, representing a good calibration. However, the 90-day mortality algorithm underestimated mortality for the lowest predicted probabilities, with an overall intercept of 0.81.

Conclusions: The SORG algorithms for predicting 90-day and 1-year mortality in patients with spinal metastatic disease generally performed well on international external validation in a predominately Taiwanese population. However, 90-day mortality was underestimated in this group. Whether this inconsistency was due to different primary tumor characteristics, body mass index, selection bias or other factors remains unclear, and may be better understood with further validative works that utilize international and/or diverse populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2021.01.027DOI Listing
February 2021

Do Cohabitants Reliably Complete Questionnaires for Patients in a Terminal Cancer Stage when Assessing Quality of Life, Pain, Depression, and Anxiety?

Clin Orthop Relat Res 2021 04;479(4):792-801

O. Q. Groot, N. R. P. Pereira, M. E. R. Bongers, P. T. Ogink, E. T. Newman, K. A. Raskin, S. A. Lozano-Calderon, J. H. Schwab, Department of Orthopaedic Surgery, Orthopaedic Oncology Service, Massachusetts General Hospital - Harvard Medical School, Boston, MA, USA.

Background: Patients with bone metastases often are unable to complete quality of life (QoL) questionnaires, and cohabitants (such as spouses, domestic partners, offspring older than 18 years, or other people who live with the patient) could be a reliable alternative. However, the extent of reliability in this complicated patient population remains undefined, and the influence of the cohabitant's condition on their assessment of the patient's QoL is unknown.

Questions/purposes: (1) Do QoL scores, measured by the 5-level EuroQol-5D (EQ-5D-5L) version and the Patient-reported Outcomes Measurement Information System (PROMIS) version 1.0 in three domains (anxiety, pain interference, and depression), reported by patients differ markedly from scores as assessed by their cohabitants? (2) Do cohabitants' PROMIS-Depression scores correlate with differences in measured QoL results?

Methods: This cross-sectional study included patients and cohabitants older than 18 years of age. Patients included those with presence of histologically confirmed bone metastases (including lymphoma and multiple myeloma), and cohabitants must have been present at the clinic visit. Patients were eligible for inclusion in the study regardless of comorbidities, prognosis, prior surgery, or current treatment. Between June 1, 2016 and March 1, 2017 and between October 1, 2017 and February 26, 2018, all 96 eligible patients were approached, of whom 49% (47) met the selection criteria and were willing to participate. The included 47 patient-cohabitant pairs independently completed the EQ-5D-5L and the eight-item PROMIS for three domains (anxiety, pain, and depression) with respect to the patients' symptoms. The cohabitants also completed the four-item PROMIS-Depression survey with respect to their own symptoms.

Results: There were no clinically important differences between the scores of patients and their cohabitants for all questionnaires, and the agreement between patient and cohabitant scores was moderate to strong (Spearman correlation coefficients ranging from 0.52 to 0.72 on the four questionnaires; all p values < 0.05). However, despite the good agreement in QoL scores, an increased cohabitant's depression score was correlated with an overestimation of the patient's symptom burden for the anxiety and depression domains (weak Spearman correlation coefficient of 0.33 [95% confidence interval 0.08 to 0.58]; p = 0.01 and moderate Spearman correlation coefficient of 0.52 [95% CI 0.29 to 0.74]; p < 0.01, respectively).

Conclusion: The present findings support that cohabitants might be reliable raters of the QoL of patients with bone metastases. However, if a patient's cohabitant has depression, the cohabitant may overestimate a patient's symptoms in emotional domains such as anxiety and depression, warranting further research that includes cohabitants with and without depression to elucidate the effect of depression on the level of agreement. For now, clinicians may want to reconsider using the cohabitant's judgement if depression is suspected.

Clinical Relevance: These findings suggest that a cohabitant's impressions of a patient's quality of life are, in most instances, accurate; this is potentially helpful in situations where the patient cannot weigh in. Future studies should employ longitudinal designs to see how or whether our findings change over time and with disease progression, and how specific interventions-like different chemotherapeutic regimens or surgery-may factor in.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000001525DOI Listing
April 2021

Postoperative adverse events secondary to iatrogenic vascular injury during anterior lumbar spinal surgery.

Spine J 2021 May 3;21(5):795-802. Epub 2020 Nov 3.

Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114, USA.

Background: Anterior lumbar spine surgery (ALSS) requires mobilization of the great vessels, resulting in a high risk of iatrogenic vascular injury (VI). It remains unclear whether VI is associated with increased risk of postoperative complications and other related adverse outcomes.

Purpose: The purpose of this study was to (1) assess the incidence of postoperative complications attributable to VI during ALSS, and (2) outcomes secondary to VI such as procedural blood loss, transfusion of blood products, length of stay (LOS), and in hospital mortality.

Study Design: Retrospective propensity-score matched, case-control study at 2 academic and 3 community medical centers, PATIENT SAMPLE: Patients 18 years of age or older, undergoing ALSS between January 1st, 2000 and July 31st, 2019 were included in this analysis.

Outcome Measures: The primary outcome was the incidence of postoperative complications attributable to VI, such as venous thromboembolism, compartment syndrome, transfusion reaction, limb ischemia, and reoperations. The secondary outcomes included estimated operative blood loss (milliliter), transfused blood products, LOS (days), and in-hospital mortality.

Methods: In total, 1,035 patients were identified, of which 75 (7.2%) had a VI. For comparative analyses, the 75 VI patients were paired with 75 comparable non-VI patients by propensity-score matching. The adequacy of the matching was assessed by testing the standardized mean differences (SMD) between VI and non-VI group (>0.25 SMD).

Results: Two patients (2.7%) had VI-related postoperative complications in the studied period, which consisted of two deep venous thromboembolisms (DVTs) occurring on day 3 and 7 postoperatively. Both DVTs were located in the distal left common iliac vein (CIV). The VI these patients suffered were to the distal inferior vena cava and the left CIV, respectively. Both patients did not develop additional complications in consequence of their DVTs, however, did require systemic anticoagulation and placement of an inferior vena cava filter. There was no statistical difference with the non-VI group where no instances (0%) of postoperative complications were reported (p=.157). No differences were found in LOS or in hospital mortality between the two groups (p=.157 and p=.999, respectively). Intraoperative blood loss and blood transfusion were both found to be higher in the VI group in comparison to the non-VI group (650 mL, interquartile range [IQR] 300-1400 vs. 150 mL, IQR 50-425, p≤.001; 0 units, IQR 0-3 vs. 0 units, IQR 0-1, p=.012, respectively).

Conclusion: This study found a low number of serious postoperative complications related to VI in ALSS. In addition, these complications were not significantly different between the VI and matched non-VI ALSS cohort. Although not significant, the found DVT incidence of 2.7% after VI in ALSS warrants vigilance and preventive measures during the postoperative course of these patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.10.031DOI Listing
May 2021

Natural language processing for automated quantification of bone metastases reported in free-text bone scintigraphy reports.

Acta Oncol 2020 Dec 12;59(12):1455-1460. Epub 2020 Sep 12.

Department of Orthopaedic Surgery, Orthopaedic Oncology Service, Massachusetts General Hospital - Harvard Medical School, Boston, MA, USA.

Background: The widespread use of electronic patient-generated health data has led to unprecedented opportunities for automated extraction of clinical features from free-text medical notes. However, processing this rich resource of data for clinical and research purposes, depends on labor-intensive and potentially error-prone manual review. The aim of this study was to develop a natural language processing (NLP) algorithm for binary classification (single metastasis versus two or more metastases) in bone scintigraphy reports of patients undergoing surgery for bone metastases.

Material And Methods: Bone scintigraphy reports of patients undergoing surgery for bone metastases were labeled each by three independent reviewers using a binary classification (single metastasis versus two or more metastases) to establish a ground truth. A stratified 80:20 split was used to develop and test an extreme-gradient boosting supervised machine learning NLP algorithm.

Results: A total of 704 free-text bone scintigraphy reports from 704 patients were included in this study and 617 (88%) had multiple bone metastases. In the independent test set ( = 141) not used for model development, the NLP algorithm achieved an 0.97 AUC-ROC (95% confidence interval [CI], 0.92-0.99) for classification of multiple bone metastases and an 0.99 AUC-PRC (95% CI, 0.99-0.99). At a threshold of 0.90, NLP algorithm correctly identified multiple bone metastases in 117 of the 124 who had multiple bone metastases in the testing cohort (sensitivity 0.94) and yielded 3 false positives (specificity 0.82). At the same threshold, the NLP algorithm had a positive predictive value of 0.97 and F1-score of 0.96.

Conclusions: NLP has the potential to automate clinical data extraction from free text radiology notes in orthopedics, thereby optimizing the speed, accuracy, and consistency of clinical chart review. Pending external validation, the NLP algorithm developed in this study may be implemented as a means to aid researchers in tackling large amounts of data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/0284186X.2020.1819563DOI Listing
December 2020

Does Artificial Intelligence Outperform Natural Intelligence in Interpreting Musculoskeletal Radiological Studies? A Systematic Review.

Clin Orthop Relat Res 2020 12;478(12):2751-2764

O. Q. Groot, M. E. R. Bongers, A. V. Karhade, J. H. Schwab, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

Background: Machine learning (ML) is a subdomain of artificial intelligence that enables computers to abstract patterns from data without explicit programming. A myriad of impactful ML applications already exists in orthopaedics ranging from predicting infections after surgery to diagnostic imaging. However, no systematic reviews that we know of have compared, in particular, the performance of ML models with that of clinicians in musculoskeletal imaging to provide an up-to-date summary regarding the extent of applying ML to imaging diagnoses. By doing so, this review delves into where current ML developments stand in aiding orthopaedists in assessing musculoskeletal images.

Questions/purposes: This systematic review aimed (1) to compare performance of ML models versus clinicians in detecting, differentiating, or classifying orthopaedic abnormalities on imaging by (A) accuracy, sensitivity, and specificity, (B) input features (for example, plain radiographs, MRI scans, ultrasound), (C) clinician specialties, and (2) to compare the performance of clinician-aided versus unaided ML models.

Methods: A systematic review was performed in PubMed, Embase, and the Cochrane Library for studies published up to October 1, 2019, using synonyms for machine learning and all potential orthopaedic specialties. We included all studies that compared ML models head-to-head against clinicians in the binary detection of abnormalities in musculoskeletal images. After screening 6531 studies, we ultimately included 12 studies. We conducted quality assessment using the Methodological Index for Non-randomized Studies (MINORS) checklist. All 12 studies were of comparable quality, and they all clearly included six of the eight critical appraisal items (study aim, input feature, ground truth, ML versus human comparison, performance metric, and ML model description). This justified summarizing the findings in a quantitative form by calculating the median absolute improvement of the ML models compared with clinicians for the following metrics of performance: accuracy, sensitivity, and specificity.

Results: ML models provided, in aggregate, only very slight improvements in diagnostic accuracy and sensitivity compared with clinicians working alone and were on par in specificity (3% (interquartile range [IQR] -2.0% to 7.5%), 0.06% (IQR -0.03 to 0.14), and 0.00 (IQR -0.048 to 0.048), respectively). Inputs used by the ML models were plain radiographs (n = 8), MRI scans (n = 3), and ultrasound examinations (n = 1). Overall, ML models outperformed clinicians more when interpreting plain radiographs than when interpreting MRIs (17 of 34 and 3 of 16 performance comparisons, respectively). Orthopaedists and radiologists performed similarly to ML models, while ML models mostly outperformed other clinicians (outperformance in 7 of 19, 7 of 23, and 6 of 10 performance comparisons, respectively). Two studies evaluated the performance of clinicians aided and unaided by ML models; both demonstrated considerable improvements in ML-aided clinician performance by reporting a 47% decrease of misinterpretation rate (95% confidence interval [CI] 37 to 54; p < 0.001) and a mean increase in specificity of 0.048 (95% CI 0.029 to 0.068; p < 0.001) in detecting abnormalities on musculoskeletal images.

Conclusions: At present, ML models have comparable performance to clinicians in assessing musculoskeletal images. ML models may enhance the performance of clinicians as a technical supplement rather than as a replacement for clinical intelligence. Future ML-related studies should emphasize how ML models can complement clinicians, instead of determining the overall superiority of one versus the other. This can be accomplished by improving transparent reporting, diminishing bias, determining the feasibility of implantation in the clinical setting, and appropriately tempering conclusions.

Level Of Evidence: Level III, diagnostic study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000001360DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7899420PMC
December 2020

How Does the Skeletal Oncology Research Group Algorithm's Prediction of 5-year Survival in Patients with Chondrosarcoma Perform on International Validation?

Clin Orthop Relat Res 2020 Oct;478(10):2300-2308

M. E. R. Bongers, A. V. Karhade, O. Q. Groot, J. H. Schwab, Department of Orthopaedic Surgery, Division of Orthopaedic Oncology, Massachusetts General Hospital - Harvard Medical School, Boston, MA, USA.

Background: The Skeletal Oncology Research Group (SORG) machine learning algorithm for predicting survival in patients with chondrosarcoma was developed using data from the Surveillance, Epidemiology, and End Results (SEER) registry. This algorithm was externally validated on a dataset of patients from the United States in an earlier study, where it demonstrated generally good performance but overestimated 5-year survival. In addition, this algorithm has not yet been validated in patients outside the United States; doing so would be important because external validation is necessary as algorithm performance may be misleading when applied in different populations.

Questions/purposes: Does the SORG algorithm retain validity in patients who underwent surgery for primary chondrosarcoma outside the United States, specifically in Italy?

Methods: A total of 737 patients were treated for chondrosarcoma between January 2000 and October 2014 at the Italian tertiary care center which was used for international validation. We excluded patients whose first surgical procedure was performed elsewhere (n = 25), patients who underwent nonsurgical treatment (n = 27), patients with a chondrosarcoma of the soft tissue or skull (n = 60), and patients with peripheral, periosteal, or mesenchymal chondrosarcoma (n = 161). Thus, 464 patients were ultimately included in this external validation study, as the earlier performed SEER study was used as the training set. Therefore, this study-unlike most of this type-does not have a training and validation set. Although the earlier study overestimated 5-year survival, we did not modify the algorithm in this report, as this is the first international validation and the prior performance in the single-institution validation study from the United States may have been driven by a small sample or non-generalizable patterns related to its single-center setting. Variables needed for the SORG algorithm were manually collected from electronic medical records. These included sex, age, histologic subtype, tumor grade, tumor size, tumor extension, and tumor location. By inputting these variables into the algorithm, we calculated the predicted probabilities of survival for each patient. The performance of the SORG algorithm was assessed in this study through discrimination (the ability of a model to distinguish between a binary outcome), calibration (the agreement of observed and predicted outcomes), overall performance (the accuracy of predictions), and decision curve analysis (establishment on the ability of a model to make a decision better than without using the model). For discrimination, the c-statistic (commonly known as the area under the receiver operating characteristic curve for binary classification) was calculated; this ranged from 0.5 (no better than chance) to 1.0 (excellent discrimination). The agreement between predicted and observed outcomes was visualized with a calibration plot, and the calibration slope and intercept were calculated. Perfect calibration results in a slope of 1 and an intercept of 0. For overall performance, the Brier score and the null-model Brier score were calculated. The Brier score ranges from 0 (perfect prediction) to 1 (poorest prediction). Appropriate interpretation of the Brier score requires comparison with the null-model Brier score. The null-model Brier score is the score for an algorithm that predicts a probability equal to the population prevalence of the outcome for every patient. A decision curve analysis was performed to compare the potential net benefit of the algorithm versus other means of decision support, such as treating all or none of the patients. There were several differences between this study and the earlier SEER study, and such differences are important because they help us to determine the performance of the algorithm in a group different from the initial study population. In this study from Italy, 5-year survival was different from the earlier SEER study (71% [319 of 450 patients] versus 76% [1131 of 1487 patients]; p = 0.03). There were more patients with dedifferentiated chondrosarcoma than in the earlier SEER study (25% [118 of 464 patients] versus 8.5% [131 of 1544 patients]; p < 0.001). In addition, in this study patients were older, tumor size was larger, and there were higher proportions of high-grade tumors than the earlier SEER study (age: 56 years [interquartile range {IQR} 42 to 67] versus 52 years [IQR 40 to 64]; p = 0.007; tumor size: 80 mm [IQR 50 to 120] versus 70 mm [IQR 42 to 105]; p < 0.001; tumor grade: 22% [104 of 464 had Grade 1], 42% [196 of 464 had Grade 2], and 35% [164 of 464 had Grade 3] versus 41% [592 of 1456 had Grade 1], 40% [588 of 1456 had Grade 2], and 19% [276 of 1456 had Grade 3]; p ≤ 0.001).

Results: Validation of the SORG algorithm in a primarily Italian population achieved a c-statistic of 0.86 (95% confidence interval 0.82 to 0.89), suggesting good-to-excellent discrimination. The calibration plot showed good agreement between the predicted probability and observed survival in the probability thresholds of 0.8 to 1.0. With predicted survival probabilities lower than 0.8, however, the SORG algorithm underestimated the observed proportion of patients with 5-year survival, reflected in the overall calibration intercept of 0.82 (95% CI 0.67 to 0.98) and calibration slope of 0.68 (95% CI 0.42 to 0.95). The Brier score for 5-year survival was 0.15, compared with a null-model Brier of 0.21. The algorithm showed a favorable decision curve analysis in the validation cohort.

Conclusions: The SORG algorithm to predict 5-year survival for patients with chondrosarcoma held good discriminative ability and overall performance on international external validation; however, it underestimated 5-year survival for patients with predicted probabilities from 0 to 0.8 because the calibration plot was not perfectly aligned for the observed outcomes, which resulted in a maximum underestimation of 20%. The differences may reflect the baseline differences noted between the two study populations. The overall performance of the algorithm supports the utility of the algorithm and validation presented here. The freely available digital application for the algorithm is available here: https://sorg-apps.shinyapps.io/extremitymetssurvival/.

Level Of Evidence: Level III, prognostic study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000001305DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7491905PMC
October 2020

Does the SORG algorithm generalize to a contemporary cohort of patients with spinal metastases on external validation?

Spine J 2020 10 16;20(10):1646-1652. Epub 2020 May 16.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.

Background Context: The SORG machine-learning algorithms were previously developed for preoperative prediction of overall survival in spinal metastatic disease. On sub-group analysis of a previous external validation, these algorithms were found to have diminished performance on patients treated after 2010.

Purpose: The purpose of this study was to assess the performance of these algorithms on a large contemporary cohort of consecutive spinal metastatic disease patients.

Study Design/setting: Retrospective study performed at a tertiary care referral center.

Patient Sample: Patients of 18 years and older treated with surgery for metastatic spinal disease between 2014 and 2016.

Outcome Measures: Ninety-day and one-year mortality.

Methods: Baseline patient and tumor characteristics of the validation cohort were compared to the development cohort using bivariate logistic regression. Performance of the SORG algorithms on external validation in the contemporary cohort was assessed with discrimination (c-statistic and receiver operating curve), calibration (calibration plot, intercept, and slope), overall performance (Brier score compared to the null-model Brier score), and decision curve analysis.

Results: Overall, 200 patients were included with 90-day and 1-year mortality rates of 55 (27.6%) and 124 (62.9%), respectively. The contemporary external validation cohort and the developmental cohort differed significantly on primary tumor histology, presence of visceral metastases, American Spinal Injury Association impairment scale, and preoperative laboratory values. The SORG algorithms for 90-day and 1-year mortality retained good discriminative ability (c-statistic of 0.81 [95% confidence interval [CI], 0.74-0.87] and 0.84 [95% CI, 0.77-0.89]), overall performance, and decision curve analysis. The algorithm for 90-day mortality showed almost perfect calibration reflected in an overall calibration intercept of -0.07 (95% CI: -0.50, 0.35). The 1-year mortality algorithm underestimated mortality mainly for the lowest predicted probabilities with an overall intercept of 0.57 (95% CI: 0.18, 0.96).

Conclusions: The SORG algorithms for survival in spinal metastatic disease generalized well to a contemporary cohort of consecutively treated patients from an external institutional. Further validation in international cohorts and large, prospective multi-institutional trials is required to confirm or refute the findings presented here. The open-access algorithms are available here: https://sorg-apps.shinyapps.io/spinemetssurvival/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.05.003DOI Listing
October 2020

Development of machine learning and natural language processing algorithms for preoperative prediction and automated identification of intraoperative vascular injury in anterior lumbar spine surgery.

Spine J 2020 Apr 12. Epub 2020 Apr 12.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Department of Orthopedic Surgery, Newton Wellesley Hospital, Newton, MA, USA. Electronic address:

Background: Intraoperative vascular injury (VI) may be an unavoidable complication of anterior lumbar spine surgery; however, vascular injury has implications for quality and safety reporting as this intraoperative complication may result in serious bleeding, thrombosis, and postoperative stricture.

Purpose: The purpose of this study was to (1) develop machine learning algorithms for preoperative prediction of VI and (2) develop natural language processing (NLP) algorithms for automated surveillance of intraoperative VI from free-text operative notes.

Patient Sample: Adult patients, 18 years or age or older, undergoing anterior lumbar spine surgery at two academic and three community medical centers were included in this analysis.

Outcome Measures: The primary outcome was unintended VI during anterior lumbar spine surgery.

Methods: Manual review of free-text operative notes was used to identify patients who had unintended VI. The available population was split into training and testing cohorts. Five machine learning algorithms were developed for preoperative prediction of VI. An NLP algorithm was trained for automated detection of intraoperative VI from free-text operative notes. Performance of the NLP algorithm was compared to current procedural terminology and international classification of diseases codes.

Results: In all, 1035 patients underwent anterior lumbar spine surgery and the rate of intraoperative VI was 7.2% (n=75). Variables used for preoperative prediction of VI were age, male sex, body mass index, diabetes, L4-L5 exposure, and surgery for infection (discitis, osteomyelitis). The best performing machine learning algorithm achieved c-statistic of 0.73 for preoperative prediction of VI (https://sorg-apps.shinyapps.io/lumbar_vascular_injury/). For automated detection of intraoperative VI from free-text notes, the NLP algorithm achieved c-statistic of 0.92. The NLP algorithm identified 18 of the 21 patients (sensitivity 0.86) who had a VI whereas current procedural terminologyand international classification of diseases codes identified 6 of the 21 (sensitivity 0.29) patients. At this threshold, the NLP algorithm had a specificity of 0.93, negative predictive value of 0.99, positive predictive value of 0.51, and F1-score of 0.64.

Conclusion: Relying on administrative procedural and diagnosis codes may underestimate the rate of unintended intraoperative VI in anterior lumbar spine surgery. External and prospective validation of the algorithms presented here may improve quality and safety reporting.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.04.001DOI Listing
April 2020

Can natural language processing provide accurate, automated reporting of wound infection requiring reoperation after lumbar discectomy?

Spine J 2020 10 4;20(10):1602-1609. Epub 2020 Mar 4.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Department of Orthopedic Surgery, Newton Wellesley Hospital, Newton, MA, USA. Electronic address:

Background: Surgical site infections are a major driver of morbidity and increased costs in the postoperative period after spine surgery. Current tools for surveillance of these adverse events rely on prospective clinical tracking, manual retrospective chart review, or administrative procedural and diagnosis codes.

Purpose: The purpose of this study was to develop natural language processing (NLP) algorithms for automated reporting of postoperative wound infection requiring reoperation after lumbar discectomy.

Patient Sample: Adult patients undergoing discectomy at two academic and three community medical centers between January 1, 2000 and July 31, 2019 for lumbar disc herniation.

Outcome Measures: Reoperation for wound infection within 90 days after surgery METHODS: Free-text notes of patients who underwent surgery from January 1, 2000 to December 31, 2015 were used for algorithm training. Free-text notes of patients who underwent surgery after January 1, 2016 were used for algorithm testing. Manual chart review was used to label which patients had reoperation for wound infection. An extreme gradient-boosting NLP algorithm was developed to detect reoperation for postoperative wound infection.

Results: Overall, 5,860 patients were included in this study and 62 (1.1%) had a reoperation for wound infection. In patients who underwent surgery after January 1, 2016 (n=1,377), the NLP algorithm detected 15 of the 16 patients (sensitivity=0.94) who had reoperation for infection. In comparison, current procedural terminology and international classification of disease codes detected 12 of these 16 patients (sensitivity=0.75). At a threshold of 0.05, the NLP algorithm had positive predictive value of 0.83 and F1-score of 0.88.

Conclusion: Temporal validation of the algorithm developed in this study demonstrates a proof-of-concept application of NLP for automated reporting of adverse events after spine surgery. Adapting this methodology for other procedures and outcomes in spine and orthopedics has the potential to dramatically improve and automatize quality and safety reporting.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.02.021DOI Listing
October 2020

Natural language processing for automated detection of incidental durotomy.

Spine J 2020 05 23;20(5):695-700. Epub 2019 Dec 23.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, MA 02114, USA. Electronic address:

Background: Incidental durotomy is a common intraoperative complication during spine surgery with potential implications for postoperative recovery, patient-reported outcomes, length of stay, and costs. To our knowledge, there are no processes available for automated surveillance of incidental durotomy.

Purpose: The purpose of this study was to develop natural language processing (NLP) algorithms for automated detection of incidental durotomies in free-text operative notes of patients undergoing lumbar spine surgery.

Patient Sample: Adult patients 18 years or older undergoing lumbar spine surgery between January 1, 2000 and June 31, 2018 at two academic and three community medical centers.

Outcome Measures: The primary outcome was defined as intraoperative durotomy recorded in free-text operative notes.

Methods: An 80:20 stratified split was undertaken to create training and testing populations. An extreme gradient-boosting NLP algorithm was developed to detect incidental durotomy. Discrimination was assessed via area under receiver-operating curve (AUC-ROC), precision-recall curve, and Brier score. Performance of this algorithm was compared with current procedural terminology (CPT) and international classification of diseases (ICD) codes for durotomy.

Results: Overall, 1,000 patients were included in the study and 93 (9.3%) had a recorded incidental durotomy in the free-text operative report. In the independent testing set (n=200) not used for model development, the NLP algorithm achieved AUC-ROC of 0.99 for detection of durotomy. In comparison, the CPT/ICD codes had AUC-ROC of 0.64. In the testing set, the NLP algorithm detected 16 of 18 patients with incidental durotomy (sensitivity 0.89) whereas the CPT and ICD codes detected 5 of 18 (sensitivity 0.28). At a threshold of 0.05, the NLP algorithm had specificity of 0.99, positive predictive value of 0.89, and negative predictive value of 0.99.

Conclusions: Internal validation of the NLP algorithm developed in this study indicates promising results for future NLP applications in spine surgery. Pending external validation, the NLP algorithm developed in this study may be used by entities including national spine registries or hospital quality and safety departments to automate tracking of incidental durotomies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2019.12.006DOI Listing
May 2020

High Risk of Symptomatic Venous Thromboembolism After Surgery for Spine Metastatic Bone Lesions: A Retrospective Study.

Clin Orthop Relat Res 2019 07;477(7):1674-1686

O. Q. Groot, P. T. Ogink, N. R. P. Pereira, S. A. Lozano-Calderon, J. H. Schwab, Department of Orthopaedic Surgery, Orthopaedic Oncology Service, Massachusetts General Hospital - Harvard Medical School, Boston, MA, USA M. L. Ferrone, M. B. Harris, A. J. Schoenfield, Department of Orthopaedic Surgery, Orthopaedic Spine Service, Brigham and Women's Hospital - Harvard Medical School, Boston, MA, USA.

Background: Cancer and spinal surgery are both considered risk factors for venous thromboembolism (VTE). However, the risk of symptomatic VTE for patients undergoing surgery for spine metastases remains undefined.

Questions/purposes: The purposes of this study were to: (1) identify the proportion of patients who develop symptomatic VTE within 90-days of surgical treatment for spine metastases; (2) identify the factors associated with the development of symptomatic VTE among patients receiving surgery for spine metastases; (3) assess the association between the development of postoperative symptomatic VTE and 1-year survival among patients who underwent surgery for spine metastases; and (4) assess if chemoprophylaxis increases the risk of wound complications among patients who underwent surgery for spine metastases.

Methods: Between 2002 and 2014, 637 patients at two hospitals underwent spine surgery for metastases. We considered eligible for analysis adult patients whose procedures were to treat cervical, thoracic, or lumbar metastases (including lymphoma and multiple myeloma). At followup after 90 days and 1 year, respectively, 21 of 637 patients (3%) and 41 of 637 patients (6%) were lost to followup. In general, we used 40 mg of enoxaparin or 5000 IUs subcutaneous heparin every 12 hours. Patients on preoperative chemoprophylaxis continued their initial medication postoperatively. All chemoprophylaxis was started 48 hours after surgery and continued day to day but was discontinued if a bleeding complication developed. Low-molecular-weight heparin (including enoxaparin and dalteparin, in general dosages of respectively 40 mg and 5000 IUs daily) was the most commonly used chemoprophylaxis in 308 patients (48%). Subcutaneous heparin was injected into 127 patients (20%); aspirin was used for 92 patients (14%); and warfarin was administered in 21 patients (3.3%). No form of chemoprophylaxis was prescribed for 89 patients (14%). The primary outcome variable, VTE, was defined as any symptomatic pulmonary embolism (PE) or symptomatic deep venous thromboembolism (DVT) within 90 days of surgery as determined by chart review. The secondary outcome was defined as any documented wound complication within 90 days of surgery that might be attributable to chemoprophylaxis. Statistical analysis was performed using multivariable logistic and Cox regression and Kaplan-Meier.

Results: Overall, 72 of 637 patients (11%) had symptomatic VTE; 38 (6%) developed a PE-eight (1.3%) of which were fatal-and 40 (6%) a DVT. After controlling for relevant confounding variables such as age, the modified Charlson Comorbidity Index, visceral metastases, and chemoprophylaxis, longer duration of surgery was independently associated with an increased risk of symptomatic VTE (odds ratio 1.15 for each additional hour of surgery; 95% confidence interval [CI], 1.04-1.28; p = 0.009). After controlling for relevant confounding variables such as age, the modified Charlson Comorbidity Index, visceral metastases, and primary tumor type, patients with symptomatic VTE had a worse 1-year survival rate (VTE, 38%; 95% CI, 27-49 versus nonVTE, 47%; 95% CI, 42-51; p = 0.044). After controlling for relevant confounding variables, no association was found between wound complications and the use of chemoprophylaxis (odds ratio, 1.34; 95% CI, 0.62-2.90; p = 0.459). The overall proportion of patients who developed a wound complication was 10% (66 of 637), including 1.1% (seven of 637) spinal epidural hematomas.

Conclusions: The risk of both symptomatic PE and fatal PE is high in this patient population, and those with symptomatic VTE were less likely to survive 1-year than those who did not, though this may reflect overall infirmity as much as anything else, because many of these patients did not die from VTE-related complications. Further study, such as randomized controlled trials with consistent postoperative VTE screening comparing different chemoprophylaxis regimens, are needed to identify better VTE prevention strategies.

Level Of Evidence: Level III, therapeutic study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000000733DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6999978PMC
July 2019

Non-HLA Genetic Factors and Their Influence on Heart Transplant Outcomes: A Systematic Review.

Transplant Direct 2019 Feb 21;5(2):e422. Epub 2019 Jan 21.

Division Heart and Lungs, Department of Cardiology, University Medical Center Utrecht, University of Utrecht, Utrecht, the Netherlands.

Background: Improvement of immunosuppressive therapies and surgical techniques has increased the survival rate after heart transplantation. Nevertheless, a large number of patients still experience complications, such as allograft rejection, vasculopathy, kidney dysfunction, and diabetes in response to immunosuppressive therapy. Variants in HLA genes have been extensively studied for their role in clinical outcomes after transplantation, whereas the knowledge about non-HLA genetic variants in this setting is still limited. Non-HLA polymorphisms are involved in the metabolism of major immunosuppressive therapeutics and may play a role in clinical outcomes after cardiac transplantation. This systematic review summarizes the existing knowledge of associations between non-HLA genetic variation and heart transplant outcomes.

Methods: The current evidence available on genetic polymorphisms associated with outcomes after heart transplantation was identified by a systematic search in PubMed and Embase. Studies reporting on polymorphisms significantly associated with clinical outcomes after cardiac transplantation were included.

Results: A total of 56 studies were included, all were candidate gene studies. These studies identified 58 polymorphisms in 36 genes that were associated with outcomes after cardiac transplantation. Variants in , and are consistently replicated across multiple studies for various transplant outcomes.

Conclusions: The research currently available supports the hypothesis that non-HLA polymorphisms are associated with clinical outcomes after heart transplantation. However, many genetic variants were only identified in a single study, questioning their true effect on the clinical outcomes tested. Further research in larger cohorts with well-defined phenotypes is warranted.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/TXD.0000000000000859DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6415970PMC
February 2019

High Risk of Venous Thromboembolism After Surgery for Long Bone Metastases: A Retrospective Study of 682 Patients.

Clin Orthop Relat Res 2018 10;476(10):2052-2061

Department of Orthopaedic Surgery, Orthopaedic Oncology Service, Massachusetts General Hospital, Boston, MA, USA.

Background: Previous studies have shown that venous thromboembolism (VTE) is a complication associated with neoplastic disease and major orthopaedic surgery. However, many potential risk factors remain undefined.

Questions/purposes: (1) What proportion of patients develop symptomatic VTE after surgery for long bone metastases? (2) What factors are associated with the development of symptomatic VTE among patients receiving surgery for long bone metastases? (3) Is there an association between the development of symptomatic VTE and 1-year survival among patients undergoing surgery for long bone metastases? (4) Does chemoprophylaxis increase the risk of wound complications among patients undergoing surgery for long bone metastases?

Methods: A retrospective study identified 682 patients undergoing surgical treatment of long bone metastases between 2002 and 2013 at the Massachusetts General Hospital and Brigham and Women's Hospital. We included patients 18 years of age or older who had a surgical procedure for impending or pathologic metastatic long bone fracture. We considered the humerus, radius, ulna, femur, tibia, and fibula as long bones; metastatic disease was defined as metastases from solid organs, multiple myeloma, or lymphoma. In general, we used 40 mg enoxaparin daily for lower extremity surgery and 325 mg aspirin daily for lower or upper extremity surgery. The primary outcome was a VTE defined as any symptomatic pulmonary embolism (PE) or symptomatic deep vein thrombosis (DVT; proximal and distal) within 90 days of surgery as determined by chart review. The tertiary outcome was defined as any documented wound complication that might be attributable to chemoprophylaxis within 90 days of surgery. At followup after 90 days and 1 year, respectively, 4% (25 of 682) and 8% (53 of 682) were lost to followup. Statistical analysis was performed using multivariable logistic and Cox regression and Kaplan-Meier.

Results: Overall, 6% (44 of 682) of patients had symptomatic VTE; 22 patients sustained a DVT, and 22 developed a PE. After controlling for relevant confounding variables, higher preoperative hemoglobin level was independently associated (odds ratio [OR], 0.75; 95% confidence interval [CI], 0.60-0.93; p = 0.011) with decreased symptomatic VTE risk, the presence of symptomatic VTE was associated with a worse 1-year survival rate (VTE: 27% [95% CI, 14%-40%] and non-VTE: 39% [95% CI, 35%-43%]; p = 0.041), and no association was found between wound complications and the use of chemoprophylaxis (OR, 3.29; 95% CI, 0.43-25.17; p = 0.252).

Conclusions: The risk of symptomatic 90-day VTE is high in patients undergoing surgery for long bone metastases. Further study would be needed to determine the VTE prevention strategy that best balances risks and benefits to address this complication.

Level Of Evidence: Level III, therapeutic study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000000463DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6259821PMC
October 2018

Complications and reoperations after surgery for 647 patients with spine metastatic disease.

Spine J 2019 01 1;19(1):144-156. Epub 2018 Jun 1.

Department of Orthopaedic Surgery, Orthopaedic Oncology Service, Massachusetts General Hospital-Harvard Medical School, 55 Fruit St, Boston, MA 02114, USA. Electronic address:

Background Context: Postoperative morbidity may offset the potential benefits of surgical treatment for spine metastatic disease; hence, risk factors for postoperative complications and reoperations should be taken into considerations during surgical decision-making. In addition, it remains unknown whether complications and reoperations shorten these patients' survival.

Purpose: We aimed to describe and identify factors associated with having a complication within 30 days of index surgery as well as factors associated with having a subsequent reoperation. Furthermore, we assessed the effect of 30-day complications and reoperations on the patients' postoperative survival, as well as described neurologic changes after surgery.

Study Design: Retrospective cohort study.

Patient Sample: We included 647 patients 18 years and older who had surgery for metastatic disease in the spine between January 2002 and January 2014 in one of two affiliated tertiary care centers.

Outcome Measures: Our primary outcomes were complications within 30 days after surgery and reoperations until final follow-up or death.

Methods: We used multivariate logistic regression to identify risk factors for 30-day complications and reoperations. We used the Cox regression analysis to assess the effect of postoperative complications and reoperations on survival.

Results: From 647 included patients, 205 (32%) had a complication within 30 days. The following variables were independently associated with 30-day complications: lower albumin levels (odds ratio [OR]: 0.69, 95% confidence interval [CI]=0.49-0.96, p=.021), additional comorbidities (OR=1.42, 95% CI=1.00-2.01, p=.048), pathologic fracture (OR=1.41, 95% CI=0.97-2.05, p=.031), three or more spine levels operated upon (OR=1.64, 95% CI=1.02-2.64, p=.027), and combined surgical approach (OR=2.44, 95% CI=1.06-5.60, p=.036). One hundred and fifteen patients (18%) had at least one reoperation after the initial surgery; prior radiotherapy (OR=1.56, 95% CI=1.07-2.29, p=.021) to the spinal tumor was independently associated with reoperation. 30-day complications were associated with worse survival (hazard ratio [HR]=1.40, 95% CI=1.17-1.68, p<.001), and reoperation was not significantly associated with worse survival (HR=0.80, 95% CI=0.09-1.00, p=.054). Neurologic status worsened in 42 (6.7%), remained stable in 445 (71%), and improved in 140 (22%) patients after surgery.

Conclusions: Three or more spine levels operated upon and prior radiotherapy should prompt consideration of a preoperative plastic surgery consultation regarding soft tissue coverage. Furthermore, if time allows, aggressive nutritional supplementation should be considered for patient with low preoperative serum albumin levels. Surgeons should be aware of the increase in complications in patients presenting with pathologic fracture, undergoing a combined approach, and with any additional preoperative comorbidities. Importantly, 30-day complications were associated with worsened survival.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2018.05.037DOI Listing
January 2019