Publications by authors named "Michiel E R Bongers"

16 Publications

  • Page 1 of 1

Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review.

Acta Orthop 2021 Apr 18:1-9. Epub 2021 Apr 18.

Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;

Background and purpose - External validation of machine learning (ML) prediction models is an essential step before clinical application. We assessed the proportion, performance, and transparent reporting of externally validated ML prediction models in orthopedic surgery, using the Transparent Reporting for Individual Prognosis or Diagnosis (TRIPOD) guidelines.Material and methods - We performed a systematic search using synonyms for every orthopedic specialty, ML, and external validation. The proportion was determined by using 59 ML prediction models with only internal validation in orthopedic surgical outcome published up until June 18, 2020, previously identified by our group. Model performance was evaluated using discrimination, calibration, and decision-curve analysis. The TRIPOD guidelines assessed transparent reporting.Results - We included 18 studies externally validating 10 different ML prediction models of the 59 available ML models after screening 4,682 studies. All external validations identified in this review retained good discrimination. Other key performance measures were provided in only 3 studies, rendering overall performance evaluation difficult. The overall median TRIPOD completeness was 61% (IQR 43-89), with 6 items being reported in less than 4/18 of the studies.Interpretation - Most current predictive ML models are not externally validated. The 18 available external validation studies were characterized by incomplete reporting of performance measures, limiting a transparent examination of model performance. Further prospective studies are needed to validate or refute the myriad of predictive ML models in orthopedics while adhering to existing guidelines. This ensures clinicians can take full advantage of validated and clinically implementable ML decision tools.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/17453674.2021.1910448DOI Listing
April 2021

Machine learning prediction models in orthopedic surgery: A systematic review in transparent reporting.

J Orthop Res 2021 Mar 18. Epub 2021 Mar 18.

Orthopedic Oncology Service, Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA.

Machine learning (ML) studies are becoming increasingly popular in orthopedics but lack a critically appraisal of their adherence to peer-reviewed guidelines. The objective of this review was to (1) evaluate quality and transparent reporting of ML prediction models in orthopedic surgery based on the transparent reporting of multivariable prediction models for individual prognosis or diagnosis (TRIPOD), and (2) assess risk of bias with the Prediction model Risk Of Bias ASsessment Tool. A systematic review was performed to identify all ML prediction studies published in orthopedic surgery through June 18th, 2020. After screening 7138 studies, 59 studies met the study criteria and were included. Two reviewers independently extracted data and discrepancies were resolved by discussion with at least two additional reviewers present. Across all studies, the overall median completeness for the TRIPOD checklist was 53% (interquartile range 47%-60%). The overall risk of bias was low in 44% (n = 26), high in 41% (n = 24), and unclear in 15% (n = 9). High overall risk of bias was driven by incomplete reporting of performance measures, inadequate handling of missing data, and use of small datasets with inadequate outcome numbers. Although the number of ML studies in orthopedic surgery is increasing rapidly, over 40% of the existing models are at high risk of bias. Furthermore, over half incompletely reported their methods and/or performance measures. Until these issues are adequately addressed to give patients and providers trust in ML models, a considerable gap remains between the development of ML prediction models and their implementation in orthopedic practice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/jor.25036DOI Listing
March 2021

International external validation of the SORG machine learning algorithms for predicting 90-day and 1-year survival of patients with spine metastases using a Taiwanese cohort.

Spine J 2021 Feb 2. Epub 2021 Feb 2.

Department of Orthopedics, National Taiwan University College of Medicine and National Taiwan University Hospital, Taipei, Taiwan. Electronic address:

Background Context: Accurately predicting the survival of patients with spinal metastases is important for guiding surgical intervention. The SORG machine-learning (ML) algorithm for the 90-day and 1-year mortality of patients with metastatic cancer to the spine has been multiply validated, with a high degree of accuracy in both internal and external validation studies. However, prior external validations were conducted using patient groups located on the east coast of the United States, representing a generally homogeneous population. The aim of this study was to externally validate the SORG algorithms with a Taiwanese population.

Study Design/setting: Retrospective study at a single tertiary care center in Taiwan PATIENT SAMPLE: Four hundred and twenty-seven patients who underwent surgery for metastatic spine disease from November 1, 2010 to December 31, 2018 OUTCOME MEASURES: 90-Day and 1-Year Mortality METHODS: The baseline characteristics of our validation cohort were compared with those of the previously published developmental and external validation cohorts. Discrimination (c-statistic and receiver operating curve), calibration (calibration plot, intercept, and slope), overall performance (Brier score), and decision curve analysis were used to assess the performance of the SORG ML algorithms in this cohort.

Results: Ninety-day and 1-year mortality rates were 110 of 427 (26%) and 256 of 427 (60%), respectively. The external validation cohort and the developmental cohort differed in body mass index (BMI), preoperative performance status, American Spinal Injury Association impairment scale, primary tumor histology and in several laboratory measurements. The SORG ML algorithm for 90-day and 1-year mortality demonstrated a high level of discriminative ability (c-statistics of 0.73 [95% confidence interval [CI], 0.67-0.78] and 0.74 [95% CI, 0.69-0.79]), overall performance, and had a positive net benefit throughout the range of threshold probabilities in decision curve analysis. The algorithm for 1-year mortality had a calibration intercept of 0.08, representing a good calibration. However, the 90-day mortality algorithm underestimated mortality for the lowest predicted probabilities, with an overall intercept of 0.81.

Conclusions: The SORG algorithms for predicting 90-day and 1-year mortality in patients with spinal metastatic disease generally performed well on international external validation in a predominately Taiwanese population. However, 90-day mortality was underestimated in this group. Whether this inconsistency was due to different primary tumor characteristics, body mass index, selection bias or other factors remains unclear, and may be better understood with further validative works that utilize international and/or diverse populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2021.01.027DOI Listing
February 2021

Do Cohabitants Reliably Complete Questionnaires for Patients in a Terminal Cancer Stage when Assessing Quality of Life, Pain, Depression, and Anxiety?

Clin Orthop Relat Res 2021 04;479(4):792-801

O. Q. Groot, N. R. P. Pereira, M. E. R. Bongers, P. T. Ogink, E. T. Newman, K. A. Raskin, S. A. Lozano-Calderon, J. H. Schwab, Department of Orthopaedic Surgery, Orthopaedic Oncology Service, Massachusetts General Hospital - Harvard Medical School, Boston, MA, USA.

Background: Patients with bone metastases often are unable to complete quality of life (QoL) questionnaires, and cohabitants (such as spouses, domestic partners, offspring older than 18 years, or other people who live with the patient) could be a reliable alternative. However, the extent of reliability in this complicated patient population remains undefined, and the influence of the cohabitant's condition on their assessment of the patient's QoL is unknown.

Questions/purposes: (1) Do QoL scores, measured by the 5-level EuroQol-5D (EQ-5D-5L) version and the Patient-reported Outcomes Measurement Information System (PROMIS) version 1.0 in three domains (anxiety, pain interference, and depression), reported by patients differ markedly from scores as assessed by their cohabitants? (2) Do cohabitants' PROMIS-Depression scores correlate with differences in measured QoL results?

Methods: This cross-sectional study included patients and cohabitants older than 18 years of age. Patients included those with presence of histologically confirmed bone metastases (including lymphoma and multiple myeloma), and cohabitants must have been present at the clinic visit. Patients were eligible for inclusion in the study regardless of comorbidities, prognosis, prior surgery, or current treatment. Between June 1, 2016 and March 1, 2017 and between October 1, 2017 and February 26, 2018, all 96 eligible patients were approached, of whom 49% (47) met the selection criteria and were willing to participate. The included 47 patient-cohabitant pairs independently completed the EQ-5D-5L and the eight-item PROMIS for three domains (anxiety, pain, and depression) with respect to the patients' symptoms. The cohabitants also completed the four-item PROMIS-Depression survey with respect to their own symptoms.

Results: There were no clinically important differences between the scores of patients and their cohabitants for all questionnaires, and the agreement between patient and cohabitant scores was moderate to strong (Spearman correlation coefficients ranging from 0.52 to 0.72 on the four questionnaires; all p values < 0.05). However, despite the good agreement in QoL scores, an increased cohabitant's depression score was correlated with an overestimation of the patient's symptom burden for the anxiety and depression domains (weak Spearman correlation coefficient of 0.33 [95% confidence interval 0.08 to 0.58]; p = 0.01 and moderate Spearman correlation coefficient of 0.52 [95% CI 0.29 to 0.74]; p < 0.01, respectively).

Conclusion: The present findings support that cohabitants might be reliable raters of the QoL of patients with bone metastases. However, if a patient's cohabitant has depression, the cohabitant may overestimate a patient's symptoms in emotional domains such as anxiety and depression, warranting further research that includes cohabitants with and without depression to elucidate the effect of depression on the level of agreement. For now, clinicians may want to reconsider using the cohabitant's judgement if depression is suspected.

Clinical Relevance: These findings suggest that a cohabitant's impressions of a patient's quality of life are, in most instances, accurate; this is potentially helpful in situations where the patient cannot weigh in. Future studies should employ longitudinal designs to see how or whether our findings change over time and with disease progression, and how specific interventions-like different chemotherapeutic regimens or surgery-may factor in.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000001525DOI Listing
April 2021

The use of autologous free vascularized fibula grafts in reconstruction of the mobile spine following tumor resection: surgical technique and outcomes.

J Neurosurg Spine 2020 Nov 6:1-10. Epub 2020 Nov 6.

Departments of1Orthopedic Surgery, Orthopedic Oncology Service.

Objective: Reconstruction of the mobile spine following total en bloc spondylectomy (TES) of one or multiple vertebral bodies in patients with malignant spinal tumors is a challenging procedure with high failure rates. A common reason for reconstructive failure is nonunion, which becomes more problematic when using local radiation therapy. Radiotherapy is an integral part of the management of primary malignant osseous tumors in the spine. Vascularized grafts may help prevent nonunion in the radiotherapy setting. The authors have utilized free vascularized fibular grafts (FVFGs) for reconstruction of the spine following TES. The purpose of this article is to describe the surgical technique for vascularized reconstruction of defects after TES. Additionally, the outcomes of consecutive cases treated with this technique are reported.

Methods: Thirty-nine patients were treated at the authors' tertiary care institution for malignant tumors in the mobile spine using FVFG following TES between 2010 and 2018. Postoperative union, reoperations, complications, neurological outcome, and survival were reported. The median follow-up duration was 50 months (range 14-109 months).

Results: The cohort consisted of 26 males (67%), and the median age was 58 years. Chordoma was the most prevalent tumor (67%), and the lumbar spine was most affected (46%). Complete union was seen in 26 patients (76%), the overall complication rate was 54%, and implant failure was the most common complication, with 13 patients (33%) affected. In 18 patients (46%), one or more reoperations were needed, and the fixation was surgically revised 15 times (42% of reoperations) in 10 patients (26%). A reconstruction below the L1 vertebra had a higher proportion of implant failure (67%; 8 of 12 patients) compared with higher resections (21%; 5 of 24 patients) (p = 0.011). Graft length, number of resected vertebrae, and docking the FVFG on the endplate or cancellous bone was not associated with union or implant failure on univariate analysis.

Conclusions: The FVFG is an effective reconstruction technique, particularly in the cervicothoracic spine. However, high implant failure rates in the lumbar spine have been seen, which occurred even in cases in which the graft completely healed. Methods to increase the weight-bearing capacity of the graft in the lumbar spine should be considered in these reconstructions. Overall, the rates of failure and revision surgery for FVFG compare with previous reports on reconstruction after TES.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3171/2020.6.SPINE20521DOI Listing
November 2020

Postoperative adverse events secondary to iatrogenic vascular injury during anterior lumbar spinal surgery.

Spine J 2020 Nov 3. Epub 2020 Nov 3.

Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114, USA.

Background: Anterior lumbar spine surgery (ALSS) requires mobilization of the great vessels, resulting in a high risk of iatrogenic vascular injury (VI). It remains unclear whether VI is associated with increased risk of postoperative complications and other related adverse outcomes.

Purpose: The purpose of this study was to (1) assess the incidence of postoperative complications attributable to VI during ALSS, and (2) outcomes secondary to VI such as procedural blood loss, transfusion of blood products, length of stay (LOS), and in hospital mortality.

Study Design: Retrospective propensity-score matched, case-control study at 2 academic and 3 community medical centers, PATIENT SAMPLE: Patients 18 years of age or older, undergoing ALSS between January 1st, 2000 and July 31st, 2019 were included in this analysis.

Outcome Measures: The primary outcome was the incidence of postoperative complications attributable to VI, such as venous thromboembolism, compartment syndrome, transfusion reaction, limb ischemia, and reoperations. The secondary outcomes included estimated operative blood loss (milliliter), transfused blood products, LOS (days), and in-hospital mortality.

Methods: In total, 1,035 patients were identified, of which 75 (7.2%) had a VI. For comparative analyses, the 75 VI patients were paired with 75 comparable non-VI patients by propensity-score matching. The adequacy of the matching was assessed by testing the standardized mean differences (SMD) between VI and non-VI group (>0.25 SMD).

Results: Two patients (2.7%) had VI-related postoperative complications in the studied period, which consisted of two deep venous thromboembolisms (DVTs) occurring on day 3 and 7 postoperatively. Both DVTs were located in the distal left common iliac vein (CIV). The VI these patients suffered were to the distal inferior vena cava and the left CIV, respectively. Both patients did not develop additional complications in consequence of their DVTs, however, did require systemic anticoagulation and placement of an inferior vena cava filter. There was no statistical difference with the non-VI group where no instances (0%) of postoperative complications were reported (p=.157). No differences were found in LOS or in hospital mortality between the two groups (p=.157 and p=.999, respectively). Intraoperative blood loss and blood transfusion were both found to be higher in the VI group in comparison to the non-VI group (650 mL, interquartile range [IQR] 300-1400 vs. 150 mL, IQR 50-425, p≤.001; 0 units, IQR 0-3 vs. 0 units, IQR 0-1, p=.012, respectively).

Conclusion: This study found a low number of serious postoperative complications related to VI in ALSS. In addition, these complications were not significantly different between the VI and matched non-VI ALSS cohort. Although not significant, the found DVT incidence of 2.7% after VI in ALSS warrants vigilance and preventive measures during the postoperative course of these patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.10.031DOI Listing
November 2020

Natural language processing for automated quantification of bone metastases reported in free-text bone scintigraphy reports.

Acta Oncol 2020 Dec 12;59(12):1455-1460. Epub 2020 Sep 12.

Department of Orthopaedic Surgery, Orthopaedic Oncology Service, Massachusetts General Hospital - Harvard Medical School, Boston, MA, USA.

Background: The widespread use of electronic patient-generated health data has led to unprecedented opportunities for automated extraction of clinical features from free-text medical notes. However, processing this rich resource of data for clinical and research purposes, depends on labor-intensive and potentially error-prone manual review. The aim of this study was to develop a natural language processing (NLP) algorithm for binary classification (single metastasis versus two or more metastases) in bone scintigraphy reports of patients undergoing surgery for bone metastases.

Material And Methods: Bone scintigraphy reports of patients undergoing surgery for bone metastases were labeled each by three independent reviewers using a binary classification (single metastasis versus two or more metastases) to establish a ground truth. A stratified 80:20 split was used to develop and test an extreme-gradient boosting supervised machine learning NLP algorithm.

Results: A total of 704 free-text bone scintigraphy reports from 704 patients were included in this study and 617 (88%) had multiple bone metastases. In the independent test set ( = 141) not used for model development, the NLP algorithm achieved an 0.97 AUC-ROC (95% confidence interval [CI], 0.92-0.99) for classification of multiple bone metastases and an 0.99 AUC-PRC (95% CI, 0.99-0.99). At a threshold of 0.90, NLP algorithm correctly identified multiple bone metastases in 117 of the 124 who had multiple bone metastases in the testing cohort (sensitivity 0.94) and yielded 3 false positives (specificity 0.82). At the same threshold, the NLP algorithm had a positive predictive value of 0.97 and F1-score of 0.96.

Conclusions: NLP has the potential to automate clinical data extraction from free text radiology notes in orthopedics, thereby optimizing the speed, accuracy, and consistency of clinical chart review. Pending external validation, the NLP algorithm developed in this study may be implemented as a means to aid researchers in tackling large amounts of data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/0284186X.2020.1819563DOI Listing
December 2020

Free Vascularized Fibula Graft with Femoral Allograft Sleeve for Lumbar Spine Defects After Spondylectomy of Malignant Tumors: A Case Report.

JBJS Case Connect 2020 Jul-Sep;10(3):e2000075

1Department of Orthopedic Surgery, Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 2Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 3Department of Vascular Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 4Division of Thoracic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 5Department of Orthopedic Surgery, Hand and Upper Extremity Service, Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts.

Case: We present a 65-year-old man with an L4 conventional chordoma. Total en bloc spondylectomy (TES) of the involved vertebral bodies and surrounding soft tissues with reconstruction of the spine using a free vascularized fibula autograft (FVFG) is a proven technique, limiting complications and recurrence. However, graft fracture has occurred only in the lumbar spine in our institutional cases. We used a technique in our patient to ensure extra stability and support, with the addition of a femoral allograft sleeve encasing the FVFG.

Conclusions: Our technique for the reconstruction of the lumbar spine after TES of primary malignant spinal disease using a femoral allograft sleeve encasing the FVFG is viable to consider.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2106/JBJS.CC.20.00075DOI Listing
April 2021

Does Artificial Intelligence Outperform Natural Intelligence in Interpreting Musculoskeletal Radiological Studies? A Systematic Review.

Clin Orthop Relat Res 2020 12;478(12):2751-2764

O. Q. Groot, M. E. R. Bongers, A. V. Karhade, J. H. Schwab, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

Background: Machine learning (ML) is a subdomain of artificial intelligence that enables computers to abstract patterns from data without explicit programming. A myriad of impactful ML applications already exists in orthopaedics ranging from predicting infections after surgery to diagnostic imaging. However, no systematic reviews that we know of have compared, in particular, the performance of ML models with that of clinicians in musculoskeletal imaging to provide an up-to-date summary regarding the extent of applying ML to imaging diagnoses. By doing so, this review delves into where current ML developments stand in aiding orthopaedists in assessing musculoskeletal images.

Questions/purposes: This systematic review aimed (1) to compare performance of ML models versus clinicians in detecting, differentiating, or classifying orthopaedic abnormalities on imaging by (A) accuracy, sensitivity, and specificity, (B) input features (for example, plain radiographs, MRI scans, ultrasound), (C) clinician specialties, and (2) to compare the performance of clinician-aided versus unaided ML models.

Methods: A systematic review was performed in PubMed, Embase, and the Cochrane Library for studies published up to October 1, 2019, using synonyms for machine learning and all potential orthopaedic specialties. We included all studies that compared ML models head-to-head against clinicians in the binary detection of abnormalities in musculoskeletal images. After screening 6531 studies, we ultimately included 12 studies. We conducted quality assessment using the Methodological Index for Non-randomized Studies (MINORS) checklist. All 12 studies were of comparable quality, and they all clearly included six of the eight critical appraisal items (study aim, input feature, ground truth, ML versus human comparison, performance metric, and ML model description). This justified summarizing the findings in a quantitative form by calculating the median absolute improvement of the ML models compared with clinicians for the following metrics of performance: accuracy, sensitivity, and specificity.

Results: ML models provided, in aggregate, only very slight improvements in diagnostic accuracy and sensitivity compared with clinicians working alone and were on par in specificity (3% (interquartile range [IQR] -2.0% to 7.5%), 0.06% (IQR -0.03 to 0.14), and 0.00 (IQR -0.048 to 0.048), respectively). Inputs used by the ML models were plain radiographs (n = 8), MRI scans (n = 3), and ultrasound examinations (n = 1). Overall, ML models outperformed clinicians more when interpreting plain radiographs than when interpreting MRIs (17 of 34 and 3 of 16 performance comparisons, respectively). Orthopaedists and radiologists performed similarly to ML models, while ML models mostly outperformed other clinicians (outperformance in 7 of 19, 7 of 23, and 6 of 10 performance comparisons, respectively). Two studies evaluated the performance of clinicians aided and unaided by ML models; both demonstrated considerable improvements in ML-aided clinician performance by reporting a 47% decrease of misinterpretation rate (95% confidence interval [CI] 37 to 54; p < 0.001) and a mean increase in specificity of 0.048 (95% CI 0.029 to 0.068; p < 0.001) in detecting abnormalities on musculoskeletal images.

Conclusions: At present, ML models have comparable performance to clinicians in assessing musculoskeletal images. ML models may enhance the performance of clinicians as a technical supplement rather than as a replacement for clinical intelligence. Future ML-related studies should emphasize how ML models can complement clinicians, instead of determining the overall superiority of one versus the other. This can be accomplished by improving transparent reporting, diminishing bias, determining the feasibility of implantation in the clinical setting, and appropriately tempering conclusions.

Level Of Evidence: Level III, diagnostic study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000001360DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7899420PMC
December 2020

How Does the Skeletal Oncology Research Group Algorithm's Prediction of 5-year Survival in Patients with Chondrosarcoma Perform on International Validation?

Clin Orthop Relat Res 2020 Oct;478(10):2300-2308

M. E. R. Bongers, A. V. Karhade, O. Q. Groot, J. H. Schwab, Department of Orthopaedic Surgery, Division of Orthopaedic Oncology, Massachusetts General Hospital - Harvard Medical School, Boston, MA, USA.

Background: The Skeletal Oncology Research Group (SORG) machine learning algorithm for predicting survival in patients with chondrosarcoma was developed using data from the Surveillance, Epidemiology, and End Results (SEER) registry. This algorithm was externally validated on a dataset of patients from the United States in an earlier study, where it demonstrated generally good performance but overestimated 5-year survival. In addition, this algorithm has not yet been validated in patients outside the United States; doing so would be important because external validation is necessary as algorithm performance may be misleading when applied in different populations.

Questions/purposes: Does the SORG algorithm retain validity in patients who underwent surgery for primary chondrosarcoma outside the United States, specifically in Italy?

Methods: A total of 737 patients were treated for chondrosarcoma between January 2000 and October 2014 at the Italian tertiary care center which was used for international validation. We excluded patients whose first surgical procedure was performed elsewhere (n = 25), patients who underwent nonsurgical treatment (n = 27), patients with a chondrosarcoma of the soft tissue or skull (n = 60), and patients with peripheral, periosteal, or mesenchymal chondrosarcoma (n = 161). Thus, 464 patients were ultimately included in this external validation study, as the earlier performed SEER study was used as the training set. Therefore, this study-unlike most of this type-does not have a training and validation set. Although the earlier study overestimated 5-year survival, we did not modify the algorithm in this report, as this is the first international validation and the prior performance in the single-institution validation study from the United States may have been driven by a small sample or non-generalizable patterns related to its single-center setting. Variables needed for the SORG algorithm were manually collected from electronic medical records. These included sex, age, histologic subtype, tumor grade, tumor size, tumor extension, and tumor location. By inputting these variables into the algorithm, we calculated the predicted probabilities of survival for each patient. The performance of the SORG algorithm was assessed in this study through discrimination (the ability of a model to distinguish between a binary outcome), calibration (the agreement of observed and predicted outcomes), overall performance (the accuracy of predictions), and decision curve analysis (establishment on the ability of a model to make a decision better than without using the model). For discrimination, the c-statistic (commonly known as the area under the receiver operating characteristic curve for binary classification) was calculated; this ranged from 0.5 (no better than chance) to 1.0 (excellent discrimination). The agreement between predicted and observed outcomes was visualized with a calibration plot, and the calibration slope and intercept were calculated. Perfect calibration results in a slope of 1 and an intercept of 0. For overall performance, the Brier score and the null-model Brier score were calculated. The Brier score ranges from 0 (perfect prediction) to 1 (poorest prediction). Appropriate interpretation of the Brier score requires comparison with the null-model Brier score. The null-model Brier score is the score for an algorithm that predicts a probability equal to the population prevalence of the outcome for every patient. A decision curve analysis was performed to compare the potential net benefit of the algorithm versus other means of decision support, such as treating all or none of the patients. There were several differences between this study and the earlier SEER study, and such differences are important because they help us to determine the performance of the algorithm in a group different from the initial study population. In this study from Italy, 5-year survival was different from the earlier SEER study (71% [319 of 450 patients] versus 76% [1131 of 1487 patients]; p = 0.03). There were more patients with dedifferentiated chondrosarcoma than in the earlier SEER study (25% [118 of 464 patients] versus 8.5% [131 of 1544 patients]; p < 0.001). In addition, in this study patients were older, tumor size was larger, and there were higher proportions of high-grade tumors than the earlier SEER study (age: 56 years [interquartile range {IQR} 42 to 67] versus 52 years [IQR 40 to 64]; p = 0.007; tumor size: 80 mm [IQR 50 to 120] versus 70 mm [IQR 42 to 105]; p < 0.001; tumor grade: 22% [104 of 464 had Grade 1], 42% [196 of 464 had Grade 2], and 35% [164 of 464 had Grade 3] versus 41% [592 of 1456 had Grade 1], 40% [588 of 1456 had Grade 2], and 19% [276 of 1456 had Grade 3]; p ≤ 0.001).

Results: Validation of the SORG algorithm in a primarily Italian population achieved a c-statistic of 0.86 (95% confidence interval 0.82 to 0.89), suggesting good-to-excellent discrimination. The calibration plot showed good agreement between the predicted probability and observed survival in the probability thresholds of 0.8 to 1.0. With predicted survival probabilities lower than 0.8, however, the SORG algorithm underestimated the observed proportion of patients with 5-year survival, reflected in the overall calibration intercept of 0.82 (95% CI 0.67 to 0.98) and calibration slope of 0.68 (95% CI 0.42 to 0.95). The Brier score for 5-year survival was 0.15, compared with a null-model Brier of 0.21. The algorithm showed a favorable decision curve analysis in the validation cohort.

Conclusions: The SORG algorithm to predict 5-year survival for patients with chondrosarcoma held good discriminative ability and overall performance on international external validation; however, it underestimated 5-year survival for patients with predicted probabilities from 0 to 0.8 because the calibration plot was not perfectly aligned for the observed outcomes, which resulted in a maximum underestimation of 20%. The differences may reflect the baseline differences noted between the two study populations. The overall performance of the algorithm supports the utility of the algorithm and validation presented here. The freely available digital application for the algorithm is available here: https://sorg-apps.shinyapps.io/extremitymetssurvival/.

Level Of Evidence: Level III, prognostic study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000001305DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7491905PMC
October 2020

Does the SORG algorithm generalize to a contemporary cohort of patients with spinal metastases on external validation?

Spine J 2020 10 16;20(10):1646-1652. Epub 2020 May 16.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.

Background Context: The SORG machine-learning algorithms were previously developed for preoperative prediction of overall survival in spinal metastatic disease. On sub-group analysis of a previous external validation, these algorithms were found to have diminished performance on patients treated after 2010.

Purpose: The purpose of this study was to assess the performance of these algorithms on a large contemporary cohort of consecutive spinal metastatic disease patients.

Study Design/setting: Retrospective study performed at a tertiary care referral center.

Patient Sample: Patients of 18 years and older treated with surgery for metastatic spinal disease between 2014 and 2016.

Outcome Measures: Ninety-day and one-year mortality.

Methods: Baseline patient and tumor characteristics of the validation cohort were compared to the development cohort using bivariate logistic regression. Performance of the SORG algorithms on external validation in the contemporary cohort was assessed with discrimination (c-statistic and receiver operating curve), calibration (calibration plot, intercept, and slope), overall performance (Brier score compared to the null-model Brier score), and decision curve analysis.

Results: Overall, 200 patients were included with 90-day and 1-year mortality rates of 55 (27.6%) and 124 (62.9%), respectively. The contemporary external validation cohort and the developmental cohort differed significantly on primary tumor histology, presence of visceral metastases, American Spinal Injury Association impairment scale, and preoperative laboratory values. The SORG algorithms for 90-day and 1-year mortality retained good discriminative ability (c-statistic of 0.81 [95% confidence interval [CI], 0.74-0.87] and 0.84 [95% CI, 0.77-0.89]), overall performance, and decision curve analysis. The algorithm for 90-day mortality showed almost perfect calibration reflected in an overall calibration intercept of -0.07 (95% CI: -0.50, 0.35). The 1-year mortality algorithm underestimated mortality mainly for the lowest predicted probabilities with an overall intercept of 0.57 (95% CI: 0.18, 0.96).

Conclusions: The SORG algorithms for survival in spinal metastatic disease generalized well to a contemporary cohort of consecutively treated patients from an external institutional. Further validation in international cohorts and large, prospective multi-institutional trials is required to confirm or refute the findings presented here. The open-access algorithms are available here: https://sorg-apps.shinyapps.io/spinemetssurvival/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.05.003DOI Listing
October 2020

Development of machine learning and natural language processing algorithms for preoperative prediction and automated identification of intraoperative vascular injury in anterior lumbar spine surgery.

Spine J 2020 Apr 12. Epub 2020 Apr 12.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Department of Orthopedic Surgery, Newton Wellesley Hospital, Newton, MA, USA. Electronic address:

Background: Intraoperative vascular injury (VI) may be an unavoidable complication of anterior lumbar spine surgery; however, vascular injury has implications for quality and safety reporting as this intraoperative complication may result in serious bleeding, thrombosis, and postoperative stricture.

Purpose: The purpose of this study was to (1) develop machine learning algorithms for preoperative prediction of VI and (2) develop natural language processing (NLP) algorithms for automated surveillance of intraoperative VI from free-text operative notes.

Patient Sample: Adult patients, 18 years or age or older, undergoing anterior lumbar spine surgery at two academic and three community medical centers were included in this analysis.

Outcome Measures: The primary outcome was unintended VI during anterior lumbar spine surgery.

Methods: Manual review of free-text operative notes was used to identify patients who had unintended VI. The available population was split into training and testing cohorts. Five machine learning algorithms were developed for preoperative prediction of VI. An NLP algorithm was trained for automated detection of intraoperative VI from free-text operative notes. Performance of the NLP algorithm was compared to current procedural terminology and international classification of diseases codes.

Results: In all, 1035 patients underwent anterior lumbar spine surgery and the rate of intraoperative VI was 7.2% (n=75). Variables used for preoperative prediction of VI were age, male sex, body mass index, diabetes, L4-L5 exposure, and surgery for infection (discitis, osteomyelitis). The best performing machine learning algorithm achieved c-statistic of 0.73 for preoperative prediction of VI (https://sorg-apps.shinyapps.io/lumbar_vascular_injury/). For automated detection of intraoperative VI from free-text notes, the NLP algorithm achieved c-statistic of 0.92. The NLP algorithm identified 18 of the 21 patients (sensitivity 0.86) who had a VI whereas current procedural terminologyand international classification of diseases codes identified 6 of the 21 (sensitivity 0.29) patients. At this threshold, the NLP algorithm had a specificity of 0.93, negative predictive value of 0.99, positive predictive value of 0.51, and F1-score of 0.64.

Conclusion: Relying on administrative procedural and diagnosis codes may underestimate the rate of unintended intraoperative VI in anterior lumbar spine surgery. External and prospective validation of the algorithms presented here may improve quality and safety reporting.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.04.001DOI Listing
April 2020

Surgical Strategies for Chordoma.

Neurosurg Clin N Am 2020 Apr 16;31(2):251-261. Epub 2020 Jan 16.

Orthopaedic Oncology Service, Massachusetts General Hospital - Harvard Medical School, Yawkey Building, Room 3.922, 55 Fruit Street, Boston, MA 02114, USA.

Chordomas are rare tumors of the axial skeleton whose slow growth belies a relentless tumor with a propensity for recurrence and late metastasis. Local control remains an issue with chordoma in spite of aggressive operative management. High local failure rates have led to the exploration of alternative methods of treatment. Radiation continues to gain acceptance as an adjuvant to surgery and, in some cases, as a standalone treatment. However, the use of radiation remains controversial, and operative management remains the standard of care in spite of relatively high morbidity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.nec.2019.11.007DOI Listing
April 2020

Can natural language processing provide accurate, automated reporting of wound infection requiring reoperation after lumbar discectomy?

Spine J 2020 10 4;20(10):1602-1609. Epub 2020 Mar 4.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Department of Orthopedic Surgery, Newton Wellesley Hospital, Newton, MA, USA. Electronic address:

Background: Surgical site infections are a major driver of morbidity and increased costs in the postoperative period after spine surgery. Current tools for surveillance of these adverse events rely on prospective clinical tracking, manual retrospective chart review, or administrative procedural and diagnosis codes.

Purpose: The purpose of this study was to develop natural language processing (NLP) algorithms for automated reporting of postoperative wound infection requiring reoperation after lumbar discectomy.

Patient Sample: Adult patients undergoing discectomy at two academic and three community medical centers between January 1, 2000 and July 31, 2019 for lumbar disc herniation.

Outcome Measures: Reoperation for wound infection within 90 days after surgery METHODS: Free-text notes of patients who underwent surgery from January 1, 2000 to December 31, 2015 were used for algorithm training. Free-text notes of patients who underwent surgery after January 1, 2016 were used for algorithm testing. Manual chart review was used to label which patients had reoperation for wound infection. An extreme gradient-boosting NLP algorithm was developed to detect reoperation for postoperative wound infection.

Results: Overall, 5,860 patients were included in this study and 62 (1.1%) had a reoperation for wound infection. In patients who underwent surgery after January 1, 2016 (n=1,377), the NLP algorithm detected 15 of the 16 patients (sensitivity=0.94) who had reoperation for infection. In comparison, current procedural terminology and international classification of disease codes detected 12 of these 16 patients (sensitivity=0.75). At a threshold of 0.05, the NLP algorithm had positive predictive value of 0.83 and F1-score of 0.88.

Conclusion: Temporal validation of the algorithm developed in this study demonstrates a proof-of-concept application of NLP for automated reporting of adverse events after spine surgery. Adapting this methodology for other procedures and outcomes in spine and orthopedics has the potential to dramatically improve and automatize quality and safety reporting.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.02.021DOI Listing
October 2020

Natural language processing for automated detection of incidental durotomy.

Spine J 2020 05 23;20(5):695-700. Epub 2019 Dec 23.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, MA 02114, USA. Electronic address:

Background: Incidental durotomy is a common intraoperative complication during spine surgery with potential implications for postoperative recovery, patient-reported outcomes, length of stay, and costs. To our knowledge, there are no processes available for automated surveillance of incidental durotomy.

Purpose: The purpose of this study was to develop natural language processing (NLP) algorithms for automated detection of incidental durotomies in free-text operative notes of patients undergoing lumbar spine surgery.

Patient Sample: Adult patients 18 years or older undergoing lumbar spine surgery between January 1, 2000 and June 31, 2018 at two academic and three community medical centers.

Outcome Measures: The primary outcome was defined as intraoperative durotomy recorded in free-text operative notes.

Methods: An 80:20 stratified split was undertaken to create training and testing populations. An extreme gradient-boosting NLP algorithm was developed to detect incidental durotomy. Discrimination was assessed via area under receiver-operating curve (AUC-ROC), precision-recall curve, and Brier score. Performance of this algorithm was compared with current procedural terminology (CPT) and international classification of diseases (ICD) codes for durotomy.

Results: Overall, 1,000 patients were included in the study and 93 (9.3%) had a recorded incidental durotomy in the free-text operative report. In the independent testing set (n=200) not used for model development, the NLP algorithm achieved AUC-ROC of 0.99 for detection of durotomy. In comparison, the CPT/ICD codes had AUC-ROC of 0.64. In the testing set, the NLP algorithm detected 16 of 18 patients with incidental durotomy (sensitivity 0.89) whereas the CPT and ICD codes detected 5 of 18 (sensitivity 0.28). At a threshold of 0.05, the NLP algorithm had specificity of 0.99, positive predictive value of 0.89, and negative predictive value of 0.99.

Conclusions: Internal validation of the NLP algorithm developed in this study indicates promising results for future NLP applications in spine surgery. Pending external validation, the NLP algorithm developed in this study may be used by entities including national spine registries or hospital quality and safety departments to automate tracking of incidental durotomies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2019.12.006DOI Listing
May 2020

Does the SORG Algorithm Predict 5-year Survival in Patients with Chondrosarcoma? An External Validation.

Clin Orthop Relat Res 2019 Oct;477(10):2296-2303

M. E. R. Bongers, Q. C. B. S. Thio, A.V. Karhade, M. L. Stor, K. A. Raskin, S. A. Lozano-Calderon, Department of Orthopaedic Surgery, Division of Orthopaedic Oncology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA T. F. DeLaney, Department of Radiation Oncology, Massachusetts General Hospital, Boston, MA, USA M. L. Ferrone, Department of Orthopaedic Surgery, Orthopaedic Oncology Service, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA J. H. Schwab, Department of Orthopaedic Surgery, Division of Orthopaedic Oncology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

Background: We developed a machine learning algorithm to predict the survival of patients with chondrosarcoma. The algorithm demonstrated excellent discrimination and calibration on internal validation in a derivation cohort based on data from the Surveillance, Epidemiology, and End Results (SEER) registry. However, the algorithm has not been validated in an independent external dataset.

Questions/purposes: Does the Skeletal Oncology Research Group (SORG) algorithm accurately predict 5-year survival in an independent patient population surgically treated for chondrosarcoma?

Methods: The SORG algorithm was developed using the SEER registry, which contains demographic data, tumor characteristics, treatment, and outcome values; and includes approximately 30% of the cancer patients in the United States. The SEER registry was ideal for creating the derivation cohort, and consequently the SORG algorithm, because of the high number of eligible patients and the availability of most (explanatory) variables of interest. Between 1992 to 2013, 326 patients were treated surgically for extracranial chondrosarcoma of the bone at two tertiary care referral centers. Of those, 179 were accounted for at a minimum of 5 years after diagnosis in a clinical note at one of the two institutions, unless they died earlier, and were included in the validation cohort. In all, 147 (45%) did not meet the minimum 5 years of followup at the institution and were not included in the validation of the SORG algorithm. The outcome (survival at 5 years) was checked for all 326 patients in the Social Security death index and were included in the supplemental validation cohort, to also ascertain validity for patients with less than 5 years of institutional followup. Variables used in the SORG algorithm to predict 5-year survival including sex, age, histologic subtype, tumor grade, tumor size, tumor extension, and tumor location were collected manually from medical records. The tumor characteristics were collected from the postoperative musculoskeletal pathology report. Predicted probabilities of 5-year survival were calculated for each patient in the validation cohort using the SORG algorithm, followed by an assessment of performance using the same metrics as used for internal validation, namely: discrimination, calibration, and overall performance. Discrimination was calculated using the concordance statistic (or the area under the Receiver Operating Characteristic (ROC) curve) to determine how well the algorithm discriminates between the outcome, which ranges from 0.5 (no better than a coin-toss) to 1.0 (perfect discrimination). Calibration was assessed using the calibration slope and intercept from a calibration plot to measure the agreement between predicted and observed outcomes. A perfect calibration plot should show a 45° upwards line. Overall performance was determined using the Brier score, ranging from 0 (excellent prediction) to 1 (worst prediction). The Brier score was compared with the null-model Brier score, which showed the performance of a model that ignored all the covariates. A Brier score lower than the null model Brier score indicated greater performance of the algorithm. For the external validation an F1-score was added to measure the overall accuracy of the algorithm, which ranges between 0 (total failure of an algorithm) and 1 (perfect algorithm).The 5-year survival was lower in the validation cohort than it was in the derivation cohort from SEER (61.5% [110 of 179] versus 76% [1131 of 1544] ; p < 0.001). This difference was driven by higher proportion of dedifferentiated chondrosarcoma in the institutional population than in the derivation cohort (27% [49 of 179] versus 9% [131 of 1544]; p < 0.001). Patients in the validation cohort also had larger tumor sizes, higher grades, and nonextremity tumor locations than did those in the derivation cohort. These differences between the study groups emphasize that the external validation is performed not only in a different patient cohort, but also in terms of disease characteristics. Five-year survival was not different for both patient groups between subpopulations of patients with conventional chondrosarcomas and those with dedifferentiated chondrosarcomas.

Results: The concordance statistic for the validation cohort was 0.87 (95% CI, 0.80-0.91). Evaluation of the algorithm's calibration in the institutional population resulted in a calibration slope of 0.97 (95% CI, 0.68-1.3) and calibration intercept of -0.58 (95% CI, -0.20 to -0.97). Finally, on overall performance, the algorithm had a Brier score of 0.152 compared with a null-model Brier score of 0.237 for a high level of overall performance. The F1-score was 0.836. For the supplementary validation in the total of 326 patients, the SORG algorithm had a validation of 0.89 (95% CI, 0.85-0.93). The calibration slope was 1.13 (95% CI, 0.87-1.39) and the calibration intercept was -0.26 (95% CI, -0.57 to 0.06). The Brier score was 0.11, with a null-model Brier score of 0.19. The F1-score was 0.901.

Conclusions: On external validation, the SORG algorithm retained good discriminative ability and overall performance but overestimated 5-year survival in patients surgically treated for chondrosarcoma. This internet-based tool can help guide patient counseling and shared decision making.

Level Of Evidence: Level III, prognostic study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000000748DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6999936PMC
October 2019