Publications by authors named "Aditya V Karhade"

75 Publications

Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review.

Acta Orthop 2021 Apr 18:1-9. Epub 2021 Apr 18.

Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;

Background and purpose - External validation of machine learning (ML) prediction models is an essential step before clinical application. We assessed the proportion, performance, and transparent reporting of externally validated ML prediction models in orthopedic surgery, using the Transparent Reporting for Individual Prognosis or Diagnosis (TRIPOD) guidelines.Material and methods - We performed a systematic search using synonyms for every orthopedic specialty, ML, and external validation. The proportion was determined by using 59 ML prediction models with only internal validation in orthopedic surgical outcome published up until June 18, 2020, previously identified by our group. Model performance was evaluated using discrimination, calibration, and decision-curve analysis. The TRIPOD guidelines assessed transparent reporting.Results - We included 18 studies externally validating 10 different ML prediction models of the 59 available ML models after screening 4,682 studies. All external validations identified in this review retained good discrimination. Other key performance measures were provided in only 3 studies, rendering overall performance evaluation difficult. The overall median TRIPOD completeness was 61% (IQR 43-89), with 6 items being reported in less than 4/18 of the studies.Interpretation - Most current predictive ML models are not externally validated. The 18 available external validation studies were characterized by incomplete reporting of performance measures, limiting a transparent examination of model performance. Further prospective studies are needed to validate or refute the myriad of predictive ML models in orthopedics while adhering to existing guidelines. This ensures clinicians can take full advantage of validated and clinically implementable ML decision tools.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/17453674.2021.1910448DOI Listing
April 2021

Value-based health care in spine: Where do we go from here?

Spine J 2021 Apr 12. Epub 2021 Apr 12.

Institute for Strategy and Competitiveness, Harvard Business School, Boston, MA.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2021.04.006DOI Listing
April 2021

Updated external validation of the SORG machine learning algorithms for prediction of ninety-day and one-year mortality after surgery for spinal metastasis.

Spine J 2021 Mar 31. Epub 2021 Mar 31.

Department of Orthopaedic Surgery, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.

Background Context: Surgical decompression and stabilization in the setting of spinal metastasis is performed to relieve pain and preserve functional status. These potential benefits must be weighed against the risks of perioperative morbidity and mortality. Accurate prediction of a patient's postoperative survival is a crucial component of patient counseling.

Purpose: To externally validate the SORG machine learning algorithms for prediction of 90-day and 1-year mortality after surgery for spinal metastasis.

Study Design/setting: Retrospective, cohort study PATIENT SAMPLE: Patients 18 years or older at a tertiary care medical center treated surgically for spinal metastasis OUTCOME MEASURES: Mortality within 90 days of surgery, mortality within 1 year of surgery METHODS: This is a retrospective cohort study of 298 adult patients at a tertiary care medical center treated surgically for spinal metastasis between 2004 and 2020. Baseline characteristics of the validation cohort were compared to the derivation cohort for the SORG algorithms. The following metrics were used to assess the performance of the algorithms: discrimination, calibration, overall model performance, and decision curve analysis.

Results: Sixty-one patients died within 90 days of surgery and 133 died within 1 year of surgery. The validation cohort differed significantly from the derivation cohort. The SORG algorithms for 90-day mortality and 1-year mortality performed excellently with respect to discrimination; the algorithm for 1-year mortality was well-calibrated. At both postoperative time points, the SORG algorithms showed greater net benefit than the default strategies of changing management for no patients or for all patients.

Conclusions: With an independent, contemporary, and geographically distinct population, we report successful external validation of SORG algorithms for preoperative risk prediction of 90-day and 1-year mortality after surgery for spinal metastasis. By providing accurate prediction of intermediate and long-term mortality risk, these externally validated algorithms may inform shared decision-making with patients in determining management of spinal metastatic disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2021.03.026DOI Listing
March 2021

Introduction to the special issue of The Spine Journal on artificial intelligence and machine learning.

Spine J 2021 Mar 27. Epub 2021 Mar 27.

Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

In the last 5 years, artificial intelligence (AI) algorithms have made rapid advances for diagnosis and prognosis in fields ranging from dermatology to anesthesiology. How do we make sense of the rise of AI in healthcare and specifically in spine? How much of what we see today is "hype" and what will remain when the dust settles? In this special issue, several reviews and original articles help us understand the state of AI in healthcare today, the avenues for future progress, and the implications for spine care.  Continued engagement, skepticism, and collaboration with technical experts will allow for the development of AI systems that complement and expand our abilities to diagnose, predict, and operate.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2021.03.028DOI Listing
March 2021

Machine learning prediction models in orthopedic surgery: A systematic review in transparent reporting.

J Orthop Res 2021 Mar 18. Epub 2021 Mar 18.

Orthopedic Oncology Service, Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA.

Machine learning (ML) studies are becoming increasingly popular in orthopedics but lack a critically appraisal of their adherence to peer-reviewed guidelines. The objective of this review was to (1) evaluate quality and transparent reporting of ML prediction models in orthopedic surgery based on the transparent reporting of multivariable prediction models for individual prognosis or diagnosis (TRIPOD), and (2) assess risk of bias with the Prediction model Risk Of Bias ASsessment Tool. A systematic review was performed to identify all ML prediction studies published in orthopedic surgery through June 18th, 2020. After screening 7138 studies, 59 studies met the study criteria and were included. Two reviewers independently extracted data and discrepancies were resolved by discussion with at least two additional reviewers present. Across all studies, the overall median completeness for the TRIPOD checklist was 53% (interquartile range 47%-60%). The overall risk of bias was low in 44% (n = 26), high in 41% (n = 24), and unclear in 15% (n = 9). High overall risk of bias was driven by incomplete reporting of performance measures, inadequate handling of missing data, and use of small datasets with inadequate outcome numbers. Although the number of ML studies in orthopedic surgery is increasing rapidly, over 40% of the existing models are at high risk of bias. Furthermore, over half incompletely reported their methods and/or performance measures. Until these issues are adequately addressed to give patients and providers trust in ML models, a considerable gap remains between the development of ML prediction models and their implementation in orthopedic practice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/jor.25036DOI Listing
March 2021

Erratum to: Development and Internal Validation of Machine Learning Algorithms for Preoperative Survival Prediction of Extremity Metastatic Disease.

Clin Orthop Relat Res 2021 04;479(4):862

Q. C. B. S. Thio, A. V. Karhade, B. Bindels, P. T. Ogink, S. A. Lozano Calderón, K. A. Raskin, J. H. Schwab, Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000001678DOI Listing
April 2021

Development of machine learning model algorithm for prediction of 5-year soft tissue myxoid liposarcoma survival.

J Surg Oncol 2021 Mar 8. Epub 2021 Mar 8.

Department of Orthopedic Surgery, Musculoskeletal Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA.

Background: Predicting survival in myxoid liposarcoma (MLS) patients is very challenging given its propensity to metastasize and the controversial role of adjuvant therapy. The purpose of this study was to develop a machine-learning algorithm for the prediction of survival at five years for patients with MLS and externally validate it using our institutional cohort.

Methods: Two databases, the surveillance, epidemiology, and end results program (SEER) database and an institutional database, were used in this study. Five machine learning models were created based on the SEER database and performance was rated using the TRIPOD criteria. The model that performed best on the SEER data was again tested on our institutional database.

Results: The net-elastic penalized logistic regression model was the best according to our performance indicators. This model had an area under the curve (AUC) of 0.85 when compared to the SEER testing data and an AUC of 0.76 when tested against institutional database. An application to use this calculator is available at https://sorg-apps.shinyapps.io/myxoid_liposarcoma/.

Conclusion: MLS is a soft-tissue sarcoma with adjunct treatment options that are, in part, decided by prognostic survival. We developed the first machine-learning predictive algorithm specifically for MLS using the SEER registry that retained performance during external validation with institutional data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/jso.26398DOI Listing
March 2021

International external validation of the SORG machine learning algorithms for predicting 90-day and 1-year survival of patients with spine metastases using a Taiwanese cohort.

Spine J 2021 Feb 2. Epub 2021 Feb 2.

Department of Orthopedics, National Taiwan University College of Medicine and National Taiwan University Hospital, Taipei, Taiwan. Electronic address:

Background Context: Accurately predicting the survival of patients with spinal metastases is important for guiding surgical intervention. The SORG machine-learning (ML) algorithm for the 90-day and 1-year mortality of patients with metastatic cancer to the spine has been multiply validated, with a high degree of accuracy in both internal and external validation studies. However, prior external validations were conducted using patient groups located on the east coast of the United States, representing a generally homogeneous population. The aim of this study was to externally validate the SORG algorithms with a Taiwanese population.

Study Design/setting: Retrospective study at a single tertiary care center in Taiwan PATIENT SAMPLE: Four hundred and twenty-seven patients who underwent surgery for metastatic spine disease from November 1, 2010 to December 31, 2018 OUTCOME MEASURES: 90-Day and 1-Year Mortality METHODS: The baseline characteristics of our validation cohort were compared with those of the previously published developmental and external validation cohorts. Discrimination (c-statistic and receiver operating curve), calibration (calibration plot, intercept, and slope), overall performance (Brier score), and decision curve analysis were used to assess the performance of the SORG ML algorithms in this cohort.

Results: Ninety-day and 1-year mortality rates were 110 of 427 (26%) and 256 of 427 (60%), respectively. The external validation cohort and the developmental cohort differed in body mass index (BMI), preoperative performance status, American Spinal Injury Association impairment scale, primary tumor histology and in several laboratory measurements. The SORG ML algorithm for 90-day and 1-year mortality demonstrated a high level of discriminative ability (c-statistics of 0.73 [95% confidence interval [CI], 0.67-0.78] and 0.74 [95% CI, 0.69-0.79]), overall performance, and had a positive net benefit throughout the range of threshold probabilities in decision curve analysis. The algorithm for 1-year mortality had a calibration intercept of 0.08, representing a good calibration. However, the 90-day mortality algorithm underestimated mortality for the lowest predicted probabilities, with an overall intercept of 0.81.

Conclusions: The SORG algorithms for predicting 90-day and 1-year mortality in patients with spinal metastatic disease generally performed well on international external validation in a predominately Taiwanese population. However, 90-day mortality was underestimated in this group. Whether this inconsistency was due to different primary tumor characteristics, body mass index, selection bias or other factors remains unclear, and may be better understood with further validative works that utilize international and/or diverse populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2021.01.027DOI Listing
February 2021

Systematic Review of Sleep Quality Before and After Arthroscopic Rotator Cuff Repair: Are Improvements Experienced and Maintained?

Orthop J Sports Med 2020 Dec 29;8(12):2325967120969224. Epub 2020 Dec 29.

Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, Illinois, USA.

Background: Poor sleep quality is prevalent among patients with rotator cuff tears (RCTs) and negatively influences the potential for healing and quality of life. However, there is a paucity of literature describing the magnitude and timing of changes in sleep quality after arthroscopic rotator cuff repair (RCR).

Purpose: (1) To evaluate the prevalence of poor sleep quality in patients undergoing arthroscopic RCR and (2) to determine the timing and magnitude of changes in sleep quality after RCR.

Study Design: Systematic review; Level of evidence, 4.

Methods: PubMed, OVID/Medline, and Cochrane databases were queried in January 2020 for literature investigating the prevalence of poor sleep quality in patients with RCTs or changes in sleep quality after arthroscopic RCR. Data pertaining to study characteristics, risk of bias, sleep quality assessments, and clinical outcomes were extracted. A qualitative analysis of the prevalence of poor sleep quality and changes in sleep quality was performed.

Results: A total of 8 studies (1034 patients) were included. The mean Pittsburgh Sleep Quality Index (PSQI) ranged from 5.2 to 15.0 preoperatively among all studies, while the frequency of patients experiencing poor sleep quality ranged from 40.8% to 89.0% in 4 studies. Four studies reported the mean PSQI at a minimum of 6 months postoperatively, which ranged from 4.2 to 7.1. Four studies did not report the PSQI score or the proportion of patients who experienced poor postoperative sleep quality. One study evaluated the PSQI at 12 months postoperatively, which decreased to 4.2 from 5.8 at 6 months. One study evaluated the PSQI at 24 months postoperatively, which decreased to 5.5 from 6.2 at 6 months.

Conclusion: Patients with RCTs have a high prevalence of poor sleep quality. Consistent improvements in sleep quality are observed in the 6 months after arthroscopic RCR, but there is limited evidence based on the available data to characterize changes in sleep quality beyond this time. More evidence is needed to characterize changes in sleep quality beyond 6 months and how these changes are perceived by this patient population.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1177/2325967120969224DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7780319PMC
December 2020

Postoperative adverse events secondary to iatrogenic vascular injury during anterior lumbar spinal surgery.

Spine J 2020 Nov 3. Epub 2020 Nov 3.

Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114, USA.

Background: Anterior lumbar spine surgery (ALSS) requires mobilization of the great vessels, resulting in a high risk of iatrogenic vascular injury (VI). It remains unclear whether VI is associated with increased risk of postoperative complications and other related adverse outcomes.

Purpose: The purpose of this study was to (1) assess the incidence of postoperative complications attributable to VI during ALSS, and (2) outcomes secondary to VI such as procedural blood loss, transfusion of blood products, length of stay (LOS), and in hospital mortality.

Study Design: Retrospective propensity-score matched, case-control study at 2 academic and 3 community medical centers, PATIENT SAMPLE: Patients 18 years of age or older, undergoing ALSS between January 1st, 2000 and July 31st, 2019 were included in this analysis.

Outcome Measures: The primary outcome was the incidence of postoperative complications attributable to VI, such as venous thromboembolism, compartment syndrome, transfusion reaction, limb ischemia, and reoperations. The secondary outcomes included estimated operative blood loss (milliliter), transfused blood products, LOS (days), and in-hospital mortality.

Methods: In total, 1,035 patients were identified, of which 75 (7.2%) had a VI. For comparative analyses, the 75 VI patients were paired with 75 comparable non-VI patients by propensity-score matching. The adequacy of the matching was assessed by testing the standardized mean differences (SMD) between VI and non-VI group (>0.25 SMD).

Results: Two patients (2.7%) had VI-related postoperative complications in the studied period, which consisted of two deep venous thromboembolisms (DVTs) occurring on day 3 and 7 postoperatively. Both DVTs were located in the distal left common iliac vein (CIV). The VI these patients suffered were to the distal inferior vena cava and the left CIV, respectively. Both patients did not develop additional complications in consequence of their DVTs, however, did require systemic anticoagulation and placement of an inferior vena cava filter. There was no statistical difference with the non-VI group where no instances (0%) of postoperative complications were reported (p=.157). No differences were found in LOS or in hospital mortality between the two groups (p=.157 and p=.999, respectively). Intraoperative blood loss and blood transfusion were both found to be higher in the VI group in comparison to the non-VI group (650 mL, interquartile range [IQR] 300-1400 vs. 150 mL, IQR 50-425, p≤.001; 0 units, IQR 0-3 vs. 0 units, IQR 0-1, p=.012, respectively).

Conclusion: This study found a low number of serious postoperative complications related to VI in ALSS. In addition, these complications were not significantly different between the VI and matched non-VI ALSS cohort. Although not significant, the found DVT incidence of 2.7% after VI in ALSS warrants vigilance and preventive measures during the postoperative course of these patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.10.031DOI Listing
November 2020

Development of prediction models for clinically meaningful improvement in PROMIS scores after lumbar decompression.

Spine J 2021 Mar 31;21(3):397-404. Epub 2020 Oct 31.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Department of Orthopedic Surgery, Newton Wellesley Hospital, Newton, MA, USA. Electronic address:

Background: The ability to preoperatively predict which patients will achieve a minimal clinically important difference (MCID) after lumbar spine decompression surgery can help determine the appropriateness and timing of surgery. Patient-Reported Outcome Measurement Information System (PROMIS) scores are an increasingly popular outcome instrument.

Purpose: The purpose of this study was to develop algorithms predictive of achieving MCID after primary lumbar decompression surgery.

Patient Sample: This was a retrospective study at two academic medical centers and three community medical centers including adult patients 18 years or older undergoing one or two level posterior decompression for lumbar disc herniation or lumbar spinal stenosis between January 1, 2016 and April 1, 2019.

Outcome Measures: The primary outcome, MCID, was defined using distribution-based methods as one half the standard deviation of postoperative patient-reported outcomes (PROMIS physical function, pain interference, pain intensity).

Methods: Five machine learning algorithms were developed to predict MCID on these surveys and assessed by discrimination, calibration, Brier score, and decision curve analysis. The final model was incorporated into an open access digital application.

Results: Overall, 906 patients completed at least one PROMs survey in the 90 days before surgery and at least one PROMs survey in the year after surgery. Attainment of MCID during the study period by PROMIS instrument was 74.3% for physical function, 75.8% for pain interference, and 79.2% for pain intensity. Factors identified for preoperative prediction of MCID attainment on these outcomes included preoperative PROs, percent unemployment in neighborhood of residence, comorbidities, body mass index, private insurance, preoperative opioid use, surgery for disc herniation, and federal poverty level in neighborhood of residence. The discrimination (c-statistic) of the final algorithms for these outcomes was 0.79 for physical function, 0.74 for pain interference, and 0.69 for pain intensity with good calibration. The open access digital application for these algorithms can be found here: https://sorg-apps.shinyapps.io/promis_pld_mcid/ CONCLUSION: Lower preoperative PROMIS scores, fewer comorbidities, and certain sociodemographic factors increase the likelihood of achieving MCID for PROMIS after lumbar spine decompression.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.10.026DOI Listing
March 2021

Surgeon-level variance in achieving clinical improvement after lumbar decompression: the importance of adequate risk adjustment.

Spine J 2021 Mar 9;21(3):405-410. Epub 2020 Oct 9.

Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA; Department of Orthopaedic Surgery, Newton Wellesley Hospital, Newton, MA 02462, USA. Electronic address:

Background Context: Patient-Reported Outcome Measurement Information System (PROMIS) scores are increasingly utilized in clinical care. However, it is unclear if PROMIS can discriminate surgeon performance on an individual level.

Purpose: The purpose of this study was to examine surgeon-level variance in rates of achieving minimal clinically important difference (MCID) after lumbar decompression.

Patient Sample: This is a prospective, observational cohort study performed across a healthcare enterprise (two academic medical centers and three community centers). Patients 18 years or older undergoing one- to two-level primary decompression for lumbar disc herniation (LDH) or lumbar spinal stenosis (LSS) were included.

Outcome Measures: The primary outcome was achievement of MCID, using a distribution-based method, on paired PROMIS physical function scores.

Methods: Descriptive statistics were generated to examine the baseline characteristics of the study cohort. Bivariate analyses were used to examine the impact of surgeon-level variance on rates of MCID. Multivariable analyses were used to examine the risk-adjusted impact of surgeon-level variance on rates of MCID.

Results: Overall, 636 patients treated by nine surgeons were included. The median patient age was 58 [interquartile range (IQR): 46-70] and 62.3% (n=396) were female. Among all patients, 56.9% (n=362) underwent surgery for LDH. The overall rate of achieving MCID was 75.8% (n=482). Of the surgeons, the median years in practice were 12 (range 4-31) and 55.6% (n=5) were in academic practice settings. On bivariate analysis, patients treated by one of the surgeons had lower rates of achieving MICD (odds ratio=0.37, 95% confidence interval: 0.15-0.91, p=.03). However, on multivariable analysis adjusting for operative indication (LDH vs. LSS), body mass index, number of comorbidities, percent unemployment in patient zip code, and preoperative PROMIS physical function scores, all surgeons were equally likely to obtain MCID.

Conclusions: In this cohort, variance in PROMIS scores after primary lumbar decompression is influenced by patient-related factors and not by individual surgeon. Adequate risk adjustment is needed if ascertaining clinical improvement on an individual surgeon basis.

Level Of Evidence: 2.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.10.005DOI Listing
March 2021

Diagnostic Performance of Artificial Intelligence for Detection of Anterior Cruciate Ligament and Meniscus Tears: A Systematic Review.

Arthroscopy 2021 02 18;37(2):771-781. Epub 2020 Sep 18.

Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, Illinois, U.S.A.. Electronic address:

Purpose: To (1) determine the diagnostic efficacy of artificial intelligence (AI) methods for detecting anterior cruciate ligament (ACL) and meniscus tears and to (2) compare the efficacy to human clinical experts.

Methods: PubMed, OVID/Medline, and Cochrane libraries were queried in November 2019 for research articles pertaining to AI use for detection of ACL and meniscus tears. Information regarding AI model, prediction accuracy/area under the curve (AUC), sample sizes of testing/training sets, and imaging modalities were recorded.

Results: A total of 11 AI studies were identified: 5 investigated ACL tears, 5 investigated meniscal tears, and 1 investigated both. The AUC of AI models for detecting ACL tears ranged from 0.895 to 0.980, and the prediction accuracy ranged from 86.7% to 100%. Of these studies, 3 compared AI models to clinical experts. Two found no significant differences in diagnostic capability, whereas one found that radiologists had a significantly greater sensitivity for detecting ACL tears (P = .002) and statistically similar specificity and accuracy. Of the 5 studies investigating the meniscus, the AUC for AI models ranged from 0.847 to 0.910 and prediction accuracy ranged from 75.0% to 90.0%. Of these studies, 2 compared AI models with clinical experts. One found no significant differences in diagnostic accuracy, whereas one found that the AI model had a significantly lower specificity (P = .003) and accuracy (P = .015) than radiologists. Two studies reported that the addition of AI models significantly increased the diagnostic performance of clinicians compared to their efforts without these models.

Conclusions: AI prediction capabilities were excellent and may enhance the diagnosis of ACL and meniscal pathology; however, AI did not outperform clinical experts.

Clinical Relevance: AI models promise to improve diagnosing certain pathologies as well as or better than human experts, are excellent for detecting ACL and meniscus tears, and may enhance the diagnostic capabilities of human experts; however, when compared with these experts, they may not offer any significant advantage.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.arthro.2020.09.012DOI Listing
February 2021

Natural language processing for automated quantification of bone metastases reported in free-text bone scintigraphy reports.

Acta Oncol 2020 Dec 12;59(12):1455-1460. Epub 2020 Sep 12.

Department of Orthopaedic Surgery, Orthopaedic Oncology Service, Massachusetts General Hospital - Harvard Medical School, Boston, MA, USA.

Background: The widespread use of electronic patient-generated health data has led to unprecedented opportunities for automated extraction of clinical features from free-text medical notes. However, processing this rich resource of data for clinical and research purposes, depends on labor-intensive and potentially error-prone manual review. The aim of this study was to develop a natural language processing (NLP) algorithm for binary classification (single metastasis versus two or more metastases) in bone scintigraphy reports of patients undergoing surgery for bone metastases.

Material And Methods: Bone scintigraphy reports of patients undergoing surgery for bone metastases were labeled each by three independent reviewers using a binary classification (single metastasis versus two or more metastases) to establish a ground truth. A stratified 80:20 split was used to develop and test an extreme-gradient boosting supervised machine learning NLP algorithm.

Results: A total of 704 free-text bone scintigraphy reports from 704 patients were included in this study and 617 (88%) had multiple bone metastases. In the independent test set ( = 141) not used for model development, the NLP algorithm achieved an 0.97 AUC-ROC (95% confidence interval [CI], 0.92-0.99) for classification of multiple bone metastases and an 0.99 AUC-PRC (95% CI, 0.99-0.99). At a threshold of 0.90, NLP algorithm correctly identified multiple bone metastases in 117 of the 124 who had multiple bone metastases in the testing cohort (sensitivity 0.94) and yielded 3 false positives (specificity 0.82). At the same threshold, the NLP algorithm had a positive predictive value of 0.97 and F1-score of 0.96.

Conclusions: NLP has the potential to automate clinical data extraction from free text radiology notes in orthopedics, thereby optimizing the speed, accuracy, and consistency of clinical chart review. Pending external validation, the NLP algorithm developed in this study may be implemented as a means to aid researchers in tackling large amounts of data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/0284186X.2020.1819563DOI Listing
December 2020

Serum alkaline phosphatase is a prognostic marker in bone metastatic disease of the extremity.

J Orthop 2020 Nov-Dec;22:346-351. Epub 2020 Aug 17.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

Purpose: The purpose of this study was to determine the prognostic value of serum alkaline phosphatase for treatment decision making in metastatic bone disease.

Methods: 1090 patients who underwent surgery for extremity metastatic disease were retrospectively identified at two tertiary care centers. The association between alkaline phosphatase and mortality was assessed by bivariate and multivariate analyses.

Results: Three-month and one-year mortality rates were 305 (29%) and 639 (62%), respectively. Alkaline phosphatase was associated with mortality at both three months and one year.

Conclusion: Serum alkaline phosphatase may be a useful marker in prognostic algorithms for patients with extremity metastatic disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jor.2020.08.008DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7475062PMC
August 2020

Does Artificial Intelligence Outperform Natural Intelligence in Interpreting Musculoskeletal Radiological Studies? A Systematic Review.

Clin Orthop Relat Res 2020 12;478(12):2751-2764

O. Q. Groot, M. E. R. Bongers, A. V. Karhade, J. H. Schwab, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

Background: Machine learning (ML) is a subdomain of artificial intelligence that enables computers to abstract patterns from data without explicit programming. A myriad of impactful ML applications already exists in orthopaedics ranging from predicting infections after surgery to diagnostic imaging. However, no systematic reviews that we know of have compared, in particular, the performance of ML models with that of clinicians in musculoskeletal imaging to provide an up-to-date summary regarding the extent of applying ML to imaging diagnoses. By doing so, this review delves into where current ML developments stand in aiding orthopaedists in assessing musculoskeletal images.

Questions/purposes: This systematic review aimed (1) to compare performance of ML models versus clinicians in detecting, differentiating, or classifying orthopaedic abnormalities on imaging by (A) accuracy, sensitivity, and specificity, (B) input features (for example, plain radiographs, MRI scans, ultrasound), (C) clinician specialties, and (2) to compare the performance of clinician-aided versus unaided ML models.

Methods: A systematic review was performed in PubMed, Embase, and the Cochrane Library for studies published up to October 1, 2019, using synonyms for machine learning and all potential orthopaedic specialties. We included all studies that compared ML models head-to-head against clinicians in the binary detection of abnormalities in musculoskeletal images. After screening 6531 studies, we ultimately included 12 studies. We conducted quality assessment using the Methodological Index for Non-randomized Studies (MINORS) checklist. All 12 studies were of comparable quality, and they all clearly included six of the eight critical appraisal items (study aim, input feature, ground truth, ML versus human comparison, performance metric, and ML model description). This justified summarizing the findings in a quantitative form by calculating the median absolute improvement of the ML models compared with clinicians for the following metrics of performance: accuracy, sensitivity, and specificity.

Results: ML models provided, in aggregate, only very slight improvements in diagnostic accuracy and sensitivity compared with clinicians working alone and were on par in specificity (3% (interquartile range [IQR] -2.0% to 7.5%), 0.06% (IQR -0.03 to 0.14), and 0.00 (IQR -0.048 to 0.048), respectively). Inputs used by the ML models were plain radiographs (n = 8), MRI scans (n = 3), and ultrasound examinations (n = 1). Overall, ML models outperformed clinicians more when interpreting plain radiographs than when interpreting MRIs (17 of 34 and 3 of 16 performance comparisons, respectively). Orthopaedists and radiologists performed similarly to ML models, while ML models mostly outperformed other clinicians (outperformance in 7 of 19, 7 of 23, and 6 of 10 performance comparisons, respectively). Two studies evaluated the performance of clinicians aided and unaided by ML models; both demonstrated considerable improvements in ML-aided clinician performance by reporting a 47% decrease of misinterpretation rate (95% confidence interval [CI] 37 to 54; p < 0.001) and a mean increase in specificity of 0.048 (95% CI 0.029 to 0.068; p < 0.001) in detecting abnormalities on musculoskeletal images.

Conclusions: At present, ML models have comparable performance to clinicians in assessing musculoskeletal images. ML models may enhance the performance of clinicians as a technical supplement rather than as a replacement for clinical intelligence. Future ML-related studies should emphasize how ML models can complement clinicians, instead of determining the overall superiority of one versus the other. This can be accomplished by improving transparent reporting, diminishing bias, determining the feasibility of implantation in the clinical setting, and appropriately tempering conclusions.

Level Of Evidence: Level III, diagnostic study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000001360DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7899420PMC
December 2020

CORR Synthesis: When Should We Be Skeptical of Clinical Prediction Models?

Clin Orthop Relat Res 2020 Dec;478(12):2722-2728

A. V. Karhade, J. H. Schwab, Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000001367DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7899395PMC
December 2020

SMART on FHIR in spine: integrating clinical prediction models into electronic health records for precision medicine at the point of care.

Spine J 2020 Jun 26. Epub 2020 Jun 26.

Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Ste 140, Salt Lake City, UT 84108, USA.

Recent applications of artificial intelligence have shown great promise for improving the quality and efficiency of clinical care. Numerous clinical decision support tools exist in today's electronic health records (EHRs) such as medication dosing support, order facilitators (eg, procedure specific order sets), and point of care alerts. However, less has been done to integrate artificial intelligence (AI)-enabled risk predictors into EHRs despite wide availability of validated risk prediction tools. An interoperability standard known as SMART on FHIR (substitutable medical applications and reusable technologies on fast health interoperability resources) offers a promising path forward, enabling digital innovations to be seamlessly integrated with the EHR with regard to the user interface and patient data. For the next step in progress towards the goal of learning healthcare and informatics-enabled spine surgery, we propose the application of SMART on FHIR to integrate existing and new risk predictions tools in spine surgery through an EHR add-on-application.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.06.014DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7762727PMC
June 2020

Comparison of the Stopping Opioids after Surgery (SOS) score to preoperative morphine milligram equivalents (MME) for prediction of opioid prescribing after lumbar spine surgery.

Spine J 2020 11 11;20(11):1798-1804. Epub 2020 Jun 11.

Department of Orthopaedic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. Electronic address:

Background Context: Reliable estimation of the likelihood for prolonged postoperative opioid use may aid targeted interventions for high-risk patients. Previous studies have recommended differing methodologies for prediction of sustained postoperative opioid use.

Purpose: To compare the performance of the Stopping Opioids after Surgery (SOS) score and preoperative morphine milligram equivalents (MME) for postoperative opioid prescription exposure in a contemporary cohort of lumbar surgery patients.

Patient Sample: Adult patients undergoing posterior decompression with or without fusion for degenerative lumbar conditions between January 31, 2016 and May 31, 2019.

Study Design/setting: Retrospective review at two academic medical centers and three community hospitals.

Outcome Measures: The primary outcome was sustained postoperative prescription opioid exposure at 3 months and 6 months. Reoperations and readmissions were considered secondarily.

Methods: The Stopping Opioids after Surgery score and MME were assigned to patients based on data from their preoperative surgical evaluation. Performance for both measures was assessed for all outcomes by discrimination, including c-statistic and receiver-operating curve analysis. Calibration of the low, medium and high-risk strata with the observed rates of postoperative adverse events were examined.

Results: Overall, 4,165 patients were included in this study. Preoperative prevalence of prescription opioid use was 31%. Rates of postoperative opioid prescriptions at 3 months and 6 months, were 3.3% (n=136) and 1.5% (n=61). The c-statistics of preoperative oral MME and SOS score for 3-month sustained opioid prescriptions were 0.64 and 0.78, respectively. The c-statistics of preoperative oral MME and SOS score for 6-month sustained opioid prescriptions were 0.64 and 0.82, respectively. C-statistics of preoperative oral MME and SOS score were much lower for reoperation and readmission, although SOS score outperformed MME for both outcomes.

Conclusions: The SOS score clinically outperformed oral MME as a predictive measure for outcomes following lumbar spine surgery. The SOS score may be valuable for identifying individuals at high-risk for sustained prescription opioid use and associated adverse events following spine surgery.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.06.005DOI Listing
November 2020

Survival After Surgery for Renal Cell Carcinoma Metastatic to the Spine: Impact of Modern Systemic Therapies on Outcomes.

Neurosurgery 2020 Jun 10. Epub 2020 Jun 10.

Department of Neurosurgery, Massachusetts General Hospital, Boston, Massachusetts.

Background: Modern medical management of metastatic renal cell carcinoma (RCC) includes therapies targeting tyrosine kinases, growth pathways (mammalian target of rapamycin (mTOR)), and immune checkpoints.

Objective: To test our hypothesis that patients with spinal metastases would benefit from postoperative systemic therapy despite presenting with disease that, in many cases, was resistant to prior systemic therapy.

Methods: This is an Institutional Review Board-approved clinical retrospective cohort analysis. A sample of adult patients with RCC metastatic to the spine who underwent operative intervention between January 2010 and December 2017 at 2 large academic medical centers was used in this study.

Results: We identified 78 patients with metastatic RCC in whom instrumented stabilization was performed in 79% and postoperative stereotactic radiosurgery was performed in 41% of patients. Of patients presenting with weakness or myelopathy, 93% noted postoperative improvement and 78% reported improvement in radicular and axial paraspinal pain severity. Increased overall survival (OS) (913 d (95% CI: 633-1975 d, n = 49) vs 222 d (95% CI: 143-1005 d, n = 29), P = .003) following surgery was noted in patients who received postoperative systemic therapy a median of 80 d (interquartile range 48-227 d) following the surgical intervention.

Conclusion: Postoperative outcomes and palliation of symptoms for metastatic RCC without targeted therapies in this cohort are similar to those reported in earlier series prior to the adoption of these systemic therapies. We observed a significantly longer OS among patients who received modern systemic therapies postoperatively. These findings have implications for the preoperative evaluation of patients with systemic disease who may have been deemed poor surgical candidates prior to the availability of these systemic therapies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/neuros/nyaa224DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7666885PMC
June 2020

How Does the Skeletal Oncology Research Group Algorithm's Prediction of 5-year Survival in Patients with Chondrosarcoma Perform on International Validation?

Clin Orthop Relat Res 2020 Oct;478(10):2300-2308

M. E. R. Bongers, A. V. Karhade, O. Q. Groot, J. H. Schwab, Department of Orthopaedic Surgery, Division of Orthopaedic Oncology, Massachusetts General Hospital - Harvard Medical School, Boston, MA, USA.

Background: The Skeletal Oncology Research Group (SORG) machine learning algorithm for predicting survival in patients with chondrosarcoma was developed using data from the Surveillance, Epidemiology, and End Results (SEER) registry. This algorithm was externally validated on a dataset of patients from the United States in an earlier study, where it demonstrated generally good performance but overestimated 5-year survival. In addition, this algorithm has not yet been validated in patients outside the United States; doing so would be important because external validation is necessary as algorithm performance may be misleading when applied in different populations.

Questions/purposes: Does the SORG algorithm retain validity in patients who underwent surgery for primary chondrosarcoma outside the United States, specifically in Italy?

Methods: A total of 737 patients were treated for chondrosarcoma between January 2000 and October 2014 at the Italian tertiary care center which was used for international validation. We excluded patients whose first surgical procedure was performed elsewhere (n = 25), patients who underwent nonsurgical treatment (n = 27), patients with a chondrosarcoma of the soft tissue or skull (n = 60), and patients with peripheral, periosteal, or mesenchymal chondrosarcoma (n = 161). Thus, 464 patients were ultimately included in this external validation study, as the earlier performed SEER study was used as the training set. Therefore, this study-unlike most of this type-does not have a training and validation set. Although the earlier study overestimated 5-year survival, we did not modify the algorithm in this report, as this is the first international validation and the prior performance in the single-institution validation study from the United States may have been driven by a small sample or non-generalizable patterns related to its single-center setting. Variables needed for the SORG algorithm were manually collected from electronic medical records. These included sex, age, histologic subtype, tumor grade, tumor size, tumor extension, and tumor location. By inputting these variables into the algorithm, we calculated the predicted probabilities of survival for each patient. The performance of the SORG algorithm was assessed in this study through discrimination (the ability of a model to distinguish between a binary outcome), calibration (the agreement of observed and predicted outcomes), overall performance (the accuracy of predictions), and decision curve analysis (establishment on the ability of a model to make a decision better than without using the model). For discrimination, the c-statistic (commonly known as the area under the receiver operating characteristic curve for binary classification) was calculated; this ranged from 0.5 (no better than chance) to 1.0 (excellent discrimination). The agreement between predicted and observed outcomes was visualized with a calibration plot, and the calibration slope and intercept were calculated. Perfect calibration results in a slope of 1 and an intercept of 0. For overall performance, the Brier score and the null-model Brier score were calculated. The Brier score ranges from 0 (perfect prediction) to 1 (poorest prediction). Appropriate interpretation of the Brier score requires comparison with the null-model Brier score. The null-model Brier score is the score for an algorithm that predicts a probability equal to the population prevalence of the outcome for every patient. A decision curve analysis was performed to compare the potential net benefit of the algorithm versus other means of decision support, such as treating all or none of the patients. There were several differences between this study and the earlier SEER study, and such differences are important because they help us to determine the performance of the algorithm in a group different from the initial study population. In this study from Italy, 5-year survival was different from the earlier SEER study (71% [319 of 450 patients] versus 76% [1131 of 1487 patients]; p = 0.03). There were more patients with dedifferentiated chondrosarcoma than in the earlier SEER study (25% [118 of 464 patients] versus 8.5% [131 of 1544 patients]; p < 0.001). In addition, in this study patients were older, tumor size was larger, and there were higher proportions of high-grade tumors than the earlier SEER study (age: 56 years [interquartile range {IQR} 42 to 67] versus 52 years [IQR 40 to 64]; p = 0.007; tumor size: 80 mm [IQR 50 to 120] versus 70 mm [IQR 42 to 105]; p < 0.001; tumor grade: 22% [104 of 464 had Grade 1], 42% [196 of 464 had Grade 2], and 35% [164 of 464 had Grade 3] versus 41% [592 of 1456 had Grade 1], 40% [588 of 1456 had Grade 2], and 19% [276 of 1456 had Grade 3]; p ≤ 0.001).

Results: Validation of the SORG algorithm in a primarily Italian population achieved a c-statistic of 0.86 (95% confidence interval 0.82 to 0.89), suggesting good-to-excellent discrimination. The calibration plot showed good agreement between the predicted probability and observed survival in the probability thresholds of 0.8 to 1.0. With predicted survival probabilities lower than 0.8, however, the SORG algorithm underestimated the observed proportion of patients with 5-year survival, reflected in the overall calibration intercept of 0.82 (95% CI 0.67 to 0.98) and calibration slope of 0.68 (95% CI 0.42 to 0.95). The Brier score for 5-year survival was 0.15, compared with a null-model Brier of 0.21. The algorithm showed a favorable decision curve analysis in the validation cohort.

Conclusions: The SORG algorithm to predict 5-year survival for patients with chondrosarcoma held good discriminative ability and overall performance on international external validation; however, it underestimated 5-year survival for patients with predicted probabilities from 0 to 0.8 because the calibration plot was not perfectly aligned for the observed outcomes, which resulted in a maximum underestimation of 20%. The differences may reflect the baseline differences noted between the two study populations. The overall performance of the algorithm supports the utility of the algorithm and validation presented here. The freely available digital application for the algorithm is available here: https://sorg-apps.shinyapps.io/extremitymetssurvival/.

Level Of Evidence: Level III, prognostic study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000001305DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7491905PMC
October 2020

Does the SORG algorithm generalize to a contemporary cohort of patients with spinal metastases on external validation?

Spine J 2020 10 16;20(10):1646-1652. Epub 2020 May 16.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.

Background Context: The SORG machine-learning algorithms were previously developed for preoperative prediction of overall survival in spinal metastatic disease. On sub-group analysis of a previous external validation, these algorithms were found to have diminished performance on patients treated after 2010.

Purpose: The purpose of this study was to assess the performance of these algorithms on a large contemporary cohort of consecutive spinal metastatic disease patients.

Study Design/setting: Retrospective study performed at a tertiary care referral center.

Patient Sample: Patients of 18 years and older treated with surgery for metastatic spinal disease between 2014 and 2016.

Outcome Measures: Ninety-day and one-year mortality.

Methods: Baseline patient and tumor characteristics of the validation cohort were compared to the development cohort using bivariate logistic regression. Performance of the SORG algorithms on external validation in the contemporary cohort was assessed with discrimination (c-statistic and receiver operating curve), calibration (calibration plot, intercept, and slope), overall performance (Brier score compared to the null-model Brier score), and decision curve analysis.

Results: Overall, 200 patients were included with 90-day and 1-year mortality rates of 55 (27.6%) and 124 (62.9%), respectively. The contemporary external validation cohort and the developmental cohort differed significantly on primary tumor histology, presence of visceral metastases, American Spinal Injury Association impairment scale, and preoperative laboratory values. The SORG algorithms for 90-day and 1-year mortality retained good discriminative ability (c-statistic of 0.81 [95% confidence interval [CI], 0.74-0.87] and 0.84 [95% CI, 0.77-0.89]), overall performance, and decision curve analysis. The algorithm for 90-day mortality showed almost perfect calibration reflected in an overall calibration intercept of -0.07 (95% CI: -0.50, 0.35). The 1-year mortality algorithm underestimated mortality mainly for the lowest predicted probabilities with an overall intercept of 0.57 (95% CI: 0.18, 0.96).

Conclusions: The SORG algorithms for survival in spinal metastatic disease generalized well to a contemporary cohort of consecutively treated patients from an external institutional. Further validation in international cohorts and large, prospective multi-institutional trials is required to confirm or refute the findings presented here. The open-access algorithms are available here: https://sorg-apps.shinyapps.io/spinemetssurvival/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.05.003DOI Listing
October 2020

Incidental durotomy: predictive risk model and external validation of natural language process identification algorithm.

J Neurosurg Spine 2020 May 1:1-7. Epub 2020 May 1.

1Department of Neurosurgery, Johns Hopkins University School of Medicine, Baltimore, Maryland; and.

Objective: Incidental durotomy is a common complication of elective lumbar spine surgery seen in up to 11% of cases. Prior studies have suggested patient age and body habitus along with a history of prior surgery as being associated with an increased risk of dural tear. To date, no calculator has been developed for quantifying risk. Here, the authors' aim was to identify independent predictors of incidental durotomy, present a novel predictive calculator, and externally validate a novel method to identify incidental durotomies using natural language processing (NLP).

Methods: The authors retrospectively reviewed all patients who underwent elective lumbar spine procedures at a tertiary academic hospital for degenerative pathologies between July 2016 and November 2018. Data were collected regarding surgical details, patient demographic information, and patient medical comorbidities. The primary outcome was incidental durotomy, which was identified both through manual extraction and the NLP algorithm. Multivariable logistic regression was used to identify independent predictors of incidental durotomy. Bootstrapping was then employed to estimate optimism in the model, which was corrected for; this model was converted to a calculator and deployed online.

Results: Of the 1279 elective lumbar surgery patients included in this study, incidental durotomy occurred in 108 (8.4%). Risk factors for incidental durotomy on multivariable logistic regression were increased surgical duration, older age, revision versus index surgery, and case starts after 4 pm. This model had an area under curve (AUC) of 0.73 in predicting incidental durotomies. The previously established NLP method was used to identify cases of incidental durotomy, of which it demonstrated excellent discrimination (AUC 0.97).

Conclusions: Using multivariable analysis, the authors found that increased surgical duration, older patient age, cases started after 4 pm, and a history of prior spine surgery are all independent positive predictors of incidental durotomy in patients undergoing elective lumbar surgery. Additionally, the authors put forth the first version of a clinical calculator for durotomy risk that could be used prospectively by spine surgeons when counseling patients about their surgical risk. Lastly, the authors presented an external validation of an NLP algorithm used to identify incidental durotomies through the review of free-text operative notes. The authors believe that these tools can aid clinicians and researchers in their efforts to prevent this costly complication in spine surgery.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3171/2020.2.SPINE20127DOI Listing
May 2020

Development and validation of machine learning algorithms for postoperative opioid prescriptions after TKA.

J Orthop 2020 Nov-Dec;22:95-99. Epub 2020 Mar 28.

Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

Objective: The aims of this study were to develop machine learning algorithms for preoperative prediction of prolonged opioid prescriptions after TKA and to identify variables that can predict the probability of this adverse outcome.

Methods: Five algorithms were developed for prediction of prolonged postoperative opioid prescriptions.

Results: The stochastic gradient boosting (SGB) model had the best performance. Age, history of preoperative opioid use, marital status, diagnosis of diabetes, and several preoperative medications were predictive of prolonged postoperative opioid prescriptions.

Conclusion: The SGB algorithm developed could help improve preoperative identification of TKA patients at risk for prolonged postoperative opioid prescriptions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jor.2020.03.052DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7152687PMC
March 2020

Development of machine learning and natural language processing algorithms for preoperative prediction and automated identification of intraoperative vascular injury in anterior lumbar spine surgery.

Spine J 2020 Apr 12. Epub 2020 Apr 12.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Department of Orthopedic Surgery, Newton Wellesley Hospital, Newton, MA, USA. Electronic address:

Background: Intraoperative vascular injury (VI) may be an unavoidable complication of anterior lumbar spine surgery; however, vascular injury has implications for quality and safety reporting as this intraoperative complication may result in serious bleeding, thrombosis, and postoperative stricture.

Purpose: The purpose of this study was to (1) develop machine learning algorithms for preoperative prediction of VI and (2) develop natural language processing (NLP) algorithms for automated surveillance of intraoperative VI from free-text operative notes.

Patient Sample: Adult patients, 18 years or age or older, undergoing anterior lumbar spine surgery at two academic and three community medical centers were included in this analysis.

Outcome Measures: The primary outcome was unintended VI during anterior lumbar spine surgery.

Methods: Manual review of free-text operative notes was used to identify patients who had unintended VI. The available population was split into training and testing cohorts. Five machine learning algorithms were developed for preoperative prediction of VI. An NLP algorithm was trained for automated detection of intraoperative VI from free-text operative notes. Performance of the NLP algorithm was compared to current procedural terminology and international classification of diseases codes.

Results: In all, 1035 patients underwent anterior lumbar spine surgery and the rate of intraoperative VI was 7.2% (n=75). Variables used for preoperative prediction of VI were age, male sex, body mass index, diabetes, L4-L5 exposure, and surgery for infection (discitis, osteomyelitis). The best performing machine learning algorithm achieved c-statistic of 0.73 for preoperative prediction of VI (https://sorg-apps.shinyapps.io/lumbar_vascular_injury/). For automated detection of intraoperative VI from free-text notes, the NLP algorithm achieved c-statistic of 0.92. The NLP algorithm identified 18 of the 21 patients (sensitivity 0.86) who had a VI whereas current procedural terminologyand international classification of diseases codes identified 6 of the 21 (sensitivity 0.29) patients. At this threshold, the NLP algorithm had a specificity of 0.93, negative predictive value of 0.99, positive predictive value of 0.51, and F1-score of 0.64.

Conclusion: Relying on administrative procedural and diagnosis codes may underestimate the rate of unintended intraoperative VI in anterior lumbar spine surgery. External and prospective validation of the algorithms presented here may improve quality and safety reporting.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.04.001DOI Listing
April 2020

Development of Machine Learning Algorithms to Predict Clinically Meaningful Improvement for the Patient-Reported Health State After Total Hip Arthroplasty.

J Arthroplasty 2020 08 18;35(8):2119-2123. Epub 2020 Mar 18.

Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, IL.

Background: Failure to achieve clinically significant outcome (CSO) improvement after total hip arthroplasty (THA) imposes a potential cost-to-risk imbalance in the context of bundle payment models. Patient perception of their health state is one component of such risk. The purpose of the current study is to develop machine learning algorithms to predict CSO for the patient-reported health state (PRHS) and build a clinical decision-making tool based on risk factors.

Methods: A retrospective review of primary THA patients between 2014 and 2017 was performed. Variables considered for prediction included demographics, medical history, preoperative PRHS, and modified Harris Hip Score. The minimal clinically important difference (MCID) for the PRHS was calculated using a distribution-based method. Five supervised machine learning algorithms were developed and assessed by discrimination, calibration, Brier score, and decision curve analysis.

Results: Of 616 patients, a total of 407 (69.2%) achieved the MCID for the PRHS. The random forest algorithm achieved the best performance in the independent testing set not used for algorithm development (c-statistic 0.97, calibration intercept -0.05, calibration slope 1.45, Brier score 0.054). The most important factors for achieving the MCID were preoperative PRHS, preoperative opioid use, age, and body mass index. Individual patient-level explanations were provided for the algorithm predictions and the algorithms were incorporated into an open access digital application available here: https://sorg-apps.shinyapps.io/THA_PRHS_mcid/.

Conclusion: The current study created a clinical decision-making tool based on partially modifiable risk factors for predicting CSO after THA. The tool demonstrates excellent discriminative capacity for identifying those at greatest risk for failing to achieve CSO in their current health state and may allow for preoperative health optimization.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.arth.2020.03.019DOI Listing
August 2020

Can natural language processing provide accurate, automated reporting of wound infection requiring reoperation after lumbar discectomy?

Spine J 2020 10 4;20(10):1602-1609. Epub 2020 Mar 4.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Department of Orthopedic Surgery, Newton Wellesley Hospital, Newton, MA, USA. Electronic address:

Background: Surgical site infections are a major driver of morbidity and increased costs in the postoperative period after spine surgery. Current tools for surveillance of these adverse events rely on prospective clinical tracking, manual retrospective chart review, or administrative procedural and diagnosis codes.

Purpose: The purpose of this study was to develop natural language processing (NLP) algorithms for automated reporting of postoperative wound infection requiring reoperation after lumbar discectomy.

Patient Sample: Adult patients undergoing discectomy at two academic and three community medical centers between January 1, 2000 and July 31, 2019 for lumbar disc herniation.

Outcome Measures: Reoperation for wound infection within 90 days after surgery METHODS: Free-text notes of patients who underwent surgery from January 1, 2000 to December 31, 2015 were used for algorithm training. Free-text notes of patients who underwent surgery after January 1, 2016 were used for algorithm testing. Manual chart review was used to label which patients had reoperation for wound infection. An extreme gradient-boosting NLP algorithm was developed to detect reoperation for postoperative wound infection.

Results: Overall, 5,860 patients were included in this study and 62 (1.1%) had a reoperation for wound infection. In patients who underwent surgery after January 1, 2016 (n=1,377), the NLP algorithm detected 15 of the 16 patients (sensitivity=0.94) who had reoperation for infection. In comparison, current procedural terminology and international classification of disease codes detected 12 of these 16 patients (sensitivity=0.75). At a threshold of 0.05, the NLP algorithm had positive predictive value of 0.83 and F1-score of 0.88.

Conclusion: Temporal validation of the algorithm developed in this study demonstrates a proof-of-concept application of NLP for automated reporting of adverse events after spine surgery. Adapting this methodology for other procedures and outcomes in spine and orthopedics has the potential to dramatically improve and automatize quality and safety reporting.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2020.02.021DOI Listing
October 2020

Predicting prolonged opioid prescriptions in opioid-naïve lumbar spine surgery patients.

Spine J 2020 06 31;20(6):888-895. Epub 2019 Dec 31.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. Electronic address:

Importance: Preoperative determination of the potential for postoperative opioid dependence in previously naïve patients undergoing elective spine surgery may facilitate targeted interventions.

Objective: The purpose of this study was to develop supervised machine learning algorithms for preoperative prediction of prolonged opioid prescription use in opioid-naïve patients following lumbar spine surgery.

Design: Retrospective review of clinical registry data. Variables considered for prediction included demographics, insurance status, preoperative medications, surgical factors, laboratory values, comorbidities, and neighborhood characteristics. Five supervised machine learning algorithms were developed and assessed by discrimination, calibration, Brier score, and decision curve analysis.

Setting: One healthcare entity (two academic medical centers, three community hospitals), 2000 to 2018.

Participants: Opioid-naïve patients undergoing decompression and/or fusion for lumbar disk herniation, stenosis, and spondylolisthesis.

Main Outcome: Sustained prescription opioid use exceeding 90 days after surgery.

Results: Overall, of 8,435 patients included, 359 (4.3%) were found to have prolonged postoperative opioid prescriptions. The elastic-net penalized logistic regression achieved the best performance in the independent testing set not used for algorithm development with c-statistic=0.70, calibration intercept=0.06, calibration slope=1.02, and Brier score=0.039. The five most important factors for prolonged opioid prescriptions were use of instrumented spinal fusion, preoperative benzodiazepine use, preoperative antidepressant use, preoperative gabapentin use, and uninsured status. Individual patient-level explanations were provided for the algorithm predictions and the algorithms were incorporated into an open access digital application available here: https://sorg-apps.shinyapps.io/lumbaropioidnaive/.

Conclusion And Relevance: The clinician decision aid developed in this study may be helpful to preoperatively risk-stratify opioid-naïve patients undergoing lumbar spine surgery. The tool demonstrates moderate discriminative capacity for identifying those at greatest risk of prolonged prescription opioid use. External validation is required to further support the potential utility of this tool in practice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2019.12.019DOI Listing
June 2020

Natural language processing for automated detection of incidental durotomy.

Spine J 2020 05 23;20(5):695-700. Epub 2019 Dec 23.

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, MA 02114, USA. Electronic address:

Background: Incidental durotomy is a common intraoperative complication during spine surgery with potential implications for postoperative recovery, patient-reported outcomes, length of stay, and costs. To our knowledge, there are no processes available for automated surveillance of incidental durotomy.

Purpose: The purpose of this study was to develop natural language processing (NLP) algorithms for automated detection of incidental durotomies in free-text operative notes of patients undergoing lumbar spine surgery.

Patient Sample: Adult patients 18 years or older undergoing lumbar spine surgery between January 1, 2000 and June 31, 2018 at two academic and three community medical centers.

Outcome Measures: The primary outcome was defined as intraoperative durotomy recorded in free-text operative notes.

Methods: An 80:20 stratified split was undertaken to create training and testing populations. An extreme gradient-boosting NLP algorithm was developed to detect incidental durotomy. Discrimination was assessed via area under receiver-operating curve (AUC-ROC), precision-recall curve, and Brier score. Performance of this algorithm was compared with current procedural terminology (CPT) and international classification of diseases (ICD) codes for durotomy.

Results: Overall, 1,000 patients were included in the study and 93 (9.3%) had a recorded incidental durotomy in the free-text operative report. In the independent testing set (n=200) not used for model development, the NLP algorithm achieved AUC-ROC of 0.99 for detection of durotomy. In comparison, the CPT/ICD codes had AUC-ROC of 0.64. In the testing set, the NLP algorithm detected 16 of 18 patients with incidental durotomy (sensitivity 0.89) whereas the CPT and ICD codes detected 5 of 18 (sensitivity 0.28). At a threshold of 0.05, the NLP algorithm had specificity of 0.99, positive predictive value of 0.89, and negative predictive value of 0.99.

Conclusions: Internal validation of the NLP algorithm developed in this study indicates promising results for future NLP applications in spine surgery. Pending external validation, the NLP algorithm developed in this study may be used by entities including national spine registries or hospital quality and safety departments to automate tracking of incidental durotomies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.spinee.2019.12.006DOI Listing
May 2020

Development and Internal Validation of Machine Learning Algorithms for Preoperative Survival Prediction of Extremity Metastatic Disease.

Clin Orthop Relat Res 2020 02;478(2):322-333

Department of Orthopedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

Background: A preoperative estimation of survival is critical for deciding on the operative management of metastatic bone disease of the extremities. Several tools have been developed for this purpose, but there is room for improvement. Machine learning is an increasingly popular and flexible method of prediction model building based on a data set. It raises some skepticism, however, because of the complex structure of these models.

Questions/purposes: The purposes of this study were (1) to develop machine learning algorithms for 90-day and 1-year survival in patients who received surgical treatment for a bone metastasis of the extremity, and (2) to use these algorithms to identify those clinical factors (demographic, treatment related, or surgical) that are most closely associated with survival after surgery in these patients.

Methods: All 1090 patients who underwent surgical treatment for a long-bone metastasis at two institutions between 1999 and 2017 were included in this retrospective study. The median age of the patients in the cohort was 63 years (interquartile range [IQR] 54 to 72 years), 56% of patients (610 of 1090) were female, and the median BMI was 27 kg/m (IQR 23 to 30 kg/m). The most affected location was the femur (70%), followed by the humerus (22%). The most common primary tumors were breast (24%) and lung (23%). Intramedullary nailing was the most commonly performed type of surgery (58%), followed by endoprosthetic reconstruction (22%), and plate screw fixation (14%). Missing data were imputed using the missForest methods. Features were selected by random forest algorithms, and five different models were developed on the training set (80% of the data): stochastic gradient boosting, random forest, support vector machine, neural network, and penalized logistic regression. These models were chosen as a result of their classification capability in binary datasets. Model performance was assessed on both the training set and the validation set (20% of the data) by discrimination, calibration, and overall performance.

Results: We found no differences among the five models for discrimination, with an area under the curve ranging from 0.86 to 0.87. All models were well calibrated, with intercepts ranging from -0.03 to 0.08 and slopes ranging from 1.03 to 1.12. Brier scores ranged from 0.13 to 0.14. The stochastic gradient boosting model was chosen to be deployed as freely available web-based application and explanations on both a global and an individual level were provided. For 90-day survival, the three most important factors associated with poorer survivorship were lower albumin level, higher neutrophil-to-lymphocyte ratio, and rapid growth primary tumor. For 1-year survival, the three most important factors associated with poorer survivorship were lower albumin level, rapid growth primary tumor, and lower hemoglobin level.

Conclusions: Although the final models must be externally validated, the algorithms showed good performance on internal validation. The final models have been incorporated into a freely accessible web application that can be found at https://sorg-apps.shinyapps.io/extremitymetssurvival/. Pending external validation, clinicians may use this tool to predict survival for their individual patients to help in shared treatment decision making.

Level Of Evidence: Level III, therapeutic study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/CORR.0000000000000997DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7438151PMC
February 2020