Publications by authors named "Praveer Singh"

11 Publications

  • Page 1 of 1

Single-Examination Risk Prediction of Severe Retinopathy of Prematurity.

Pediatrics 2021 Nov 23. Epub 2021 Nov 23.

Departments of Ophthalmology.

Background And Objectives: Retinopathy of prematurity (ROP) is a leading cause of childhood blindness. Screening and treatment reduces this risk, but requires multiple examinations of infants, most of whom will not develop severe disease. Previous work has suggested that artificial intelligence may be able to detect incident severe disease (treatment-requiring retinopathy of prematurity [TR-ROP]) before clinical diagnosis. We aimed to build a risk model that combined artificial intelligence with clinical demographics to reduce the number of examinations without missing cases of TR-ROP.

Methods: Infants undergoing routine ROP screening examinations (1579 total eyes, 190 with TR-ROP) were recruited from 8 North American study centers. A vascular severity score (VSS) was derived from retinal fundus images obtained at 32 to 33 weeks' postmenstrual age. Seven ElasticNet logistic regression models were trained on all combinations of birth weight, gestational age, and VSS. The area under the precision-recall curve was used to identify the highest-performing model.

Results: The gestational age + VSS model had the highest performance (mean ± SD area under the precision-recall curve: 0.35 ± 0.11). On 2 different test data sets (n = 444 and n = 132), sensitivity was 100% (positive predictive value: 28.1% and 22.6%) and specificity was 48.9% and 80.8% (negative predictive value: 100.0%).

Conclusions: Using a single examination, this model identified all infants who developed TR-ROP, on average, >1 month before diagnosis with moderate to high specificity. This approach could lead to earlier identification of incident severe ROP, reducing late diagnosis and treatment while simultaneously reducing the number of ROP examinations and unnecessary physiologic stress for low-risk infants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1542/peds.2021-051772DOI Listing
November 2021

Diagnosability of Synthetic Retinal Fundus Images for Plus Disease Detection in Retinopathy of Prematurity.

AMIA Annu Symp Proc 2020 25;2020:329-337. Epub 2021 Jan 25.

Medical Informatics & Clinical Epidemiology.

Advances in generative adversarial networks have allowed for engineering of highly-realistic images. Many studies have applied these techniques to medical images. However, evaluation of generated medical images often relies upon image quality and reconstruction metrics, and subjective evaluation by laypersons. This is acceptable for generation of images depicting everyday objects, but not for medical images, where there may be subtle features experts rely upon for diagnosis. We implemented the pix2pix generative adversarial network for retinal fundus image generation, and evaluated the ability of experts to identify generated images as such and to form accurate diagnoses of plus disease in retinopathy of prematurity. We found that, while experts could discern between real and generated images, the diagnoses between image sets were similar. By directly evaluating and confirming physicians' abilities to diagnose generated retinal fundus images, this work supports conclusions that generated images may be viable for dataset augmentation and physician training.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075515PMC
June 2021

Automated Assessment and Tracking of COVID-19 Pulmonary Disease Severity on Chest Radiographs using Convolutional Siamese Neural Networks.

Radiol Artif Intell 2020 Jul 22;2(4):e200079. Epub 2020 Jul 22.

Athinoula A. Martinos Center for Biomedical Imaging (M.D.L., N.T.A., M.G., K.C., P.S., J.K.C.), Department of Radiology (F.D., M.L.), Division of Thoracic Imaging and Intervention (B.P.L, D.P.M.), Division of Abdominal Imaging (S.I.L., A.O., A.P.), and MGH and BWH Center for Clinical Data Science (J.K.) of the Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

Purpose: To develop an automated measure of COVID-19 pulmonary disease severity on chest radiographs (CXRs), for longitudinal disease tracking and outcome prediction.

Materials And Methods: A convolutional Siamese neural network-based algorithm was trained to output a measure of pulmonary disease severity on CXRs (pulmonary x-ray severity (PXS) score), using weakly-supervised pretraining on ∼160,000 anterior-posterior images from CheXpert and transfer learning on 314 frontal CXRs from COVID-19 patients. The algorithm was evaluated on internal and external test sets from different hospitals (154 and 113 CXRs respectively). PXS scores were correlated with radiographic severity scores independently assigned by two thoracic radiologists and one in-training radiologist (Pearson r). For 92 internal test set patients with follow-up CXRs, PXS score change was compared to radiologist assessments of change (Spearman ρ). The association between PXS score and subsequent intubation or death was assessed. Bootstrap 95% confidence intervals (CI) were calculated.

Results: PXS scores correlated with radiographic pulmonary disease severity scores assigned to CXRs in the internal and external test sets (r=0.86 (95%CI 0.80-0.90) and r=0.86 (95%CI 0.79-0.90) respectively). The direction of change in PXS score in follow-up CXRs agreed with radiologist assessment (ρ=0.74 (95%CI 0.63-0.81)). In patients not intubated on the admission CXR, the PXS score predicted subsequent intubation or death within three days of hospital admission (area under the receiver operating characteristic curve=0.80 (95%CI 0.75-0.85)).

Conclusion: A Siamese neural network-based severity score automatically measures radiographic COVID-19 pulmonary disease severity, which can be used to track disease change and predict subsequent intubation or death.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1148/ryai.2020200079DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7392327PMC
July 2020

Radiomics Repeatability Pitfalls in a Scan-Rescan MRI Study of Glioblastoma.

Radiol Artif Intell 2021 Jan 16;3(1):e190199. Epub 2020 Dec 16.

Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology (K.V.H., J.B.P., A.L.B., K.C., P.S., J.M.B., M.C.P., B.R.R., J.K.C.), and Stephen E. and Catherine Pappas Center for Neuro-Oncology (T.T.B., E.R.G.), Massachusetts General Hospital, 149 13th St, Charlestown, MA 02129; and Harvard-MIT Division of Health Sciences and Technology, Cambridge, Mass (K.V.H., J.B.P., K.C.).

Purpose: To determine the influence of preprocessing on the repeatability and redundancy of radiomics features extracted using a popular open-source radiomics software package in a scan-rescan glioblastoma MRI study.

Materials And Methods: In this study, a secondary analysis of T2-weighted fluid-attenuated inversion recovery (FLAIR) and T1-weighted postcontrast images from 48 patients (mean age, 56 years [range, 22-77 years]) diagnosed with glioblastoma were included from two prospective studies (ClinicalTrials.gov NCT00662506 [2009-2011] and NCT00756106 [2008-2011]). All patients underwent two baseline scans 2-6 days apart using identical imaging protocols on 3-T MRI systems. No treatment occurred between scan and rescan, and tumors were essentially unchanged visually. Radiomic features were extracted by using PyRadiomics https://pyradiomics.readthedocs.io/ under varying conditions, including normalization strategies and intensity quantization. Subsequently, intraclass correlation coefficients were determined between feature values of the scan and rescan.

Results: Shape features showed a higher repeatability than intensity (adjusted < .001) and texture features (adjusted < .001) for both T2-weighted FLAIR and T1-weighted postcontrast images. Normalization improved the overlap between the region of interest intensity histograms of scan and rescan (adjusted < .001 for both T2-weighted FLAIR and T1-weighted postcontrast images), except in scans where brain extraction fails. As such, normalization significantly improves the repeatability of intensity features from T2-weighted FLAIR scans (adjusted = .003 [ score normalization] and adjusted = .002 [histogram matching]). The use of a relative intensity binning strategy as opposed to default absolute intensity binning reduces correlation between gray-level co-occurrence matrix features after normalization.

Conclusion: Both normalization and intensity quantization have an effect on the level of repeatability and redundancy of features, emphasizing the importance of both accurate reporting of methodology in radiomics articles and understanding the limitations of choices made in pipeline design. © RSNA, 2020See also the commentary by Tiwari and Verma in this issue.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1148/ryai.2020190199DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7845781PMC
January 2021

Applications of Artificial Intelligence for Retinopathy of Prematurity Screening.

Pediatrics 2021 03;147(3)

Athinoula A. Martinos Center for Biomedical Imaging and Department of Radiology, Massachusetts General Hospital, Charlestown, Massachusetts.

Objectives: Childhood blindness from retinopathy of prematurity (ROP) is increasing as a result of improvements in neonatal care worldwide. We evaluate the effectiveness of artificial intelligence (AI)-based screening in an Indian ROP telemedicine program and whether differences in ROP severity between neonatal care units (NCUs) identified by using AI are related to differences in oxygen-titrating capability.

Methods: External validation study of an existing AI-based quantitative severity scale for ROP on a data set of images from the Retinopathy of Prematurity Eradication Save Our Sight ROP telemedicine program in India. All images were assigned an ROP severity score (1-9) by using the Imaging and Informatics in Retinopathy of Prematurity Deep Learning system. We calculated the area under the receiver operating characteristic curve and sensitivity and specificity for treatment-requiring retinopathy of prematurity. Using multivariable linear regression, we evaluated the mean and median ROP severity in each NCU as a function of mean birth weight, gestational age, and the presence of oxygen blenders and pulse oxygenation monitors.

Results: The area under the receiver operating characteristic curve for detection of treatment-requiring retinopathy of prematurity was 0.98, with 100% sensitivity and 78% specificity. We found higher median (interquartile range) ROP severity in NCUs without oxygen blenders and pulse oxygenation monitors, most apparent in bigger infants (>1500 g and 31 weeks' gestation: 2.7 [2.5-3.0] vs 3.1 [2.4-3.8]; = .007, with adjustment for birth weight and gestational age).

Conclusions: Integration of AI into ROP screening programs may lead to improved access to care for secondary prevention of ROP and may facilitate assessment of disease epidemiology and NCU resources.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1542/peds.2020-016618DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924138PMC
March 2021

Deep Learning for the Diagnosis of Stage in Retinopathy of Prematurity: Accuracy and Generalizability across Populations and Cameras.

Ophthalmol Retina 2021 10 6;5(10):1027-1035. Epub 2021 Feb 6.

Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon. Electronic address:

Purpose: Stage is an important feature to identify in retinal images of infants at risk of retinopathy of prematurity (ROP). The purpose of this study was to implement a convolutional neural network (CNN) for binary detection of stages 1, 2, and 3 in ROP and to evaluate its generalizability across different populations and camera systems.

Design: Diagnostic validation study of CNN for stage detection.

Participants: Retinal fundus images obtained from preterm infants during routine ROP screenings.

Methods: Two datasets were used: 5943 fundus images obtained by RetCam camera (Natus Medical, Pleasanton, CA) from 9 North American institutions and 5049 images obtained by 3nethra camera (Forus Health Incorporated, Bengaluru, India) from 4 hospitals in Nepal. Images were labeled based on the presence of stage by 1 to 3 expert graders. Three CNN models were trained using 5-fold cross-validation on datasets from North America alone, Nepal alone, and a combined dataset and were evaluated on 2 held-out test sets consisting of 708 and 247 images from the Nepali and North American datasets, respectively.

Main Outcome Measures: Convolutional neural network performance was evaluated using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), sensitivity, and specificity.

Results: Both the North American- and Nepali-trained models demonstrated high performance on a test set from the same population: AUROC, 0.99; AUPRC, 0.98; sensitivity, 94%; and AUROC, 0.97; AUPRC, 0.91; and sensitivity, 73%; respectively. However, the performance of each model decreased to AUROC of 0.96 and AUPRC of 0.88 (sensitivity, 52%) and AUROC of 0.62 and AUPRC of 0.36 (sensitivity, 44%) when evaluated on a test set from the other population. Compared with the models trained on individual datasets, the model trained on a combined dataset achieved improved performance on each respective test set: sensitivity improved from 94% to 98% on the North American test set and from 73% to 82% on the Nepali test set.

Conclusions: A CNN can identify accurately the presence of ROP stage in retinal images, but performance depends on the similarity between training and testing populations. We demonstrated that internal and external performance can be improved by increasing the heterogeneity of the training dataset features of the training dataset, in this case by combining images from different populations and cameras.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.oret.2020.12.013DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8364291PMC
October 2021

Improvement and Multi-Population Generalizability of a Deep Learning-Based Chest Radiograph Severity Score for COVID-19.

medRxiv 2020 Sep 18. Epub 2020 Sep 18.

Purpose: To improve and test the generalizability of a deep learning-based model for assessment of COVID-19 lung disease severity on chest radiographs (CXRs) from different patient populations.

Materials And Methods: A published convolutional Siamese neural network-based model previously trained on hospitalized patients with COVID-19 was tuned using 250 outpatient CXRs. This model produces a quantitative measure of COVID-19 lung disease severity (pulmonary x-ray severity (PXS) score). The model was evaluated on CXRs from four test sets, including 3 from the United States (patients hospitalized at an academic medical center (N=154), patients hospitalized at a community hospital (N=113), and outpatients (N=108)) and 1 from Brazil (patients at an academic medical center emergency department (N=303)). Radiologists from both countries independently assigned reference standard CXR severity scores, which were correlated with the PXS scores as a measure of model performance (Pearson r). The Uniform Manifold Approximation and Projection (UMAP) technique was used to visualize the neural network results.

Results: Tuning the deep learning model with outpatient data improved model performance in two United States hospitalized patient datasets (r=0.88 and r=0.90, compared to baseline r=0.86). Model performance was similar, though slightly lower, when tested on the United States outpatient and Brazil emergency department datasets (r=0.86 and r=0.85, respectively). UMAP showed that the model learned disease severity information that generalized across test sets.

Conclusions: Performance of a deep learning-based model that extracts a COVID-19 severity score on CXRs improved using training data from a different patient cohort (outpatient versus hospitalized) and generalized across multiple populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2020.09.15.20195453DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523150PMC
September 2020

Multi-Institutional Assessment and Crowdsourcing Evaluation of Deep Learning for Automated Classification of Breast Density.

J Am Coll Radiol 2020 Dec 24;17(12):1653-1662. Epub 2020 Jun 24.

Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts; Scientific Director (CCDS), Director (QTIM lab and the Center for Machine Learning), Associate Professor of Radiology, MGH/Harvard Medical School, Boston, Massachusetts. Electronic address:

Objective: We developed deep learning algorithms to automatically assess BI-RADS breast density.

Methods: Using a large multi-institution patient cohort of 108,230 digital screening mammograms from the Digital Mammographic Imaging Screening Trial, we investigated the effect of data, model, and training parameters on overall model performance and provided crowdsourcing evaluation from the attendees of the ACR 2019 Annual Meeting.

Results: Our best-performing algorithm achieved good agreement with radiologists who were qualified interpreters of mammograms, with a four-class κ of 0.667. When training was performed with randomly sampled images from the data set versus sampling equal number of images from each density category, the model predictions were biased away from the low-prevalence categories such as extremely dense breasts. The net result was an increase in sensitivity and a decrease in specificity for predicting dense breasts for equal class compared with random sampling. We also found that the performance of the model degrades when we evaluate on digital mammography data formats that differ from the one that we trained on, emphasizing the importance of multi-institutional training sets. Lastly, we showed that crowdsourced annotations, including those from attendees who routinely read mammograms, had higher agreement with our algorithm than with the original interpreting radiologists.

Conclusion: We demonstrated the possible parameters that can influence the performance of the model and how crowdsourcing can be used for evaluation. This study was performed in tandem with the development of the ACR AI-LAB, a platform for democratizing artificial intelligence.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jacr.2020.05.015DOI Listing
December 2020

Automated assessment of COVID-19 pulmonary disease severity on chest radiographs using convolutional Siamese neural networks.

medRxiv 2020 May 26. Epub 2020 May 26.

Purpose To develop an automated measure of COVID-19 pulmonary disease severity on chest radiographs (CXRs), for longitudinal disease evaluation and clinical risk stratification. Materials and Methods A convolutional Siamese neural network-based algorithm was trained to output a measure of pulmonary disease severity on anterior-posterior CXRs (pulmonary x-ray severity (PXS) score), using weakly-supervised pretraining on ~160,000 images from CheXpert and transfer learning on 314 CXRs from patients with COVID-19. The algorithm was evaluated on internal and external test sets from different hospitals, containing 154 and 113 CXRs respectively. The PXS score was correlated with a radiographic severity score independently assigned by two thoracic radiologists and one in-training radiologist. For 92 internal test set patients with follow-up CXRs, the change in PXS score was compared to radiologist assessments of change. The association between PXS score and subsequent intubation or death was assessed. Results The PXS score correlated with the radiographic pulmonary disease severity score assigned to CXRs in the COVID-19 internal and external test sets (ρ=0.84 and ρ=0.78 respectively). The direction of change in PXS score in follow-up CXRs agreed with radiologist assessment (ρ=0.74). In patients not intubated on the admission CXR, the PXS score predicted subsequent intubation or death within three days of hospital admission (area under the receiver operator characteristic curve=0.80 (95%CI 0.75-0.85)). Conclusion A Siamese neural network-based severity score automatically measures COVID-19 pulmonary disease severity in chest radiographs, which can be scaled and rapidly deployed for clinical triage and workflow optimization.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2020.05.20.20108159DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7274251PMC
May 2020

Siamese neural networks for continuous disease severity evaluation and change detection in medical imaging.

NPJ Digit Med 2020 26;3:48. Epub 2020 Mar 26.

1Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA USA.

Using medical images to evaluate disease severity and change over time is a routine and important task in clinical decision making. Grading systems are often used, but are unreliable as domain experts disagree on disease severity category thresholds. These discrete categories also do not reflect the underlying continuous spectrum of disease severity. To address these issues, we developed a convolutional Siamese neural network approach to evaluate disease severity at single time points and change between longitudinal patient visits on a continuous spectrum. We demonstrate this in two medical imaging domains: retinopathy of prematurity (ROP) in retinal photographs and osteoarthritis in knee radiographs. Our patient cohorts consist of 4861 images from 870 patients in the Imaging and Informatics in Retinopathy of Prematurity (i-ROP) cohort study and 10,012 images from 3021 patients in the Multicenter Osteoarthritis Study (MOST), both of which feature longitudinal imaging data. Multiple expert clinician raters ranked 100 retinal images and 100 knee radiographs from excluded test sets for severity of ROP and osteoarthritis, respectively. The Siamese neural network output for each image in comparison to a pool of normal reference images correlates with disease severity rank ( = 0.87 for ROP and  = 0.89 for osteoarthritis), both within and between the clinical grading categories. Thus, this output can represent the continuous spectrum of disease severity at any single time point. The difference in these outputs can be used to show change over time. Alternatively, paired images from the same patient at two time points can be directly compared using the Siamese neural network, resulting in an additional continuous measure of change between images. Importantly, our approach does not require manual localization of the pathology of interest and requires only a binary label for training (same versus different). The location of disease and site of change detected by the algorithm can be visualized using an occlusion sensitivity map-based approach. For a longitudinal binary change detection task, our Siamese neural networks achieve test set receiving operator characteristic area under the curves (AUCs) of up to 0.90 in evaluating ROP or knee osteoarthritis change, depending on the change detection strategy. The overall performance on this binary task is similar compared to a conventional convolutional deep-neural network trained for multi-class classification. Our results demonstrate that convolutional Siamese neural networks can be a powerful tool for evaluating the continuous spectrum of disease severity and change in medical imaging.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41746-020-0255-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7099081PMC
March 2020

Deep Tone Mapping Operator for High Dynamic Range Images.

IEEE Trans Image Process 2019 Sep 2. Epub 2019 Sep 2.

A computationally fast tone mapping operator (TMO) that can quickly adapt to a wide spectrum of high dynamic range (HDR) content is quintessential for visualization on varied low dynamic range (LDR) output devices such as movie screens or standard displays. Existing TMOs can successfully tone-map only a limited number of HDR content and require an extensive parameter tuning to yield the best subjective-quality tone-mapped output. In this paper, we address this problem by proposing a fast, parameter-free and scene-adaptable deep tone mapping operator (DeepTMO) that yields a high-resolution and high-subjective quality tone mapped output. Based on conditional generative adversarial network (cGAN), DeepTMO not only learns to adapt to vast scenic-content (e.g., outdoor, indoor, human, structures, etc.) but also tackles the HDR related scene-specific challenges such as contrast and brightness, while preserving the fine-grained details. We explore 4 possible combinations of Generator-Discriminator architectural designs to specifically address some prominent issues in HDR related deep-learning frameworks like blurring, tiling patterns and saturation artifacts. By exploring different influences of scales, loss-functions and normalization layers under a cGAN setting, we conclude with adopting a multi-scale model for our task. To further leverage on the large-scale availability of unlabeled HDR data, we train our network by generating targets using an objective HDR quality metric, namely Tone Mapping Image Quality Index (TMQI). We demonstrate results both quantitatively and qualitatively, and showcase that our DeepTMO generates high-resolution, high-quality output images over a large spectrum of real-world scenes. Finally, we evaluate the perceived quality of our results by conducting a pair-wise subjective study which confirms the versatility of our method.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2019.2936649DOI Listing
September 2019
-->