Publications by authors named "Riccardo Bellazzi"

196 Publications

A Reliable Machine Learning Approach applied to Single-Cell Classification in Acute Myeloid Leukemia.

AMIA Annu Symp Proc 2020 25;2020:925-932. Epub 2021 Jan 25.

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.

Machine Learning research applied to the medical field is increasing. However, few of the proposed approaches are actually deployed in clinical settings. One reason is that current methods may not be able to generalize on new unseen instances which differ from the training population, thus providing unreliable classifications. Approaches to measure classification reliability could be useful to assess whether to trust prediction on new cases. Here, we propose a new reliability measure based on the similarity of a new instance to the training set. In particular, we evaluate whether this example would be selected as informative by an instance selection method, in comparison with the available training set. We show that this method distinguishes reliable examples, for which we can trust the classifier's prediction, from unreliable ones, both on simulated data and in a real-case scenario, to distinguish tumor and normal cells in Acute Myeloid Leukemia patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075526PMC
January 2021

Digital Health during COVID-19: Informatics Dialogue with the World Health Organization.

Yearb Med Inform 2021 Apr 21. Epub 2021 Apr 21.

eHealth Development Association, Amman, Jordan.

Background: On December 16, 2020 representatives of the International Medical Informatics Association (IMIA), a Non-Governmental Organization in official relations with the World Health Organization (WHO), along with its International Academy for Health Sciences Informatics (IAHSI), held an open dialogue with WHO Director General (WHO DG) Tedros Adhanom Ghebreyesus about the opportunities and challenges of digital health during the COVID-19 global pandemic.

Objectives: The aim of this paper is to report the outcomes of the dialogue and discussions with more than 200 participants representing different civil society organizations (CSOs).

Methods: The dialogue was held in form of a webinar. After an initial address of the WHO DG, short presentations by the panelists, and live discussions between panelists, the WHO DG and WHO representatives took place. The audience was able to post questions in written. These written discussions were saved with participants' consent and summarized in this paper.

Results: The main themes that were brought up by the audience for discussion were: (a) opportunities and challenges in general; (b) ethics and artificial intelligence; (c) digital divide; (d) education. Proposed actions included the development of a roadmap based on the lessons learned from the COVID-19 pandemic.

Conclusions: Decision making by policy makers needs to be evidence-based and health informatics research should be used to support decisions surrounding digital health, and we further propose next steps in the collaboration between IMIA and WHO such as future engagement in the World Health Assembly.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1055/s-0041-1726480DOI Listing
April 2021

Exploring the inter-subject variability in the relationship between glucose monitoring metrics and glycated hemoglobin for pediatric patients with type 1 diabetes.

J Pediatr Endocrinol Metab 2021 May 7;34(5):619-625. Epub 2021 Apr 7.

Department of Electrical, Computer and Biomedical Engineering, Università degli Studi di Pavia, Pavia, Italy.

Objectives: Despite the widespread diffusion of continuous glucose monitoring (CGM) systems, which includes both real-time CGM (rtCGM) and intermittently scanned CGM (isCGM), an effective application of CGM technology in clinical practice is still limited. The study aimed to investigate the relationship between isCGM-derived glycemic metrics and glycated hemoglobin (HbA1c), identifying overall CGM targets and exploring the inter-subject variability.

Methods: A group of 27 children and adolescents with type 1 diabetes under multiple daily injection insulin-therapy was enrolled. All participants used the isCGM Abbott's FreeStyle Libre system on average for eight months, and clinical data were collected from the Advanced Intelligent Distant-Glucose Monitoring platform. Starting from each HbA1c exam date, windows of past 30, 60, and 90 days were considered to compute several CGM metrics. The relationships between HbA1c and each metric were explored through linear mixed models, adopting an HbA1c target of 7%.

Results: Time in Range and Time in Target Range show a negative relationship with HbA1c (R>0.88) whereas Time Above Range and Time Severely Above Range show a positive relationship (R>0.75). Focusing on Time in Range in 30-day windows, random effect represented by the patient's specific intercept reveals a high variability compared to the overall population intercept.

Conclusions: This study confirms the relationship between several CGM metrics and HbA1c; it also highlights the importance of an individualized interpretation of the CGM data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1515/jpem-2020-0725DOI Listing
May 2021

What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask.

J Med Internet Res 2021 03 2;23(3):e22219. Epub 2021 Mar 2.

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.

Coincident with the tsunami of COVID-19-related publications, there has been a surge of studies using real-world data, including those obtained from the electronic health record (EHR). Unfortunately, several of these high-profile publications were retracted because of concerns regarding the soundness and quality of the studies and the EHR data they purported to analyze. These retractions highlight that although a small community of EHR informatics experts can readily identify strengths and flaws in EHR-derived studies, many medical editorial teams and otherwise sophisticated medical readers lack the framework to fully critically appraise these studies. In addition, conventional statistical analyses cannot overcome the need for an understanding of the opportunities and limitations of EHR-derived studies. We distill here from the broader informatics literature six key considerations that are crucial for appraising studies utilizing EHR data: data completeness, data collection and handling (eg, transformation), data type (ie, codified, textual), robustness of methods against EHR variability (within and across institutions, countries, and time), transparency of data and analytic code, and the multidisciplinary approach. These considerations will inform researchers, clinicians, and other stakeholders as to the recommended best practices in reviewing manuscripts, grants, and other outputs from EHR-data derived studies, and thereby promote and foster rigor, quality, and reliability of this rapidly growing field.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2196/22219DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7927948PMC
March 2021

Using Case-Based Reasoning in a Learning System: A Prototype of a Pedagogical Nurse Tool for Evidence-Based Diabetic Foot Ulcer Care.

J Diabetes Sci Technol 2021 Feb 15:1932296821991127. Epub 2021 Feb 15.

Department of Health Science and Technology, Aalborg University, Denmark.

Background: Currently, evidence-based learning systems to increase knowledge and evidence level of wound care are unavailable to wound care nurses in Denmark, which means that they need to learn about diabetic foot ulcers from experience and peer-to-peer training, or by asking experienced colleagues. Interactive evidence-based learning systems built on case-based reasoning (CBR) have the potential to increase wound care nurses' diabetic foot ulcer knowledge and evidence levels.

Method: A prototype of a CBR-interactive, evidence-based algorithm-operated learning system calculates a dissimilarity score (DS) that gives a quantitative measure of similarity between a new case and cases stored in a case base in relation to six variables: necrosis, wound size, granulation, fibrin, dry skin, and age. Based on the DS, cases are selected by matching the six variables with the best predictive power and by weighing the impact of each variable according to its contribution to the prediction. The cases are ranked, and the six cases with the lowest DS are visualized in the system.

Results: Conventional education, that is, evidence-based learning material such as books and lectures, may be less motivating and pedagogical than peer-to-peer training, which is, however, often less evidence-based. The CBR interactive learning systems presented in this study may bridge the two approaches. Showing wound care nurses how individual variables affect outcomes may help them achieve greater insights into pathophysiological processes.

Conclusion: A prototype of a CBR-interactive, evidence-based learning system that is centered on diabetic foot ulcers and related treatments bridges the gap between traditional evidence-based learning and more motivating and interactive learning approaches.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1177/1932296821991127DOI Listing
February 2021

Validation of an Internationally Derived Patient Severity Phenotype to Support COVID-19 Analytics from Electronic Health Record Data.

J Am Med Inform Assoc 2021 Feb 10. Epub 2021 Feb 10.

IRCCS ICS Maugeri, Pavia, Italy.

Introduction: The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing COVID-19 with federated analyses of electronic health record (EHR) data.

Objective: We sought to develop and validate a computable phenotype for COVID-19 severity.

Methods: Twelve 4CE sites participated. First we developed an EHR-based severity phenotype consisting of six code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also piloted an alternative machine-learning approach and compared selected predictors of severity to the 4CE phenotype at one site.

Results: The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability - up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean AUC 0.903 (95% CI: 0.886, 0.921), compared to AUC 0.956 (95% CI: 0.952, 0.959) for the machine-learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared to chart review.

Discussion: We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine-learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly due to heterogeneous pandemic conditions.

Conclusion: We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocab018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7928835PMC
February 2021

International Comparisons of Harmonized Laboratory Value Trajectories to Predict Severe COVID-19: Leveraging the 4CE Collaborative Across 342 Hospitals and 6 Countries: A Retrospective Cohort Study.

medRxiv 2021 Feb 5. Epub 2021 Feb 5.

BIOMERIS (BIOMedical Research Informatics Solutions).

Objectives: To perform an international comparison of the trajectory of laboratory values among hospitalized patients with COVID-19 who develop severe disease and identify optimal timing of laboratory value collection to predict severity across hospitals and regions.

Design: Retrospective cohort study.

Setting: The Consortium for Clinical Characterization of COVID-19 by EHR (4CE), an international multi-site data-sharing collaborative of 342 hospitals in the US and in Europe.

Participants: Patients hospitalized with COVID-19, admitted before or after PCR-confirmed result for SARS-CoV-2. Primary and secondary outcome measures: Patients were categorized as ″ever-severe″ or ″never-severe″ using the validated 4CE severity criteria. Eighteen laboratory tests associated with poor COVID-19-related outcomes were evaluated for predictive accuracy by area under the curve (AUC), compared between the severity categories. Subgroup analysis was performed to validate a subset of laboratory values as predictive of severity against a published algorithm. A subset of laboratory values (CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin) was compared between North American and European sites for severity prediction.

Results: Of 36,447 patients with COVID-19, 19,953 (43.7%) were categorized as ever-severe. Most patients (78.7%) were 50 years of age or older and male (60.5%). Longitudinal trajectories of CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin showed association with disease severity. Significant differences of laboratory values at admission were found between the two groups. With the exception of D-dimer, predictive discrimination of laboratory values did not improve after admission. Sub-group analysis using age, D-dimer, CRP, and lymphocyte count as predictive of severity at admission showed similar discrimination to a published algorithm (AUC=0.88 and 0.91, respectively). Both models deteriorated in predictive accuracy as the disease progressed. On average, no difference in severity prediction was found between North American and European sites.

Conclusions: Laboratory test values at admission can be used to predict severity in patients with COVID-19. Prediction models show consistency across international sites highlighting the potential generalizability of these models.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/2020.12.16.20247684DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7872369PMC
February 2021

Cytoplasmic movements of the early human embryo: imaging and artificial intelligence to predict blastocyst development.

Reprod Biomed Online 2021 Mar 24;42(3):521-528. Epub 2020 Dec 24.

Department of Biology and Biotechnology 'Lazzaro Spallanzani', University of Pavia, Via Ferrata, 9 27100, Italy; Centre for Health Technology, University of Pavia, Pavia, Italy. Electronic address:

Research Question: Can artificial intelligence and advanced image analysis extract and harness novel information derived from cytoplasmic movements of the early human embryo to predict development to blastocyst?

Design: In a proof-of-principle study, 230 human preimplantation embryos were retrospectively assessed using an artificial neural network. After intracytoplasmic sperm injection, embryos underwent time-lapse monitoring for 44 h. For comparison, standard embryo assessment of each embryo by a single embryologist was carried out to predict development to blastocyst stage based on a single picture frame taken at 42 h of development. In the experimental approach, in embryos that developed to blastocyst or destined to arrest, cytoplasm movement velocity was recorded by time-lapse monitoring during the first 44 h of culture and analysed with a Particle Image Velocimetry algorithm to extract quantitative information. Three main artificial intelligence approaches, the k-Nearest Neighbour, the Long-Short Term Memory Neural Network and the hybrid ensemble classifier were used to classify the embryos.

Results: Blind operator assessment classified each embryo in terms of ability to develop to blastocyst, with 75.4% accuracy, 76.5% sensitivity, 74.3% specificity, 74.3% precision and 75.4% F1 score. Integration of results from artificial intelligence models with the blind operator classification, resulted in 82.6% accuracy, 79.4% sensitivity, 85.7% specificity, 84.4% precision and 81.8% F1 score.

Conclusions: The present study suggests the possibility of predicting human blastocyst development at early cleavage stages by detection of cytoplasm movement velocity and artificial intelligence analysis. This indicates the importance of the dynamics of the cytoplasm as a novel and valuable source of data to assess embryo viability.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.rbmo.2020.12.008DOI Listing
March 2021

Evolving determinants of carotid atherosclerosis vulnerability in asymptomatic patients from the MAGNETIC observational study.

Sci Rep 2021 Jan 27;11(1):2327. Epub 2021 Jan 27.

Molecular Cardiology, Istituti Clinici Scientifici Maugeri, Pavia, Italy.

MRI can assess plaque composition and has demonstrated an association between some atherosclerotic risk factors (RF) and markers of plaque vulnerability in naive patients. We aimed at investigating this association in medically treated asymptomatic patients. This is a cross-sectional interim analysis (August 2013-September 2016) of a single center prospective study on carotid plaque vulnerability (MAGNETIC study). We recruited patients with asymptomatic carotid atherosclerosis (US stenosis > 30%, ECST criteria), receiving medical treatments at a tertiary cardiac rehabilitation. Atherosclerotic burden and plaque composition were quantified with 3.0 T MRI. The association between baseline characteristics and extent of lipid-rich necrotic core (LRNC), fibrous cap (CAP) and intraplaque hemorrhage (IPH) was studied with multiple regression analysis. We enrolled 260 patients (198 male, 76%) with median age of 71-y (interquartile range: 65-76). Patients were on antiplatelet therapy, ACE-inhibitors/angiotensin receptor blockers and statins (196-229, 75-88%). Median LDL-cholesterol was 78 mg/dl (59-106), blood pressure 130/70 mmHg (111-140/65-80), glycosylated hemoglobin 46 mmol/mol (39-51) and BMI 25 kg/m (23-28); moreover, 125 out of 187 (67%) patients were ex-smokers. Multivariate analysis of a data-set of 487 (94%) carotid arteries showed that a history of hypercholesterolemia, diabetes, hypertension or smoking did not correlate with LRNC, CAP or IPH. Conversely, maximum stenosis was the strongest independent predictor of LRNC, CAP and IPH (p < 0.001). MRI assessment of plaque composition in patients on treatment for asymptomatic carotid atherosclerosis shows no correlation between plaque vulnerability and the most well-controlled modifiable RF. Conversely, maximum stenosis exhibits a strong correlation with vulnerable features despite treatment.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-021-81247-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7840938PMC
January 2021

The DNA-helicase HELLS drives ALK ALCL proliferation by the transcriptional control of a cytokinesis-related program.

Cell Death Dis 2021 Jan 27;12(1):130. Epub 2021 Jan 27.

Laboratory of Translational Research, Azienda USL-IRCCS di Reggio Emilia, Reggio Emilia, 42123, Italy.

Deregulation of chromatin modifiers, including DNA helicases, is emerging as one of the mechanisms underlying the transformation of anaplastic lymphoma kinase negative (ALK) anaplastic large cell lymphoma (ALCL). We recently identified the DNA-helicase HELLS as central for proficient ALKALCL proliferation and progression. Here we assessed in detail its function by performing RNA-sequencing profiling coupled with bioinformatic prediction to identify HELLS targets and transcriptional cooperators. We demonstrated that HELLS, together with the transcription factor YY1, contributes to an appropriate cytokinesis via the transcriptional regulation of genes involved in cleavage furrow regulation. Binding target promoters, HELLS primes YY1 recruitment and transcriptional activation of cytoskeleton genes including the small GTPases RhoA and RhoU and their effector kinase Pak2. Single or multiple knockdowns of these genes reveal that RhoA and RhoU mediate HELLS effects on cell proliferation and cell division of ALKALCLs. Collectively, our work demonstrates the transcriptional role of HELLS in orchestrating a complex transcriptional program sustaining neoplastic features of ALKALCL.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41419-021-03425-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7840974PMC
January 2021

Health informatics and EHR to support clinical research in the COVID-19 pandemic: an overview.

Brief Bioinform 2021 03;22(2):812-822

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy.

The coronavirus disease 2019 (COVID-19) pandemic has clearly shown that major challenges and threats for humankind need to be addressed with global answers and shared decisions. Data and their analytics are crucial components of such decision-making activities. Rather interestingly, one of the most difficult aspects is reusing and sharing of accurate and detailed clinical data collected by Electronic Health Records (EHR), even if these data have a paramount importance. EHR data, in fact, are not only essential for supporting day-by-day activities, but also they can leverage research and support critical decisions about effectiveness of drugs and therapeutic strategies. In this paper, we will concentrate our attention on collaborative data infrastructures to support COVID-19 research and on the open issues of data sharing and data governance that COVID-19 had made emerge. Data interoperability, healthcare processes modelling and representation, shared procedures to deal with different data privacy regulations, and data stewardship and governance are seen as the most important aspects to boost collaborative research. Lessons learned from COVID-19 pandemic can be a strong element to improve international research and our future capability of dealing with fast developing emergencies and needs, which are likely to be more frequent in the future in our connected and intertwined world.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbaa418DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7929411PMC
March 2021

Continuous Glucose and Heart Rate Monitoring in Young People with Type 1 Diabetes: An Exploratory Study about Perspectives in Nocturnal Hypoglycemia Detection.

Metabolites 2020 Dec 24;11(1). Epub 2020 Dec 24.

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, 27100 Pavia, Italy.

A combination of information from blood glucose (BG) and heart rate (HR) measurements has been proposed to investigate the HR changes related to nocturnal hypoglycemia (NH) episodes in pediatric subjects with type 1 diabetes (T1D), examining whether they could improve hypoglycemia prediction. We enrolled seventeen children and adolescents with T1D, monitored on average for 194 days. BG was detected by flash glucose monitoring devices, and HR was measured by wrist-worn fitness trackers. For each subject, we compared HR values recorded in the hour before NH episodes (before-hypoglycemia) with HR values recorded during sleep intervals without hypoglycemia (no-hypoglycemia). Furthermore, we investigated the behavior after the end of NH. Nine participants (53%) experienced at least three NH. Among these nine subjects, six (67%) showed a statistically significant difference between the before-hypoglycemia HR distribution and the no-hypoglycemia HR distribution. In all these six cases, the before-hypoglycemia HR median value was higher than the no-hypoglycemia HR median value. In almost all cases, HR values after the end of hypoglycemia remained higher compared to no-hypoglycemia sleep intervals. This exploratory study support that HR modifications occur during NH in T1D subjects. The identification of specific HR patterns can be helpful to improve NH detection and prevent fatal events.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/metabo11010005DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7824609PMC
December 2020

Using topological data analysis and pseudo time series to infer temporal phenotypes from electronic health records.

Artif Intell Med 2020 08 15;108:101930. Epub 2020 Jul 15.

Department of Computer Science, Brunel University London, United Kingdom.

Temporal phenotyping enables clinicians to better understand observable characteristics of a disease as it progresses. Modelling disease progression that captures interactions between phenotypes is inherently challenging. Temporal models that capture change in disease over time can identify the key features that characterize disease subtypes that underpin these trajectories. These models will enable clinicians to identify early warning signs of progression in specific sub-types and therefore to make informed decisions tailored to individual patients. In this paper, we explore two approaches to building temporal phenotypes based on the topology of data: topological data analysis and pseudo time-series. Using type 2 diabetes data, we show that the topological data analysis approach is able to identify disease trajectories and that pseudo time-series can infer a state space model characterized by transitions between hidden states that represent distinct temporal phenotypes. Both approaches highlight lipid profiles as key factors in distinguishing the phenotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.artmed.2020.101930DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7536308PMC
August 2020

SCOR: A secure international informatics infrastructure to investigate COVID-19.

J Am Med Inform Assoc 2020 11;27(11):1721-1726

Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA.

Global pandemics call for large and diverse healthcare data to study various risk factors, treatment options, and disease progression patterns. Despite the enormous efforts of many large data consortium initiatives, scientific community still lacks a secure and privacy-preserving infrastructure to support auditable data sharing and facilitate automated and legally compliant federated analysis on an international scale. Existing health informatics systems do not incorporate the latest progress in modern security and federated machine learning algorithms, which are poised to offer solutions. An international group of passionate researchers came together with a joint mission to solve the problem with our finest models and tools. The SCOR Consortium has developed a ready-to-deploy secure infrastructure using world-class privacy and security technologies to reconcile the privacy/utility conflicts. We hope our effort will make a change and accelerate research in future pandemics with broad and diverse samples on an international scale.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocaa172DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7454652PMC
November 2020

International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium.

NPJ Digit Med 2020 19;3:109. Epub 2020 Aug 19.

Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC USA.

We leveraged the largely untapped resource of electronic health record data to address critical clinical and epidemiological questions about Coronavirus Disease 2019 (COVID-19). To do this, we formed an international consortium (4CE) of 96 hospitals across five countries (www.covidclinical.net). Contributors utilized the Informatics for Integrating Biology and the Bedside (i2b2) or Observational Medical Outcomes Partnership (OMOP) platforms to map to a common data model. The group focused on temporal changes in key laboratory test values. Harmonized data were analyzed locally and converted to a shared aggregate form for rapid analysis and visualization of regional differences and global commonalities. Data covered 27,584 COVID-19 cases with 187,802 laboratory tests. Case counts and laboratory trajectories were concordant with existing literature. Laboratory tests at the time of diagnosis showed hospital-level differences equivalent to country-level variation across the consortium partners. Despite the limitations of decentralized data generation, we established a framework to capture the trajectory of COVID-19 disease in patients and their response to interventions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41746-020-00308-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7438496PMC
August 2020

Integrated Multi-Omics Analyses in Oncology: A Review of Machine Learning Methods and Tools.

Front Oncol 2020 30;10:1030. Epub 2020 Jun 30.

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.

In recent years, high-throughput sequencing technologies provide unprecedented opportunity to depict cancer samples at multiple molecular levels. The integration and analysis of these multi-omics datasets is a crucial and critical step to gain actionable knowledge in a precision medicine framework. This paper explores recent data-driven methodologies that have been developed and applied to respond major challenges of stratified medicine in oncology, including patients' phenotyping, biomarker discovery, and drug repurposing. We systematically retrieved peer-reviewed journals published from 2014 to 2019, select and thoroughly describe the tools presenting the most promising innovations regarding the integration of heterogeneous data, the machine learning methodologies that successfully tackled the complexity of multi-omics data, and the frameworks to deliver actionable results for clinical practice. The review is organized according to the applied methods: Deep learning, Network-based methods, Clustering, Features Extraction, and Transformation, Factorization. We provide an overview of the tools available in each methodological group and underline the relationship among the different categories. Our analysis revealed how multi-omics datasets could be exploited to drive precision oncology, but also current limitations in the development of multi-omics data integration.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fonc.2020.01030DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7338582PMC
June 2020

A survey on single and multi omics data mining methods in cancer data classification.

J Biomed Inform 2020 07 7;107:103466. Epub 2020 Jun 7.

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; IRCCS ICS Maugeri, Pavia, Italy.

Data analytics is routinely used to support biomedical research in all areas, with particular focus on the most relevant clinical conditions, such as cancer. Bioinformatics approaches, in particular, have been used to characterize the molecular aspects of diseases. In recent years, numerous studies have been performed on cancer based upon single and multi-omics data. For example, Single-omics-based studies have employed a diverse set of data, such as gene expression, DNA methylation, or miRNA, to name only a few instances. Despite that, a significant part of literature reports studies on gene expression with microarray datasets. Single-omics data have high numbers of attributes and very low sample counts. This characteristic makes them paradigmatic of an under-sampled, small-n large-p machine learning problem. An important goal of single-omics data analysis is to find the most relevant genes, in terms of their potential use in clinics and research, in the batch of available data. This problem has been addressed in gene selection as one of the pre-processing steps in data mining. An analysis that use only one type of data (single-omics) often miss the complexity of the landscape of molecular phenomena underlying the disease. As a result, they provide limited and sometimes poorly reliable information about the disease mechanisms. Therefore, in recent years, researchers have been eager to build models that are more complex, obtaining more reliable results using multi-omics data. However, to achieve this, the most important challenge is data integration. In this paper, we provide a comprehensive overview of the challenges in single and multi-omics data analysis of cancer data, focusing on gene selection and data integration methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2020.103466DOI Listing
July 2020

Mining post-surgical care processes in breast cancer patients.

Artif Intell Med 2020 05 15;105:101855. Epub 2020 Apr 15.

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy. Electronic address:

In this work we describe the application of a careflow mining algorithm to detect the most frequent patterns of care in a cohort of 3000 breast cancer patients. The applied method relies on longitudinal data extracted from electronic health records, recorded from the first surgical procedure after a breast cancer diagnosis. Careflows are mined from events data recorded for administrative purposes, including procedures from ICD9 - CM billing codes and chemotherapy treatments. Events data have been pre-processed with Topic Modelling to create composite events based on concurrent procedures. The results of the careflow mining algorithm allow the discovery of electronic temporal phenotypes across the studied population. These phenotypes are further characterized on the basis of clinical traits and tumour histopathology, as well as in terms of relapses, metastasis occurrence and 5-year survival rates. Results are highly significant from a clinical perspective, since phenotypes describe well characterized pathology classes, and the careflows are well matched with existing clinical guidelines. The analysis thus facilitates deriving real-world evidence that can inform clinicians as well as hospital decision makers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.artmed.2020.101855DOI Listing
May 2020

Linc00941 Is a Novel Transforming Growth Factor β Target That Primes Papillary Thyroid Cancer Metastatic Behavior by Regulating the Expression of Cadherin 6.

Thyroid 2021 02 1;31(2):247-263. Epub 2020 Jul 1.

Laboratory of Translational Research, Azienda USL-IRCCS di Reggio Emilia, Reggio Emilia, Italy.

Papillary thyroid cancers (PTCs) are common, usually indolent malignancies. Still, a small but significant percentage of patients have aggressive tumors and develop distant metastases leading to death. Currently, it is not possible to discriminate aggressive lesions due to lack of prognostic markers. Long noncoding RNAs (lncRNAs), which are selectively expressed in a context-dependent manner, are expected to represent a new landscape to search for molecular discriminants. Transforming growth factor β (TGFβ) is a multifunctional cytokine that fosters epithelial-to-mesenchymal transition and metastatic spreading. In PTCs, it triggers the expression of the metastatic marker Cadherin 6 (CDH6). Here, we investigated the TGFβ-dependent lncRNAs that may cooperate to potentiate PTC aggressiveness. We used a genome-wide approach to map enhancer (ENH)-associated lncRNAs under TGFβ control. Linc00941 was selected and validated using functional assays. A combined approach using bioinformatic analyses of the thyroid cancer (THCA)-the cancer genome atlas (TCGA) dataset and RNA-seq analysis was used to identify the processes in which linc00941 was involved in and the genes under its regulation. Correlation with clinical data was performed to evaluate the potential of this lncRNA and its targets as prognostic markers in THCA. Linc00941 was identified as transcribed starting from one of the TGFβ-induced ENHs. Linc00941 expression was significantly higher in aggressive cancer both in the TCGA dataset and in a separate validation cohort from our institution. Loss of function assays for linc00941 showed that it promotes response to stimuli and invasiveness while restraining proliferation in PTC cells, a typical phenotype of metastatic cells. From the integration of TCGA data and linc00941 knockdown RNA-seq profiling, we identified 77 genes under the regulation of this lncRNA. Among these, we found the prometastatic gene . Linc00941 knockdown partially recapitulates the effects observed upon CDH6 silencing, promoting cell cytoskeleton and membrane adhesions rearrangements and autophagy. The combined expression of CDH6 and linc00941 is a distinctive feature of highly aggressive PTC lesions. Our data provide new insights into the biology driving metastasis in PTCs and highlight how lncRNAs cooperate with coding transcripts to sustain these processes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/thy.2020.0001DOI Listing
February 2021

A Bayesian data fusion based approach for learning genome-wide transcriptional regulatory networks.

BMC Bioinformatics 2020 May 29;21(1):219. Epub 2020 May 29.

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, Italy.

Background: Reverse engineering of transcriptional regulatory networks (TRN) from genomics data has always represented a computational challenge in System Biology. The major issue is modeling the complex crosstalk among transcription factors (TFs) and their target genes, with a method able to handle both the high number of interacting variables and the noise in the available heterogeneous experimental sources of information.

Results: In this work, we propose a data fusion approach that exploits the integration of complementary omics-data as prior knowledge within a Bayesian framework, in order to learn and model large-scale transcriptional networks. We develop a hybrid structure-learning algorithm able to jointly combine TFs ChIP-Sequencing data and gene expression compendia to reconstruct TRNs in a genome-wide perspective. Applying our method to high-throughput data, we verified its ability to deal with the complexity of a genomic TRN, providing a snapshot of the synergistic TFs regulatory activity. Given the noisy nature of data-driven prior knowledge, which potentially contains incorrect information, we also tested the method's robustness to false priors on a benchmark dataset, comparing the proposed approach to other regulatory network reconstruction algorithms. We demonstrated the effectiveness of our framework by evaluating structural commonalities of our learned genomic network with other existing networks inferred by different DNA binding information-based methods.

Conclusions: This Bayesian omics-data fusion based methodology allows to gain a genome-wide picture of the transcriptional interplay, helping to unravel key hierarchical transcriptional interactions, which could be subsequently investigated, and it represents a promising learning approach suitable for multi-layered genomic data integration, given its robustness to noisy sources and its tailored framework for handling high dimensional data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3510-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7257163PMC
May 2020

Progress in Characterizing the Human Exposome: a Key Step for Precision Medicine.

Yearb Med Inform 2020 Aug 17;29(1):115-120. Epub 2020 Apr 17.

The University of Manchester, UK.

Objective: Most diseases result from the complex interplay between genetic and environmental factors. The exposome can be defined as a systematic approach to acquire large data sets corresponding to environmental exposures of an individual along her/ his life. The objective of this contribution is to raise awareness within the health informatics community about the importance of dealing with data related to the contribution of environmental factors to individual health, particularly in the context of precision medicine informatics.

Methods: This article summarizes the main findings after a panel organized by the International Medical Informatics Association - Exposome Informatics Working Group held during the last MEDINFO, in Lyon (France) in August 2019.

Results: The members of our community presented four initiatives (PULSE, Digital exposome, Cloudy with a chance of pain, Wearable clinics), providing a detailed view of current challenges and accomplishments in processing environmental and social data from a health research perspective. Projects illustrate a wide range of research methods, digital data collection technologies, and analytics and visualization tools. This reinforces the idea that this area is now ready for health informaticians to step in and contribute their expertise, leading the application of informatics strategies to understand environmental health problems.

Conclusions: The featured projects illustrate applications that use exposome data for the investigation of the causes of diseases, health care, patient empowerment, and public health. They offer a rich overview of the expanding range of applications that informatics is finding in the field of environmental health, with potential impact in precision medicine.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1055/s-0040-1701975DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7442499PMC
August 2020

Comparative Study of Salivary, Duodenal, and Fecal Microbiota Composition Across Adult Celiac Disease.

J Clin Med 2020 Apr 13;9(4). Epub 2020 Apr 13.

Gastroenterology Unit, Department of Medicine, A.O.U.I. Borgo Roma and University of Verona, 37134 Verona, Italy.

Background: Growing evidence suggests that an altered microbiota composition contributes to the pathogenesis and clinical features in celiac disease (CD). We performed a comparative analysis of the gut microbiota in adulthood CD to evaluate whether: (i) dysbiosis anticipates mucosal lesions, (ii) gluten-free diet restores eubiosis, (iii) refractory CD has a peculiar microbial signature, and (iv) salivary and fecal communities overlap the mucosal one.

Methods: This is a cross-sectional study where a total of 52 CD patients, including 13 active CD, 29 treated CD, 4 refractory CD, and 6 potential CD, were enrolled in a tertiary center together with 31 controls. A 16S rRNA-based amplicon metagenomics approach was applied to determine the microbiota structure and composition of salivary, duodenal mucosa, and stool samples, followed by appropriate bioinformatic analyses.

Results: A reduction of both α- and β-diversity in CD, already evident in the potential form and achieving nadir in refractory CD, was evident. Taxonomically, mucosa displayed a significant abundance of and an expansion of , especially in active patients, while treated celiacs showed an intermediate profile between active disease and controls. The saliva community mirrored the mucosal one better than stool.

Conclusion: Expansion of pathobiontic species anticipates villous atrophy and achieves the maximal divergence from controls in refractory CD. Gluten-free diet results in incomplete recovery. The overlapping results between mucosal and salivary samples indicate the use of saliva as a diagnostic fluid.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/jcm9041109DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7231226PMC
April 2020

Deep Learning to Unveil Correlations between Urban Landscape and Population Health.

Sensors (Basel) 2020 Apr 8;20(7). Epub 2020 Apr 8.

Department of Electrical, Computer and Biomedical Engineering, via Ferrata 5, 27100 Pavia, Italy.

The global healthcare landscape is continuously changing throughout the world as technology advances, leading to a gradual change in lifestyle. Several diseases such as asthma and cardiovascular conditions are becoming more diffuse, due to a rise in pollution exposure and a more sedentary lifestyle. Healthcare providers deal with increasing new challenges, and thanks to fast-developing big data technologies, they can be faced with systems that provide direct support to citizens. In this context, within the EU-funded Participatory Urban Living for Sustainable Environments (PULSE) project, we are implementing a data analytic platform designed to provide public health decision makers with advanced approaches, to jointly analyze maps and geospatial information with healthcare and air pollution data. In this paper we describe a component of such platforms, which couples deep learning analysis of urban geospatial images with healthcare indexes collected by the 500 Cities project. By applying a pre-learned deep Neural Network architecture, satellite images of New York City are analyzed and latent feature variables are extracted. These features are used to derive clusters, which are correlated with healthcare indicators by means of a multivariate classification model. Thanks to this pipeline, it is possible to show that, in New York City, health care indexes are significantly correlated to the urban landscape. This pipeline can serve as a basis to ease urban planning, since the same interventions can be organized on similar areas, even if geographically distant.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/s20072105DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7181035PMC
April 2020

The Search for Molecular Markers in a Gene-Orphan Case Study of a Pediatric Spinal Cord Pilocytic Astrocytoma.

Cancer Genomics Proteomics 2020 Mar-Apr;17(2):117-130

Department of Biology and Biotechnology, University of Pavia, Pavia, Italy

Background/aim: We herein presented a case of pediatric spinal cord pilocytic astrocytoma diagnosed on the basis of histopathological and clinical findings.

Materials And Methods: Given the paucity of data on genetic features for this tumor, we performed exome, array CGH and RNA sequencing analysis from nucleic acids isolated from a unique and not repeatable very small amount of a formalin-fixed, paraffin-embedded (FFPE) specimen.

Results: DNA mutation analysis, comparing tumor and normal lymphocyte peripheral DNA, evidenced few tumor-specific single nucleotide variants in DEFB119, MUC5B, NUDT1, LTBP3 and CPSF3L genes. Differently, tumor DNA was not characterized by for the main pilocytic astrocytoma gene variations, including BRAFV600E. An inframe trinucleotides insertion involving DLX6 or lnc DLX6-AS1 genes was scored in 44.9% of sequenced reads; the temporal profile of this variation on the expression of DLX-AS1 was investigated in patient's urine-derived exosomes, reporting no significant variation in the one-year molecular follow-up. Array CGH identified a tumor microdeletion at the 6q25.3 chromosomal region, spanning 1,01 Mb and comprising ZDHHC14, SNX9, TULP4 and SYTL3 genes. The expression of these genes did not change in urine-derived exosomes during the one-year investigation period. Finally, RNAseq did not reveal any of the common pilocytic BRAF-KIAA1549 genes fusion events.

Conclusion: To our knowledge, the present report is one of the first described gene-orphan case studies of a pediatric spinal cord pilocytic astrocytoma.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.21873/cgp.20172DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7078841PMC
September 2020

Dataset on linear and non-linear indices for discriminating healthy and IUGR fetuses.

Data Brief 2020 Apr 29;29:105164. Epub 2020 Jan 29.

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100 Pavia, Italy.

The presented collection of data comprises of a set of 12 linear and nonlinear indices computed at different time scales and extracted from Fetal Heart Rate (FHR) traces acquired through Hewlett Packard CTG fetal monitors (series 1351A), connected to a PC. The sampling frequency of the recorded FHR signal is equal 2 Hz. The recorded populations consist of two groups of fetuses: 60 healthy and 60 Intra Uterine Growth Restricted (IUGR) fetuses. IUGR condition is a fetal condition defined as the abnormal rate of fetal growth. In clinical practice, diagnosis is confirmed at birth and may only be suspected during pregnancy. The pathology is a documented cause of fetal and neonatal morbidity and mortality. The described database was employed in a set of machine learning approaches for the early detection of the IUGR condition: "Integrating machine learning techniques and physiology based heart rate features for antepartum fetal monitoring" [1]. The added value of the proposed indices is their interpretability and close connection to physiological and pathological aspect of FHR regulation. Additional information on data acquisition, feature extraction and potential relevance in clinical practice are discussed in [1].
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.dib.2020.105164DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7015997PMC
April 2020

Clustering Cardiovascular Risk Trajectories of Patients with Type 2 Diabetes Using Process Mining.

Annu Int Conf IEEE Eng Med Biol Soc 2019 Jul;2019:341-344

Patients with type 2 diabetes have a higher chance of developing cardiovascular diseases and an increased odds of mortality. Reliability of randomized clinical trials is continuously judged due to selection, attrition and reporting bias. Moreover, cardiovascular risk is frequently assessed in cross-sectional studies instead of observing the evolution of risk in longitudinal cohorts. In order to correctly assess the course of cardiovascular risk in patients with type 2 diabetes, we applied process mining techniques based on the principles of evidence-based medicine. Using a validated formulation of the cardiovascular risk, process mining allowed to cluster frequent risk pathways and produced 3 major trajectories related to risk management: high risk, medium risk and low risk. This enables the extraction of meaningful distributions, such as the gender of the patients per cluster in a human understandable manner, leading to more insights to improve the management of cardiovascular diseases in type 2 diabetes patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/EMBC.2019.8856507DOI Listing
July 2019

Patient-Generated Health Data Integration and Advanced Analytics for Diabetes Management: The AID-GM Platform.

Sensors (Basel) 2019 Dec 24;20(1). Epub 2019 Dec 24.

Department of Electrical, Computer and Biomedical Engineering University of Pavia, 27100 Pavia, Italy.

Diabetes is a high-prevalence disease that leads to an alteration in the patient's blood glucose (BG) values. Several factors influence the subject's BG profile over the day, including meals, physical activity, and sleep. Wearable devices are available for monitoring the patient's BG value around the clock, while activity trackers can be used to record his/her sleep and physical activity. However, few tools are available to jointly analyze the collected data, and only a minority of them provide functionalities for performing advanced and personalized analyses. In this paper, we present AID-GM, a web application that enables the patient to share with his/her diabetologist both the raw BG data collected by a flash glucose monitoring device, and the information collected by activity trackers, including physical activity, heart rate, and sleep. AID-GM provides several data views for summarizing the subject's metabolic control over time, and for complementing the BG profile with the information given by the activity tracker. AID-GM also allows the identification of complex temporal patterns in the collected heterogeneous data. In this paper, we also present the results of a real-world pilot study aimed to assess the usability of the proposed system. The study involved 30 pediatric patients receiving care at the Fondazione IRCCS Policlinico San Matteo Hospital in Pavia, Italy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/s20010128DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6983021PMC
December 2019

Integrating machine learning techniques and physiology based heart rate features for antepartum fetal monitoring.

Comput Methods Programs Biomed 2020 Mar 17;185:105015. Epub 2019 Oct 17.

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100 Pavia, Italy. Electronic address:

Background And Objectives: Intrauterine Growth Restriction (IUGR) is a fetal condition defined as the abnormal rate of fetal growth. The pathology is a documented cause of fetal and neonatal morbidity and mortality. In clinical practice, diagnosis is confirmed at birth and may only be suspected during pregnancy. Therefore, designing an accurate model for the early and prompt identification of pathology in the antepartum period is crucial in view of pregnancy management.

Methods: We tested the performance of 15 machine learning techniques in discriminating healthy versus IUGR fetuses. The various models were trained with a set of 12 physiology based heart rate features extracted from a single antepartum CardioTocographic (CTG) recording. The reason for the utilization of time, frequency, and nonlinear indices is based on their standalone documented ability to describe several physiological and pathological fetal conditions.

Results: We validated our approach on a database of 60 healthy and 60 IUGR fetuses. The machine learning methodology achieving the best performance was Random Forests. Specifically, we obtained a mean classification accuracy of 0.911 [0.860, 0.961 (0.95 confidence interval)] averaged over 10 test sets (10 Fold Cross Validation). Similar results were provided by Classification Trees, Logistic Regression, and Support Vector Machines. A features ranking procedure highlighted that nonlinear indices showed the highest capability to discriminate between the considered fetal conditions. Nevertheless, is the combination of features investigating CTG signal in different domains, that contributes to an increase in classification accuracy.

Conclusions: We provided validation of an accurate artificially intelligence framework for the diagnosis of IUGR condition in the antepartum period. The employed physiology based heart rate features constitute an interpretable link between the machine learning results and the quantitative estimators of fetal wellbeing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cmpb.2019.105015DOI Listing
March 2020

Autologous micrograft accelerates endogenous wound healing response through ERK-induced cell migration.

Cell Death Differ 2020 05 25;27(5):1520-1538. Epub 2019 Oct 25.

Department of Development and Regeneration, Stem Cell Institute, KU Leuven, B-3000, Leuven, Belgium.

Defective cell migration causes delayed wound healing (WH) and chronic skin lesions. Autologous micrograft (AMG) therapies have recently emerged as a new effective and affordable treatment able to improve wound healing capacity. However, the precise molecular mechanism through which AMG exhibits its beneficial effects remains unrevealed. Herein we show that AMG improves skin re-epithelialization by accelerating the migration of fibroblasts and keratinocytes. More specifically, AMG-treated wounds showed improvement of indispensable events associated with successful wound healing such as granulation tissue formation, organized collagen content, and newly formed blood vessels. We demonstrate that AMG is enriched with a pool of WH-associated growth factors that may provide the starting signal for a faster endogenous wound healing response. This work links the increased cell migration rate to the activation of the extracellular signal-regulated kinase (ERK) signaling pathway, which is followed by an increase in matrix metalloproteinase expression and their extracellular enzymatic activity. Overall we reveal the AMG-mediated wound healing transcriptional signature and shed light on the AMG molecular mechanism supporting its potential to trigger a highly improved wound healing process. In this way, we present a framework for future improvements in AMG therapy for skin tissue regeneration applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41418-019-0433-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206041PMC
May 2020