Predicting outcomes of chronic kidney disease from EMR data based on Random Forest Regression.

Math Biosci 2019 04 12;310:24-30. Epub 2019 Feb 12.

Bioinformatics and Mathematical Biosciences Lab, Department of Mathematics and Statistics, South Dakota State University, Brookings, SD 57006, USA. Electronic address:

Chronic kidney disease (CKD) is prevalent across the world, and kidney function is well defined by an estimated glomerular filtration rate (eGFR). The progression of kidney disease can be predicted if the future eGFR can be accurately estimated using predictive analytics. In this study, we developed and validated a prediction model of eGFR by data extracted from a regional health system. This dataset includes demographic, clinical and laboratory information from primary care clinics. The model was built using Random Forest regression and evaluated using Goodness-of-fit statistics and discrimination metrics. After data preprocessing, the patient cohort for model development and validation contained 61,740 patients. The final model included eGFR, age, gender, body mass index (BMI), obesity, hypertension, and diabetes, which achieved a mean coefficient of determination of 0.95. The estimated eGFRs were used to classify patients into CKD stages with high macro-averaged and micro-averaged metrics. In conclusion, a model using real-world electronic medical records (EMR) data can accurately predict future kidney functions and provide clinical decision support.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.mbs.2019.02.001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6435377PMC
April 2019

Publication Analysis

Top Keywords

kidney disease
12
emr data
8
random forest
8
chronic kidney
8
forest regression
8
model
5
kidney
5
validation contained
4
contained 61740
4
development validation
4
cohort model
4
preprocessing patient
4
patient cohort
4
model development
4
61740 patients
4
egfr age
4
age gender
4
gender body
4
included egfr
4
model included
4

Similar Publications