A systematic literature review of machine learning in online personal health data.

Authors:
Zhijun Yin
Zhijun Yin
Vanderbilt University
Nashville | United States
Lina M Sulieman
Lina M Sulieman
Vanderbilt University Medical Center
Nashville | United States
Bradley A Malin
Bradley A Malin
Vanderbilt University
Lake Success | United States

J Am Med Inform Assoc 2019 Jun;26(6):561-576

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

Objective: User-generated content (UGC) in online environments provides opportunities to learn an individual's health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations.

Materials And Methods: We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review.

Results: We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support.

Conclusions: The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.

Download full-text PDF

Source
https://academic.oup.com/jamia/advance-article/doi/10.1093/j
Publisher Site
http://dx.doi.org/10.1093/jamia/ocz009DOI Listing
June 2019
9 Reads

Publication Analysis

Top Keywords

personal health
16
applied ugc
8
machine learning
8
ugc
7
health
6
publications applied
4
library aaai
4
ugc focus
4
2018 publications
4
indicated effectively
4
common health-related
4
proceedings 2010
4
effectively applied
4
2010 2018
4
focus personal
4
health identified
4
identified 103
4
n = 17 ensemble
4
103 eligible
4
review indicated
4

Altmetric Statistics

References

(Supplied by CrossRef)

Collen et al.
2015

Yin et al.
2017
Characterisation of mental health conditions in social media using informed deep learning
Gkotsis et al.
Sci Rep 2017
Social media usage: 2005–2015
Perrin et al.
Pew Res Cent 2015

Similar Publications