Applying under-sampling techniques and cost-sensitive learning methods on risk assessment of breast cancer.

J Med Syst 2015 Apr 25;39(4):210. Epub 2015 Feb 25.

Department of Computer Science and Information Engineering, Fu Jen Catholic University, New Tapei City, Taiwan, Republic of China,

Breast cancer is one of the most common cause of cancer mortality. Early detection through mammography screening could significantly reduce mortality from breast cancer. However, most of screening methods may consume large amount of resources. We propose a computational model, which is solely based on personal health information, for breast cancer risk assessment. Our model can be served as a pre-screening program in the low-cost setting. In our study, the data set, consisting of 3976 records, is collected from Taipei City Hospital starting from 2008.1.1 to 2008.12.31. Based on the dataset, we first apply the sampling techniques and dimension reduction method to preprocess the testing data. Then, we construct various kinds of classifiers (including basic classifiers, ensemble methods, and cost-sensitive methods) to predict the risk. The cost-sensitive method with random forest classifier is able to achieve recall (or sensitivity) as 100 %. At the recall of 100 %, the precision (positive predictive value, PPV), and specificity of cost-sensitive method with random forest classifier was 2.9 % and 14.87 %, respectively. In our study, we build a breast cancer risk assessment model by using the data mining techniques. Our model has the potential to be served as an assisting tool in the breast cancer screening.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10916-015-0210-xDOI Listing
April 2015
2 Reads

Publication Analysis

Top Keywords

breast cancer
24
risk assessment
12
method random
8
random forest
8
forest classifier
8
cost-sensitive method
8
assessment model
8
cancer risk
8
cancer screening
8
cancer
7
breast
6
preprocess testing
4
testing data
4
method preprocess
4
dimension reduction
4
reduction method
4
basic classifiers
4
data construct
4
kinds classifiers
4
techniques dimension
4

References

(Supplied by CrossRef)

R Siegel et al.
CA: Cancer J. Clin. 2013

J Kim et al.
J. Am. Med. Inform. Assoc. 2013

Z Uhry et al.
Stat. Methods Med. Res. 2010

A Bleyer et al.
N. Engl. J. Med. 2012

JD Blume et al.
J. Am. Med. Assoc. 2008

SJ Lord et al.
Eur. J. Cancer 2007

J Kittler et al.
IEEE Trans. Pattern Anal. Mach. Intell. 1998

DH Wolpert et al.
Neural Netw. 1992

Similar Publications