Artif Intell Med 2017 Oct 20;82:1-10. Epub 2017 Sep 20.
Department of Industrial Engineering - Universidade Federal do Rio Grande do Sul, Av. Osvaldo Aranha, 99-5° andar, Porto Alegre, RS, Brazil. Electronic address:
Colorectal cancer (CRC) a leading cause of death by cancer, and screening programs for its early identification are at the heart of the increasing survival rates. To motivate population participation, non-invasive, accurate, scalable and cost-effective diagnosis methods are required. Blood fluorescence spectroscopy provides rich information that can be used for cancer identification. The main challenges in analyzing blood fluorescence data for CRC classification are related to its high dimensionality and inherent variability, especially when analyzing a small number of samples. In this paper, we present a hierarchical classification method based on plasma fluorescence to identify not only CRC, but also adenomas and other non-malignant colorectal findings that may require further medical investigation. A feature selection algorithm is proposed to deal with the high dimensionality and select discriminant fluorescence wavelengths. These are used to train a binary support vector machine (SVM) in the first level to identify the CRC samples. The remaining samples are then presented to a one-class SVM trained on healthy subjects to detect deviant samples, and thus non-malignant findings. This hierarchical design, together with the one class-SVM, aims to reduce the effects of small samples and high variability. Using a dataset analyzed in previous studies comprised of 12,341 wavelengths, we achieved much superior results. Sensitivity and specificity are 0.87 and 0.95 for CRC detection, and 0.60 and 0.79 for non-malignant findings, respectively. Compared to related work, the proposed method presented a better accuracy, required fewer features, and provides a unified approach that expands CRC detection to non-malignant findings.