IEEE/ACM Trans Comput Biol Bioinform 2019 Mar 7. Epub 2019 Mar 7.
Feature selection (FS) is one of fundamental data processing techniques in machine learning algorithms, especially for classification of healthcare data. It is a challenging issue due to the large search space. Binary Particle Swarm Optimization (BPSO) is an efficient evolutionary computation technique, and has been widely used in FS. However, in traditional BPSO-based FS schemes, each particle's historically best position and the globally best position of particle swarm are iteratively updated according to the overall fitness of the particle, without taking into account the fine-grained impact of each dimension in the participle. In addition, the acquisition cost of different features is naturally different, especially for medical data. To address these two issues, this paper proposed the Confidence-based and Cost-effective feature selection (CCFS) method using BPSO. First, CCFS improves search effectiveness through developing a new updating mechanism, in which confidence of each feature is explicitly considered, including the correlation between feature and categories, and historically selected frequency of each feature. Second, the feature cost is intentionally incorporated into the design of the fitness function. CCFS has been verified in various UCI public datasets. The experimental result shows the effectiveness of the proposed method, in terms of accuracy and feature selection cost.