Comprehensive decision tree models in bioinformatics.

PLoS One 2012 30;7(3):e33812. Epub 2012 Mar 30.

Faculty of Health Sciences, University of Maribor, Maribor, Slovenia.

Purpose: Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible.

Methods: This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree.

Results: The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree.

Conclusions: The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.

Download full-text PDF

Source
http://ri.fzv.uni-mb.si/gstiglic/pdfs/plosone2012.pdf
Web Search
http://dx.plos.org/10.1371/journal.pone.0033812
Publisher Site
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0033812PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3316502PMC
November 2012
5 Reads

Publication Analysis

Top Keywords

machine learning
20
decision tree
16
tuning decision
12
classification performance
12
decision trees
12
visual tuning
8
decision
8
tuned decision
8
visually tuned
8
classical machine
8
tree models
8
tuning
6
classification
6
models
5
tree
5
bioinformatics
5
datasets
5
learning
5
machine
5
benchmarking datasets
4

References

(Supplied by CrossRef)
Top 10 Algorithms in Data Mining,
X Wu et al.
Knowledge and Information Systems 2008
SPRINT: A Scalable Parallel Classifier for Data Mining.
JC Shafer et al.
1996

JR Quinlan et al.
1993
Classification and regression trees
L Breiman et al.
1984

SJ Darnell et al.
2007
Gene Expression Classification: Decision Trees vs. SVMs.
Y Xiaojing et al.
FLAIRS Conference 2003
Development and validation of a computer-aided diagnostic tool to screen for age-related macular degeneration by optical coherence tomography.,
P Serrano-Aguilar et al.
Br J Ophthalmol 2011
A generic approach for image classsification based on decision tree ensembles and local sub-windows.
R Marée et al.
2004
KnowledgeSeeker website.
Automatic construction of decision trees from data: A multidisciplinary survey.
SK Smith et al.
Data Mining and Knowledge Discovery 1998

VN Vapnik et al.
1999

Similar Publications