Genes sharing the protein family domain decrease the performance of classification with RNA-seq genomic signatures.
Biol Direct 2018 Feb 21;13(1). Epub 2018 Feb 21.
Scientific IT Services, ETH Zurich, Weinbergstrasse 11, Zürich, 8092, Switzerland.
Background: The experience with running various types of classification on the CAMDA neuroblastoma dataset have led us to the conclusion that the results are not always obvious and may differ depending on type of analysis and selection of genes used for classification. This paper aims in pointing out several factors that may influence the downstream machine learning analysis. In particular those factors are: type of the primary analysis, type of the classifier and increased correlation between the genes sharing a protein domain. Read More