Gaining confidence in biological interpretation of the microarray data: the functional consistence of the significant GO categories.

Bioinformatics 2008 Jan 15;24(2):265-71. Epub 2007 Nov 15.

Department of Bioinformatics, Bio-pharmaceutical Key Laboratory of Heilongjiang Province-Incubator of State Key Laboratory, Harbin Medical University, Harbin 150086, China.

Motivation: In microarray studies, numerous tools are available for functional enrichment analysis based on GO categories. Most of these tools, due to their requirement of a prior threshold for designating genes as differentially expressed genes (DEGs), are categorized as threshold-dependent methods that often suffer from a major criticism on their changing results with different thresholds.

Results: In the present article, by considering the inherent correlation structure of the GO categories, a continuous measure based on semantic similarity of GO categories is proposed to investigate the functional consistence (or stability) of threshold-dependent methods. The results from several datasets show when simply counting overlapping categories between two groups, the significant category groups selected under different DEG thresholds are seemingly very different. However, based on the semantic similarity measure proposed in this article, the results are rather functionally consistent for a wide range of DEG thresholds. Moreover, we find that the functional consistence of gene lists ranked by SAM metric behaves relatively robust against changing DEG thresholds.

Availability: Source code in R is available on request from the authors.

Download full-text PDF

Source
https://academic.oup.com/bioinformatics/article-lookup/doi/1
Publisher Site
http://dx.doi.org/10.1093/bioinformatics/btm558DOI Listing
January 2008
1 Read

Publication Analysis

Top Keywords

functional consistence
12
semantic similarity
8
threshold-dependent methods
8
based semantic
8
deg thresholds
8
categories
5
consistence stability
4
investigate functional
4
proposed investigate
4
similarity categories
4
categories proposed
4
stability threshold-dependent
4
methods datasets
4
overlapping categories
4
categories groups
4
counting overlapping
4
simply counting
4
datasets simply
4
source code
4
measure based
4

References

(Supplied by CrossRef)
Controlling the false discovery rate: a practical and powerful approach to multiple testing
Benjamini et al.
J. R. Stat. Soc. Ser. B (Methodological) 1995
Global functional profiling of gene expression
Draghici et al.
Genomics 2003

Frakes et al.
1992

Similar Publications