Detecting discordance enrichment among a series of two-sample genome-wide expression data sets.

Authors:
Yinglei Lai
Yinglei Lai
The George Washington University
United States
Fanni Zhang
Fanni Zhang
The George Washington University
Tapan K Nayak
Tapan K Nayak
National Cancer Institute
United States
Reza Modarres
Reza Modarres
The George Washington University
United States
Norman H Lee
Norman H Lee
The George Washington University Medical Center

BMC Genomics 2017 01 25;18(Suppl 1):1050. Epub 2017 Jan 25.

Department of Medicine, Division of Genomic Medicine, The George Washington University Medical Center, Washington, 20037, D.C., USA.

Background: With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest.

Methods: In this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when the number of data sets is increased. The model-based probability of discordance enrichment can be calculated for gene set detection.

Results: We apply our method to a microarray expression data set collected from forty-five matched tumor/non-tumor pairs of tissues for studying pancreatic cancer. We divided the data set into a series of non-overlapping subsets according to the tumor/non-tumor paired expression ratio of gene PNLIP (pancreatic lipase, recently shown it association with pancreatic cancer). The log-ratio ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). Our purpose is to understand whether any gene sets are enriched in discordant behaviors among these subsets (when the log-ratio is increased from negative to positive). We focus on KEGG pathways. The detected pathways will be useful for our further understanding of the role of gene PNLIP in pancreatic cancer research. Among the top list of detected pathways, the neuroactive ligand receptor interaction and olfactory transduction pathways are the most significant two. Then, we consider gene TP53 that is well-known for its role as tumor suppressor in cancer research. The log-ratio also ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). We divided the microarray data set again according to the expression ratio of gene TP53. After the discordance enrichment analysis, we observed overall similar results and the above two pathways are still the most significant detections. More interestingly, only these two pathways have been identified for their association with pancreatic cancer in a pathway analysis of genome-wide association study (GWAS) data.

Conclusions: This study illustrates that some disease-related pathways can be enriched in discordant molecular behaviors when an important disease-related gene changes its expression. Our proposed statistical method is useful in the detection of these pathways. Furthermore, our method can also be applied to genome-wide expression data collected by the recent RNA-seq technology.

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-016-3265-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5310286PMC
January 2017
14 Reads

Publication Analysis

Top Keywords

data sets
16
expression data
16
pancreatic cancer
16
data set
12
genome-wide expression
12
gene set
12
discordance enrichment
12
gene
9
data
9
data collected
8
ratio gene
8
expression ratio
8
set enrichment
8
gene pnlip
8
log-ratio ranges
8
ranges negative
8
statistical method
8
cancer log-ratio
8
association pancreatic
8
pnlip pancreatic
8

Similar Publications

Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.

BMC Genomics 2014 24;15 Suppl 1:S6. Epub 2014 Jan 24.

Background: Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. Read More

View Article
November 2014

Integrative pathway analysis of genome-wide association studies and gene expression data in prostate cancer.

BMC Syst Biol 2012 17;6 Suppl 3:S13. Epub 2012 Dec 17.

Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.

Background: Pathway analysis of large-scale omics data assists us with the examination of the cumulative effects of multiple functionally related genes, which are difficult to detect using the traditional single gene/marker analysis. So far, most of the genomic studies have been conducted in a single domain, e.g. Read More

View Article
June 2013

Reproducibility enhancement and differential expression of non predefined functional gene sets in human genome.

BMC Genomics 2014 Dec 24;15:1181. Epub 2014 Dec 24.

Instituto de Física, Universidade Federal do Rio Grande do Sul, Av, Bento Gonçalves, 9500, 91501-970 Porto Alegre, RS, Brazil.

Background: Transcriptogram profiling is a method to present and analyze transcription data in a genome-wide scale that reduces noise and facilitates biological interpretation. An ordered gene list is produced, such that the probability that the genes are functionally associated exponentially decays with their distance on the list. This list presents a biological logic, evinced by the selective enrichment of successive intervals with Gene Ontology terms or KEGG pathways. Read More

View Article
December 2014

Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management.

BJU Int 2011 Jul 16;108(2 Pt 2):E29-35. Epub 2011 Mar 16.

Department of Urology, University of Rostock, Rostock, Germany.

Objective: To improve the workflow for standardizing the statistical interpretation provides an opportunity for the analysis of gene expression in clear cell renal cell carcinoma (ccRCC). RCC as a solid tumour entity represents a very suitable tumour model for such investigations. Although it is possible to investigate expression profiles by microarray technologies, the main problem is how to adequately interpret the accumulated mass of data derived from microarray technologies. Read More

View Article
July 2011