Search our Database of Scientific Publications and Authors

I’m looking for a
    Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and "Big data" biology.
    OMICS 2013 Aug 11;17(8):414-22. Epub 2013 Jun 11.
    Biomedical Biotechnology Research Institute, North Carolina Central University, Durham, North Carolina, USA.
    Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association to obesity compared to pathways identified from the original databases.

    Similar Publications

    Integrating multiple 'omics' analysis for microbial biology: application and methodologies.
    Microbiology 2010 Feb 12;156(Pt 2):287-301. Epub 2009 Nov 12.
    Center for Ecogenomics, Biodesign Institute, Arizona State University, Tempe, AZ 85287-6501, USA.
    Recent advances in various 'omics' technologies enable quantitative monitoring of the abundance of various biological molecules in a high-throughput manner, and thus allow determination of their variation between different biological states on a genomic scale. Several popular 'omics' platforms that have been used in microbial systems biology include transcriptomics, which measures mRNA transcript levels; proteomics, which quantifies protein abundance; metabolomics, which determines abundance of small cellular metabolites; interactomics, which resolves the whole set of molecular interactions in cells; and fluxomics, which establishes dynamic changes of molecules within a cell over time. However, no single 'omics' analysis can fully unravel the complexities of fundamental microbial biology. Read More
    IPAD: the Integrated Pathway Analysis Database for Systematic Enrichment Analysis.
    BMC Bioinformatics 2012 11;13 Suppl 15:S7. Epub 2012 Sep 11.
    Department of Academic and Institutional Resources and Technology, University of North Texas Health Science Center, Fort Worth, USA.
    Background: Next-Generation Sequencing (NGS) technologies and Genome-Wide Association Studies (GWAS) generate millions of reads and hundreds of datasets, and there is an urgent need for a better way to accurately interpret and distill such large amounts of data. Extensive pathway and network analysis allow for the discovery of highly significant pathways from a set of disease vs. healthy samples in the NGS and GWAS. Read More
    Integrated enrichment analysis and pathway-centered visualization of metabolomics, proteomics, transcriptomics, and genomics data by using the InCroMAP software.
    J Chromatogr B Analyt Technol Biomed Life Sci 2014 Sep 25;966:77-82. Epub 2014 Apr 25.
    Institute for Diabetes Research and Metabolic Diseases of the Helmholtz Centre Munich at the University of Tübingen, Tübingen, Germany; Division of Clinical Chemistry and Pathobiochemistry, Department of Internal Medicine IV, University Hospital Tübingen, Tübingen, Germany; German Center for Diabetes Research (DZD), Germany. Electronic address:
    In systems biology, the combination of multiple types of omics data, such as metabolomics, proteomics, transcriptomics, and genomics, yields more information on a biological process than the analysis of a single type of data. Thus, data from different omics platforms is usually combined in one experimental setup to obtain insight into a biological process or a disease state. Particularly high accuracy metabolomics data from modern mass spectrometry instruments is currently more and more integrated into biological studies. Read More
    Integrative pathway analysis of genome-wide association studies and gene expression data in prostate cancer.
    BMC Syst Biol 2012 17;6 Suppl 3:S13. Epub 2012 Dec 17.
    Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.
    Background: Pathway analysis of large-scale omics data assists us with the examination of the cumulative effects of multiple functionally related genes, which are difficult to detect using the traditional single gene/marker analysis. So far, most of the genomic studies have been conducted in a single domain, e.g. Read More