Publications by authors named "Siva Sivaganesan"

16 Publications

  • Page 1 of 1

Evaluation of the Laboratory Risk Indicator for Necrotizing Fasciitis (LRINEC) score for detecting necrotizing soft tissue infections in patients with diabetes and lower extremity infection.

Diabetes Res Clin Pract 2021 Jan 21;171:108520. Epub 2020 Oct 21.

Division of Podiatric Surgery, Department of Surgery, University of Cincinnati Medical Center, Cincinnati, OH, USA; Podiatry Department, Cincinnati Veteran Affairs Medical Center, Cincinnati, OH, USA.

Aims: The aim of this pilot study was to assess the Laboratory Risk Indicator for Necrotizing Fasciitis (LRINEC), a scoring system for Necrotizing Soft Tissue Infections, to diagnose Necrotizing Soft Tissue Infections of the lower extremity in patients with diabetes.

Methods: Sixty-nine patients with lower extremity infections were prospectively enrolled. The Laboratory Risk Indicator for Necrotizing Fasciitis was calculated and logistic regression was performed for each laboratory value.

Results: The Laboratory Risk Indicator for Necrotizing Fasciitis was associated with Necrotizing Soft Tissue Infection diagnosis in patients with diabetes (p = 0.01). Sensitivity, specificity, positive predictive value, and negative predictive value were 100%, 69%, 16.6%, and 100% respectively. Elevated C-reactive protein (OR 1.01, p = 0.02, 95% CI [1.002-1.23]) and white blood cell count (OR 1.34, p < 0.01, 95% CI [1.1-1.7]) were associated with Necrotizing Soft Tissue Infection.

Conclusions: The Laboratory Risk Indicator for Necrotizing Fasciitis was useful as a negative predictor of Necrotizing Soft Tissue Infection while C- reactive protein and white blood cell count may have value as individual predictors. We recommend high clinical suspicion of Necrotizing Soft Tissue Infections in diabetics as laboratory evaluation may be non-specific.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.diabres.2020.108520DOI Listing
January 2021

Predicting mechanism of action of cellular perturbations with pathway activity signatures.

Bioinformatics 2020 09;36(18):4781-4788

Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH 45267-0056, USA.

Motivation: Misregulation of signaling pathway activity is etiologic for many human diseases, and modulating activity of signaling pathways is often the preferred therapeutic strategy. Understanding the mechanism of action (MOA) of bioactive chemicals in terms of targeted signaling pathways is the essential first step in evaluating their therapeutic potential. Changes in signaling pathway activity are often not reflected in changes in expression of pathway genes which makes MOA inferences from transcriptional signatures (TSeses) a difficult problem.

Results: We developed a new computational method for implicating pathway targets of bioactive chemicals and other cellular perturbations by integrated analysis of pathway network topology, the Library of Integrated Network-based Cellular Signature TSes of genetic perturbations of pathway genes and the TS of the perturbation. Our methodology accurately predicts signaling pathways targeted by the perturbation when current pathway analysis approaches utilizing only the TS of the perturbation fail.

Availability And Implementation: Open source R package paslincs is available at https://github.com/uc-bd2k/paslincs.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa590DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7751003PMC
September 2020

The Library of Integrated Network-Based Cellular Signatures NIH Program: System-Level Cataloging of Human Cells Response to Perturbations.

Cell Syst 2018 01 29;6(1):13-24. Epub 2017 Nov 29.

BD2K-LINCS DCIC, Department of Environmental Health, University of Cincinnati, Cincinnati, OH 45220, USA.

The Library of Integrated Network-Based Cellular Signatures (LINCS) is an NIH Common Fund program that catalogs how human cells globally respond to chemical, genetic, and disease perturbations. Resources generated by LINCS include experimental and computational methods, visualization tools, molecular and imaging data, and signatures. By assembling an integrated picture of the range of responses of human cells exposed to many perturbations, the LINCS program aims to better understand human disease and to advance the development of new therapies. Perturbations under study include drugs, genetic perturbations, tissue micro-environments, antibodies, and disease-causing mutations. Responses to perturbations are measured by transcript profiling, mass spectrometry, cell imaging, and biochemical methods, among other assays. The LINCS program focuses on cellular physiology shared among tissues and cell types relevant to an array of diseases, including cancer, heart disease, and neurodegenerative disorders. This Perspective describes LINCS technologies, datasets, tools, and approaches to data accessibility and reusability.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cels.2017.11.001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5799026PMC
January 2018

Berkson error adjustment and other exposure surrogates in occupational case-control studies, with application to the Canadian INTEROCC study.

J Expo Sci Environ Epidemiol 2018 05 29;28(3):251-258. Epub 2017 Mar 29.

McLaughlin Centre for Population Health Risk Assessment, University of Ottawa, Ottawa, Ontario, Canada.

Many epidemiological studies assessing the relationship between exposure and disease are carried out without data on individual exposures. When this barrier is encountered in occupational studies, the subject exposures are often evaluated with a job-exposure matrix (JEM), which consists of mean exposure for occupational categories measured on a comparable group of workers. One of the objectives of the seven-country case-control study of occupational exposure and brain cancer risk, INTEROCC, was to investigate the relationship of occupational exposure to electromagnetic fields (EMF) in different frequency ranges and brain cancer risk. In this paper, we use the Canadian data from INTEROCC to estimate the odds of developing brain tumours due to occupational exposure to EMF. The first step was to find the best EMF exposure surrogate among the arithmetic mean, the geometric mean, and the mean of log-normal exposure distribution for each occupation in the JEM, in comparison to Berkson error adjustments via numerical approximation of the likelihood function. Contrary to previous studies of Berkson errors in JEMs, we found that the geometric mean was the best exposure surrogate. This analysis provided no evidence that cumulative lifetime exposure to extremely low frequency magnetic fields increases brain cancer risk, a finding consistent with other recent epidemiological studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/jes.2017.2DOI Listing
May 2018

A Bayesian subgroup analysis using collections of ANOVA models.

Biom J 2017 Jul 20;59(4):746-766. Epub 2017 Mar 20.

Department of Mathematics, University of Texas, Austin, TX, 78712, USA.

We develop a Bayesian approach to subgroup analysis using ANOVA models with multiple covariates, extending an earlier work. We assume a two-arm clinical trial with normally distributed response variable. We also assume that the covariates for subgroup finding are categorical and are a priori specified, and parsimonious easy-to-interpret subgroups are preferable. We represent the subgroups of interest by a collection of models and use a model selection approach to finding subgroups with heterogeneous effects. We develop suitable priors for the model space and use an objective Bayesian approach that yields multiplicity adjusted posterior probabilities for the models. We use a structured algorithm based on the posterior probabilities of the models to determine which subgroup effects to report. Frequentist operating characteristics of the approach are evaluated using simulation. While our approach is applicable in more general cases, we mainly focus on the 2 × 2 case of two covariates each at two levels for ease of presentation. The approach is illustrated using a real data example.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/bimj.201600064DOI Listing
July 2017

Subgroup finding via Bayesian additive regression trees.

Stat Med 2017 07 9;36(15):2391-2403. Epub 2017 Mar 9.

Cincinnati Children Hospital and Medical Center, Cincinnati, OH, U.S.A.

We provide a Bayesian decision theoretic approach to finding subgroups that have elevated treatment effects. Our approach separates the modeling of the response variable from the task of subgroup finding and allows a flexible modeling of the response variable irrespective of potential subgroups of interest. We use Bayesian additive regression trees to model the response variable and use a utility function defined in terms of a candidate subgroup and the predicted response for that subgroup. Subgroups are identified by maximizing the expected utility where the expectation is taken with respect to the posterior predictive distribution of the response, and the maximization is carried out over an a priori specified set of candidate subgroups. Our approach allows subgroups based on both quantitative and categorical covariates. We illustrate the approach using simulated data set study and a real data set. Copyright © 2017 John Wiley & Sons, Ltd.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.7276DOI Listing
July 2017

An objective Bayesian analysis of a crossover design via model selection and model averaging.

Stat Med 2016 11 30;35(25):4509-4527. Epub 2016 Jun 30.

Department of Mathematical Sciences, University of Cincinnati, Cincinnati, 45221, OH, U.S.A..

Inference about the treatment effect in a crossover design has received much attention over time owing to the uncertainty in the existence of the carryover effect and its impact on the estimation of the treatment effect. Adding to this uncertainty is that the existence of the carryover effect and its size may depend on the presence of the treatment effect and its size. We consider estimation and testing hypothesis about the treatment effect in a two-period crossover design, assuming normally distributed response variable, and use an objective Bayesian approach to test the hypothesis about the treatment effect and to estimate its size when it exists while accounting for the uncertainty about the presence of the carryover effect as well as the treatment and period effects. We evaluate and compare the performance of the proposed approach with a standard frequentist approach using simulated data, and real data. Copyright © 2016 John Wiley & Sons, Ltd.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.7015DOI Listing
November 2016

A probabilistic approach for lateralization of seizure onset zone in drug-resistant epilepsy with bilateral cerebral pathology.

Math Biosci 2016 07 29;277:136-40. Epub 2016 Apr 29.

Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, United States; Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH 45229, United States ; Division of Epidemiology and Biostatistics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, United States.

Background: Lateralization of seizure-onset zone (SOZ) during electroencephalography (EEG) monitoring in people with bilateral potentially epileptogenic lesions is important to facilitate clinical decision making for resective surgery.

Methods: We develop two Bayesian approaches for estimating the number of consecutive ipsilateral seizures required to lateralize the SOZ to a given lower limit of 95% credible interval (LLI, assuming continuous prior distribution), or to a given posterior probability (assuming mixture of discrete and continuous prior probabilities).

Results: With estimation approach, if both the cerebral hemispheres are a priori equi-probable to contain SOZ, then using Jeffrey's prior, a minimum of 9, 18, and 38 consecutive ipsilateral seizures will yield an LLI of 0.81, 0.90, and 0.95 respectively. If one of the hemisphere is a priori more likely to have SOZ, then prior beta distributions with α=3, β=2, and α=4, β=3 will require a minimum of 18 and 24 consecutive ipsilateral seizures to yield an LLI of 0.80. Contrariwise, the testing approach allows approximation of the number of consecutive ipsilateral seizures to lateralize the SOZ depending on an estimate of prior probability of lateralized SOZ, to a desired posterior probability. For a prior probability of 0.5, using uniform prior, mixture model will require 7, 17, and 37 consecutive ipsilateral seizures to lateralize the SOZ with a posterior probability of 0.8, 0.9, and 0.95 respectively.

Conclusion: While the reasoning presented here is based on probability theory, it is hoped that it may help clinical decision making and stimulate further validation with actual clinical data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.mbs.2016.04.006DOI Listing
July 2016

Genome-wide signatures of transcription factor activity: connecting transcription factors, disease, and small molecules.

PLoS Comput Biol 2013 5;9(9):e1003198. Epub 2013 Sep 5.

Laboratory for Statistical Genomics and Systems Biology, Department of Environmental Health, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America.

Identifying transcription factors (TF) involved in producing a genome-wide transcriptional profile is an essential step in building mechanistic model that can explain observed gene expression data. We developed a statistical framework for constructing genome-wide signatures of TF activity, and for using such signatures in the analysis of gene expression data produced by complex transcriptional regulatory programs. Our framework integrates ChIP-seq data and appropriately matched gene expression profiles to identify True REGulatory (TREG) TF-gene interactions. It provides genome-wide quantification of the likelihood of regulatory TF-gene interaction that can be used to either identify regulated genes, or as genome-wide signature of TF activity. To effectively use ChIP-seq data, we introduce a novel statistical model that integrates information from all binding "peaks" within 2 Mb window around a gene's transcription start site (TSS), and provides gene-level binding scores and probabilities of regulatory interaction. In the second step we integrate these binding scores and regulatory probabilities with gene expression data to assess the likelihood of True REGulatory (TREG) TF-gene interactions. We demonstrate the advantages of TREG framework in identifying genes regulated by two TFs with widely different distribution of functional binding events (ERα and E2f1). We also show that TREG signatures of TF activity vastly improve our ability to detect involvement of ERα in producing complex diseases-related transcriptional profiles. Through a large study of disease-related transcriptional signatures and transcriptional signatures of drug activity, we demonstrate that increase in statistical power associated with the use of TREG signatures makes the crucial difference in identifying key targets for treatment, and drugs to use for treatment. All methods are implemented in an open-source R package treg. The package also contains all data used in the analysis including 494 TREG binding profiles based on ENCODE ChIP-seq data. The treg package can be downloaded at http://GenomicsPortals.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1003198DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3764016PMC
April 2014

Generalized random set framework for functional enrichment analysis using primary genomics datasets.

Bioinformatics 2011 Jan 22;27(1):70-7. Epub 2010 Oct 22.

Department of Environmental Health, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA.

Motivation: Functional enrichment analysis using primary genomics datasets is an emerging approach to complement established methods for functional enrichment based on predefined lists of functionally related genes. Currently used methods depend on creating lists of 'significant' and 'non-significant' genes based on ad hoc significance cutoffs. This can lead to loss of statistical power and can introduce biases affecting the interpretation of experimental results.

Results: We developed and validated a new statistical framework, generalized random set (GRS) analysis, for comparing the genomic signatures in two datasets without the need for gene categorization. In our tests, GRS produced correct measures of statistical significance, and it showed dramatic improvement in the statistical power over other methods currently used in this setting. We also developed a procedure for identifying genes driving the concordance of the genomics profiles and demonstrated a dramatic improvement in functional coherence of genes identified in such analysis.

Availability: GRS can be downloaded as part of the R package CLEAN from http://ClusterAnalysis.org/. An online implementation is available at http://GenomicsPortals.org/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btq593DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3025713PMC
January 2011

A semi-parametric Bayesian model for unsupervised differential co-expression analysis.

BMC Bioinformatics 2010 May 7;11:234. Epub 2010 May 7.

Laboratory for Statistical Genomics and Systems Biology, Department of Environmental Health, University of Cincinnati College of Medicine, Cincinnati OH 45267-0056, USA.

Background: Differential co-expression analysis is an emerging strategy for characterizing disease related dysregulation of gene expression regulatory networks. Given pre-defined sets of biological samples, such analysis aims at identifying genes that are co-expressed in one, but not in the other set of samples.

Results: We developed a novel probabilistic framework for jointly uncovering contexts (i.e. groups of samples) with specific co-expression patterns, and groups of genes with different co-expression patterns across such contexts. In contrast to current clustering and bi-clustering procedures, the implicit similarity measure in this model used for grouping biological samples is based on the clustering structure of genes within each sample and not on traditional measures of gene expression level similarities. Within this framework, biological samples with widely discordant expression patterns can be placed in the same context as long as the co-clustering structure of genes is concordant within these samples. To the best of our knowledge, this is the first method to date for unsupervised differential co-expression analysis in this generality. When applied to the problem of identifying molecular subtypes of breast cancer, our method identified reproducible patterns of differential co-expression across several independent expression datasets. Sample groupings induced by these patterns were highly informative of the disease outcome. Expression patterns of differentially co-expressed genes provided new insights into the complex nature of the ERalpha regulatory network.

Conclusions: We demonstrated that the use of the co-clustering structure as the similarity measure in the unsupervised analysis of sample gene expression profiles provides valuable information about expression regulatory networks.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-11-234DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2876132PMC
May 2010

Bayesian hierarchical model for transcriptional module discovery by jointly modeling gene expression and ChIP-chip data.

BMC Bioinformatics 2007 Aug 3;8:283. Epub 2007 Aug 3.

Department of Environmental Health, University of Cincinnati, 3223 Eden Ave, ML 56, Cincinnati, Ohio 45267, USA.

Background: Transcriptional modules (TM) consist of groups of co-regulated genes and transcription factors (TF) regulating their expression. Two high-throughput (HT) experimental technologies, gene expression microarrays and Chromatin Immuno-Precipitation on Chip (ChIP-chip), are capable of producing data informative about expression regulatory mechanism on a genome scale. The optimal approach to joint modeling of data generated by these two complementary biological assays, with the goal of identifying and characterizing TMs, is an important open problem in computational biomedicine.

Results: We developed and validated a novel probabilistic model and related computational procedure for identifying TMs by jointly modeling gene expression and ChIP-chip binding data. We demonstrate an improved functional coherence of the TMs produced by the new method when compared to either analyzing expression or ChIP-chip data separately or to alternative approaches for joint analysis. We also demonstrate the ability of the new algorithm to identify novel regulatory relationships not revealed by ChIP-chip data alone. The new computational procedure can be used in more or less the same way as one would use simple hierarchical clustering without performing any special transformation of data prior to the analysis. The R and C-source code for implementing our algorithm is incorporated within the R package gimmR which is freely available at http://eh3.uc.edu/gimm.

Conclusion: Our results indicate that, whenever available, ChIP-chip and expression data should be analyzed within the unified probabilistic modeling framework, which will likely result in improved clusters of co-regulated genes and improved ability to detect meaningful regulatory relationships. Given the good statistical properties and the ease of use, the new computational procedure offers a worthy new tool for reconstructing transcriptional regulatory networks.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-8-283DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994961PMC
August 2007

Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments.

BMC Bioinformatics 2006 Dec 19;7:538. Epub 2006 Dec 19.

Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA.

Background: The small sample sizes often used for microarray experiments result in poor estimates of variance if each gene is considered independently. Yet accurately estimating variability of gene expression measurements in microarray experiments is essential for correctly identifying differentially expressed genes. Several recently developed methods for testing differential expression of genes utilize hierarchical Bayesian models to "pool" information from multiple genes. We have developed a statistical testing procedure that further improves upon current methods by incorporating the well-documented relationship between the absolute gene expression level and the variance of gene expression measurements into the general empirical Bayes framework.

Results: We present a novel Bayesian moderated-T, which we show to perform favorably in simulations, with two real, dual-channel microarray experiments and in two controlled single-channel experiments. In simulations, the new method achieved greater power while correctly estimating the true proportion of false positives, and in the analysis of two publicly-available "spike-in" experiments, the new method performed favorably compared to all tested alternatives. We also applied our method to two experimental datasets and discuss the additional biological insights as revealed by our method in contrast to the others. The R-source code for implementing our algorithm is freely available at http://eh3.uc.edu/ibmt.

Conclusion: We use a Bayesian hierarchical normal model to define a novel Intensity-Based Moderated T-statistic (IBMT). The method is completely data-dependent using empirical Bayes philosophy to estimate hyperparameters, and thus does not require specification of any free parameters. IBMT has the strength of balancing two important factors in the analysis of microarray data: the degree of independence of variances relative to the degree of identity (i.e. t-tests vs. equal variance assumption), and the relationship between variance and signal intensity. When this variance-intensity relationship is weak or does not exist, IBMT reduces to a previously described moderated t-statistic. Furthermore, our method may be directly applied to any array platform and experimental design. Together, these properties show IBMT to be a valuable option in the analysis of virtually any microarray experiment.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-7-538DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1781470PMC
December 2006

Effect of lot variability on ultraviolet radiation inactivation kinetics of Cryptosporidium parvum oocysts.

Environ Sci Technol 2005 Jun;39(11):4166-71

Office of the Director, Water Supply and Water Resources Division, National Risk Management Research Laboratory, U.S. Environmental Protection Agency, 26 West Martin Luther King Drive, Cincinnati, Ohio 45268, USA.

Numerous studies have demonstrated the efficiency of ultraviolet (UV) radiation for the inactivation of oocysts of Cryptosporidium parvum. In these studies inactivation is measured as reduction in oocysts. A primary goal is to estimate the UV radiation required to achieve a high degree of inactivation. Different lots of Cryptosporidium parvum oocysts are used in these studies, and the inactivation rate may vary depending on the lot of oocysts used. The goal of this paper is to account for the error in estimating the amount of inactivation after exposure to UV radiation, and for the effect of lot variability in determining the required UV radiation. A Bayesian approach is used to simultaneously model the logistic dose-response model and the UV inactivation kinetic model. The oocysts lot variability is incorporated using a hierarchical Bayesian model. Posterior distributions using Markov Chain Monte Carlo method is used to obtain estimates and Bayesian credible interval for the required UV radiation to achieve a given inactivation level of Cryptosporidium parvum oocysts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/es0489083DOI Listing
June 2005

Statistical assessment of mediational effects for logistic mediational models.

Stat Med 2004 Sep;23(17):2713-28

Center for Epidemiology and Biostatistics, Children's Hospital Medical Center, Cincinnati, OH 45229-3039, USA.

The concept of mediation has broad applications in medical health studies. Although the statistical assessment of a mediational effect under the normal assumption has been well established in linear structural equation models (SEM), it has not been extended to the general case where normality is not a usual assumption. In this paper, we propose to extend the definition of mediational effects through causal inference. The new definition is consistent with that in linear SEM and does not rely on the assumption of normality. Here, we focus our attention on the logistic mediation model, where all variables involved are binary. Three approaches to the estimation of mediational effects-Delta method, bootstrap, and Bayesian modelling via Monte Carlo simulation are investigated. Simulation studies are used to examine the behaviour of the three approaches. Measured by 95 per cent confidence interval (CI) coverage rate and root mean square error (RMSE) criteria, it was found that the Bayesian method using a non-informative prior outperformed both bootstrap and the Delta methods, particularly for small sample sizes. Case studies are presented to demonstrate the application of the proposed method to public health research using a nationally representative database. Extending the proposed method to other types of mediational model and to multiple mediators are also discussed.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.1847DOI Listing
September 2004

Bayesian infinite mixture model based clustering of gene expression profiles.

Bioinformatics 2002 Sep;18(9):1194-206

Center for Genome Information, Department of Environmental Health, University of Cincinnati Medical Center, 3223 Eden Av. ML 56, Cincinnati, OH 45267-0056, USA.

Motivation: The biologic significance of results obtained through cluster analyses of gene expression data generated in microarray experiments have been demonstrated in many studies. In this article we focus on the development of a clustering procedure based on the concept of Bayesian model-averaging and a precise statistical model of expression data.

Results: We developed a clustering procedure based on the Bayesian infinite mixture model and applied it to clustering gene expression profiles. Clusters of genes with similar expression patterns are identified from the posterior distribution of clusterings defined implicitly by the stochastic data-generation model. The posterior distribution of clusterings is estimated by a Gibbs sampler. We summarized the posterior distribution of clusterings by calculating posterior pairwise probabilities of co-expression and used the complete linkage principle to create clusters. This approach has several advantages over usual clustering procedures. The analysis allows for incorporation of a reasonable probabilistic model for generating data. The method does not require specifying the number of clusters and resulting optimal clustering is obtained by averaging over models with all possible numbers of clusters. Expression profiles that are not similar to any other profile are automatically detected, the method incorporates experimental replicates, and it can be extended to accommodate missing data. This approach represents a qualitative shift in the model-based cluster analysis of expression data because it allows for incorporation of uncertainties involved in the model selection in the final assessment of confidence in similarities of expression profiles. We also demonstrated the importance of incorporating the information on experimental variability into the clustering model.

Availability: The MS Windows(TM) based program implementing the Gibbs sampler and supplemental material is available at http://homepages.uc.edu/~medvedm/BioinformaticsSupplement.htm

Contact: [email protected]
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/18.9.1194DOI Listing
September 2002
-->