Publications by authors named "Raghu Machiraju"

60 Publications

Autonomous Computing Materials.

ACS Nano 2021 03 26;15(3):3586-3592. Epub 2021 Feb 26.

Ann and H.J. Smead Aerospace Engineering Sciences, University of Colorado Boulder, Boulder, Colorado 80303, United States.

Conventional materials are reaching their limits in computation, sensing, and data storage capabilities, ushered in by the end of Moore's law, myriad sensing applications, and the continuing exponential rise in worldwide data storage demand. Conventional materials are also limited by the controlled environments in which they must operate, their high energy consumption, and their limited capacity to perform simultaneous, integrated sensing, computation, and data storage and retrieval. In contrast, the human brain is capable of multimodal sensing, complex computation, and both short- and long-term data storage simultaneously, with near instantaneous rate of recall, seamless integration, and minimal energy consumption. Motivated by the brain and the need for revolutionary new computing materials, we recently proposed the data-driven materials discovery framework, . This framework aims to mimic the brain's capabilities for integrated sensing, computation, and data storage by programming excitonic, phononic, photonic, and dynamic structural nanoscale materials, without attempting to mimic the unknown implementational details of the brain. If realized, such materials would offer transformative opportunities for distributed, multimodal sensing, computation, and data storage in an integrated manner in biological and other nonconventional environments, including interfacing with biological sensors and computers such as the brain itself.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acsnano.0c09556DOI Listing
March 2021

Self-organizing maps with variable neighborhoods facilitate learning of chromatin accessibility signal shapes associated with regulatory elements.

BMC Bioinformatics 2021 Jan 30;22(1):35. Epub 2021 Jan 30.

Department of Biomedical Informatics, The Ohio State University College of Medicine, 370 W. 9th Avenue, Columbus, OH, 43210, USA.

Background: Assigning chromatin states genome-wide (e.g. promoters, enhancers, etc.) is commonly performed to improve functional interpretation of these states. However, computational methods to assign chromatin state suffer from the following drawbacks: they typically require data from multiple assays, which may not be practically feasible to obtain, and they depend on peak calling algorithms, which require careful parameterization and often exclude the majority of the genome. To address these drawbacks, we propose a novel learning technique built upon the Self-Organizing Map (SOM), Self-Organizing Map with Variable Neighborhoods (SOM-VN), to learn a set of representative shapes from a single, genome-wide, chromatin accessibility dataset to associate with a chromatin state assignment in which a particular RE is prevalent. These shapes can then be used to assign chromatin state using our workflow.

Results: We validate the performance of the SOM-VN workflow on 14 different samples of varying quality, namely one assay each of A549 and GM12878 cell lines and two each of H1 and HeLa cell lines, primary B-cells, and brain, heart, and stomach tissue. We show that SOM-VN learns shapes that are (1) non-random, (2) associated with known chromatin states, (3) generalizable across sets of chromosomes, and (4) associated with magnitude and multimodality. We compare the accuracy of SOM-VN chromatin states against the Clustering Aggregation Tool (CAGT), an unsupervised method that learns chromatin accessibility signal shapes but does not associate these shapes with REs, and we show that overall precision and recall is increased when learning shapes using SOM-VN as compared to CAGT. We further compare enhancer state assignments from SOM-VN in signals above a set threshold to enhancer state assignments from Predicting Enhancers from ATAC-seq Data (PEAS), a deep learning method that assigns enhancer chromatin states to peaks. We show that the precision-recall area under the curve for the assignment of enhancer states is comparable to PEAS.

Conclusions: Our work shows that the SOM-VN workflow can learn relationships between REs and chromatin accessibility signal shape, which is an important step toward the goal of assigning and comparing enhancer state across multiple experiments and phenotypic states.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-021-03976-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7847148PMC
January 2021

Spatial cell type composition in normal and Alzheimers human brains is revealed using integrated mouse and human single cell RNA sequencing.

Sci Rep 2020 10 22;10(1):18014. Epub 2020 Oct 22.

Medical and Molecular Genetics, Indiana University Purdue University Indianapolis, HITS 5015, 410 W. 10th St., Indianapolis, IN, 46202, USA.

Single-cell RNA sequencing (scRNA-seq) resolves heterogenous cell populations in tissues and helps to reveal single-cell level function and dynamics. In neuroscience, the rarity of brain tissue is the bottleneck for such study. Evidence shows that, mouse and human share similar cell type gene markers. We hypothesized that the scRNA-seq data of mouse brain tissue can be used to complete human data to infer cell type composition in human samples. Here, we supplement cell type information of human scRNA-seq data, with mouse. The resulted data were used to infer the spatial cellular composition of 3702 human brain samples from Allen Human Brain Atlas. We then mapped the cell types back to corresponding brain regions. Most cell types were localized to the correct regions. We also compare the mapping results to those derived from neuronal nuclei locations. They were consistent after accounting for changes in neural connectivity between regions. Furthermore, we applied this approach on Alzheimer's brain data and successfully captured cell pattern changes in AD brains. We believe this integrative approach can solve the sample rarity issue in the neuroscience.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-020-74917-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7582925PMC
October 2020

Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources.

Metabolites 2020 May 15;10(5). Epub 2020 May 15.

Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA.

As researchers are increasingly able to collect data on a large scale from multiple clinical and omics modalities, multi-omics integration is becoming a critical component of metabolomics research. This introduces a need for increased understanding by the metabolomics researcher of computational and statistical analysis methods relevant to multi-omics studies. In this review, we discuss common types of analyses performed in multi-omics studies and the computational and statistical methods that can be used for each type of analysis. We pinpoint the caveats and considerations for analysis methods, including required parameters, sample size and data distribution requirements, sources of a priori knowledge, and techniques for the evaluation of model accuracy. Finally, for the types of analyses discussed, we provide examples of the applications of corresponding methods to clinical and basic research. We intend that our review may be used as a guide for metabolomics researchers to choose effective techniques for multi-omics analyses relevant to their field of study.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/metabo10050202DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7281435PMC
May 2020

Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge.

BMC Bioinformatics 2019 Dec 20;20(Suppl 24):669. Epub 2019 Dec 20.

Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.

Background: Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking.

Results: We evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches.

Conclusions: In addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-019-3253-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923881PMC
December 2019

PTR Explorer: An approach to identify and explore Post Transcriptional Regulatory mechanisms using proteogenomics.

Pac Symp Biocomput 2020 ;25:475-486

Dept. of Computer Science and Engineering, The Ohio State University, 2015 Neil Ave, Columbus, OH, USA,

Integration of transcriptomic and proteomic data should reveal multi-layered regulatory processes governing cancer cell behaviors. Traditional correlation-based analyses have demonstrated limited ability to identify the post-transcriptional regulatory (PTR) processes that drive the non-linear relationship between transcript and protein abundances. In this work, we ideate an integrative approach to explore the variety of post-transcriptional mechanisms that dictate relationships between genes and corresponding proteins. The proposed workflow utilizes the intuitive technique of scatterplot diagnostics or scagnostics, to characterize and examine the diverse scatterplots built from transcript and protein abundances in a proteogenomic experiment. The workflow includes representing gene-protein relationships as scatterplots, clustering on geometric scagnostic features of these scatterplots, and finally identifying and grouping the potential gene-protein relationships according to their disposition to various PTR mechanisms. Our study verifies the efficacy of the implemented approach to excavate possible regulatory mechanisms by utilizing comprehensive tests on a synthetic dataset. We also propose a variety of 2D pattern-specific downstream analyses methodologies such as mixture modeling, and mapping miRNA post-transcriptional effects to explore each mechanism further. This work suggests that the proposed methodology has the potential for discovering and categorizing post-transcriptional regulatory mechanisms, manifesting in proteogenomic trends. These trends subsequently provide evidence for cancer specificity, miRNA targeting, and identification of regulation impacted by biological functionality and different types of degradation. (Supplementary Material - https://github.com/arunima2/PTRE_PSB_2020).
View Article and Find Full Text PDF

Download full-text PDF

Source
March 2021

Visual Exploration of Neural Document Embedding in Information Retrieval: Semantics and Feature Selection.

IEEE Trans Vis Comput Graph 2019 06 15;25(6):2181-2192. Epub 2019 Mar 15.

Neural embeddings are widely used in language modeling and feature generation with superior computational power. Particularly, neural document embedding - converting texts of variable-length to semantic vector representations - has shown to benefit widespread downstream applications, e.g., information retrieval (IR). However, the black-box nature makes it difficult to understand how the semantics are encoded and employed. We propose visual exploration of neural document embedding to gain insights into the underlying embedding space, and promote the utilization in prevalent IR applications. In this study, we take an IR application-driven view, which is further motivated by biomedical IR in healthcare decision-making, and collaborate with domain experts to design and develop a visual analytics system. This system visualizes neural document embeddings as a configurable document map and enables guidance and reasoning; facilitates to explore the neural embedding space and identify salient neural dimensions (semantic features) per task and domain interest; and supports advisable feature selection (semantic analysis) along with instant visual feedback to promote IR performance. We demonstrate the usefulness and effectiveness of this system and present inspiring findings in use cases. This work will help designers/developers of downstream applications gain insights and confidence in neural document embedding, and exploit that to achieve more favorable performance in application domains.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TVCG.2019.2903946DOI Listing
June 2019

Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility.

Pac Symp Biocomput 2019 ;24:208-219

Computer Science and Engineering, The Ohio State University, 2015 Neil Ave Columbus, OH 43210, USA,

Benchmark challenges, such as the Critical Assessment of Structure Prediction (CASP) and Dialogue for Reverse Engineering Assessments and Methods (DREAM) have been instrumental in driving the development of bioinformatics methods. Typically, challenges are posted, and then competitors perform a prediction based upon blinded test data. Challengers then submit their answers to a central server where they are scored. Recent efforts to automate these challenges have been enabled by systems in which challengers submit Docker containers, a unit of software that packages up code and all of its dependencies, to be run on the cloud. Despite their incredible value for providing an unbiased test-bed for the bioinformatics community, there remain opportunities to further enhance the potential impact of benchmark challenges. Specifically, current approaches only evaluate end-to-end performance; it is nearly impossible to directly compare methodologies or parameters. Furthermore, the scientific community cannot easily reuse challengers' approaches, due to lack of specifics, ambiguity in tools and parameters as well as problems in sharing and maintenance. Lastly, the intuition behind why particular steps are used is not captured, as the proposed workflows are not explicitly defined, making it cumbersome to understand the flow and utilization of data. Here we introduce an approach to overcome these limitations based upon the WINGS semantic workflow system. Specifically, WINGS enables researchers to submit complete semantic workflows as challenge submissions. By submitting entries as workflows, it then becomes possible to compare not just the results and performance of a challenger, but also the methodology employed. This is particularly important when dozens of challenge entries may use nearly identical tools, but with only subtle changes in parameters (and radical differences in results). WINGS uses a component driven workflow design and offers intelligent parameter and data selection by reasoning about data characteristics. This proves to be especially critical in bioinformatics workflows where using default or incorrect parameter values is prone to drastically altering results. Different challenge entries may be readily compared through the use of abstract workflows, which also facilitate reuse. WINGS is housed on a cloud based setup, which stores data, dependencies and workflows for easy sharing and utility. It also has the ability to scale workflow executions using distributed computing through the Pegasus workflow execution system. We demonstrate the application of this architecture to the DREAM proteogenomic challenge.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417805PMC
August 2019

Imitating Pathologist Based Assessment With Interpretable and Context Based Neural Network Modeling of Histology Images.

Biomed Inform Insights 2018 31;10:1178222618807481. Epub 2018 Oct 31.

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA.

Convolutional neural networks (CNNs) have gained steady popularity as a tool to perform automatic classification of whole slide histology images. While CNNs have proven to be powerful classifiers in this context, they fail to explain this classification, as the network engineered features used for modeling and classification are ONLY interpretable by the CNNs themselves. This work aims at enhancing a traditional neural network model to perform histology image modeling, patient classification, and interpretation of the distinctive features identified by the network within the histology whole slide images (WSIs). We synthesize a workflow which (a) intelligently samples the training data by automatically selecting only image areas that display visible disease-relevant tissue state and (b) isolates regions most pertinent to the trained CNN prediction and translates them to observable and qualitative features such as color, intensity, cell and tissue morphology and texture. We use the Cancer Genome Atlas's Breast Invasive Carcinoma (TCGA-BRCA) histology dataset to build a model predicting patient attributes (disease stage and node status) and the tumor proliferation challenge (TUPAC 2016) breast cancer histology image repository to help identify disease-relevant tissue state (mitotic activity). We find that our enhanced CNN based workflow both increased patient attribute predictive accuracy (~2% increase for disease stage and ~10% increase for node status) and experimentally proved that a data-driven CNN histology model predicting breast invasive carcinoma stages is highly sensitive to features such as color, cell size, and shape, granularity, and uniformity. This work summarizes the need for understanding the widely trusted models built using deep learning and adds a layer of biological context to a technique that functioned as a classification only approach till now.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1177/1178222618807481DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6236488PMC
October 2018

Integrative cancer patient stratification via subspace merging.

Bioinformatics 2019 05;35(10):1653-1659

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA.

Motivation: Technologies that generate high-throughput omics data are flourishing, creating enormous, publicly available repositories of multi-omics data. As many data repositories continue to grow, there is an urgent need for computational methods that can leverage these data to create comprehensive clusters of patients with a given disease.

Results: Our proposed approach creates a patient-to-patient similarity graph for each data type as an intermediate representation of each omics data type and merges the graphs through subspace analysis on a Grassmann manifold. We hypothesize that this approach generates more informative clusters by preserving the complementary information from each level of omics data. We applied our approach to The Cancer Genome Atlas (TCGA) breast cancer dataset and show that by integrating gene expression, microRNA and DNA methylation data, our proposed method can produce clinically useful subtypes of breast cancer. We then investigate the molecular characteristics underlying these subtypes. We discover a highly expressed cluster of genes on chromosome 19p13 that strongly correlates with survival in TCGA breast cancer patients and validate these results in three additional breast cancer datasets. We also compare our approach with previous integrative clustering approaches and obtain comparable or superior results.

Availability And Implementation: https://github.com/michaelsharpnack/GrassmannCluster.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty866DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6513164PMC
May 2019

Proteogenomic Analysis of Surgically Resected Lung Adenocarcinoma.

J Thorac Oncol 2018 10 11;13(10):1519-1529. Epub 2018 Jul 11.

Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio. Electronic address:

Introduction: Despite apparently complete surgical resection, approximately half of resected early-stage lung cancer patients relapse and die of their disease. Adjuvant chemotherapy reduces this risk by only 5% to 8%. Thus, there is a need for better identifying who benefits from adjuvant therapy, the drivers of relapse, and novel targets in this setting.

Methods: RNA sequencing and liquid chromatography/liquid chromatography-mass spectrometry proteomics data were generated from 51 surgically resected non-small cell lung tumors with known recurrence status.

Results: We present a rationale and framework for the incorporation of high-content RNA and protein measurements into integrative biomarkers and show the potential of this approach for predicting risk of recurrence in a group of lung adenocarcinomas. In addition, we characterize the relationship between mRNA and protein measurements in lung adenocarcinoma and show that it is outcome specific.

Conclusions: Our results suggest that mRNA and protein data possess independent biological and clinical importance, which can be leveraged to create higher-powered expression biomarkers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jtho.2018.06.025DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7135954PMC
October 2018

Visualizing Article Similarities via Sparsified Article Network and Map Projection for Systematic Reviews.

Stud Health Technol Inform 2017 ;245:422-426

Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.

Systematic Reviews (SRs) of biomedical literature summarize evidence from high-quality studies to inform clinical decisions, but are time and labor intensive due to the large number of article collections. Article similarities established from textual features have been shown to assist in the identification of relevant articles, thus facilitating the article screening process efficiently. In this study, we visualized article similarities to extend its utilization in practical settings for SR researchers, aiming to promote human comprehension of article distributions and hidden patterns. To prompt an effective visualization in an interpretable, intuitive, and scalable way, we implemented a graph-based network visualization with three network sparsification approaches and a distance-based map projection via dimensionality reduction. We evaluated and compared three network sparsification approaches and the visualization types (article network vs. article map). We demonstrated the effectiveness in revealing article distribution and exhibiting clustering patterns of relevant articles with practical meanings for SRs.
View Article and Find Full Text PDF

Download full-text PDF

Source
June 2018

Building trans-omics evidence: using imaging and 'omics' to characterize cancer profiles.

Pac Symp Biocomput 2018 ;23:377-387

Department of Computer Science and Engineering, The Ohio State University, 2015 Neil Avenue, Columbus, OH 43210, USA,

Utilization of single modality data to build predictive models in cancer results in a rather narrow view of most patient profiles. Some clinical facet s relate strongly to histology image features, e.g. tumor stages, whereas others are associated with genomic and proteomic variations (e.g. cancer subtypes and disease aggression biomarkers). We hypothesize that there are coherent "trans-omics" features that characterize varied clinical cohorts across multiple sources of data leading to more descriptive and robust disease characterization. In this work, for l 05 breast cancer patients from the TCGA (The Cancer Genome Atlas), we consider four clinical attributes (AJCC Stage, Tumor Stage, ER-Status and PAM50 mRNA Subtypes), and build predictive models using three different modalities of data (histopathological images, transcriptomics and proteomics). Following which, we identify critical multi-level features that drive successful classification of patients for the various different cohorts. To build predictors for each data type, we employ widely used "best practice" techniques including CNN-based (convolutional neural network) classifiers for histopathological images and regression models for proteogenomic data. While, as expected, histology images outperformed molecular features while predicting cancer stages, and transcriptomics held superior discriminatory power for ER-Status and PAM50 subtypes, there exist a few cases where all data modalities exhibited comparable performance. Further, we also identified sets of key genes and proteins whose expression and abundance correlate across each clinical cohort including (i) tumor severity and progression (incl. GABARAP), (ii) ER-status (incl.ESRl) and (iii) disease subtypes (incl. FOXCl). Thus, we quantitatively assess the efficacy of different data types to predict critical breast cancer patient attributes and improve disease characterization.
View Article and Find Full Text PDF

Download full-text PDF

Source
August 2018

Predictive models for pressure ulcers from intensive care unit electronic health records using Bayesian networks.

BMC Med Inform Decis Mak 2017 Jul 5;17(Suppl 2):65. Epub 2017 Jul 5.

Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA.

Background: We develop predictive models enabling clinicians to better understand and explore patient clinical data along with risk factors for pressure ulcers in intensive care unit patients from electronic health record data. Identifying accurate risk factors of pressure ulcers is essential to determining appropriate prevention strategies; in this work we examine medication, diagnosis, and traditional Braden pressure ulcer assessment scale measurements as patient features. In order to predict pressure ulcer incidence and better understand the structure of related risk factors, we construct Bayesian networks from patient features. Bayesian network nodes (features) and edges (conditional dependencies) are simplified with statistical network techniques. Upon reviewing a network visualization of our model, our clinician collaborators were able to identify strong relationships between risk factors widely recognized as associated with pressure ulcers.

Methods: We present a three-stage framework for predictive analysis of patient clinical data: 1) Developing electronic health record feature extraction functions with assistance of clinicians, 2) simplifying features, and 3) building Bayesian network predictive models. We evaluate all combinations of Bayesian network models from different search algorithms, scoring functions, prior structure initializations, and sets of features.

Results: From the EHRs of 7,717 ICU patients, we construct Bayesian network predictive models from 86 medication, diagnosis, and Braden scale features. Our model not only identifies known and suspected high PU risk factors, but also substantially increases sensitivity of the prediction - nearly three times higher comparing to logistical regression models - without sacrificing the overall accuracy. We visualize a representative model with which our clinician collaborators identify strong relationships between risk factors widely recognized as associated with pressure ulcers.

Conclusions: Given the strong adverse effect of pressure ulcers on patients and the high cost for treating pressure ulcers, our Bayesian network based model provides a novel framework for significantly improving the sensitivity of the prediction model. Thus, when the model is deployed in a clinical setting, the caregivers can suitably respond to conditions likely associated with pressure ulcer incidence.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12911-017-0471-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5506589PMC
July 2017

Analysis of live cell images: Methods, tools and opportunities.

Methods 2017 02 27;115:65-79. Epub 2017 Feb 27.

Institute of Biomedical Engineering, Department of Engineering Science, Old Road Campus Research Building, University of Oxford, Headington, Oxford OX3 7DQ, United Kingdom; Ludwig Institute for Cancer Research, University of Oxford, Nuffield Department of Medicine, Old Road Campus Research Building, Oxford OX3 7DQ, United Kingdom; Target Discovery Institute, NDM Research Building, University of Oxford, Old Road Campus, Headington OX3 7FZ, United Kingdom. Electronic address:

Advances in optical microscopy, biosensors and cell culturing technologies have transformed live cell imaging. Thanks to these advances live cell imaging plays an increasingly important role in basic biology research as well as at all stages of drug development. Image analysis methods are needed to extract quantitative information from these vast and complex data sets. The aim of this review is to provide an overview of available image analysis methods for live cell imaging, in particular required preprocessing image segmentation, cell tracking and data visualisation methods. The potential opportunities recent advances in machine learning, especially deep learning, and computer vision provide are being discussed. This review includes overview of the different available software packages and toolkits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ymeth.2017.02.007DOI Listing
February 2017

annoPeak: a web application to annotate and visualize peaks from ChIP-seq/ChIP-exo-seq.

Bioinformatics 2017 May;33(10):1570-1571

Department of Molecular Virology, Immunology and Medical Genetics College of Medicine, The Ohio State University, Columbus, OH, USA.

Summary: We developed annoPeak, a web application to annotate, visualize and compare predicted protein-binding regions derived from ChIP-seq/ChIP-exo-seq experiments using human and mouse cells. Users can upload peak regions from multiple experiments onto the annoPeak server to annotate them with biological context, identify associated target genes and categorize binding sites with respect to gene structure. Users can also compare multiple binding profiles intuitively with the help of visualization tools and tables provided by annoPeak. In general, annoPeak will help users identify patterns of genome wide transcription factor binding profiles, assess binding profiles in different biological contexts and generate new hypotheses.

Availability And Implementation: The web service is freely accessible through URL: http://ccc-annopeak.osumc.edu/annoPeak . Source code is available at https://github.com/XingTang2014/annoPeak .

Contact: gustavo.leone@osumc.edu or kun.huang@osumc.edu.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btx016DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860050PMC
May 2017

Identification of recurrent combinatorial patterns of chromatin modifications at promoters across various tissue types.

BMC Bioinformatics 2016 Dec 23;17(Suppl 17):534. Epub 2016 Dec 23.

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA.

Background: Identification and analysis of recurrent combinatorial patterns of multiple chromatin modifications provide invaluable information for understanding epigenetic regulations. Furthermore, as more data becomes available, it is computationally expensive and unnecessary to study combinatorial patterns of all modifications.

Methods: A novel framework is proposed to investigate recurrent combinatorial patterns of a subset of quantitatively selected chromatin modifications. The framework is based on heirarchical clustering and selects subsets of chromatin modifications that form distinct recurrent patterns at regulatory regions. The identified recurrent combinatorial patterns can be further utilized to discover novel regulatory regions. Data is in the form of genome wide maps of histone acetylations, methylations, and histone variant of human skeletal muscular and B-lymphocyte cells both derived from the ENCODE project.

Results: A case study conducted at promoter regions is presented: four out of twelve chromatin modifications were selected, eight different promoter states were identified and the identified patterns of active promoters were further utilized to discover novel promoter regions. Several previously un-annotated promoters were discovered, further investigations confirm their promoter functions.

Conclusions: This framework is approproiately general and could lead to better understanding of epigenetic regulations by discovering previously unknown regulatory regions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-016-1346-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5259941PMC
December 2016

E2f8 mediates tumor suppression in postnatal liver development.

J Clin Invest 2016 08 25;126(8):2955-69. Epub 2016 Jul 25.

E2F-mediated transcriptional repression of cell cycle-dependent gene expression is critical for the control of cellular proliferation, survival, and development. E2F signaling also interacts with transcriptional programs that are downstream of genetic predictors for cancer development, including hepatocellular carcinoma (HCC). Here, we evaluated the function of the atypical repressor genes E2f7 and E2f8 in adult liver physiology. Using several loss-of-function alleles in mice, we determined that combined deletion of E2f7 and E2f8 in hepatocytes leads to HCC. Temporal-specific ablation strategies revealed that E2f8's tumor suppressor role is critical during the first 2 weeks of life, which correspond to a highly proliferative stage of postnatal liver development. Disruption of E2F8's DNA binding activity phenocopied the effects of an E2f8 null allele and led to HCC. Finally, a profile of chromatin occupancy and gene expression in young and tumor-bearing mice identified a set of shared targets for E2F7 and E2F8 whose increased expression during early postnatal liver development is associated with HCC progression in mice. Increased expression of E2F8-specific target genes was also observed in human liver biopsies from HCC patients compared to healthy patients. In summary, these studies suggest that E2F8-mediated transcriptional repression is a critical tumor suppressor mechanism during postnatal liver development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1172/JCI85506DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4966321PMC
August 2016

Single-Cell Co-expression Analysis Reveals Distinct Functional Modules, Co-regulation Mechanisms and Clinical Outcomes.

PLoS Comput Biol 2016 Apr 21;12(4):e1004892. Epub 2016 Apr 21.

The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America.

Co-expression analysis has been employed to predict gene function, identify functional modules, and determine tumor subtypes. Previous co-expression analysis was mainly conducted at bulk tissue level. It is unclear whether co-expression analysis at the single-cell level will provide novel insights into transcriptional regulation. Here we developed a computational approach to compare glioblastoma expression profiles at the single-cell level with those obtained from bulk tumors. We found that the co-expressed genes observed in single cells and bulk tumors have little overlap and show distinct characteristics. The co-expressed genes identified in bulk tumors tend to have similar biological functions, and are enriched for intrachromosomal interactions with synchronized promoter activity. In contrast, single-cell co-expressed genes are enriched for known protein-protein interactions, and are regulated through interchromosomal interactions. Moreover, gene members of some protein complexes are co-expressed only at the bulk level, while those of other complexes are co-expressed at both single-cell and bulk levels. Finally, we identified a set of co-expressed genes that can predict the survival of glioblastoma patients. Our study highlights that comparative analyses of single-cell and bulk gene expression profiles enable us to identify functional modules that are regulated at different levels and hold great translational potential.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1004892DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4839722PMC
April 2016

Identify Critical Genes in Development with Consistent H3K4me2 Patterns across Multiple Tissues.

IEEE/ACM Trans Comput Biol Bioinform 2015 Sep-Oct;12(5):1104-11

Histone modification is an important epigenetic event which plays essential roles in cell differentiation and tissue development. Recent studies show that a unique dimethylation of lysine 4 residue on histone 3 (H3K4me2) distribution pattern around transcription starting sites (TSS) of genes marks tissue specific genes in human CD4 þ T cells and mouse nervous tissue cells. However, existence of this pattern has not been widely tested and its implication remains unclear. In this paper, we study the H3K4me2 distribution patterns across six different cell lines from five major tissue types (including muscular tissue, nervous tissue, non-blood connective tissue, blood, and epithelial tissue) as well as embryonic stem cells. We define a metric ‘tail length’ to quantitatively describe H3K4me2 distribution patterns around the TSS. While confirming the previous observations, we also identified a group of 217 genes with ubiquitous long-tail H3K4me2 patterns in all the tested tissues and the embryonic stem cells (ESC). Further analyses confirmed that these genes are critical for development, and highly interactive with other tissue specific genes as evinced by protein-protein interaction networks, suggesting their critical regulatory functions. Our results suggest that rich information on gene functions and epigenetic events can be revealed using pattern recognition methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/tcbb.2015.2430340DOI Listing
July 2016

GRAPHIE: graph based histology image explorer.

BMC Bioinformatics 2015 13;16 Suppl 11:S10. Epub 2015 Aug 13.

Background: Histology images comprise one of the important sources of knowledge for phenotyping studies in systems biology. However, the annotation and analyses of histological data have remained a manual, subjective and relatively low-throughput process.

Results: We introduce Graph based Histology Image Explorer (GRAPHIE)-a visual analytics tool to explore, annotate and discover potential relationships in histology image collections within a biologically relevant context. The design of GRAPHIE is guided by domain experts' requirements and well-known InfoVis mantras. By representing each image with informative features and then subsequently visualizing the image collection with a graph, GRAPHIE allows users to effectively explore the image collection. The features were designed to capture localized morphological properties in the given tissue specimen. More importantly, users can perform feature selection in an interactive way to improve the visualization of the image collection and the overall annotation process. Finally, the annotation allows for a better prospective examination of datasets as demonstrated in the users study. Thus, our design of GRAPHIE allows for the users to navigate and explore large collections of histology image datasets.

Conclusions: We demonstrated the usefulness of our visual analytics approach through two case studies. Both of the cases showed efficient annotation and analysis of histology image collection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-16-S11-S10DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547152PMC
March 2016

Predictive Modeling for Pressure Ulcers from Intensive Care Unit Electronic Health Records.

AMIA Jt Summits Transl Sci Proc 2015 25;2015:82-6. Epub 2015 Mar 25.

Department of Computer Science and Engineering, The Ohio State University.

Our goal in this study is to find risk factors associated with Pressure Ulcers (PUs) and to develop predictive models of PU incidence. We focus on Intensive Care Unit (ICU) patients since patients admitted to ICU have shown higher incidence of PUs. The most common PU incidence assessment tool is the Braden scale, which sums up six subscale features. In an ICU setting it's known drawbacks include omission of important risk factors, use of subscale features not significantly associated with PU incidence, and yielding too many false positives. To improve on this, we extract medication and diagnosis features from patient EHRs. Studying Braden, medication, and diagnosis features and combinations thereof, we evaluate six types of predictive models and find that diagnosis features significantly improve the models' predictive power. The best models combine Braden and diagnosis. Finally, we report the top diagnosis features which compared to Braden improve AUC by 10%.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4525237PMC
August 2015

Redeployment of Myc and E2f1-3 drives Rb-deficient cell cycles.

Nat Cell Biol 2015 Aug 20;17(8):1036-48. Epub 2015 Jul 20.

1] Department of Molecular Virology, Immunology and Medical Genetics, College of Medicine, The Ohio State University, Columbus, Ohio 43210, USA [2] Department of Molecular Genetics, College of Biological Sciences, The Ohio State University, Columbus, Ohio 43210, USA [3] Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio 43210, USA.

Robust mechanisms to control cell proliferation have evolved to maintain the integrity of organ architecture. Here, we investigated how two critical proliferative pathways, Myc and E2f, are integrated to control cell cycles in normal and Rb-deficient cells using a murine intestinal model. We show that Myc and E2f1-3 have little impact on normal G1-S transitions. Instead, they synergistically control an S-G2 transcriptional program required for normal cell divisions and maintaining crypt-villus integrity. Surprisingly, Rb deficiency results in the Myc-dependent accumulation of E2f3 protein and chromatin repositioning of both Myc and E2f3, leading to the 'super activation' of a G1-S transcriptional program, ectopic S phase entry and rampant cell proliferation. These findings reveal that Rb-deficient cells hijack and redeploy Myc and E2f3 from an S-G2 program essential for normal cell cycles to a G1-S program that re-engages ectopic cell cycles, exposing an unanticipated addiction of Rb-null cells on Myc.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncb3210DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4526313PMC
August 2015

Actin grips: circular actin-rich cytoskeletal structures that mediate the wrapping of polymeric microfibers by endothelial cells.

Biomaterials 2015 Jun 18;52:395-406. Epub 2015 Mar 18.

Department of Internal Medicine, The Ohio State University, Columbus, OH, 43210, USA. Electronic address:

Interaction of endothelial-lineage cells with three-dimensional substrates was much less studied than that with flat culture surfaces. We investigated the in vitro attachment of both mature endothelial cells (ECs) and of less differentiated EC colony-forming cells to poly-ε-capro-lactone (PCL) fibers with diameters in 5-20 μm range ('scaffold microfibers', SMFs). We found that notwithstanding the poor intrinsic adhesiveness to PCL, both cell types completely wrapped the SMFs after long-term cultivation, thus attaining a cylindrical morphology. In this system, both EC types grew vigorously for more than a week and became increasingly more differentiated, as shown by multiplexed gene expression. Three-dimensional reconstructions from multiphoton confocal microscopy images using custom software showed that the filamentous (F) actin bundles took a conspicuous ring-like organization around the SMFs. Unlike the classical F-actin-containing stress fibers, these rings were not associated with either focal adhesions or intermediate filaments. We also demonstrated that plasma membrane boundaries adjacent to these circular cytoskeletal structures were tightly yet dynamically apposed to the SMFs, for which reason we suggest to call them 'actin grips'. In conclusion, we describe a particular form of F-actin assembly with relevance for cytoskeletal organization in response to biomaterials, for endothelial-specific cell behavior in vitro and in vivo, and for tissue engineering.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.biomaterials.2015.02.034DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4418805PMC
June 2015

Examining the Distribution, Modularity, and Community Structure in Article Networks for Systematic Reviews.

AMIA Annu Symp Proc 2015 5;2015:1927-36. Epub 2015 Nov 5.

Department of Biomedical Informatics, The Ohio State University, Columbus, OH.

Systematic reviews (SRs) provide high quality evidence for clinical practice, but the article screening process is time and labor intensive. As SRs aim to identify relevant articles with a specific scope, we propose that a pre-defined article relationship, using similarity metrics, could accelerate this process. In this study, we established the article relationship using MEDLINE element similarities and visualized the article network with the Force Atlas layout. We also analyzed the article networks with graph diameter, closeness centrality, and module classes. The results revealed the distribution of articles and found that included articles tended to aggregate together in some module classes, providing further evidence of the existence of strong relationships among included articles. This approach can be utilized to facilitate the articles selection process through early identification of these dominant module classes. We are optimistic that the use of article network visualization can help better SR work prioritization.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765615PMC
February 2018

An integrative analysis of regional gene expression profiles in the human brain.

Methods 2015 Feb 15;73:54-70. Epub 2014 Dec 15.

Department of Health Sciences, Boston University, 635 Commonwealth Ave, Boston, MA 02215, USA. Electronic address:

Studies of the brain's transcriptome have become prominent in recent years, resulting in an accumulation of datasets with somewhat distinct attributes. These datasets, which are often analyzed only in isolation, also are often collected with divergent goals, which are reflected in their sampling properties. While many researchers have been interested in sampling gene expression in one or a few brain areas in a large number of subjects, recent efforts from the Allen Institute for Brain Sciences and others have focused instead on dense neuroanatomical sampling, necessarily limiting the number of individual donor brains studied. The purpose of the present work is to develop methods that draw on the complementary strengths of these two types of datasets for study of the human brain, and to characterize the anatomical specificity of gene expression profiles and gene co-expression networks derived from human brains using different specific technologies. The approach is applied using two publicly accessible datasets: (1) the high anatomical resolution Allen Human Brain Atlas (AHBA, Hawrylycz et al., 2012) and (2) a relatively large sample size, but comparatively coarse neuroanatomical dataset described previously by Gibbs et al. (2010). We found a relatively high degree of correspondence in differentially expressed genes and regional gene expression profiles across the two datasets. Gene co-expression networks defined in individual brain regions were less congruent, but also showed modest anatomical specificity. Using gene modules derived from the Gibbs dataset and from curated gene lists, we demonstrated varying degrees of anatomical specificity based on two classes of methods, one focused on network modularity and the other focused on enrichment of expression levels. Two approaches to assessing the statistical significance of a gene set's modularity in a given brain region were studied, which provide complementary information about the anatomical specificity of a gene network of interest. Overall, the present work demonstrates the feasibility of cross-dataset analysis of human brain microarray studies, and offers a new approach to annotating gene lists in a neuroanatomical context.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ymeth.2014.12.010DOI Listing
February 2015

Understanding the sequence requirements of protein families: insights from the BioVis 2013 contests.

BMC Proc 2014 28;8(Suppl 2 Proceedings of the 3rd Annual Symposium on Biologica):S1. Epub 2014 Aug 28.

Nationwide Children's Hospital, 575 Children's Crossroad, 43215, Columbus, OH, USA ; The Ohio State University, 100 W. 18th Ave, 43210, Columbus, OH, USA ; Contest Chairs.

Introduction: In 2011, the BioVis symposium of the IEEE VisWeek conferences inaugurated a new variety of data analysis contest. Aimed at fostering collaborations between computational scientists and biologists, the BioVis contest provided real data from biological domains with emerging visualization needs, in the hope that novel approaches would result in powerful new tools for the community. In 2011 and 2012 the theme of these contests was expression Quantitative Trait Locus analysis, within and across tissues respectively. In 2013 the topic was updated to protein sequence and mutation visualization.

Methods: The contest was framed in the context of a real protein with numerous mutations that had lost function, and the question posed "what minimal set of changes would you propose to rescue function, or how could you support a biologist attempting to answer that question?". The data was grounded in actual experimental results in triosephosphate isomerase(TIM) enzymes. Seven teams composed of 36 individuals submitted entries with proposed solutions and approaches to the challenge. Their contributions ranged from careful analysis of the visualization and analytical requirements for the problem through integration of existing tools for analyzing the context and consequences of protein mutations, to completely new tools addressing the problem.

Results: Judges found valuable and novel contributions in each of the entries, including interesting ways to hierarchicalize the protein into domains of informational interaction, tools for simultaneously understanding both sequential and spatial order, and approaches for conveying some types of inter-residue dependencies. In this manuscript we document the problem presented to the contestants, summarize the biological contributions of their entries, and suggest opportunities that this work has highlighted for even more improved tools in the future.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1753-6561-8-S2-S1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4155613PMC
September 2014

Effectively processing medical term queries on the UMLS Metathesaurus by layered dynamic programming.

BMC Med Genomics 2014 8;7 Suppl 1:S11. Epub 2014 May 8.

Background: Mapping medical terms to standardized UMLS concepts is a basic step for leveraging biomedical texts in data management and analysis. However, available methods and tools have major limitations in handling queries over the UMLS Metathesaurus that contain inaccurate query terms, which frequently appear in real world applications.

Methods: To provide a practical solution for this task, we propose a layered dynamic programming mapping (LDPMap) approach, which can efficiently handle these queries. LDPMap uses indexing and two layers of dynamic programming techniques to efficiently map a biomedical term to a UMLS concept.

Results: Our empirical study shows that LDPMap achieves much faster query speeds than LCS. In comparison to the UMLS Metathesaurus Browser and MetaMap, LDPMap is much more effective in querying the UMLS Metathesaurus for inaccurately spelled medical terms, long medical terms, and medical terms with special characters.

Conclusions: These results demonstrate that LDPMap is an efficient and effective method for mapping medical terms to the UMLS Metathesaurus.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1755-8794-7-S1-S11DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4101532PMC
March 2015