Publications by authors named "Shaoke Lou"

22 Publications

  • Page 1 of 1

Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients.

BMC Bioinformatics 2020 Oct 15;21(1):457. Epub 2020 Oct 15.

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.

Background: The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data.

Results: Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression.

Conclusion: We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-03785-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7560063PMC
October 2020

An integrative ENCODE resource for cancer genomics.

Nat Commun 2020 07 29;11(1):3696. Epub 2020 Jul 29.

Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06520, USA.

ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-14743-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391744PMC
July 2020

RADAR: annotation and prioritization of variants in the post-transcriptional regulome of RNA-binding proteins.

Genome Biol 2020 07 30;21(1):151. Epub 2020 Jul 30.

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.

RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation and disease. Their binding sites cover more of the genome than coding exons; nevertheless, most noncoding variant prioritization methods only focus on transcriptional regulation. Here, we integrate the portfolio of ENCODE-RBP experiments to develop RADAR, a variant-scoring framework. RADAR uses conservation, RNA structure, network centrality, and motifs to provide an overall impact score. Then, it further incorporates tissue-specific inputs to highlight disease-specific variants. Our results demonstrate RADAR can successfully pinpoint variants, both somatic and germline, associated with RBP-function dysregulation, which cannot be found by most current prioritization methods, for example, variants affecting splicing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-01979-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391703PMC
July 2020

TopicNet: a framework for measuring transcriptional regulatory network change.

Bioinformatics 2020 07;36(Suppl_1):i474-i481

Department of Molecular Biophysics and Biochemistry.

Motivation: Recently, many chromatin immunoprecipitation sequencing experiments have been carried out for a diverse group of transcription factors (TFs) in many different types of human cells. These experiments manifest large-scale and dynamic changes in regulatory network connectivity (i.e. network 'rewiring'), highlighting the different regulatory programs operating in disparate cellular states. However, due to the dense and noisy nature of current regulatory networks, directly comparing the gains and losses of targets of key TFs across cell states is often not informative. Thus, here, we seek an abstracted, low-dimensional representation to understand the main features of network change.

Results: We propose a method called TopicNet that applies latent Dirichlet allocation to extract functional topics for a collection of genes regulated by a given TF. We then define a rewiring score to quantify regulatory-network changes in terms of the topic changes for this TF. Using this framework, we can pinpoint particular TFs that change greatly in network connectivity between different cellular states (such as observed in oncogenesis). Also, incorporating gene expression data, we define a topic activity score that measures the degree to which a given topic is active in a particular cellular state. And we show how activity differences can indicate differential survival in various cancers.

Availability And Implementation: The TopicNet framework and related analysis were implemented using R and all codes are available at https://github.com/gersteinlab/topicnet.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa403DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355251PMC
July 2020

DiNeR: a Differential graphical model for analysis of co-regulation Network Rewiring.

BMC Bioinformatics 2020 Jul 2;21(1):281. Epub 2020 Jul 2.

Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, 06520, USA.

Background: During transcription, numerous transcription factors (TFs) bind to targets in a highly coordinated manner to control the gene expression. Alterations in groups of TF-binding profiles (i.e. "co-binding changes") can affect the co-regulating associations between TFs (i.e. "rewiring the co-regulator network"). This, in turn, can potentially drive downstream expression changes, phenotypic variation, and even disease. However, quantification of co-regulatory network rewiring has not been comprehensively studied.

Results: To address this, we propose DiNeR, a computational method to directly construct a differential TF co-regulation network from paired disease-to-normal ChIP-seq data. Specifically, DiNeR uses a graphical model to capture the gained and lost edges in the co-regulation network. Then, it adopts a stability-based, sparsity-tuning criterion -- by sub-sampling the complete binding profiles to remove spurious edges -- to report only significant co-regulation alterations. Finally, DiNeR highlights hubs in the resultant differential network as key TFs associated with disease. We assembled genome-wide binding profiles of 104 TFs in the K562 and GM12878 cell lines, which loosely model the transition between normal and cancerous states in chronic myeloid leukemia (CML). In total, we identified 351 significantly altered TF co-regulation pairs. In particular, we found that the co-binding of the tumor suppressor BRCA1 and RNA polymerase II, a well-known transcriptional pair in healthy cells, was disrupted in tumors. Thus, DiNeR successfully extracted hub regulators and discovered well-known risk genes.

Conclusions: Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators. Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-03605-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7333332PMC
July 2020

Approaches for integrating heterogeneous RNA-seq data reveal cross-talk between microbes and genes in asthmatic patients.

Genome Biol 2020 06 22;21(1):150. Epub 2020 Jun 22.

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.

Sputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNA-seq) of sputum samples can be challenging to interpret due to the complex and heterogeneous mixtures of human cells and exogenous (microbial) material. In this study, we develop a pipeline that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. LDA(Latent Dirichlet allocation)-link connects microbes to genes using reduced-dimensionality LDA topics. We validate our method with single-cell RNA-seq and microscopy and then apply it to the sputum of asthmatic patients to find known and novel relationships between microbes and genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02033-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7310008PMC
June 2020

Passenger Mutations in More Than 2,500 Cancer Genomes: Overall Molecular Functional Impact and Consequences.

Cell 2020 03 20;180(5):915-927.e16. Epub 2020 Feb 20.

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA; Department of Computer Science, Yale University, New Haven, CT 06511, USA. Electronic address:

The dichotomous model of "drivers" and "passengers" in cancer posits that only a few mutations in a tumor strongly affect its progression, with the remaining ones being inconsequential. Here, we leveraged the comprehensive variant dataset from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project to demonstrate that-in addition to the dichotomy of high- and low-impact variants-there is a third group of medium-impact putative passengers. Moreover, we also found that molecular impact correlates with subclonal architecture (i.e., early versus late mutations), and different signatures encode for mutations with divergent impact. Furthermore, we adapted an additive-effects model from complex-trait studies to show that the aggregated effect of putative passengers, including undetected weak drivers, provides significant additional power (∼12% additive variance) for predicting cancerous phenotypes, beyond PCAWG-identified driver mutations. Finally, this framework allowed us to estimate the frequency of potential weak-driver mutations in PCAWG samples lacking any well-characterized driver alterations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2020.01.032DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7210002PMC
March 2020

GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner.

PLoS Genet 2019 08 30;15(8):e1007860. Epub 2019 Aug 30.

Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America.

There has been much effort to prioritize genomic variants with respect to their impact on "function". However, function is often not precisely defined: sometimes it is the disease association of a variant; on other occasions, it reflects a molecular effect on transcription or epigenetics. Here, we coupled multiple genomic predictors to build GRAM, a GeneRAlized Model, to predict a well-defined experimental target: the expression-modulating effect of a non-coding variant on its associated gene, in a transferable, cell-specific manner. Firstly, we performed feature engineering: using LASSO, a regularized linear model, we found transcription factor (TF) binding most predictive, especially for TFs that are hubs in the regulatory network; in contrast, evolutionary conservation, a popular feature in many other variant-impact predictors, has almost no contribution. Moreover, TF binding inferred from in vitro SELEX is as effective as that from in vivo ChIP-Seq. Second, we implemented GRAM integrating only SELEX features and expression profiles; thus, the program combines a universal regulatory score with an easily obtainable modifier reflecting the particular cell type. We benchmarked GRAM on large-scale MPRA datasets, achieving AUROC scores of 0.72 in GM12878 and 0.66 in a multi-cell line dataset. We then evaluated the performance of GRAM on targeted regions using luciferase assays in the MCF7 and K562 cell lines. We noted that changing the insertion position of the construct relative to the reporter gene gave very different results, highlighting the importance of carefully defining the exact prediction target of the model. Finally, we illustrated the utility of GRAM in fine-mapping causal variants and developed a practical software pipeline to carry this out. In particular, we demonstrated in specific examples how the pipeline could pinpoint variants that directly modulate gene expression within a larger linkage-disequilibrium block associated with a phenotype of interest (e.g., for an eQTL).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1007860DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6742416PMC
August 2019

Building a Hybrid Physical-Statistical Classifier for Predicting the Effect of Variants Related to Protein-Drug Interactions.

Structure 2019 09 3;27(9):1469-1481.e3. Epub 2019 Jul 3.

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA; Department of Computer Science, Yale University, New Haven, CT 06520, USA. Electronic address:

A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on ∼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.str.2019.06.001DOI Listing
September 2019

Comprehensive functional genomic resource and integrative model for the human brain.

Science 2018 12;362(6420)

Despite progress in defining genetic risk for psychiatric disorders, their molecular mechanisms remain elusive. Addressing this, the PsychENCODE Consortium has generated a comprehensive online resource for the adult brain across 1866 individuals. The PsychENCODE resource contains ~79,000 brain-active enhancers, sets of Hi-C linkages, and topologically associating domains; single-cell expression profiles for many cell types; expression quantitative-trait loci (QTLs); and further QTLs associated with chromatin, splicing, and cell-type proportions. Integration shows that varying cell-type proportions largely account for the cross-population variation in expression (with >88% reconstruction accuracy). It also allows building of a gene regulatory network, linking genome-wide association study variants to genes (e.g., 321 for schizophrenia). We embed this network into an interpretable deep-learning model, which improves disease prediction by ~6-fold versus polygenic risk scores and identifies key genes and pathways in psychiatric disorders.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.aat8464DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6413328PMC
December 2018

MrTADFinder: A network modularity based approach to identify topologically associating domains in multiple resolutions.

PLoS Comput Biol 2017 Jul 24;13(7):e1005647. Epub 2017 Jul 24.

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America.

Genome-wide proximity ligation based assays such as Hi-C have revealed that eukaryotic genomes are organized into structural units called topologically associating domains (TADs). From a visual examination of the chromosomal contact map, however, it is clear that the organization of the domains is not simple or obvious. Instead, TADs exhibit various length scales and, in many cases, a nested arrangement. Here, by exploiting the resemblance between TADs in a chromosomal contact map and densely connected modules in a network, we formulate TAD identification as a network optimization problem and propose an algorithm, MrTADFinder, to identify TADs from intra-chromosomal contact maps. MrTADFinder is based on the network-science concept of modularity. A key component of it is deriving an appropriate background model for contacts in a random chain, by numerically solving a set of matrix equations. The background model preserves the observed coverage of each genomic bin as well as the distance dependence of the contact frequency for any pair of bins exhibited by the empirical map. Also, by introducing a tunable resolution parameter, MrTADFinder provides a self-consistent approach for identifying TADs at different length scales, hence the acronym "Mr" standing for Multiple Resolutions. We then apply MrTADFinder to various Hi-C datasets. The identified domain boundaries are marked by characteristic signatures in chromatin marks and transcription factors (TF) that are consistent with earlier work. Moreover, by calling TADs at different length scales, we observe that boundary signatures change with resolution, with different chromatin features having different characteristic length scales. Furthermore, we report an enrichment of HOT (high-occupancy target) regions near TAD boundaries and investigate the role of different TFs in determining boundaries at various resolutions. To further explore the interplay between TADs and epigenetic marks, as tumor mutational burden is known to be coupled to chromatin structure, we examine how somatic mutations are distributed across boundaries and find a clear stepwise pattern. Overall, MrTADFinder provides a novel computational framework to explore the multi-scale structures in Hi-C contact maps.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1005647DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5546724PMC
July 2017

The real cost of sequencing: scaling computation to keep pace with data generation.

Genome Biol 2016 Mar 23;17:53. Epub 2016 Mar 23.

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.

As the cost of sequencing continues to decrease and the amount of sequence data generated grows, new paradigms for data storage and analysis are increasingly important. The relative scaling behavior of these evolving technologies will impact genomics research moving forward.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-016-0917-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4806511PMC
March 2016

Integrative Genomic Analyses Yield Cell-Cycle Regulatory Programs with Prognostic Value.

Mol Cancer Res 2016 Apr 8;14(4):332-43. Epub 2016 Feb 8.

Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire.

Unlabelled: Liposarcoma is the second most common form of sarcoma, which has been categorized into four molecular subtypes, which are associated with differential prognosis of patients. However, the transcriptional regulatory programs associated with distinct histologic and molecular subtypes of liposarcoma have not been investigated. This study uses integrative analyses to systematically define the transcriptional regulatory programs associated with liposarcoma. Likewise, computational methods are used to identify regulatory programs associated with different liposarcoma subtypes, as well as programs that are predictive of prognosis. Further analysis of curated gene sets was used to identify prognostic gene signatures. The integration of data from a variety of sources, including gene expression profiles, transcription factor-binding data from ChIP-Seq experiments, curated gene sets, and clinical information of patients, indicated discrete regulatory programs (e.g., controlled by E2F1 and E2F4), with significantly different regulatory activity in one or multiple subtypes of liposarcoma with respect to normal adipose tissue. These programs were also shown to be prognostic, wherein liposarcoma patients with higher E2F4 or E2F1 activity associated with unfavorable prognosis. A total of 259 gene sets were significantly associated with patient survival in liposarcoma, among which > 50% are involved in cell cycle and proliferation.

Implications: These integrative analyses provide a general framework that can be applied to investigate the mechanism and predict prognosis of different cancer types.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1158/1541-7786.MCR-15-0368DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5033644PMC
April 2016

Regulators associated with clinical outcomes revealed by DNA methylation data in breast cancer.

PLoS Comput Biol 2015 May 21;11(5):e1004269. Epub 2015 May 21.

Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America; Institute for Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, United States of America; Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, United States of America.

The regulatory architecture of breast cancer is extraordinarily complex and gene misregulation can occur at many levels, with transcriptional malfunction being a major cause. This dysfunctional process typically involves additional regulatory modulators including DNA methylation. Thus, the interplay between transcription factor (TF) binding and DNA methylation are two components of a cancer regulatory interactome presumed to display correlated signals. As proof of concept, we performed a systematic motif-based in silico analysis to infer all potential TFs that are involved in breast cancer prognosis through an association with DNA methylation changes. Using breast cancer DNA methylation and clinical data derived from The Cancer Genome Atlas (TCGA), we carried out a systematic inference of TFs whose misregulation underlie different clinical subtypes of breast cancer. Our analysis identified TFs known to be associated with clinical outcomes of p53 and ER (estrogen receptor) subtypes of breast cancer, while also predicting new TFs that may also be involved. Furthermore, our results suggest that misregulation in breast cancer can be caused by the binding of alternative factors to the binding sites of TFs whose activity has been ablated. Overall, this study provides a comprehensive analysis that links DNA methylation to TF binding to patient prognosis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1004269DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4440643PMC
May 2015

FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer.

Genome Biol 2014 ;15(10):480

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.

Identification of noncoding drivers from thousands of somatic alterations in a typical tumor is a difficult and unsolved problem. We report a computational framework, FunSeq2, to annotate and prioritize these mutations. The framework combines an adjustable data context integrating large-scale genomics and cancer resources with a streamlined variant-prioritization pipeline. The pipeline has a weighted scoring system combining: inter- and intra-species conservation;loss- and gain-of-function events for transcription-factor binding; enhancer-gene linkages and network centrality; and per-element recurrence across samples. We further highlight putative drivers with information specific to a particular sample, such as differential expression. FunSeq2 is available from funseq2.gersteinlab.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-014-0480-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4203974PMC
October 2015

Whole-genome bisulfite sequencing of multiple individuals reveals complementary roles of promoter and gene body methylation in transcriptional regulation.

Genome Biol 2014 Jul 30;15(7):408. Epub 2014 Jul 30.

Background: DNA methylation is an important type of epigenetic modification involved in gene regulation. Although strong DNA methylation at promoters is widely recognized to be associated with transcriptional repression, many aspects of DNA methylation remain not fully understood, including the quantitative relationships between DNA methylation and expression levels, and the individual roles of promoter and gene body methylation.

Results: Here we present an integrated analysis of whole-genome bisulfite sequencing and RNA sequencing data from human samples and cell lines. We find that while promoter methylation inversely correlates with gene expression as generally observed, the repressive effect is clear only on genes with a very high DNA methylation level. By means of statistical modeling, we find that DNA methylation is indicative of the expression class of a gene in general, but gene body methylation is a better indicator than promoter methylation. These findings are general in that a model constructed from a sample or cell line could accurately fit the unseen data from another. We further find that promoter and gene body methylation have minimal redundancy, and either one is sufficient to signify low expression. Finally, we obtain increased modeling power by integrating histone modification data with the DNA methylation data, showing that neither type of information fully subsumes the other.

Conclusion: Our results suggest that DNA methylation outside promoters also plays critical roles in gene regulation. Future studies on gene regulatory mechanisms and disease-associated differential methylation should pay more attention to DNA methylation at gene bodies and other non-promoter regions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-014-0408-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4189148PMC
July 2014

Transcriptional profiling of angiogenesis activities of calycosin in zebrafish.

Mol Biosyst 2011 Nov 12;7(11):3112-21. Epub 2011 Sep 12.

Institute of Chinese Medical Sciences, University of Macau, Av. Padre Tomas Pereira, Taipa, Macau, China.

Angiogenesis plays an important role in a wide range of physiological processes and many diseases are associated with the dysregulation of angiogenesis. The commonly used Chinese herbal medicine Radix Astragali (known as Huang qi in Chinese) is a potential candidate for treating this type of disease. Calycosin, a major isoflavonoid in Radix Astragali, was identified in our earlier study and shown to induce angiogenesis in human umbilical vein endothelial cells (HUVEC) in vitro and in zebrafish embryos in vivo. Using zebrafish as a testing model, we investigated the angiogenic effect of calycosin on the subintestinal vessels (SIVs) in zebrafish embryos. Our findings using transcriptional profiling by deep sequencing, and confirmed by quantitative real-time PCR (qPCR), demonstrate that calycosin modulated vascular endothelial growth factor (VEGF), fibroblast growth factor (FGF) and ErbB signaling pathways. The inhibitory effects of calycosin-induced phenotypic responses by several pathway-specific inhibitors (VRI, SU5402, MEK1/2 Inhibitor, Wortmannin and LY294002) further identified the potential involvement of VEGF(R) and FGF(R) signaling pathways in the angiogenic activities of calycosin. We present a comprehensive framework of study using fluorescence microscopy, transcriptomics and qPCR to demonstrate the proangiogenic effects of calycosin in vivo. The data have elucidated the connection between morphological observations and genomic evidence, indicating the potential roles of several key signaling pathways in angiogenesis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1039/c1mb05206cDOI Listing
November 2011

Combined in vivo imaging and omics approaches reveal metabolism of icaritin and its glycosides in zebrafish larvae.

Mol Biosyst 2011 Jul 29;7(7):2128-38. Epub 2011 Mar 29.

State Key Laboratory of Quality Research of Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Av. Padre Tomás Pereira S.J., Taipa, Macao, China.

Flavonoids isolated from Herba Epimedii such as icaritin, icariin and epimedin C have been suggested as potential bone anabolic compounds. However, the "specific localized effects" of these flavonoids in bone, in vivo, and the metabolism of these flavonoids in zebrafish larvae have never been demonstrated. In this study, we used multiple methods including in vivo imaging, drug metabolites profiling, transcriptomic and proteomic approaches to determine the mechanisms involved in the distribution and metabolism of the flavonoids in zebrafish larvae by measuring the fluorescence emission, in vivo, of icaritin and its glycoside derivatives. The fluorescence emission mechanism of icaritin in vitro was identified by spectrophotometric analysis, and the fluorescent property of icaritin was used as a probe to visualize the metabolism and distribution of icaritin and its glycoside derivatives in zebrafish larvae. Phase I and phase II metabolism of icaritin and its derivatives were identified in zebrafish by mass spectrometry. The combined transcriptomics and proteomics demonstrate a high degree of conservation of phase I and phase II drug metabolic enzymes between zebrafish larvae and mammals. Icaritin and its glycoside derivatives were demonstrated using combined approaches of in vivo imaging, drug metabolites identification, and transcriptomic and proteomic profiling to illustrate phase I and phase II metabolism of the flavonoids and their distribution in bone of zebrafish larvae. This study provides a new methodological model for use of the zebrafish larvae to examine drug metabolism.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1039/c1mb00001bDOI Listing
July 2011

Microarray analysis of differentially expressed genes in mouse bone marrow tissues after ionizing radiation.

Int J Radiat Biol 2006 Jul;82(7):511-21

Beijing Institute of Radiation Medicine, Beijing, China.

Purpose: To identify differentially expressed genes in mouse bone marrow involved in radiation-induced injury.

Materials And Methods: Microarray analysis was used to identify the differentially expressed genes and other techniques, e.g., polymerase chain reaction (PCR), western-blotting and antisense were also used to validate the results.

Results: DNA microarray analysis demonstrated that the mRNA of 34 genes increased and 69 genes decreased in mouse bone marrow cells (BMC) from C57BL mice 6 h after a whole body dose of 6.5 Gy. These differentially expressed genes were involved in a number of processes including DNA replication/repair, proliferation/apoptosis, cell cycle control and RNA processing. In these experiments, a decline of the mammalian homolog Sir2a (an acronym for the silent mating type information regulation 2 homolog [SIRT1]) mRNA accompanied by an increase of P53 protein acetylation was observed in irradiated BMC. To determine whether the reduced SIRT1 is related to the higher acetylation status of P53 after irradiation, we designed and synthesized antisense oligonucleotides (AS) targeting human SIRT1 mRNA. Notably, AS transfection increased tumor protein 53 (P53) protein acetylation and bax-luciferase activity in human bone marrow stromal cell line (HS-5) after radiation. Furthermore, the AS transfer stimulated cell apoptosis in post-irradiation HS-5 cells.

Conclusion: Ionizing radiation (IR) affects the expression of a series of genes including genes involved in G1/S transition and the P53 pathway. Among those, reduction of SIRT1 was seen to be involved in transactivation of P53.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/09553000600857389DOI Listing
July 2006

Selection of antisense oligonucleotides based on multiple predicted target mRNA structures.

BMC Bioinformatics 2006 Mar 9;7:122. Epub 2006 Mar 9.

Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, P R China.

Background: Local structures of target mRNAs play a significant role in determining the efficacies of antisense oligonucleotides (ODNs), but some structure-based target site selection methods are limited by uncertainties in RNA secondary structure prediction. If all the predicted structures of a given mRNA within a certain energy limit could be used simultaneously, target site selection would obviously be improved in both reliability and efficiency. In this study, some key problems in ODN target selection on the basis of multiple predicted target mRNA structures are systematically discussed.

Results: Two methods were considered for merging topologically different RNA structures into integrated representations. Several parameters were derived to characterize local target site structures. Statistical analysis on a dataset with 448 ODNs against 28 different mRNAs revealed 9 features quantitatively associated with efficacy. Features of structural consistency seemed to be more highly correlated with efficacy than indices of the proportion of bases in single-stranded or double-stranded regions. The local structures of the target site 5' and 3' termini were also shown to be important in target selection. Neural network efficacy predictors using these features, defined on integrated structures as inputs, performed well in "minus-one-gene" cross-validation experiments.

Conclusion: Topologically different target mRNA structures can be merged into integrated representations and then used in computer-aided ODN design. The results of this paper imply that some features characterizing multiple predicted target site structures can be used to predict ODN efficacy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-7-122DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1421440PMC
March 2006

AOBase: a database for antisense oligonucleotides selection and design.

Nucleic Acids Res 2006 Jan;34(Database issue):D664-7

Beijing Institute of Radiation Medicine, Beijing 100850, China.

Antisense oligonucleotides (ODNs) technology is one of the important approaches for the sequence-specific knockdown of gene expression. ODNs have been used as research tools in the post-genome era, as well as new types of therapeutic agents. Since finding effective target sites within RNA is a hard work for antisense ODNs design, various experimental methods and computational approaches have been proposed. For better sharing of the experimented and published ODNs, valid and invalid ODNs reported in literatures are screened, collected and stored in AOBase. Till now, approximately 700 ODNs against 46 target mRNAs are contained in AOBase. Entries can be explored via TargetSearch and AOSearch web retrieval interfaces. AOBase can not only be useful in ODNs selection for gene function exploration, but also contribute to mining rules and developing algorithms for rational ODNs design. AOBase is freely accessible via http://www.bioit.org.cn/ao/aobase.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkj065DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1347428PMC
January 2006