Publications by authors named "Chenghai Xue"

14 Publications

  • Page 1 of 1

SUV39H1 regulates the progression of MLL-AF9-induced acute myeloid leukemia.

Oncogene 2020 12 9;39(50):7239-7252. Epub 2020 Oct 9.

State Key Laboratory of Experimental Hematology, National Clinical Research Center for Hematological Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 300020, Tianjin, China.

Epigenetic regulations play crucial roles in leukemogenesis and leukemia progression. SUV39H1 is the dominant H3K9 methyltransferase in the hematopoietic system, and its expression declines with aging. However, the role of SUV39H1 via its-mediated repressive modification H3K9me3 in leukemogenesis/leukemia progression remains to be explored. We found that SUV39H1 was down-regulated in a variety of leukemias, including MLL-r AML, as compared with normal individuals. Decreased levels of Suv39h1 expression and genomic H3K9me3 occupancy were observed in LSCs from MLL-r-induced AML mouse models in comparison with that of hematopoietic stem/progenitor cells. Suv39h1 overexpression increased leukemia latency and decreased the frequency of LSCs in MLL-r AML mouse models, while Suv39h1 knockdown accelerated disease progression with increased number of LSCs. Increased Suv39h1 expression led to the inactivation of Hoxb13 and Six1, as well as reversion of Hoxa9/Meis1 downstream target genes, which in turn decelerated leukemia progression. Interestingly, Hoxb13 expression is up-regulated in MLL-AF9-induced AML cells, while knockdown of Hoxb13 in MLL-AF9 leukemic cells significantly prolonged the survival of leukemic mice with reduced LSC frequencies. Our data revealed that SUV39H1 functions as a tumor suppressor in MLL-AF9-induced AML progression. These findings provide the direct link of SUV39H1 to AML development and progression.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41388-020-01495-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728597PMC
December 2020

Stain Standardization Capsule for Application-Driven Histopathological Image Normalization.

IEEE J Biomed Health Inform 2021 02 5;25(2):337-347. Epub 2021 Feb 5.

Color consistency is crucial to developing robust deep learning methods for histopathological image analysis. With the increasing application of digital histopathological slides, the deep learning methods are probably developed based on the data from multiple medical centers. This requirement makes it a challenging task to normalize the color variance of histopathological images from different medical centers. In this paper, we propose a novel color standardization module named stain standardization capsule based on the capsule network and the corresponding dynamic routing algorithm. The proposed module can learn and generate uniform stain separation outputs for histopathological images in various color appearance without the reference to manually selected template images. The proposed module is light and can be jointly trained with the application-driven CNN model. The proposed method was validated on three histopathology datasets and a cytology dataset, and was compared with state-of-the-art methods. The experimental results have demonstrated that the SSC module is effective in improving the performance of histopathological image analysis and has achieved the best performance in the compared methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/JBHI.2020.2983206DOI Listing
February 2021

Effects of somatic alterations at pathway level are more mechanism-explanatory and clinically applicable to quantity of liver metastases of colorectal cancer.

Cancer Med 2019 08 20;8(10):4732-4742. Epub 2019 Jun 20.

Large-scale Data Analysis Center of Cancer Precision Medicine, Cancer Hospital of Chinese Medical University, Liaoning Provincial Cancer Institute and Hospital, Shenyang, China.

Background: The quantity of metastases lesions is an important reference when it comes to making a more informed treatment decision for patients with colorectal cancer liver metastases. However, the molecular alterations in patients with different numbers of lesions have not been systematically studied.

Methods: We investigated somatic alterations and microsatellite instability (MSI) of liver metastases from patients with single, multiple or diffuse metastasis lesions. A new algorithm "Pathway Damage Score" was developed to comprehensively assess the functional impact of somatic alterations at the pathway level. Pathogenic pathways of different metastasis were identified and their prognosis effects were evaluated. Furthermore, the subnetworks and affected phenotypes of the altered genes in each pathogenic pathway were analyzed.

Results: Somatic alterations and altered genes occurred sporadically as well as in MSI state in different metastasis types, although MSS patients had more metastatic lesions than that of the MSI patients. Every metastasis group has their own pathogenic pathways and damaged "Cargo recognition for clathrin-mediated endocytosis" is significantly associated with poor prognosis (P < 0.001). Further pathway subnetwork analysis showed that except conventional drivers, other genes could also contribute to metastasis formation.

Conclusions: Progression of liver metastasis could be driven by the coefficient of all altered genes belonging to the pathways. Thus, compared to somatic alterations and genes, pathway level analysis is more reasonable for functional interpretations of molecular alterations in clinical samples.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cam4.2368DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6712451PMC
August 2019

The fusion landscape of hepatocellular carcinoma.

Mol Oncol 2019 05 11;13(5):1214-1225. Epub 2019 Apr 11.

Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Most cases of hepatocellular carcinoma (HCC) are already advanced at the time of diagnosis, which limits treatment options. Challenges in early-stage diagnosis may be due to the genetic complexity of HCC. Gene fusion plays a critical function in tumorigenesis and cancer progression in multiple cancers, yet the identities of fusion genes as potential diagnostic markers in HCC have not been investigated. Here, we employed STAR-Fusion and identified 43 recurrent fusion events in our own and four public RNA-seq datasets. We identified 2354 different gene fusions in two hepatitis B virus (HBV)-HCC patients. Validation analysis against the four RNA-seq datasets revealed that only 1.8% (43/2354) were recurrent fusions. Comparison with the four fusion databases demonstrated that 19 recurrent fusions were not previously annotated to diseases and three were annotated as disease-related fusion events. Finally, we validated six of the novel fusion events, including RP11-476K15.1-CTD-2015H3.2, by RT-PCR and Sanger sequencing of 14 pairs of HBV-related HCC samples. In summary, our study provides new insights into gene fusions in HCC and may contribute to the development of anti-HCC therapy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/1878-0261.12479DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6487730PMC
May 2019

Adaptive color deconvolution for histological WSI normalization.

Comput Methods Programs Biomed 2019 Mar 15;170:107-120. Epub 2019 Jan 15.

Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China.

Background And Objective: Color consistency of histological images is significant for developing reliable computer-aided diagnosis (CAD) systems. However, the color appearance of digital histological images varies across different specimen preparations, staining, and scanning situations. This variability affects the diagnosis and decreases the accuracy of CAD approaches. It is important and challenging to develop effective color normalization methods for digital histological images.

Methods: We proposed a novel adaptive color deconvolution (ACD) algorithm for stain separation and color normalization of hematoxylin-eosin-stained whole slide images (WSIs). To avoid artifacts and reduce the failure rate of normalization, multiple prior knowledges of staining are considered and embedded in the ACD model. To improve the capacity of color normalization for various WSIs, an integrated optimization is designed to simultaneously estimate the parameters of the stain separation and color normalization. The solving of ACD model and application of the proposed method involves only pixel-wise operation, which makes it very efficient and applicable to WSIs.

Results: The proposed method was evaluated on four WSI-datasets including breast, lung and cervix cancers and was compared with 6 state-of-the-art methods. The proposed method achieved the most consistent performance in color normalization according to the quantitative metrics. Through a qualitative assessment for 500 WSIs, the failure rate of normalization was 0.4% and the structure and color artifacts were effectively avoided. Applied to CAD methods, the area under receiver operating characteristic curve for cancer image classification was improved from 0.842 to 0.914. The average time of solving the ACD model is 2.97 s.

Conclusions: The proposed ACD model has prone effective for color normalization of hematoxylin-eosin-stained WSIs in various color appearances. The model is robust and can be applied to WSIs containing different lesions. The proposed model can be efficiently solved and is effective to improve the performance of cancer image recognition, which is adequate for developing automatic CAD programs and systems based on WSIs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cmpb.2019.01.008DOI Listing
March 2019

Genome-wide DNA methylation analysis identifies candidate epigenetic markers and drivers of hepatocellular carcinoma.

Brief Bioinform 2018 01;19(1):101-108

The alteration of DNA methylation landscape is a key epigenetic event in cancer. As the accumulation of large-scale genome-wide DNA methylation data from clinical samples, we are able to characterize the patterns of DNA methylation alterations for identifying candidate epigenetic markers and drivers. In this survey, we take hepatocellular carcinoma (HCC) as an example to show the basic steps of analyzing the DNA methylation patterns in cancer across multiple data sets. We collected three genome-wide DNA methylation data sets with ∼800 clinical samples and the corresponding gene expression data sets. First, by quantitatively analyzing two global methylation alterations, it is found that about 90% tumors acquire either genome-wide DNA hypo-methylation or CpG island methylator phenotype. Second, probe-level analysis identified 267, 228 and 197 hyper-methylated sites in promoter regions for the three data sets, respectively. These local hyper-methylated patterns are highly consistent: 84 sites (from 61 promoters) are hyper-methylated in all the three studied data sets, including many previously reported genes, such as CDKL2, TBX15 and NKX6-2. Then, these hyper-methylated sites were used as candidate markers to classify tumor and non-tumor samples. The classifiers based on only 10 selected probes can achieve high discriminative ability across different data sets. Finally, by integrative analyzing DNA methylation and gene expression data, we identified 222 candidate epigenetic drivers, which are enriched in inflammatory response and multiple metabolic pathways. A set of high-confidence candidates, including SFN, SPP1 and TKT, are significantly associated with patients' overall survivals. In summary, this study systematically characterized the DNA methylation alterations and their impacts on gene expressions in HCCs based on multiple data sets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbw094DOI Listing
January 2018

Comparative analysis of the transcriptome across distant species.

Nature 2014 Aug;512(7515):445-8

Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature13424DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4155737PMC
August 2014

HITS-CLIP and integrative modeling define the Rbfox splicing-regulatory network linked to brain development and autism.

Cell Rep 2014 Mar 6;6(6):1139-1152. Epub 2014 Mar 6.

Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA. Electronic address:

The RNA binding proteins Rbfox1/2/3 regulate alternative splicing in the nervous system, and disruption of Rbfox1 has been implicated in autism. However, comprehensive identification of functional Rbfox targets has been challenging. Here, we perform HITS-CLIP for all three Rbfox family members in order to globally map, at a single-nucleotide resolution, their in vivo RNA interaction sites in the mouse brain. We find that the two guanines in the Rbfox binding motif UGCAUG are critical for protein-RNA interactions and crosslinking. Using integrative modeling, these interaction sites, combined with additional datasets, define 1,059 direct Rbfox target alternative splicing events. Over half of the quantifiable targets show dynamic changes during brain development. Of particular interest are 111 events from 48 candidate autism-susceptibility genes, including syndromic autism genes Shank3, Cacna1c, and Tsc2. Alteration of Rbfox targets in some autistic brains is correlated with downregulation of all three Rbfox proteins, supporting the potential clinical relevance of the splicing-regulatory network.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2014.02.005DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3992522PMC
March 2014

Landscape of transcription in human cells.

Nature 2012 Sep;489(7414):101-8

Centre for Genomic Regulation and UPF, Doctor Aiguader 88, Barcelona 08003, Catalonia, Spain.

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature11233DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3684276PMC
September 2012

An encyclopedia of mouse DNA elements (Mouse ENCODE).

Genome Biol 2012 Aug 13;13(8):418. Epub 2012 Aug 13.

To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2012-13-8-418DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3491367PMC
August 2012

Finding noncoding RNA transcripts from low abundance expressed sequence tags.

Cell Res 2008 Jun;18(6):695-700

Department of Entomology, Nanjing Agricultural University, Nanjing 210095, China.

It has been proved that noncoding RNA (ncRNA) genes are much more numerous than expected. However, it remains a difficult task to identify ncRNAs with either computational algorithms or biological experiments. Recent reports have suggested that ncRNAs may also appear in the expressed sequence tags (EST's) database. Nevertheless, intergenic ESTs have received little attention and are poorly annotated owing to their low abundance. Here, we have developed a computational strategy for discovering ncRNA genes from human ESTs. We first collected ESTs that are located in the intergenic regions and do not have detailed annotations. The intergenic regions were divided into non-overlapping 50-nt windows and PhastCons scores obtained from the UCSC database were assigned to these windows. We kept conserved windows that had PhastCons scores of over 0.8 and that had at least three supporting ESTs to act as seeds. Each cluster of ESTs corresponding to the seeds was assembled into a long contig. We used two criteria to screen for ncRNA transcripts from these contigs: the first was that the longest predicted open reading frame was less than 300 nt and the second was that the likely Pol-II promoters exist within 2,000 nt upstream or downstream of the contigs. As a result, 118 novel ncRNA genes were identified from human low abundance ESTs. Of seven randomly selected candidates, six were transcribed in human 2BS cells as shown by RT-PCR. Our work proves that the EST is a 'hidden treasure' for detecting novel ncRNA genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/cr.2008.59DOI Listing
June 2008

Functional importance of different patterns of correlation between adjacent cassette exons in human and mouse.

BMC Genomics 2008 Apr 26;9:191. Epub 2008 Apr 26.

MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, PRoC.

Background: Alternative splicing expands transcriptome diversity and plays an important role in regulation of gene expression. Previous studies focus on the regulation of a single cassette exon, but recent experiments indicate that multiple cassette exons within a gene may interact with each other. This interaction can increase the potential to generate various transcripts and adds an extra layer of complexity to gene regulation. Several cases of exon interaction have been discovered. However, the extent to which the cassette exons coordinate with each other remains unknown.

Results: Based on EST data, we employed a metric of correlation coefficients to describe the interaction between two adjacent cassette exons and then categorized these exon pairs into three different groups by their interaction (correlation) patterns. Sequence analysis demonstrates that strongly-correlated groups are more conserved and contain a higher proportion of pairs with reading frame preservation in a combinatorial manner. Multiple genome comparison further indicates that different groups of correlated pairs have different evolutionary courses: (1) The vast majority of positively-correlated pairs are old, (2) most of the weakly-correlated pairs are relatively young, and (3) negatively-correlated pairs are a mixture of old and young events.

Conclusion: We performed a large-scale analysis of interactions between adjacent cassette exons. Compared with weakly-correlated pairs, the strongly-correlated pairs, including both the positively and negatively correlated ones, show more evidence that they are under delicate splicing control and tend to be functionally important. Additionally, the positively-correlated pairs bear strong resemblance to constitutive exons, which suggests that they may evolve from ancient constitutive exons, while negatively and weakly correlated pairs are more likely to contain newly emerging exons.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-9-191DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2432081PMC
April 2008

RScan: fast searching structural similarities for structured RNAs in large databases.

BMC Genomics 2007 Jul 31;8:257. Epub 2007 Jul 31.

MOE Key Laboratory of Bioinformatics and Bioinformatics Div, TNLIST/Department of Automation, Tsinghua University, Beijing, China.

Background: Many RNAs have evolutionarily conserved secondary structures instead of primary sequences. Recently, there are an increasing number of methods being developed with focus on the structural alignments for finding conserved secondary structures as well as common structural motifs in pair-wise or multiple sequences. A challenging task is to search similar structures quickly for structured RNA sequences in large genomic databases since existing methods are too slow to be used in large databases.

Results: An implementation of a fast structural alignment algorithm, RScan, is proposed to fulfill the task. RScan is developed by levering the advantages of both hashing algorithms and local alignment algorithms. In our experiment, on the average, the times for searching a tRNA and an rRNA in the randomized A. pernix genome are only 256 seconds and 832 seconds respectively by using RScan, but need 3,178 seconds and 8,951 seconds respectively by using an existing method RSEARCH. Remarkably, RScan can handle large database queries, taking less than 4 minutes for searching similar structures for a microRNA precursor in human chromosome 21.

Conclusion: These results indicate that RScan is a preferable choice for real-life application of searching structural similarities for structured RNAs in large databases. RScan software is freely available at http://bioinfo.au.tsinghua.edu.cn/member/cxue/rscan/RScan.htm.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-8-257DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1949409PMC
July 2007

Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine.

BMC Bioinformatics 2005 Dec 29;6:310. Epub 2005 Dec 29.

Laboratory of Complex Systems and Intelligence Science, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, China.

Background: MicroRNAs (miRNAs) are a group of short (approximately 22 nt) non-coding RNAs that play important regulatory roles. MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures. However, a large amount of similar hairpins can be folded in many genomes. Almost all current methods for computational prediction of miRNAs use comparative genomic approaches to identify putative pre-miRNAs from candidate hairpins. Ab initio method for distinguishing pre-miRNAs from sequence segments with pre-miRNA-like hairpin structures is lacking. Being able to classify real vs. pseudo pre-miRNAs is important both for understanding of the nature of miRNAs and for developing ab initio prediction methods that can discovery new miRNAs without known homology.

Results: A set of novel features of local contiguous structure-sequence information is proposed for distinguishing the hairpins of real pre-miRNAs and pseudo pre-miRNAs. Support vector machine (SVM) is applied on these features to classify real vs. pseudo pre-miRNAs, achieving about 90% accuracy on human data. Remarkably, the SVM classifier built on human data can correctly identify up to 90% of the pre-miRNAs from other species, including plants and virus, without utilizing any comparative genomics information.

Conclusion: The local structure-sequence features reflect discriminative and conserved characteristics of miRNAs, and the successful ab initio classification of real and pseudo pre-miRNAs opens a new approach for discovering new miRNAs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-6-310DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1360673PMC
December 2005
-->