Publications by authors named "Hua-Sheng Chiu"

17 Publications

  • Page 1 of 1

Cell lines of the same anatomic site and histologic type show large variability in intrinsic radiosensitivity and relative biological effectiveness to protons and carbon ions.

Med Phys 2021 Apr 10. Epub 2021 Apr 10.

The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, USA.

Purpose: To show that intrinsic radiosensitivity varies greatly for protons and carbon (C) ions in addition to photons, and that DNA repair capacity remains important in governing this variability.

Methods: We measured or obtained from the literature clonogenic survival data for a number of human cancer cell lines exposed to photons, protons (9.9 keV/μm), and C-ions (13.3-77.1 keV/μm). We characterized their intrinsic radiosensitivity by the dose for 10% or 50% survival (D or D ), and quantified the variability at each radiation quality by the coefficient of variation (COV) in D and D . We also treated cells with DNA repair inhibitors prior to irradiation to assess how DNA repair capacity affects their variability.

Results: We found no statistically significant differences in the COVs of D or D between any of the radiation qualities investigated. The same was true regardless of whether the cells were treated with DNA repair inhibitors, or whether they were stratified into histologic subsets. Even within histologic subsets, we found remarkable differences in radiosensitivity for high LET C-ions that were often greater than the variations in RBE, with brain cancer cells varying in D (D ) up to 100% (131%) for 77.1 keV/μm C-ions, and non-small cell lung cancer and pancreatic cancer cell lines varying up to 55% (76%) and 51% (78%), respectively, for 60.5 keV/μm C-ions. The cell lines with modulated DNA repair capacity had greater variability in intrinsic radiosensitivity across all radiation qualities.

Conclusions: Even for cell lines of the same histologic type, there are remarkable variations in intrinsic radiosensitivity, and these variations do not differ significantly between photon, proton or C-ion radiation. The importance of DNA repair capacity in governing the variability in intrinsic radiosensitivity is not significantly diminished for higher LET radiation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/mp.14878DOI Listing
April 2021

Pan-cancer clinical and molecular analysis of racial disparities.

Cancer 2020 02 15;126(4):800-807. Epub 2019 Nov 15.

Department of Gynecologic Oncology and Reproductive Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas.

Background: Racial disparities in cancer outcomes are increasingly recognized, but comprehensive analyses, including molecular studies, are limited. The objective of the current study was to perform a pan-cancer clinical and epigenetic molecular analysis of outcomes in African American (AA) and European American (EA) patients.

Methods: Cross-platform analyses using cancer databases (the Surveillance, Epidemiology, and End Results program database and the National Cancer Data Base) and a molecular database (The Cancer Genome Ancestry Atlas) were performed to evaluate clinical and epigenetic molecular differences between AA and EA patients based on genetic ancestry.

Results: In the primary pan-cancer survival analysis using the Surveillance, Epidemiology, and End Results database (2,045,839 patients; 87.5% EA and 12.5% AA), AA patients had higher mortality rates for 28 of 42 cancer types analyzed (hazard ratio, >1.0). AAs continued to have higher mortality in 13 cancer types after adjustment for socioeconomic variables using the National Cancer Database (5,150,023 patients; 11.6% AA and 88.4% EA). Then, molecular features of 5,283 tumors were analyzed in patients who had genetic ancestry data available (87.2% EA and 12.8% AA). Genes were identified with altered DNA methylation along with increased microRNA expression levels unique to AA patients that are associated with cancer drug resistance. Increased miRNAs (miR-15a, miR-17, miR-130-3p, miR-181a) were noted in common among AAs with breast, kidney, thyroid, or prostate carcinomas.

Conclusions: The current results identified epigenetic features in AA patients who have cancer that may contribute to higher mortality rates compared with EA patients who have cancer. Therefore, a focus on molecular signatures unique to AAs may identify actionable molecular abnormalities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cncr.32598DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6992510PMC
February 2020

Author Correction: Integrative analysis identifies lincRNAs up- and downstream of neuroblastoma driver genes.

Sci Rep 2019 Jul 17;9(1):10536. Epub 2019 Jul 17.

Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.

A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has been fixed in the paper.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-019-46785-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6635357PMC
July 2019

Integrative analysis identifies lincRNAs up- and downstream of neuroblastoma driver genes.

Sci Rep 2019 04 5;9(1):5685. Epub 2019 Apr 5.

Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.

Long intergenic non-coding RNAs (lincRNAs) are emerging as integral components of signaling pathways in various cancer types. In neuroblastoma, only a handful of lincRNAs are known as upstream regulators or downstream effectors of oncogenes. Here, we exploit RNA sequencing data of primary neuroblastoma tumors, neuroblast precursor cells, neuroblastoma cell lines and various cellular perturbation model systems to define the neuroblastoma lincRNome and map lincRNAs up- and downstream of neuroblastoma driver genes MYCN, ALK and PHOX2B. Each of these driver genes controls the expression of a particular subset of lincRNAs, several of which are associated with poor survival and are differentially expressed in neuroblastoma tumors compared to neuroblasts. By integrating RNA sequencing data from both primary tumor tissue and cancer cell lines, we demonstrate that several of these lincRNAs are expressed in stromal cells. Deconvolution of primary tumor gene expression data revealed a strong association between stromal cell composition and driver gene status, resulting in differential expression of these lincRNAs. We also explored lincRNAs that putatively act upstream of neuroblastoma driver genes, either as presumed modulators of driver gene activity, or as modulators of effectors regulating driver gene expression. This analysis revealed strong associations between the neuroblastoma lincRNAs MIAT and MEG3 and MYCN and PHOX2B activity or expression. Together, our results provide a comprehensive catalogue of the neuroblastoma lincRNome, highlighting lincRNAs up- and downstream of key neuroblastoma driver genes. This catalogue forms a solid basis for further functional validation of candidate neuroblastoma lincRNAs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-019-42107-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6451017PMC
April 2019

LIN28 Selectively Modulates a Subclass of Let-7 MicroRNAs.

Mol Cell 2018 07;71(2):271-283.e5

Department of Systems Biology, Columbia University, New York, NY 10032, USA. Electronic address:

LIN28 is a bipartite RNA-binding protein that post-transcriptionally inhibits the biogenesis of let-7 microRNAs to regulate development and influence disease states. However, the mechanisms of let-7 suppression remain poorly understood because LIN28 recognition depends on coordinated targeting by both the zinc knuckle domain (ZKD), which binds a GGAG-like element in the precursor, and the cold shock domain (CSD), whose binding sites have not been systematically characterized. By leveraging single-nucleotide-resolution mapping of LIN28 binding sites in vivo, we determined that the CSD recognizes a (U)GAU motif. This motif partitions the let-7 microRNAs into two subclasses, precursors with both CSD and ZKD binding sites (CSD) and precursors with ZKD but no CSD binding sites (CSD). LIN28 in vivo recognition-and subsequent 3' uridylation and degradation-of CSD precursors is more efficient, leading to their stronger suppression in LIN28-activated cells and cancers. Thus, CSD binding sites amplify the regulatory effects of LIN28.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.molcel.2018.06.029DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6238216PMC
July 2018

The number of titrated microRNA species dictates ceRNA regulation.

Nucleic Acids Res 2018 05;46(9):4354-4369

Department of Systems Biology, Institute for Cancer Genetics, Herbert Irving Comprehensive Cancer Center, Center for Computational Biology and Bioinformatics, Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY 10032, USA.

microRNAs (miRNAs) play key roles in cancer, but their propensity to couple their targets as competing endogenous RNAs (ceRNAs) has only recently emerged. Multiple models have studied ceRNA regulation, but these models did not account for the effects of co-regulation by miRNAs with many targets. We modeled ceRNA and simulated its effects using established parameters for miRNA/mRNA interaction kinetics while accounting for co-regulation by multiple miRNAs with many targets. Our simulations suggested that co-regulation by many miRNA species is more likely to produce physiologically relevant context-independent couplings. To test this, we studied the overlap of inferred ceRNA networks from four tumor contexts-our proposed pan-cancer ceRNA interactome (PCI). PCI was composed of interactions between genes that were co-regulated by nearly three-times as many miRNAs as other inferred ceRNA interactions. Evidence from expression-profiling datasets suggested that PCI interactions are predictive of gene expression in 12 independent tumor- and non-tumor contexts. Biochemical assays confirmed ceRNA couplings for two PCI subnetworks, including oncogenes CCND1, HIF1A and HMGA2, and tumor suppressors PTEN, RB1 and TP53. Our results suggest that PCI is enriched for context-independent interactions that are coupled by many miRNA species and are more likely to be context independent.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky286DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961349PMC
May 2018

A Comprehensive Pan-Cancer Molecular Study of Gynecologic and Breast Cancers.

Cancer Cell 2018 04 2;33(4):690-705.e9. Epub 2018 Apr 2.

Department of Epidemiology and Comprehensive Cancer Center, University of Alabama at Birmingham, Birmingham, AL 35294, USA.

We analyzed molecular data on 2,579 tumors from The Cancer Genome Atlas (TCGA) of four gynecological types plus breast. Our aims were to identify shared and unique molecular features, clinically significant subtypes, and potential therapeutic targets. We found 61 somatic copy-number alterations (SCNAs) and 46 significantly mutated genes (SMGs). Eleven SCNAs and 11 SMGs had not been identified in previous TCGA studies of the individual tumor types. We found functionally significant estrogen receptor-regulated long non-coding RNAs (lncRNAs) and gene/lncRNA interaction networks. Pathway analysis identified subtypes with high leukocyte infiltration, raising potential implications for immunotherapy. Using 16 key molecular features, we identified five prognostic subtypes and developed a decision tree that classified patients into the subtypes based on just six features that are assessable in clinical laboratories.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ccell.2018.03.014DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5959730PMC
April 2018

Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context.

Cell Rep 2018 04;23(1):297-312.e12

Texas Children's Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA. Electronic address:

Long noncoding RNAs (lncRNAs) are commonly dysregulated in tumors, but only a handful are known to play pathophysiological roles in cancer. We inferred lncRNAs that dysregulate cancer pathways, oncogenes, and tumor suppressors (cancer genes) by modeling their effects on the activity of transcription factors, RNA-binding proteins, and microRNAs in 5,185 TCGA tumors and 1,019 ENCODE assays. Our predictions included hundreds of candidate onco- and tumor-suppressor lncRNAs (cancer lncRNAs) whose somatic alterations account for the dysregulation of dozens of cancer genes and pathways in each of 14 tumor contexts. To demonstrate proof of concept, we showed that perturbations targeting OIP5-AS1 (an inferred tumor suppressor) and TUG1 and WT1-AS (inferred onco-lncRNAs) dysregulated cancer genes and altered proliferation of breast and gynecologic cancer cells. Our analysis indicates that, although most lncRNAs are dysregulated in a tumor-specific manner, some, including OIP5-AS1, TUG1, NEAT1, MEG3, and TSIX, synergistically dysregulate cancer pathways in multiple tumor contexts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2018.03.064DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5906131PMC
April 2018

Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas.

Cell Rep 2018 04;23(1):194-212.e6

Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.

This integrated, multiplatform PanCancer Atlas study co-mapped and identified distinguishing molecular features of squamous cell carcinomas (SCCs) from five sites associated with smoking and/or human papillomavirus (HPV). SCCs harbor 3q, 5p, and other recurrent chromosomal copy-number alterations (CNAs), DNA mutations, and/or aberrant methylation of genes and microRNAs, which are correlated with the expression of multi-gene programs linked to squamous cell stemness, epithelial-to-mesenchymal differentiation, growth, genomic integrity, oxidative damage, death, and inflammation. Low-CNA SCCs tended to be HPV(+) and display hypermethylation with repression of TET1 demethylase and FANCF, previously linked to predisposition to SCC, or harbor mutations affecting CASP8, RAS-MAPK pathways, chromatin modifiers, and immunoregulatory molecules. We uncovered hypomethylation of the alternative promoter that drives expression of the ΔNp63 oncogene and embedded miR944. Co-expression of immune checkpoint, T-regulatory, and Myeloid suppressor cells signatures may explain reduced efficacy of immune therapy. These findings support possibilities for molecular classification and therapeutic approaches.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.celrep.2018.03.063DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6002769PMC
April 2018

High-throughput validation of ceRNA regulatory networks.

BMC Genomics 2017 05 30;18(1):418. Epub 2017 May 30.

Columbia Department of Systems Biology, Center for Computational Biology and Bioinformatics, Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, 10032, USA.

Background: MicroRNAs (miRNAs) play multiple roles in tumor biology. Interestingly, reports from multiple groups suggest that miRNA targets may be coupled through competitive stoichiometric sequestration. Specifically, computational models predicted and experimental assays confirmed that miRNA activity is dependent on miRNA target abundance, and consequently, changes in the abundance of some miRNA targets lead to changes to the regulation and abundance of their other targets. The resulting indirect regulatory influence between miRNA targets resembles competition and has been dubbed competitive endogenous RNA (ceRNA). Recent studies have questioned the physiological relevance of ceRNA interactions, our ability to accurately predict these interactions, and the number of genes that are impacted by ceRNA interactions in specific cellular contexts.

Results: To address these concerns, we reverse engineered ceRNA networks (ceRNETs) in breast and prostate adenocarcinomas using context-specific TCGA profiles, and tested whether ceRNA interactions can predict the effects of RNAi-mediated gene silencing perturbations in PC3 and MCF7 cells._ENREF_22 Our results, based on tests of thousands of inferred ceRNA interactions that are predicted to alter hundreds of cancer genes in each of the two tumor contexts, confirmed statistically significant effects for half of the predicted targets.

Conclusions: Our results suggest that the expression of a significant fraction of cancer genes may be regulated by ceRNA interactions in each of the two tumor contexts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-017-3790-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5450082PMC
May 2017

Cupid: simultaneous reconstruction of microRNA-target and ceRNA networks.

Genome Res 2015 Feb 5;25(2):257-67. Epub 2014 Nov 5.

Texas Children's Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA;

We introduce a method for simultaneous prediction of microRNA-target interactions and their mediated competitive endogenous RNA (ceRNA) interactions. Using high-throughput validation assays in breast cancer cell lines, we show that our integrative approach significantly improves on microRNA-target prediction accuracy as assessed by both mRNA and protein level measurements. Our biochemical assays support nearly 500 microRNA-target interactions with evidence for regulation in breast cancer tumors. Moreover, these assays constitute the most extensive validation platform for computationally inferred networks of microRNA-target interactions in breast cancer tumors, providing a useful benchmark to ascertain future improvements.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.178194.114DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4315299PMC
February 2015

An extensive microRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma.

Cell 2011 Oct;147(2):370-81

Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA.

By analyzing gene expression data in glioblastoma in combination with matched microRNA profiles, we have uncovered a posttranscriptional regulation layer of surprising magnitude, comprising more than 248,000 microRNA (miR)-mediated interactions. These include ∼7,000 genes whose transcripts act as miR "sponges" and 148 genes that act through alternative, nonsponge interactions. Biochemical analyses in cell lines confirmed that this network regulates established drivers of tumor initiation and subtype implementation, including PTEN, PDGFRA, RB1, VEGFA, STAT3, and RUNX1, suggesting that these interactions mediate crosstalk between canonical oncogenic pathways. siRNA silencing of 13 miR-mediated PTEN regulators, whose locus deletions are predictive of PTEN expression variability, was sufficient to downregulate PTEN in a 3'UTR-dependent manner and to increase tumor cell growth rates. Thus, miR-mediated interactions provide a mechanistic, experimentally validated rationale for the loss of PTEN expression in a large number of glioma samples with an intact PTEN locus.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2011.09.041DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3214599PMC
October 2011

PSLDoc: Protein subcellular localization prediction based on gapped-dipeptides and probabilistic latent semantic analysis.

Proteins 2008 Aug;72(2):693-710

Bioinformatics Lab, Institute of Information Science, Academia Sinica, Taipei, Taiwan.

Prediction of protein subcellular localization (PSL) is important for genome annotation, protein function prediction, and drug discovery. Many computational approaches for PSL prediction based on protein sequences have been proposed in recent years for Gram-negative bacteria. We present PSLDoc, a method based on gapped-dipeptides and probabilistic latent semantic analysis (PLSA) to solve this problem. A protein is considered as a term string composed by gapped-dipeptides, which are defined as any two residues separated by one or more positions. The weighting scheme of gapped-dipeptides is calculated according to a position specific score matrix, which includes sequence evolutionary information. Then, PLSA is applied for feature reduction, and reduced vectors are input to five one-versus-rest support vector machine classifiers. The localization site with the highest probability is assigned as the final prediction. It has been reported that there is a strong correlation between sequence homology and subcellular localization (Nair and Rost, Protein Sci 2002;11:2836-2847; Yu et al., Proteins 2006;64:643-651). To properly evaluate the performance of PSLDoc, a target protein can be classified into low- or high-homology data sets. PSLDoc's overall accuracy of low- and high-homology data sets reaches 86.84% and 98.21%, respectively, and it compares favorably with that of CELLO II (Yu et al., Proteins 2006;64:643-651). In addition, we set a confidence threshold to achieve a high precision at specified levels of recall rates. When the confidence threshold is set at 0.7, PSLDoc achieves 97.89% in precision which is considerably better than that of PSORTb v.2.0 (Gardy et al., Bioinformatics 2005;21:617-623). Our approach demonstrates that the specific feature representation for proteins can be successfully applied to the prediction of protein subcellular localization and improves prediction accuracy. Besides, because of the generality of the representation, our method can be extended to eukaryotic proteomes in the future. The web server of PSLDoc is publicly available at http://bio-cluster.iis.sinica.edu.tw/~ bioapp/PSLDoc/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.21944DOI Listing
August 2008

Enhanced membrane protein topology prediction using a hierarchical classification method and a new scoring function.

J Proteome Res 2008 Feb;7(2):487-96

Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan.

The prediction of transmembrane (TM) helix and topology provides important information about the structure and function of a membrane protein. Due to the experimental difficulties in obtaining a high-resolution model, computational methods are highly desirable. In this paper, we present a hierarchical classification method using support vector machines (SVMs) that integrates selected features by capturing the sequence-to-structure relationship and developing a new scoring function based on membrane protein folding. The proposed approach is evaluated on low- and high-resolution data sets with cross-validation, and the topology (sidedness) prediction accuracy reaches as high as 90%. Our method is also found to correctly predict both the location of TM helices and the topology for 69% of the low-resolution benchmark set. We also test our method for discrimination between soluble and membrane proteins and achieve very low overall false positive (0.5%) and false negative rates (0 to approximately 1.2%). Lastly, the analysis of the scoring function suggests that the topogeneses of single-spanning and multispanning TM proteins have different levels of complexity, and the consideration of interloop topogenic interactions for the latter is the key to achieving better predictions. This method can facilitate the annotation of membrane proteomes to extract useful structural and functional information. It is publicly available at http://bio-cluster.iis.sinica.edu.tw/~bioapp/SVMtop.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/pr0702058DOI Listing
February 2008

Protein subcellular localization prediction based on compartment-specific features and structure conservation.

BMC Bioinformatics 2007 Sep 8;8:330. Epub 2007 Sep 8.

Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan.

Background: Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins.

Results: We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%.

Conclusion: Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant improvement. The biological features are interpretable and can be applied in advanced analyses and experimental designs. Moreover, the overall accuracy of combining the structural homology approach is further improved, which suggests that structural conservation could be a useful indicator for inferring localization in addition to sequence homology. The proposed method can be used in large-scale analyses of proteomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-8-330DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2040162PMC
September 2007

Protein subcellular localization prediction based on compartment-specific biological features.

Comput Syst Bioinformatics Conf 2006 :325-30

Bioinformatics Lab., Institute of Information Science, Academia Sinica, Taipei, Taiwan.

Prediction of subcellular localization of proteins is important for genome annotation, protein function prediction, and drug discovery. We present a prediction method for Gram-negative bacteria that uses ten one-versus-one support vector machine (SVM) classifiers, where compartment-specific biological features are selected as input to each SVM classifier. The final prediction of localization sites is determined by integrating the results from ten binary classifiers using a combination of majority votes and a probabilistic method. The overall accuracy reaches 91.4%, which is 1.6% better than the state-of-the-art system, in a ten-fold cross-validation evaluation on a benchmark data set. We demonstrate that feature selection guided by biological knowledge and insights in one-versus-one SVM classifiers can lead to a significant improvement in the prediction performance. Our model is also used to produce highly accurate prediction of 92.8% overall accuracy for proteins of dual localizations.
View Article and Find Full Text PDF

Download full-text PDF

Source
June 2007

Transmembrane helix and topology prediction using hierarchical SVM classifiers and an alternating geometric scoring function.

Comput Syst Bioinformatics Conf 2006 :31-42

Bioinformatics Lab., Institute of Information Science, Academia Sinica, Taipei, Taiwan.

Motivation: A key class of membrane proteins contains one or more transmembrane (TM) helices, traversing the membrane lipid bilayer. Various properties such as the length, arrangement and topology or orientation of TM helices, are closely related to a protein's functions. Although a range of methods have been developed to predict TM helices and their topologies, no single method consistently outperforms the others. In addition, topology prediction has much lower accuracy than helix prediction, and thus requires continuous improvements.

Results: We develop a method based on support vector machines (SVM) in a hierarchical framework to predict TM helices first, followed by their topology. By partitioning the prediction problem into two steps, specific input features can be selected and integrated in each step. We also propose a novel scoring function for topology models based on membrane protein folding process. When benchmarked against other methods in terms of performance, our approach achieves the highest scores at 86% in helix prediction (Q(2)) and 91% in topology prediction (TOPO) for the high-resolution data set, resulting in an improvement of 6% and 14% in their respective categories over the second best method. Furthermore, we demonstrate the ability of our method to discriminate between membrane and non-membrane proteins, with higher than 99% in accuracy. When tested on a small set of newly solved structures of membrane proteins, our method overcomes some of the difficulties in predicting TM helices by incorporating multiple biological input features.
View Article and Find Full Text PDF

Download full-text PDF

Source
June 2007