Publications by authors named "Jun-Tao Guo"

44 Publications

A comparative study of protein-ssDNA interactions.

NAR Genom Bioinform 2021 Mar 23;3(1):lqab006. Epub 2021 Feb 23.

Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA.

Single-stranded DNA-binding proteins (SSBs) play crucial roles in DNA replication, recombination and repair, and serve as key players in the maintenance of genomic stability. While a number of SSBs bind single-stranded DNA (ssDNA) non-specifically, the others recognize and bind specific ssDNA sequences. The mechanisms underlying this binding discrepancy, however, are largely unknown. Here, we present a comparative study of protein-ssDNA interactions by annotating specific and non-specific SSBs and comparing structural features such as DNA-binding propensities and secondary structure types of residues in SSB-ssDNA interactions, protein-ssDNA hydrogen bonding and π-π interactions between specific and non-specific SSBs. Our results suggest that protein side chain-DNA base hydrogen bonds are the major contributors to protein-ssDNA binding specificity, while π-π interactions may mainly contribute to binding affinity. We also found the enrichment of aspartate in the specific SSBs, a key feature in specific protein-double-stranded DNA (dsDNA) interactions as reported in our previous study. In addition, no significant differences between specific and non-specific groups with respect of conformational changes upon ssDNA binding were found, suggesting that the flexibility of SSBs plays a lesser role than that of dsDNA-binding proteins in conferring binding specificity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nargab/lqab006DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7902235PMC
March 2021

Comparative assessments of indel annotations in healthy and cancer genomes with next-generation sequencing data.

BMC Med Genomics 2020 11 10;13(1):170. Epub 2020 Nov 10.

Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA.

Background: Insertion and deletion (indel) is one of the major variation types in human genomes. Accurate annotation of indels is of paramount importance in genetic variation analysis and investigation of their roles in human diseases. Previous studies revealed a high number of false positives from existing indel calling methods, which limits downstream analyses of the effects of indels on both healthy and disease genomes. In this study, we evaluated seven commonly used general indel calling programs for germline indels and four somatic indel calling programs through comparative analysis to investigate their common features and differences and to explore ways to improve indel annotation accuracy.

Methods: In our comparative analysis, we adopted a more stringent evaluation approach by considering both the indel positions and the indel types (insertion or deletion sequences) between the samples and the reference set. In addition, we applied an efficient way to use a benchmark for improved performance comparisons for the general indel calling programs RESULTS: We found that germline indels in healthy genomes derived by combining several indel calling tools could help remove a large number of false positive indels from individual programs without compromising the number of true positives. The performance comparisons of somatic indel calling programs are more complicated due to the lack of a reliable and comprehensive benchmark. Nevertheless our results revealed large variations among the programs and among cancer types.

Conclusions: While more accurate indel calling programs are needed, we found that the performance for germline indel annotations can be improved by combining the results from several programs. In addition, well-designed benchmarks for both germline and somatic indels are key in program development and evaluations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12920-020-00818-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7653722PMC
November 2020

New insights into protein-DNA binding specificity from hydrogen bond based comparative study.

Nucleic Acids Res 2019 12;47(21):11103-11113

Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA.

Knowledge of protein-DNA binding specificity has important implications in understanding DNA metabolism, transcriptional regulation and developing therapeutic drugs. Previous studies demonstrated hydrogen bonds between amino acid side chains and DNA bases play major roles in specific protein-DNA interactions. In this paper, we investigated the roles of individual DNA strands and protein secondary structure types in specific protein-DNA recognition based on side chain-base hydrogen bonds. By comparing the contribution of each DNA strand to the overall binding specificity between DNA-binding proteins with different degrees of binding specificity, we found that highly specific DNA-binding proteins show balanced hydrogen bonding with each of the two DNA strands while multi-specific DNA binding proteins are generally biased towards one strand. Protein-base pair hydrogen bonds, in which both bases of a base pair are involved in forming hydrogen bonds with amino acid side chains, are more prevalent in the highly specific protein-DNA complexes than those in the multi-specific group. Amino acids involved in side chain-base hydrogen bonds favor strand and coil secondary structure types in highly specific DNA-binding proteins while multi-specific DNA-binding proteins prefer helices.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz963DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6868434PMC
December 2019

An SVM-based method for assessment of transcription factor-DNA complex models.

BMC Bioinformatics 2018 Dec 21;19(Suppl 20):506. Epub 2018 Dec 21.

Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA.

Background: Atomic details of protein-DNA complexes can provide insightful information for better understanding of the function and binding specificity of DNA binding proteins. In addition to experimental methods for solving protein-DNA complex structures, protein-DNA docking can be used to predict native or near-native complex models. A docking program typically generates a large number of complex conformations and predicts the complex model(s) based on interaction energies between protein and DNA. However, the prediction accuracy is hampered by current approaches to model assessment, especially when docking simulations fail to produce any near-native models.

Results: We present here a Support Vector Machine (SVM)-based approach for quality assessment of the predicted transcription factor (TF)-DNA complex models. Besides a knowledge-based protein-DNA interaction potential DDNA3, we applied several structural features that have been shown to play important roles in binding specificity between transcription factors and DNA molecules to quality assessment of complex models. To address the issue of unbalanced positive and negative cases in the training dataset, we applied hard-negative mining, an iterative training process that selects an initial training dataset by combining all of the positive cases and a random sample from the negative cases. Results show that the SVM model greatly improves prediction accuracy (84.2%) over two knowledge-based protein-DNA interaction potentials, orientation potential (60.8%) and DDNA3 (68.4%). The improvement is achieved through reducing the number of false positive predictions, especially for the hard docking cases, in which a docking algorithm fails to produce any near-native complex models.

Conclusions: A learning-based SVM scoring model with structural features for specific protein-DNA binding and an atomic-level protein-DNA interaction potential DDNA3 significantly improves prediction accuracy of complex models by successfully identifying cases without near-native structural models.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-018-2538-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302363PMC
December 2018

Effects of short indels on protein structure and function in human genomes.

Sci Rep 2017 08 24;7(1):9313. Epub 2017 Aug 24.

Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.

Insertions and deletions (indels) represent the second most common type of genetic variations in human genomes. Indels can be deleterious and contribute to disease susceptibility as recent genome sequencing projects revealed a large number of indels in various cancer types. In this study, we investigated the possible effects of small coding indels on protein structure and function, and the baseline characteristics of indels in 2504 individuals of 26 populations from the 1000 Genomes Project. We found that each population has a distinct pattern in genes with small indels. Frameshift (FS) indels are enriched in olfactory receptor activity while non-frameshift (NFS) indels are enriched in transcription-related proteins. Structural analysis of NFS indels revealed that they predominantly adopt coil or disordered conformations, especially in proteins with transcription-related NFS indels. These results suggest that the annotated coding indels from the 1000 Genomes Project, while contributing to genetic variations and phenotypic diversity, generally do not affect the core protein structures and have no deleterious effect on essential biological processes. In addition, we found that a number of reference genome annotations might need to be updated due to the high prevalence of annotated homozygous indels in the general population.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-017-09287-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5570956PMC
August 2017

An efficient algorithm for improving structure-based prediction of transcription factor binding sites.

BMC Bioinformatics 2017 Jul 17;18(1):342. Epub 2017 Jul 17.

Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA.

Background: Gene expression is regulated by transcription factors binding to specific target DNA sites. Understanding how and where transcription factors bind at genome scale represents an essential step toward our understanding of gene regulation networks. Previously we developed a structure-based method for prediction of transcription factor binding sites using an integrative energy function that combines a knowledge-based multibody potential and two atomic energy terms. While the method performs well, it is not computationally efficient due to the exponential increase in the number of binding sequences to be evaluated for longer binding sites. In this paper, we present an efficient pentamer algorithm by splitting DNA binding sequences into overlapping fragments along with a simplified integrative energy function for transcription factor binding site prediction.

Results: A DNA binding sequence is split into overlapping pentamers (5 base pairs) for calculating transcription factor-pentamer interaction energy. To combine the results from overlapping pentamer scores, we developed two methods, Kmer-Sum and PWM (Position Weight Matrix) stacking, for full-length binding motif prediction. Our results show that both Kmer-Sum and PWM stacking in the new pentamer approach along with a simplified integrative energy function improved transcription factor binding site prediction accuracy and dramatically reduced computation time, especially for longer binding sites.

Conclusion: Our new fragment-based pentamer algorithm and simplified energy function improve both efficiency and accuracy. To our knowledge, this is the first fragment-based method for structure-based transcription factor binding sites prediction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-017-1755-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5514533PMC
July 2017

Chlorophyll-Catalyzed Visible-Light-Mediated Synthesis of Tetrahydroquinolines from N,N-Dimethylanilines and Maleimides.

J Org Chem 2017 02 5;82(4):1888-1894. Epub 2017 Feb 5.

Key Laboratory of Applied Chemistry of Chongqing Municipality, School of Chemistry and Chemical Engineering, Southwest University , Chongqing 400715, PR China.

Natural pigment chlorophyll was used as a green photosensitizer for the first time in a visible-light photoredox catalysis for the efficient synthesis of tetrahydroquinolines from N,N-dimethylanilines and maleimides in an air atmosphere. The reaction involves direct cyclization via an sp C-H bond functionalization process to afford products in moderate to high yields (61-98%) from a wide range of substrates with a low loading of chlorophyll under mild conditions. This work demonstrates the potential benefits of chlorophyll as photosensitizer in visible light catalysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.joc.6b03034DOI Listing
February 2017

Structure-based prediction of transcription factor binding specificity using an integrative energy function.

Bioinformatics 2016 06;32(12):i306-i313

Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA.

Unlabelled: Transcription factors (TFs) regulate gene expression through binding to specific target DNA sites. Accurate annotation of transcription factor binding sites (TFBSs) at genome scale represents an essential step toward our understanding of gene regulation networks. In this article, we present a structure-based method for computational prediction of TFBSs using a novel, integrative energy (IE) function. The new energy function combines a multibody (MB) knowledge-based potential and two atomic energy terms (hydrogen bond and π interaction) that might not be accurately captured by the knowledge-based potential owing to the mean force nature and low count problem. We applied the new energy function to the TFBS prediction using a non-redundant dataset that consists of TFs from 12 different families. Our results show that the new IE function improves the prediction accuracy over the knowledge-based, statistical potentials, especially for homeodomain TFs, the second largest TF family in mammals.

Contact: jguo4@uncc.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btw264DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908348PMC
June 2016

Statistical analysis of structural determinants for protein-DNA-binding specificity.

Proteins 2016 08 15;84(8):1147-61. Epub 2016 Jun 15.

Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, Charlotte, North Carolina, 28223.

DNA-binding proteins play critical roles in biological processes including gene expression, DNA packaging and DNA repair. They bind to DNA target sequences with different degrees of binding specificity, ranging from highly specific (HS) to nonspecific (NS). Alterations of DNA-binding specificity, due to either genetic variation or somatic mutations, can lead to various diseases. In this study, a comparative analysis of protein-DNA complex structures was carried out to investigate the structural features that contribute to binding specificity. Protein-DNA complexes were grouped into three general classes based on degrees of binding specificity: HS, multispecific (MS), and NS. Our results show a clear trend of structural features among the three classes, including amino acid binding propensities, simple and complex hydrogen bonds, major/minor groove and base contacts, and DNA shape. We found that aspartate is enriched in HS DNA binding proteins and predominately binds to a cytosine through a single hydrogen bond or two consecutive cytosines through bidentate hydrogen bonds. Aromatic residues, histidine and tyrosine, are highly enriched in the HS and MS groups and may contribute to specific binding through different mechanisms. To further investigate the role of protein flexibility in specific protein-DNA recognition, we analyzed the conformational changes between the bound and unbound states of DNA-binding proteins and structural variations. The results indicate that HS and MS DNA-binding domains have larger conformational changes upon DNA-binding and larger degree of flexibility in both bound and unbound states. Proteins 2016; 84:1147-1161. © 2016 Wiley Periodicals, Inc.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.25061DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4945413PMC
August 2016

Investigation of arc repressor DNA-binding specificity by comparative molecular dynamics simulations.

J Biomol Struct Dyn 2015 14;33(10):2083-93. Epub 2015 Jan 14.

a Department of Bioinformatics and Genomics , University of North Carolina at Charlotte , Charlotte , NC 28223 , USA.

Transcription factors regulate gene expression through binding to specific DNA sequences. How transcription factors achieve high binding specificity is still not well understood. In this paper, we investigated the role of protein flexibility in protein-DNA-binding specificity by comparative molecular dynamics (MD) simulations. Protein flexibility has been considered as a key factor in molecular recognition, which is intrinsically a dynamic process involving fine structural fitting between binding components. In this study, we performed comparative MD simulations on wild-type and F10V mutant P22 Arc repressor in both free and complex conformations. The F10V mutant has lower DNA-binding specificity though both the bound and unbound main-chain structures between the wild-type and F10V mutant Arc are highly similar. We found that the DNA-binding motif of wild-type Arc is structurally more flexible than the F10V mutant in the unbound state, especially for the six DNA base-contacting residues in each dimer. We demonstrated that the flexible side chains of wild-type Arc lead to a higher DNA-binding specificity through forming more hydrogen bonds with DNA bases upon binding. Our simulations also showed a possible conformational selection mechanism for Arc-DNA binding. These results indicate the important roles of protein flexibility and dynamic properties in protein-DNA-binding specificity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/07391102.2014.997797DOI Listing
May 2016

Semisynthesis and in vitro cytotoxic evaluation of new analogues of 1-O-acetylbritannilactone, a sesquiterpene from Inula britannica.

Eur J Med Chem 2014 Jun 13;80:71-82. Epub 2014 Apr 13.

Shaanxi Engineering Center of Bioresource Chemistry & Sustainable Utilization, College of Science, Northwest A&F University, Yangling 712100, China. Electronic address:

Semisynthetic analogues of the natural product 1-O-acetylbritannilactone (ABL), a sesquiterpene isolated from the medicinal plant Inula britannica, have been prepared and exhibited significant in vitro cytotoxic activities against four cell lines including three human cancer cell lines (HCT116, HEp-2 and HeLa) and one normal hamster cell line (CHO). Structure-activity relationships indicate that esterification of 6-OH (enhanced lipophilicity) and α-methylene-γ-lactone functionalities play important roles in conferring cytotoxicity. Among the tested compounds, 14 bearing a lauroyl group (12C) at the 6-OH position displayed most potent in vitro cytotoxic activity, with IC50 values between 2.91 and 6.78 μM, comparable to the positive control etoposide (VP-16, IC50 values between 2.13 and 4.79 μM). Moreover, the compound 14 triggered remarkable apoptosis at a low concentration, and induced cell cycle arrest in G2/M phase in HCT116 cells. The biological assays conducted with normal cells (CHO) revealed that all the synthetic compounds are no selective against cancer cell lines tested.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ejmech.2014.04.028DOI Listing
June 2014

Direct activation of human and mouse Oct4 genes using engineered TALE and Cas9 transcription factors.

Nucleic Acids Res 2014 Apr 5;42(7):4375-90. Epub 2014 Feb 5.

Key Laboratory for Regenerative Medicine, Ministry of Education, School of Biomedical Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China, SBS Core Laboratory, CUHK Shenzhen Research Institute, Shenzhen, China, Bone Marrow Transplantation Centre, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province, China, Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA, Advanced Biomedical Computing Center, National Cancer Institute, National Institutes of Health, Frederick, MD 21702, USA, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR, China and Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Hong Kong SAR, China.

The newly developed transcription activator-like effector protein (TALE) and clustered regularly interspaced short palindromic repeats/Cas9 transcription factors (TF) offered a powerful and precise approach for modulating gene expression. In this article, we systematically investigated the potential of these new tools in activating the stringently silenced pluripotency gene Oct4 (Pou5f1) in mouse and human somatic cells. First, with a number of TALEs and sgRNAs targeting various regions in the mouse and human Oct4 promoters, we found that the most efficient TALE-VP64s bound around -120 to -80 bp, while highly effective sgRNAs targeted from -147 to -89-bp upstream of the transcription start sites to induce high activity of luciferase reporters. In addition, we observed significant transcriptional synergy when multiple TFs were applied simultaneously. Although individual TFs exhibited marginal activity to up-regulate endogenous gene expression, optimized combinations of TALE-VP64s could enhance endogenous Oct4 transcription up to 30-fold in mouse NIH3T3 cells and 20-fold in human HEK293T cells. More importantly, the enhancement of OCT4 transcription ultimately generated OCT4 proteins. Furthermore, examination of different epigenetic modifiers showed that histone acetyltransferase p300 could enhance both TALE-VP64 and sgRNA/dCas9-VP64 induced transcription of endogenous OCT4. Taken together, our study suggested that engineered TALE-TF and dCas9-TF are useful tools for modulating gene expression in mammalian cells.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku109DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3985678PMC
April 2014

Towards comprehensive analysis of protein family quantitative stability-flexibility relationships using homology models.

Methods Mol Biol 2014 ;1084:239-54

Department of Bioinformatics and Genomics, University of North Carolina, Charlotte, NC, USA.

The Distance Constraint Model (DCM) is a computational modeling scheme that uniquely integrates thermodynamic and mechanical descriptions of protein structure. As such, quantitative stability-flexibility relationships (QSFR) that describe the interrelationships of thermodynamics and mechanics can be quickly computed. Using comparative QSFR analyses, we have previously investigated these relationships across a small number of protein orthologs, ranging from two to a dozen [1, 2]. However, our ultimate goal is provide a comprehensive analysis of whole protein families, which requires consideration of many more structures. To that end, we have developed homology modeling and assessment protocols so that we can robustly calculate QSFR properties for proteins without experimentally derived structures. The approach, which is presented here, starts from a large ensemble of potential homology models and uses a clustering algorithm to identify the best models, thus paving the way for a comprehensive QSFR analysis across hundreds of proteins in a protein family.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-62703-658-0_13DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4676804PMC
April 2014

Porphyrin and heme metabolism and the porphyrias.

Compr Physiol 2013 Jan;3(1):365-401

Department of Medicine and The Liver-Biliary-Pancreatic Center, Carolinas Medical Center, Charlotte, North Carolina, USA.

Porphyrins and metalloporphyrins are the key pigments of life on earth as we know it, because they include chlorophyll (a magnesium-containing metalloporphyrin) and heme (iron protoporphyrin). In eukaryotes, porphyrins and heme are synthesized by a multistep pathway that involves eight enzymes. The first and rate-controlling step is the formation of delta-aminolevulinic acid (ALA) from glycine plus succinyl CoA, catalyzed by ALA synthase. Intermediate steps occur in the cytoplasm, with formation of the monopyrrole porphobilinogen and the tetrapyrroles hydroxymethylbilane and a series of porphyrinogens, which are serially decarboxylated. Heme is utilized chiefly for the formation of hemoglobin in erythrocytes, myoglobin in muscle cells, cytochromes P-450 and mitochondrial cytochromes, and other hemoproteins in hepatocytes. The rate-controlling step of heme breakdown is catalyzed by heme oxygenase (HMOX), of which there are two isoforms, called HMOX1 and HMOX2. HMOX breaks down heme to form biliverdin, carbon monoxide, and iron. The porphyrias are a group of disorders, mainly inherited, in which there are defects in normal porphyrin and heme synthesis. The cardinal clinical features are cutaneous (due to the skin-damaging effects of excess deposited porphyrins) or neurovisceral attacks of pain, sometimes with weakness, delirium, seizures, and the like (probably due mainly to neurotoxic effects of ALA). The treatment of choice for the acute hepatic porphyrias is intravenous heme therapy, which repletes a critical regulatory heme pool in hepatocytes and leads to downregulation of hepatic ALA synthase, which is a biochemical hallmark of all forms of acute porphyria in relapse.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cphy.c120006DOI Listing
January 2013

A knowledge-based orientation potential for transcription factor-DNA docking.

Bioinformatics 2013 Feb 5;29(3):322-30. Epub 2012 Dec 5.

Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA.

Motivation: Computational modeling of protein-DNA complexes remains a challenging problem in structural bioinformatics. One of the key factors for a successful protein-DNA docking is a potential function that can accurately discriminate the near-native structures from decoy complexes and at the same time make conformational sampling more efficient. Here, we developed a novel orientation-dependent, knowledge-based, residue-level potential for improving transcription factor (TF)-DNA docking.

Results: We demonstrated the performance of this new potential in TF-DNA binding affinity prediction, discrimination of native protein-DNA complex from decoy structures, and most importantly in rigid TF-DNA docking. The rigid TF-DNA docking with the new orientation potential, on a benchmark of 38 complexes, successfully predicts 42% of the cases with root mean square deviations lower than 1 Å and 55% of the cases with root mean square deviations lower than 3 Å. The results suggest that docking with this new orientation-dependent, coarse-grained statistical potential can achieve high-docking accuracy and can serve as a crucial first step in multi-stage flexible protein-DNA docking.

Availability And Implementation: The new potential is available at http://bioinfozen.uncc.edu/Protein_DNA_orientation_potential.tar.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bts699DOI Listing
February 2013

Contribution of V(H) replacement products to the generation of anti-HIV antibodies.

Clin Immunol 2013 Jan 15;146(1):46-55. Epub 2012 Nov 15.

Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE 68198, USA.

V(H) replacement occurs through RAG-mediated secondary recombination to change unwanted IgH genes and diversify antibody repertoire. The biological significance of V(H) replacement remains to be explored. Here, we show that V(H) replacement products are highly enriched in IgH genes encoding anti-HIV antibodies, including anti-gp41, anti-V3 loop, anti-gp120, CD4i, and PGT antibodies. In particular, 73% of the CD4i antibodies and 100% of the PGT antibodies are encoded by potential VH replacement products. Such frequencies are significantly higher than those in IgH genes derived from HIV infected individuals or autoimmune patients. The identified V(H) replacement products encoding anti-HIV antibodies are highly mutated; the V(H) replacement "footprints" within CD4i antibodies preferentially encode negatively charged amino acids within the IgH CDR3; many IgH encoding PGT antibodies are likely generated from multiple rounds of V(H) replacement. Taken together, these findings uncovered a potentially significant contribution of V(H) replacement products to the generation of anti-HIV antibodies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.clim.2012.11.003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3649862PMC
January 2013

TFinDit: transcription factor-DNA interaction data depository.

BMC Bioinformatics 2012 Sep 3;13:220. Epub 2012 Sep 3.

Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA.

Background: One of the crucial steps in regulation of gene expression is the binding of transcription factor(s) to specific DNA sequences. Knowledge of the binding affinity and specificity at a structural level between transcription factors and their target sites has important implications in our understanding of the mechanism of gene regulation. Due to their unique functions and binding specificity, there is a need for a transcription factor-specific, structure-based database and corresponding web service to facilitate structural bioinformatics studies of transcription factor-DNA interactions, such as development of knowledge-based interaction potential, transcription factor-DNA docking, binding induced conformational changes, and the thermodynamics of protein-DNA interactions.

Description: TFinDit is a relational database and a web search tool for studying transcription factor-DNA interactions. The database contains annotated transcription factor-DNA complex structures and related data, such as unbound protein structures, thermodynamic data, and binding sequences for the corresponding transcription factors in the complex structures. TFinDit also provides a user-friendly interface and allows users to either query individual entries or generate datasets through culling the database based on one or more search criteria.

Conclusions: TFinDit is a specialized structural database with annotated transcription factor-DNA complex structures and other preprocessed data. We believe that this database/web service can facilitate the development and testing of TF-DNA interaction potentials and TF-DNA docking algorithms, and the study of protein-DNA recognition mechanisms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-13-220DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3483241PMC
September 2012

High performance transcription factor-DNA docking with GPU computing.

Proteome Sci 2012 Jun 21;10 Suppl 1:S17. Epub 2012 Jun 21.

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia, 30332, USA.

Background: Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality.

Methods: In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems.

Results: The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design.

Conclusions: We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1477-5956-10-S1-S17DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3380734PMC
June 2012

Benchmarks for flexible and rigid transcription factor-DNA docking.

BMC Struct Biol 2011 Nov 1;11:45. Epub 2011 Nov 1.

Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, North Carolina, USA.

Background: Structural insight from transcription factor-DNA (TF-DNA) complexes is of paramount importance to our understanding of the affinity and specificity of TF-DNA interaction, and to the development of structure-based prediction of TF binding sites. Yet the majority of the TF-DNA complexes remain unsolved despite the considerable experimental efforts being made. Computational docking represents a promising alternative to bridge the gap. To facilitate the study of TF-DNA docking, carefully designed benchmarks are needed for performance evaluation and identification of the strengths and weaknesses of docking algorithms.

Results: We constructed two benchmarks for flexible and rigid TF-DNA docking respectively using a unified non-redundant set of 38 test cases. The test cases encompass diverse fold families and are classified into easy and hard groups with respect to the degrees of difficulty in TF-DNA docking. The major parameters used to classify expected docking difficulty in flexible docking are the conformational differences between bound and unbound TFs and the interaction strength between TFs and DNA. For rigid docking in which the starting structure is a bound TF conformation, only interaction strength is considered.

Conclusions: We believe these benchmarks are important for the development of better interaction potentials and TF-DNA docking algorithms, which bears important implications to structure-based prediction of transcription factor binding sites and drug design.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1472-6807-11-45DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3262759PMC
November 2011

Structural analysis of heme proteins: implications for design and prediction.

BMC Struct Biol 2011 Mar 3;11:13. Epub 2011 Mar 3.

Cannon Research Center, Carolinas Medical Center, Charlotte, NC 28203, USA.

Background: Heme is an essential molecule and plays vital roles in many biological processes. The structural determination of a large number of heme proteins has made it possible to study the detailed chemical and structural properties of heme binding environment. Knowledge of these characteristics can provide valuable guidelines in the design of novel heme proteins and help us predict unknown heme binding proteins.

Results: In this paper, we constructed a non-redundant dataset of 125 heme-binding protein chains and found that these heme proteins encompass at least 31 different structural folds with all-α class as the dominating scaffold. Heme binding pockets are enriched in aromatic and non-polar amino acids with fewer charged residues. The differences between apo and holo forms of heme proteins in terms of the structure and the binding pockets have been investigated. In most cases the proteins undergo small conformational changes upon heme binding. We also examined the CP (cysteine-proline) heme regulatory motifs and demonstrated that the conserved dipeptide has structural implications in protein-heme interactions.

Conclusions: Our analysis revealed that heme binding pockets show special features and that most of the heme proteins undergo small conformational changes after heme binding, suggesting the apo structures can be used for structure-based heme protein prediction and as scaffolds for future heme protein design.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1472-6807-11-13DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3059290PMC
March 2011

Systematic analysis of short internal indels and their impact on protein folding.

BMC Struct Biol 2010 Aug 4;10:24. Epub 2010 Aug 4.

Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte 9201 University City Blvd, Charlotte, NC 28223 USA.

Background: Protein sequence insertions/deletions (indels) can be introduced during evolution or through alternative splicing (AS). Alternative splicing is an important biological phenomenon and is considered as the major means of expanding structural and functional diversity in eukaryotes. Knowledge of the structural changes due to indels is critical to our understanding of the evolution of protein structure and function. In addition, it can help us probe the evolution of alternative splicing and the diversity of functional isoforms. However, little is known about the effects of indels, in particular the ones involving core secondary structures, on the folding of protein structures. The long term goal of our study is to accurately predict the protein AS isoform structures. As a first step towards this goal, we performed a systematic analysis on the structural changes caused by short internal indels through mining highly homologous proteins in Protein Data Bank (PDB).

Results: We compiled a non-redundant dataset of short internal indels (2-40 amino acids) from highly homologous protein pairs and analyzed the sequence and structural features of the indels. We found that about one third of indel residues are in disordered state and majority of the residues are exposed to solvent, suggesting that these indels are generally located on the surface of proteins. Though naturally occurring indels are fewer than engineered ones in the dataset, there are no statistically significant differences in terms of amino acid frequencies and secondary structure types between the "Natural" indels and "All" indels in the dataset. Structural comparisons show that all the protein pairs with short internal indels in the dataset preserve the structural folds and about 85% of protein pairs have global RMSDs (root mean square deviations) of 2A or less, suggesting that protein structures tend to be conserved and can tolerate short insertions and deletions. A few pairs with high RMSDs are results of relative domain positions of the proteins, probably due to the intrinsically dynamic nature of the proteins.

Conclusions: The analysis demonstrated that protein structures have the "plasticity" to tolerate short indels. This study can provide valuable guides in modeling protein AS isoform structures and homologous proteins with indels through placing the indels at the right locations since the accuracy of sequence alignments dictate model qualities in homology modeling.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1472-6807-10-24DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2924343PMC
August 2010

[Simple sequence repeat variation and small-scale spatial autocorrelation analysis on smooth-shell populations of Oncomelania hupensis in Sichuan province].

Zhonghua Liu Xing Bing Xue Za Zhi 2009 May;30(5):497-501

Department of Epidemiology, School of Public Health, Fudan University, Shanghai 200032, China.

Objective: To analysis the spatial autocorrelation on the small-scale distribution of the genetic variation in the population of Oncomelania hupensis in Puge county, Sichuan province, using simple sequence repeat (SSR) marker.

Methods: 5 pairs of SSR primer were used to amplify the genomic DNA of Oncomelania hupensis, and the alleles with frequency ranging from 15% to 85% were used to calculate Moran's I spatial autocorrelation coefficients in 14 distance band based on equal numbers of paired samples.

Results: A total of 274 alleles were scored by 5 pairs of SSR primer, the average polymorphic information content of the 274 alleles were 0.965 which indicated a high level of genetic diversity. 39 alleles showed different patterns of positive spatial autocorrelation of genetic variation, which was non-random spatial structure. When the distance band increased, the spatial auto-correlativity decreased based on the average Moran's I value at 14 distance band. The alleles which showed a negative spatial autocorrelation were not found in any distance band.

Conclusion: The spatial distribution of the genetic variation of SSR showed positive spatial autocorrelation in the population of Oncomelania hupensis, and the spatial auto-correlativity decreased with the increase of distance band.
View Article and Find Full Text PDF

Download full-text PDF

Source
May 2009

PDA: an automatic and comprehensive analysis program for protein-DNA complex structures.

BMC Genomics 2009 Jul 7;10 Suppl 1:S13. Epub 2009 Jul 7.

Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC 28223 USA.

Background: Knowledge of protein-DNA interactions at the structural-level can provide insights into the mechanisms of protein-DNA recognition and gene regulation. Although over 1400 protein-DNA complex structures have been deposited into Protein Data Bank (PDB), the structural details of protein-DNA interactions are generally not available. In addition, current approaches to comparison of protein-DNA complexes are mainly based on protein sequence similarity while the DNA sequences are not taken into account. With the number of experimentally-determined protein-DNA complex structures increasing, there is a need for an automatic program to analyze the protein-DNA complex structures and to provide comprehensive structural information for the benefit of the whole research community.

Results: We developed an automatic and comprehensive protein-DNA complex structure analysis program, PDA (for protein-DNA complex structure analyzer). PDA takes PDB files as inputs and performs structural analysis that includes 1) whole protein-DNA complex structure restoration, especially the reconstruction of double-stranded DNA structures; 2) an efficient new approach for DNA base-pair detection; 3) systematic annotation of protein-DNA interactions; and 4) extraction of DNA subsequences involved in protein-DNA interactions and identification of protein-DNA binding units. Protein-DNA complex structures in current PDB were processed and analyzed with our PDA program and the analysis results were stored in a database. A dataset useful for studying protein-DNA interactions involved in gene regulation was generated using both protein and DNA sequences as well as the contact information of the complexes. WebPDA was developed to provide a web interface for using PDA and for data retrieval.

Conclusion: PDA is a computational tool for structural annotations of protein-DNA complexes. It provides a useful resource for investigating protein-DNA interactions. Data from the PDA analysis can also facilitate the classification of protein-DNA complexes and provide insights into rational design of benchmarks. The PDA program is freely available at http://bioinfozen.uncc.edu/webpda.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-10-S1-S13DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2709256PMC
July 2009

[Sequencing on products of Oncomelania hupensis through simple sequence repeat anchored polymerase chain reaction amplification].

Zhonghua Liu Xing Bing Xue Za Zhi 2008 Nov;29(11):1119-22

Department of Epidemiology, School of Public Health, Fudan University, Shanghai 200032, China.

Objective: To analyze the sequence of microsatellite and the flanking sequence from four populations of Oncomelania hupensis.

Methods: We cloned 159 SSR-PCR amplification products of a commonly used primer, (CA)(8RY), using O. hupensis genomic DNA as template, and sequenced 82 products.

Results: The sequences obtained were novel O. hupensis genomic sequences but not repeat simple sequence. It was observed that 36 out of 82 clones contained microsatellites between priming sites. The flanking sequences of certain microsatellite were invariant. Both (GA/CT)(n) and (TTAGGG/CCCTAA)(n) were found in four populations of O. hupensis. However, (CAA)(n) were found only in O. hupensis from Fuqing, Fujian province and (TCTCTG)(n) were found only in O. hupensis from Guichi, Anhui province and (GAA/TTC)(n), (CAA/TTG)(n), (CAT)(n) were found only in O. hupensis from Puge, Sichuan province.

Conclusion: The results obtained by SSR-PCR should not be interpreted as the amplification of microsatellite loci, and analytical rules similar to those for Random Amplified Polymorphic DNA should be used. SSR-PCR could not make the most of the priority of microsatellite. It seems better to amplify the microsatellites with the primers designed on the basis of the flanking sequence.
View Article and Find Full Text PDF

Download full-text PDF

Source
November 2008

Improving the performance of protein threading using insertion/deletion frequency arrays.

J Bioinform Comput Biol 2008 Jun;6(3):585-602

Department of Biochemistry and Molecular Biology, The University of Georgia, Athens, GA 30602, USA.

As a protein evolves, not every part of the amino acid sequence has an equal probability of being deleted or for allowing insertions, because not every amino acid plays an equally important role in maintaining the protein structure. However, the most prevalent models in fold recognition methods treat every amino acid deletion and insertion as equally probable events. We have analyzed the alignment patterns for homologous and analogous sequences to determine patterns of insertion and deletion, and used that information to determine the statistics of insertions and deletions for different amino acids of a target sequence. We define these patterns as insertion/deletion (indel) frequency arrays (IFAs). By applying IFAs to the protein threading problem, we have been able to improve the alignment accuracy, especially for proteins with low sequence identity. We have also demonstrated that the application of this information can lead to an improvement in fold recognition.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1142/s0219720008003552DOI Listing
June 2008

Towards modeling of amyloid fibril structures.

Authors:
Jun-tao Guo Ying Xu

Front Biosci 2008 May 1;13:4039-50. Epub 2008 May 1.

Bioinformatics Research Center and Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA.

Amyloid fibrils are associated with a number of debilitating diseases, including Alzheimer's disease and variant Creutzfeldt-Jakob disease. The elucidation of the structure of amyloid fibrils is an important step toward understanding the mechanism of amyloid formation and developing therapeutic agents for amyloid diseases. Despite great interests and substantial efforts from various research communities, deriving high-resolution structures of amyloid fibrils remains a challenging problem, due to the insolubility and non-crystalline nature of the fibrils. An array of experimental methods, such as electron microscopy, fiber diffraction, hydrogen-deuterium exchange, solid-state NMR, electron paramagnetic resonance spectroscopy and biochemical approaches, have been explored to study the problem, having yielded considerable amount of, though still partial, information about the fibril conformation. Computational modeling techniques can be used to predict and build structural models of amyloid fibrils, utilizing the available experimental data. Here, we describe a few computational methods for modeling of aggregate and fibril structures with a focus on protein threading-based approaches and discuss the challenging issues ahead.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2741/2992DOI Listing
May 2008

Structure-based prediction of transcription factor binding sites using a protein-DNA docking approach.

Proteins 2008 Sep;72(4):1114-24

Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia 30602, USA.

Accurate identification of transcription factor binding sites is critical to our understanding of transcriptional regulatory networks. To overcome the issue of high false-positive predictions that trouble the sequence-based prediction techniques, we have developed a structure-based prediction method that takes into consideration of interactions between the amino acids of a transcription factor and the nucleotides of its DNA binding sequence at structural level, along with an efficient protein-DNA docking algorithm. The docked structures between a protein and a DNA are evaluated using a knowledge-based energy function, in conjunction with van der Waals energy. Our docking algorithm supports quasi-flexible docking, overcoming a number of limiting issues faced by similar docking algorithms. Our rigid-body docking algorithm is tested on a dataset of 141 nonredundant transcription factor-DNA complex structures. The test results show that 63.1% of the 141 complex structures are reconstructed with accuracies better than 1.0 A RMSDs (root mean square deviation) and 79.4% of the complexes are predicted with accuracies better than 3.0 A RMSDs when using the native DNA structures. Our quasi-flexible docking algorithm, assuming that the DNA structures are not known, is tested on a separate set of 45 transcription factor-DNA complexes, of which 57.8% of the docked complex conformations achieve better than 1.0 A RMSDs while 71.1% of the complexes have RMSDs less than 3.0 A. We have also applied our method to predict the binding motifs of the ferric uptake regulator in E. coli and showed that most of the experimentally identified sites can be predicted accurately.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.22002DOI Listing
September 2008

A historical perspective of template-based protein structure prediction.

Methods Mol Biol 2008 ;413:3-42

Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA.

This chapter presents a broad and a historical overview of the problem of protein structure prediction. Different structure prediction methods, including homology modeling, fold recognition (FR)/protein threading, ab initio/de novo approaches, and hybrid techniques involving multiple types of approaches, are introduced in a historical context. The progress of the field as a whole, especially in the threading/FR area, as reflected by the CASP/CAFASP contests, is reviewed. At the end of the chapter, we discuss the challenging issues ahead in the field of protein structure prediction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-59745-574-9_1DOI Listing
June 2008

Improvement in protein sequence-structure alignment using insertion/deletion frequency arrays.

Comput Syst Bioinformatics Conf 2007 ;6:335-42

Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology and Institute of Bioinformatics, The University of Georgia, Athens, Georgia 30602, USA.

As a protein evolves, not every part of the amino acid sequence has an equal probability of being deleted or for allowing insertions, because not every amino acid plays an equally important role in maintaining the protein structure. However the most prevalent models in fold recognition methods treat every amino acid deletion and insertion as equally probable events. We have analyzed the alignment patterns for homologous and analogous sequences to determine patterns of insertion and deletions, and used that information to determine the statistics of insertions and deletions for different amino acids of a target sequence. We define these patterns as Insertion/Deletion (Indel) Frequency Arrays (IFA). By applying IFA to the protein threading problem, we have been able to improve the alignment accuracy, especially for proteins with low sequence identity.
View Article and Find Full Text PDF

Download full-text PDF

Source
December 2007

A generalized threading model using integer programming that allows for secondary structure element deletion.

Genome Inform 2006 ;17(2):248-58

Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30622, USA.

Integer programming is a combinatorial optimization method that has been successfully applied to the protein threading problem. We seek to expand the model optimized by this technique to allow for a more accurate description of protein threading. We have developed and implemented an expanded model of integer programming that has the capability to model secondary structure element deletion, which was not possible in previous version of integer programming based optimization.
View Article and Find Full Text PDF

Download full-text PDF

Source
June 2007