Publications by authors named "Louxin Zhang"

51 Publications

Stage-specific protein-domain mutational profile of invasive ductal breast cancer.

BMC Med Genomics 2020 10 22;13(Suppl 10):150. Epub 2020 Oct 22.

Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, 119076, Singapore.

Background: Understanding the mechanisms underlying the malignant progression of cancer cells is crucial for early diagnosis and therapeutic treatment for cancer. Mutational heterogeneity of breast cancer suggests that about a dozen of cancer genes consistently mutate, together with many other genes mutating occasionally, in patients.

Methods: Using the whole-exome sequences and clinical information of 468 patients in the TCGA project data portal, we analyzed mutated protein domains and signaling pathway alterations in order to understand how infrequent mutations contribute aggregately to tumor progression in different stages.

Results: Our findings suggest that while the spectrum of mutated domains was diverse, mutations were aggregated in Pkinase, Pkinase Tyr, Y-Phosphatase and Src-homology 2 domains, highlighting the genetic heterogeneity in activating the protein tyrosine kinase signaling pathways in invasive ductal breast cancer.

Conclusions: The study provides new clues to the functional role of infrequent mutations in protein domain regions in different stages for invasive ductal breast cancer, yielding biological insights into metastasis for invasive ductal breast cancer.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12920-020-00777-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7580001PMC
October 2020

Evidence for transmission of COVID-19 prior to symptom onset.

Elife 2020 06 22;9. Epub 2020 Jun 22.

Simon Fraser University, Burnaby, Canada.

We collated contact tracing data from COVID-19 clusters in Singapore and Tianjin, China and estimated the extent of pre-symptomatic transmission by estimating incubation periods and serial intervals. The mean incubation periods accounting for intermediate cases were 4.91 days (95%CI 4.35, 5.69) and 7.54 (95%CI 6.76, 8.56) days for Singapore and Tianjin, respectively. The mean serial interval was 4.17 (95%CI 2.44, 5.89) and 4.31 (95%CI 2.91, 5.72) days (Singapore, Tianjin). The serial intervals are shorter than incubation periods, suggesting that pre-symptomatic transmission may occur in a large proportion of transmission events (0.4-0.5 in Singapore and 0.6-0.8 in Tianjin, in our analysis with intermediate cases, and more without intermediates). Given the evidence for pre-symptomatic transmission, it is vital that even individuals who appear healthy abide by public health measures to control COVID-19.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.57149DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7386904PMC
June 2020

A survey and systematic assessment of computational methods for drug response prediction.

Brief Bioinform 2021 Jan;22(1):232-246

Drug response prediction arises from both basic and clinical research of personalized therapy, as well as drug discovery for cancers. With gene expression profiles and other omics data being available for over 1000 cancer cell lines and tissues, different machine learning approaches have been applied to drug response prediction. These methods appear in a body of literature and have been evaluated on different datasets with only one or two accuracy metrics. We systematically assess 17 representative methods for drug response prediction, which have been developed in the past 5 years, on four large public datasets in nine metrics. This study provides insights and lessons for future research into drug response prediction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbz164DOI Listing
January 2021

ZDOG: zooming in on dominating genes with mutations in cancer pathways.

BMC Bioinformatics 2019 Dec 30;20(1):740. Epub 2019 Dec 30.

Department of Mathematics and Computational Biology Programme, National University of Singapore, Singapore, 119076, Singapore.

Background: Inference of cancer-causing genes and their biological functions are crucial but challenging due to the heterogeneity of somatic mutations. The heterogeneity of somatic mutations reveals that only a handful of oncogenes mutate frequently and a number of cancer-causing genes mutate rarely.

Results: We develop a Cytoscape app, named ZDOG, for visualization of the extent to which mutated genes may affect cancer pathways using the dominating tree model. The dominator tree model allows us to examine conveniently the positional importance of a gene in cancer signalling pathways. This tool facilitates the identification of mutated "master" regulators even with low mutation frequency in deregulated signalling pathways.

Conclusions: We have presented a model for facilitating the examination of the extent to which mutation in a gene may affect downstream components in a signalling pathway through its positional information. The model is implemented in a user-friendly Cytoscape app which will be freely available upon publication.

Availability: Together with a user manual, the ZDOG app is freely available at GitHub (https://github.com/rudi2013/ZDOG). It is also available in the Cytoscape app store (http://apps.cytoscape.org/apps/ZDOG) and users can easily install it using the Cytoscape App Manager.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-019-3326-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6937862PMC
December 2019

Generating normal networks via leaf insertion and nearest neighbor interchange.

Authors:
Louxin Zhang

BMC Bioinformatics 2019 Dec 17;20(Suppl 20):642. Epub 2019 Dec 17.

Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, 119076, Singapore.

Background: Galled trees are studied as a recombination model in theoretical population genetics. This class of phylogenetic networks has been generalized to tree-child networks and other network classes by relaxing a structural condition imposed on galled trees. Although these networks are simple, their topological structures have yet to be fully understood.

Results: It is well-known that all phylogenetic trees on n taxa can be generated by the insertion of the n-th taxa to each edge of all the phylogenetic trees on n-1 taxa. We prove that all tree-child (resp. normal) networks with k reticulate nodes on n taxa can be uniquely generated via three operations from all the tree-child (resp. normal) networks with k-1 or k reticulate nodes on n-1 taxa. Applying this result to counting rooted phylogenetic networks, we show that there are exactly [Formula: see text] binary phylogenetic networks with one reticulate node on n taxa.

Conclusions: The work makes two contributions to understand normal networks. One is a generalization of an enumeration procedure for phylogenetic trees into one for normal networks. Another is simple formulas for counting normal networks and phylogenetic networks that have only one reticulate node.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-019-3209-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6915859PMC
December 2019

Compression of Phylogenetic Networks and Algorithm for the Tree Containment Problem.

J Comput Biol 2019 03 9;26(3):285-294. Epub 2019 Jan 9.

Department of Mathematics, National University of Singapore, Singapore 119076, Singapore.

Rooted phylogenetic networks are rooted acyclic digraphs. They are used to model complex evolution where hybridization, recombination, and other reticulation events play a role. A rigorous definition of network compression is introduced on the basis of recent studies of relationships between cluster, tree, and rooted phylogenetic networks. The concept reveals new connections between well-studied network classes, including tree-child networks and reticulation-visible networks. It also enables us to define a new class of networks for which the cluster containment problem is solvable in linear time.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2018.0220DOI Listing
March 2019

S-Cluster++: a fast program for solving the cluster containment problem for phylogenetic networks.

Bioinformatics 2018 09;34(17):i680-i686

Department of Mathematics, National University of Singapore, Singapore.

Motivation: Comparative genomic studies indicate that extant genomes are more properly considered to be a fusion product of random mutations over generations (vertical evolution) and genomic material transfers between individuals of different lineages (reticulate transfer). This has motivated biologists to use phylogenetic networks and other general models to study genome evolution. Two fundamental algorithmic problems arising from verification of phylogenetic networks and from computing Robinson-Foulds distance in the space of phylogenetic networks are the tree and cluster containment problems. The former asks how to decide whether or not a phylogenetic tree is displayed in a phylogenetic network. The latter is to decide whether a subset of taxa appears as a cluster in some tree displayed in a phylogenetic network. The cluster containment problem (CCP) is also closely related to testing the infinite site model on a recombination network. Both the tree containment and CCP are NP-complete. Although the CCP was introduced a decade ago, there has been little progress in developing fast algorithms for it on arbitrary phylogenetic networks.

Results: In this work, we present a fast computer program for the CCP. This program is developed on the basis of a linear-time transformation from the small version of the CCP to the SAT problem.

Availability And Implementation: The program package is available for download on http://www.math.nus.edu.sg/∼matzlx/ccp.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty594DOI Listing
September 2018

RecPhyloXML: a format for reconciled gene trees.

Bioinformatics 2018 11;34(21):3646-3652

Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.

Motivation: A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc.-along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs.

Results: Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities.

Availability And Implementation: http://phylariane.univ-lyon1.fr/recphyloxml/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty389DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6198865PMC
November 2018

Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization.

BMC Cancer 2017 Aug 2;17(1):513. Epub 2017 Aug 2.

Key Lab of Industrial Fermentation Microbiology, Ministry of Education & Tianjin City, College of Biotechnology, Tianjin University of Science and Technology, Tianjin, 300457, China.

Background: Human cancer cell lines are used in research to study the biology of cancer and to test cancer treatments. Recently there are already some large panels of several hundred human cancer cell lines which are characterized with genomic and pharmacological data. The ability to predict drug responses using these pharmacogenomics data can facilitate the development of precision cancer medicines. Although several methods have been developed to address the drug response prediction, there are many challenges in obtaining accurate prediction.

Methods: Based on the fact that similar cell lines and similar drugs exhibit similar drug responses, we adopted a similarity-regularized matrix factorization (SRMF) method to predict anticancer drug responses of cell lines using chemical structures of drugs and baseline gene expression levels in cell lines. Specifically, chemical structural similarity of drugs and gene expression profile similarity of cell lines were considered as regularization terms, which were incorporated to the drug response matrix factorization model.

Results: We first demonstrated the effectiveness of SRMF using a set of simulation data and compared it with two typical similarity-based methods. Furthermore, we applied it to the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets, and performance of SRMF exceeds three state-of-the-art methods. We also applied SRMF to estimate the missing drug response values in the GDSC dataset. Even though SRMF does not specifically model mutation information, it could correctly predict drug-cancer gene associations that are consistent with existing data, and identify novel drug-cancer gene associations that are not found in existing data as well. SRMF can also aid in drug repositioning. The newly predicted drug responses of GDSC dataset suggest that mTOR inhibitor rapamycin was sensitive to non-small cell lung cancer (NSCLC), and expression of AK1RC3 and HINT1 may be adjunct markers of cell line sensitivity to rapamycin.

Conclusions: Our analysis showed that the proposed data integration method is able to improve the accuracy of prediction of anticancer drug responses in cell lines, and can identify consistent and novel drug-cancer gene associations compared to existing data as well as aid in drug repositioning.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12885-017-3500-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5541434PMC
August 2017

A program to compute the soft Robinson-Foulds distance between phylogenetic networks.

BMC Genomics 2017 03 14;18(Suppl 2):111. Epub 2017 Mar 14.

Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore.

Background: Over the past two decades, phylogenetic networks have been studied to model reticulate evolutionary events. The relationships among phylogenetic networks, phylogenetic trees and clusters serve as the basis for reconstruction and comparison of phylogenetic networks. To understand these relationships, two problems are raised: the tree containment problem, which asks whether a phylogenetic tree is displayed in a phylogenetic network, and the cluster containment problem, which asks whether a cluster is represented at a node in a phylogenetic network. Both the problems are NP-complete.

Results: A fast exponential-time algorithm for the cluster containment problem on arbitrary networks is developed and implemented in C. The resulting program is further extended into a computer program for fast computation of the Soft Robinson-Foulds distance between phylogenetic networks.

Conclusions: Two computer programs are developed for facilitating reconstruction and validation of phylogenetic network models in evolutionary and comparative genomics. Our simulation tests indicated that they are fast enough for use in practice. Additionally, the distribution of the Soft Robinson-Foulds distance between phylogenetic networks is demonstrated to be unlikely normal by our simulation data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-017-3500-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374702PMC
March 2017

A program for verification of phylogenetic network models.

Bioinformatics 2016 09;32(17):i503-i510

Department of Mathematics.

Motivation: Genetic material is transferred in a non-reproductive manner across species more frequently than commonly thought, particularly in the bacteria kingdom. On one hand, extant genomes are thus more properly considered as a fusion product of both reproductive and non-reproductive genetic transfers. This has motivated researchers to adopt phylogenetic networks to study genome evolution. On the other hand, a gene's evolution is usually tree-like and has been studied for over half a century. Accordingly, the relationships between phylogenetic trees and networks are the basis for the reconstruction and verification of phylogenetic networks. One important problem in verifying a network model is determining whether or not certain existing phylogenetic trees are displayed in a phylogenetic network. This problem is formally called the tree containment problem. It is NP-complete even for binary phylogenetic networks.

Results: We design an exponential time but efficient method for determining whether or not a phylogenetic tree is displayed in an arbitrary phylogenetic network. It is developed on the basis of the so-called reticulation-visible property of phylogenetic networks.

Availability And Implementation: A C-program is available for download on http://www.math.nus.edu.sg/∼matzlx/tcp_package

Contact: matzlx@nus.edu.sg

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btw467DOI Listing
September 2016

On Tree-Based Phylogenetic Networks.

Authors:
Louxin Zhang

J Comput Biol 2016 07 26;23(7):553-65. Epub 2016 May 26.

Department of Mathematics, National University of Singapore , Singapore .

A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree-based networks. We present a simple necessary and sufficient condition for tree-based networks and prove that a universal tree-based network exists for any number of taxa that contains as its base every phylogenetic tree on the same set of taxa. This answers two problems posted by Francis and Steel recently. A byproduct is a computer program for generating random binary phylogenetic networks under the uniform distribution model.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2015.0228DOI Listing
July 2016

Predicting chemotherapeutic drug combinations through gene network profiling.

Sci Rep 2016 Jan 21;6:18658. Epub 2016 Jan 21.

Department of Biochemistry, National University of Singapore, Singapore.

Contemporary chemotherapeutic treatments incorporate the use of several agents in combination. However, selecting the most appropriate drugs for such therapy is not necessarily an easy or straightforward task. Here, we describe a targeted approach that can facilitate the reliable selection of chemotherapeutic drug combinations through the interrogation of drug-resistance gene networks. Our method employed single-cell eukaryote fission yeast (Schizosaccharomyces pombe) as a model of proliferating cells to delineate a drug resistance gene network using a synthetic lethality workflow. Using the results of a previous unbiased screen, we assessed the genetic overlap of doxorubicin with six other drugs harboring varied mechanisms of action. Using this fission yeast model, drug-specific ontological sub-classifications were identified through the computation of relative hypersensitivities. We found that human gastric adenocarcinoma cells can be sensitized to doxorubicin by concomitant treatment with cisplatin, an intra-DNA strand crosslinking agent, and suberoylanilide hydroxamic acid, a histone deacetylase inhibitor. Our findings point to the utility of fission yeast as a model and the differential targeting of a conserved gene interaction network when screening for successful chemotherapeutic drug combinations for human cells.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep18658DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4726371PMC
January 2016

Fitness profiling links topoisomerase II regulation of centromeric integrity to doxorubicin resistance in fission yeast.

Sci Rep 2015 Feb 11;5:8400. Epub 2015 Feb 11.

1] Department of Biochemistry, National University of Singapore, Singapore 117597 [2] National University Health System (NUHS), Singapore [3] Synthetic Biology Research Consortium, National University of Singapore [4] NUS Graduate School for Integrative Sciences and Engineering.

Doxorubicin, a chemotherapeutic agent, inhibits the religation step of topoisomerase II (Top2). However, the downstream ramifications of this action are unknown. Here we performed epistasis analyses of top2 with 63 genes representing doxorubicin resistance (DXR) genes in fission yeast and revealed a subset that synergistically collaborate with Top2 to confer DXR. Our findings show that the chromatin-regulating RSC and SAGA complexes act with Top2 in a cluster that is functionally distinct from the Ino80 complex. In various DXR mutants, doxorubicin hypersensitivity was unexpectedly suppressed by a concomitant top2 mutation. Several DXR proteins showed centromeric localization, and their disruption resulted in centromeric defects and chromosome missegregation. An additional top2 mutation could restore centromeric chromatin integrity, suggesting a counterbalance between Top2 and these DXR factors in conferring doxorubicin resistance. Overall, this molecular basis for mitotic catastrophe associated with doxorubicin treatment will help to facilitate drug combinatorial usage in doxorubicin-related chemotherapeutic regimens.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep08400DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4323662PMC
February 2015

[Effect of smoking on the microRNAs expression in pneumoconiosis patients].

Zhonghua Lao Dong Wei Sheng Zhi Ye Bing Za Zhi 2014 Sep;32(9):686-8

E-mail:

Objective: To investigate the effect of smoking on the microRNAs (miRNAs) expression in pneumoconiosis patients.

Methods: Real-time qPCR was used to measure the expression levels of miR-21, miR-200c, miR-16, miR-204, miR-206, miR-155, let-7g, miR-30b, and miR-192 in 36 non-smoking patients with pneumoconiosis and 38 smoking patients with pneumoconiosis, and the differences in expression levels between the two groups were evaluated by two-independent samples t-test.

Results: The expression of miR-192 in serum showed a significant difference between non-smoking and smoking pneumoconiosis patients (P < 0.05), and it decreased gradually in smoking patients with stage I and II pneumoconiosis. In the serum of all pneumoconiosis patients, the expression level of miR-16 was the highest, while the expression level of miR-204 was the lowest.

Conclusion: Pneumoconiosis patients have differential expression of miRNAs in serum, and smoking has an effect on the miRNAs expression in pneumoconiosis patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
September 2014

Profiling the transcription factor regulatory networks of human cell types.

Nucleic Acids Res 2014 Nov 9;42(20):12380-7. Epub 2014 Oct 9.

Department of Mathematics, National University of Singapore, Singapore 119076, Singapore National University of Singapore Graduate School for Integrative Sciences and Engineering, Singapore 117456, Singapore

Neph et al. (2012) (Circuitry and dynamics of human transcription factor regulatory networks. Cell, 150: 1274-1286) reported the transcription factor (TF) regulatory networks of 41 human cell types using the DNaseI footprinting technique. This provides a valuable resource for uncovering regulation principles in different human cells. In this paper, the architectures of the 41 regulatory networks and the distributions of housekeeping and specific regulatory interactions are investigated. The TF regulatory networks of different human cell types demonstrate similar global three-layer (top, core and bottom) hierarchical architectures, which are greatly different from the yeast TF regulatory network. However, they have distinguishable local organizations, as suggested by the fact that wiring patterns of only a few TFs are enough to distinguish cell identities. The TF regulatory network of human embryonic stem cells (hESCs) is dense and enriched with interactions that are unseen in the networks of other cell types. The examination of specific regulatory interactions suggests that specific interactions play important roles in hESCs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku923DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4227771PMC
November 2014

Are the duplication cost and Robinson-Foulds distance equivalent?

J Comput Biol 2014 Aug 2;21(8):578-90. Epub 2014 Jul 2.

1 Department of Mathematics, National University of Singapore , Singapore .

In the tree reconciliation approach for species tree inference, a tree that has the minimum reconciliation score for given gene trees is taken as an estimate of the species tree. The scoring models used in existing tree reconciliation methods include the duplication, mutation, and deep coalescence costs. Since existing inference methods all are heuristic, their performances are often evaluated by using the Robinson-Foulds (RF) distance between the true species trees and the estimates output on simulated multi-locus datasets. To better understand these methods, we study the relationships between the duplication cost and the RF distance. We prove that the gap between the duplication cost and the RF distance is unbounded, but the symmetric duplication cost is logarithmically equivalent to the RF distance. The relationships between other reconciliation costs and the RF distance are also investigated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2014.0021DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4116105PMC
August 2014

Effect of Incomplete Lineage Sorting On Tree-Reconciliation-Based Inference of Gene Duplication.

IEEE/ACM Trans Comput Biol Bioinform 2014 May-Jun;11(3):477-85

In the tree reconciliation approach to infer the duplication history of a gene family, the gene (family) tree is compared to the corresponding species tree. Incomplete lineage sorting (ILS) gives rise to stochastic variation in the topology of a gene tree and hence likely introduces false duplication events when a tree reconciliation method is used. We quantify the effect of ILS on gene duplication inference in a species tree in terms of the expected number of false duplication events inferred from reconciling a random gene tree, which occurs with a probability predicted in coalescent theory, and the species tree. We computationally examine the relationship between the effect of ILS on duplication inference in a species tree and its topological parameters. Our findings suggest that ILS may cause non-negligible bias on duplication inference, particularly on an asymmetric species tree. Hence, when gene duplication is inferred via tree reconciliation or any other approach that takes gene tree topology into account, the ILS-induced bias should be examined cautiously.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2013.2297913DOI Listing
March 2016

Maximum likelihood inference of the evolutionary history of a PPI network from the duplication history of its proteins.

IEEE/ACM Trans Comput Biol Bioinform 2013 Nov-Dec;10(6):1412-21

National University of Singapore, Singapore.

Evolutionary history of protein-protein interaction (PPI) networks provides valuable insight into molecular mechanisms of network growth. In this paper, we study how to infer the evolutionary history of a PPI network from its protein duplication relationship. We show that for a plausible evolutionary history of a PPI network, its relative quality, measured by the so-called loss number, is independent of the growth parameters of the network and can be computed efficiently. This finding leads us to propose two fast maximum likelihood algorithms to infer the evolutionary history of a PPI network given the duplication history of its proteins. Simulation studies demonstrated that our approach, which takes advantage of protein duplication information, outperforms NetArch, the first maximum likelihood algorithm for PPI network history reconstruction. Using the proposed method, we studied the topological change of the PPI networks of the yeast, fruitfly, and worm.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2013.14DOI Listing
September 2014

Counting motifs in the human interactome.

Nat Commun 2013 ;4:2241

Department of Statistics and Applied Probability, National University of Singapore NUS, Singapore 117546, Singapore.

Small over-represented motifs in biological networks often form essential functional units of biological processes. A natural question is to gauge whether a motif occurs abundantly or rarely in a biological network. Here we develop an accurate method to estimate the occurrences of a motif in the entire network from noisy and incomplete data, and apply it to eukaryotic interactomes and cell-specific transcription factor regulatory networks. The number of triangles in the human interactome is about 194 times that in the Saccharomyces cerevisiae interactome. A strong positive linear correlation exists between the numbers of occurrences of triad and quadriad motifs in human cell-specific transcription factor regulatory networks. Our findings show that the proposed method is general and powerful for counting motifs and can be applied to any network regardless of its topological structure.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms3241DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3741638PMC
February 2014

Inverted expression profiles of sex-biased genes in response to toxicant perturbations and diseases.

PLoS One 2013 14;8(2):e56668. Epub 2013 Feb 14.

Department of Biological Sciences, National University of Singapore, Kent Ridge, Singapore.

The influence of sex factor is widely recognized in various diseases, but its molecular basis, particularly how sex-biased genes, those with sexually dimorphic expression, behave in response to toxico-pathological changes is poorly understood. In this study, zebrafish toxicogenomic data and transcriptomic data from human pathological studies were analysed for the responses of male- and female-biased genes. Our analyses revealed obvious inverted expression profiles of sex-biased genes, where affected males tended to up-regulate genes of female-biased expression and down-regulate genes of male-biased expression, and vice versa in affected females, in a broad range of toxico-pathological conditions. Intriguingly, the extent of these inverted profiles correlated well to the susceptibility or severity of a given toxico-pathological state, suggesting that inverted expression profiles of sex-biased genes observed in this study can be used as important indicators to assess biological disorders.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0056668PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3573008PMC
August 2013

Motif discovery with data mining in 3D protein structure databases: discovery, validation and prediction of the U-shape zinc binding ("Huf-Zinc") motif.

J Bioinform Comput Biol 2013 Feb 16;11(1):1340008. Epub 2013 Jan 16.

Bioinformatics Institute-BII, Agency for Science and Technology (A*STAR), 30 Biopolis Street #07-01, Matrix, 138671 Singapore.

Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0219720013400088DOI Listing
February 2013

Two combinatorial optimization problems for SNP discovery using base-specific cleavage and mass spectrometry.

BMC Syst Biol 2012 12;6 Suppl 2:S5. Epub 2012 Dec 12.

School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore.

Background: The discovery of single-nucleotide polymorphisms (SNPs) has important implications in a variety of genetic studies on human diseases and biological functions. One valuable approach proposed for SNP discovery is based on base-specific cleavage and mass spectrometry. However, it is still very challenging to achieve the full potential of this SNP discovery approach.

Results: In this study, we formulate two new combinatorial optimization problems. While both problems are aimed at reconstructing the sample sequence that would attain the minimum number of SNPs, they search over different candidate sequence spaces. The first problem, denoted as SNP - MSP, limits its search to sequences whose in silico predicted mass spectra have all their signals contained in the measured mass spectra. In contrast, the second problem, denoted as SNP - MSQ, limits its search to sequences whose in silico predicted mass spectra instead contain all the signals of the measured mass spectra. We present an exact dynamic programming algorithm for solving the SNP - MSP problem and also show that the SNP - MSQ problem is NP-hard by a reduction from a restricted variation of the 3-partition problem.

Conclusions: We believe that an efficient solution to either problem above could offer a seamless integration of information in four complementary base-specific cleavage reactions, thereby improving the capability of the underlying biotechnology for sensitive and accurate SNP discovery.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1752-0509-6-S2-S5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521188PMC
June 2013

Toxicogenomic analysis suggests chemical-induced sexual dimorphism in the expression of metabolic genes in zebrafish liver.

PLoS One 2012 18;7(12):e51971. Epub 2012 Dec 18.

NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore, Singapore.

Differential gene expression in two sexes is widespread throughout the animal kingdom, giving rise to sex-dimorphic gene activities and sex-dependent adaptability to environmental cues, diets, growth and development as well as susceptibility to diseases. Here, we present a study using a toxicogenomic approach to investigate metabolic genes that show sex-dimorphic expression in the zebrafish liver triggered by several chemicals. Our analysis revealed that, besides the known genes for xenobiotic metabolism, many functionally diverse metabolic genes, such as ELOVL fatty acid elongase, DNA-directed RNA polymerase, and hydroxysteroid dehydrogenase, were also sex-dimorphic in their response to chemical treatments. Moreover, sex-dimorphic responses were also observed at the pathway level. Pathways belonging to xenobiotic metabolism, lipid metabolism, and nucleotide metabolism were enriched with sex-dimorphically expressed genes. We also observed temporal differences of the sex-dimorphic responses, suggesting that both genes and pathways are differently correlated during different periods of chemical perturbation. The ubiquity of sex-dimorphic activities at different biological hierarchies indicate the importance and the need of considering the sex factor in many areas of biological researches, especially in toxicology and pathology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0051971PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3525581PMC
June 2013

Revealing mammalian evolutionary relationships by comparative analysis of gene clusters.

Genome Biol Evol 2012 27;4(4):586-601. Epub 2012 Mar 27.

Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, PA, USA.

Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events. We developed a computational method for automatically mapping both types of orthology on a per-nucleotide basis in gene cluster regions studied by comparative sequencing, and we make this mapping accessible by visualizing the output. All of these steps are incorporated into our newly extended CHAP 2 package. We evaluate our method using both simulated data and real gene clusters (including the well-characterized α-globin and β-globin clusters). We also illustrate use of CHAP 2 by analyzing four more loci: CCL (chemokine ligand), IFN (interferon), CYP2abf (part of cytochrome P450 family 2), and KIR (killer cell immunoglobulin-like receptors). These new methods facilitate and extend our understanding of evolution at these and other loci by adding automated accurate evolutionary inference to the biologist's toolkit. The CHAP 2 package is freely available from http://www.bx.psu.edu/miller_lab.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gbe/evs032DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3342878PMC
July 2012

Structural properties of the reconciliation space and their applications in enumerating nearly-optimal reconciliations between a gene tree and a species tree.

BMC Bioinformatics 2011 Oct 5;12 Suppl 9:S7. Epub 2011 Oct 5.

Department of Mathematics, National University of Singapore, Singapore 119076.

Introduction: A gene tree for a gene family is often discordant with the containing species tree because of its complex evolutionary course during which gene duplication, gene loss and incomplete lineage sorting events might occur. Hence, it is of great challenge to infer the containing species tree from a set of gene trees. One common approach to this inference problem is through gene tree and species tree reconciliation.

Results: In this paper, we generalize the traditional least common ancestor (LCA) reconciliation to define a reconciliation between a gene tree and species tree under the tree homomorphism framework. We then study the structural properties of the space of all reconciliations between a gene tree and a species tree in terms of the gene duplication, gene loss or deep coalescence costs. As application, we show that the LCA reconciliation is the unique one that has the minimum deep coalescence cost, provide a novel characterization of the reconciliations with the optimal duplication cost, and present efficient algorithms for enumerating (nearly-)optimal reconciliations with respect to each cost.

Conclusions: This work provides a new graph-theoretic framework for studying gene tree and species tree reconciliations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-12-S9-S7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3283320PMC
October 2011

Existence of inverted profile in chemically responsive molecular pathways in the zebrafish liver.

PLoS One 2011 29;6(11):e27819. Epub 2011 Nov 29.

Department of Biological Sciences, National University of Singapore, Queenstown, Singapore.

How a living organism maintains its healthy equilibrium in response to endless exposure of potentially harmful chemicals is an important question in current biology. By transcriptomic analysis of zebrafish livers treated by various chemicals, we defined hubs as molecular pathways that are frequently perturbed by chemicals and have high degree of functional connectivity to other pathways. Our network analysis revealed that these hubs were organized into two groups showing inverted functionality with each other. Intriguingly, the inverted activity profiles in these two groups of hubs were observed to associate only with toxicopathological states but not with physiological changes. Furthermore, these inverted profiles were also present in rat, mouse, and human under certain toxicopathological conditions. Thus, toxicopathological-associated anti-correlated profiles in hubs not only indicate their potential use in diagnosis but also development of systems-based therapeutics to modulate gene expression by chemical approach in order to rewire the deregulated activities of hubs back to normal physiology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0027819PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3226580PMC
April 2012

Conversion events in gene clusters.

BMC Evol Biol 2011 Jul 28;11:226. Epub 2011 Jul 28.

Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, PA 16802 USA.

Background: Gene clusters containing multiple similar genomic regions in close proximity are of great interest for biomedical studies because of their associations with inherited diseases. However, such regions are difficult to analyze due to their structural complexity and their complicated evolutionary histories, reflecting a variety of large-scale mutational events. In particular, conversion events can mislead inferences about the relationships among these regions, as traced by traditional methods such as construction of phylogenetic trees or multi-species alignments.

Results: To correct the distorted information generated by such methods, we have developed an automated pipeline called CHAP (Cluster History Analysis Package) for detecting conversion events. We used this pipeline to analyze the conversion events that affected two well-studied gene clusters (α-globin and β-globin) and three gene clusters for which comparative sequence data were generated from seven primate species: CCL (chemokine ligand), IFN (interferon), and CYP2abf (part of cytochrome P450 family 2). CHAP is freely available at http://www.bx.psu.edu/miller_lab.

Conclusions: These studies reveal the value of characterizing conversion events in the context of studying gene clusters in complex genomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2148-11-226DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3161012PMC
July 2011

From gene trees to species trees II: species tree inference by minimizing deep coalescence events.

Authors:
Louxin Zhang

IEEE/ACM Trans Comput Biol Bioinform 2011 Nov-Dec;8(6):1685-91

Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076.

When gene copies are sampled from various species, the resulting gene tree might disagree with the containing species tree. The primary causes of gene tree and species tree discord include incomplete lineage sorting, horizontal gene transfer, and gene duplication and loss. Each of these events yields a different parsimony criterion for inferring the (containing) species tree from gene trees. With incomplete lineage sorting, species tree inference is to find the tree minimizing extra gene lineages that had to coexist along species lineages; with gene duplication, it becomes to find the tree minimizing gene duplications and/or losses. In this paper, we present the following results: 1) The deep coalescence cost is equal to the number of gene losses minus two times the gene duplication cost in the reconciliation of a uniquely leaf labeled gene tree and a species tree. The deep coalescence cost can be computed in linear time for any arbitrary gene tree and species tree. 2) The deep coalescence cost is always not less than the gene duplication cost in the reconciliation of an arbitrary gene tree and a species tree. 3) Species tree inference by minimizing deep coalescence events is NP-hard.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2011.83DOI Listing
April 2012

CAGE: Combinatorial Analysis of Gene-cluster Evolution.

J Comput Biol 2010 Sep;17(9):1227-42

Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, PA 16802, USA.

Much important evolutionary activity occurs in gene clusters, where a copy of a gene may be free to acquire new functions. Current computational methods to extract evolutionary information from sequence data for such clusters are suboptimal, in part because accurate sequence data are often lacking in these genomic regions, making existing methods difficult to apply. We describe a new method for reconstructing the recent evolutionary history of gene clusters, and evaluate its performance on both simulated data and actual human gene clusters.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2010.0094DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3122889PMC
September 2010