Search our Database of Scientific Publications and Authors

I’m looking for a

    320 results match your criteria Algorithms for Molecular Biology [Journal]

    1 OF 7

    Derivative-free neural network for optimizing the scoring functions associated with dynamic programming of pairwise-profile alignment.
    Algorithms Mol Biol 2018 15;13. Epub 2018 Feb 15.
    1Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8579 Japan.
    Background: A profile-comparison method with position-specific scoring matrix (PSSM) is among the most accurate alignment methods. Currently, cosine similarity and correlation coefficients are used as scoring functions of dynamic programming to calculate similarity between PSSMs. However, it is unclear whether these functions are optimal for profile alignment methods. Read More

    Fast phylogenetic inference from typing data.
    Algorithms Mol Biol 2018 15;13. Epub 2018 Feb 15.
    3INESC-ID Lisboa, Rua Alves Redol 9, 1000-029 Lisboa, Portugal.
    Background: Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. Read More

    A safe and complete algorithm for metagenomic assembly.
    Algorithms Mol Biol 2018 7;13. Epub 2018 Feb 7.
    Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.
    Background: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graphthat together cover all nodes, or edges, of. Read More

    Time-consistent reconciliation maps and forbidden time travel.
    Algorithms Mol Biol 2018 6;13. Epub 2018 Feb 6.
    1Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Strasse 47, 17487 Greifswald, Germany.
    Background: In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent togene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene treewith a species trees, relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible. Read More

    Gene tree parsimony for incomplete gene trees: addressing true biological loss.
    Algorithms Mol Biol 2018 19;13. Epub 2018 Jan 19.
    2Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, USA.
    Motivation: Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some(i. Read More

    Phylogeny reconstruction based on the length distribution of-mismatch common substrings.
    Algorithms Mol Biol 2017 11;12:27. Epub 2017 Dec 11.
    Department of Bioinformatics, Institute of Microbiology and Genetics, University of Goettingen, Goldschmidtstr. 1, 37077 Göttingen, Germany.
    Background: Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487-1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated based on the average length of exact common substrings. Read More

    Generalized enhanced suffix array construction in external memory.
    Algorithms Mol Biol 2017 7;12:26. Epub 2017 Dec 7.
    Institute of Mathematics and Computer Science, University of São Paulo, Av. Trabalhador São-carlense, 400, São Carlos, 13560-970 Brazil.
    Background: Suffix arrays, augmented by additional data structures, allow solving efficiently many string processing problems. The external memory construction of the generalized suffix array for a string collection is a fundamental task when the size of the input collection or the data structure exceeds the available internal memory.

    Results: In this article we present and analyze [Formula: see text] [introduced in CPM (External memory generalized suffix and [Formula: see text] arrays construction. Read More

    HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.
    Algorithms Mol Biol 2017 29;12:25. Epub 2017 Sep 29.
    School of Computer Science and Technology, Tianjin University, Tianjin, China.
    Background: Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types.

    Methods: Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e. Read More

    Algorithms for matching partially labelled sequence graphs.
    Algorithms Mol Biol 2017 25;12:24. Epub 2017 Sep 25.
    Francis Crick Institute, 1 Midland Road, London, NW1 1AT UK.
    Background: In order to find correlated pairs of positions between proteins, which are useful in predicting interactions, it is necessary to concatenate two large multiple sequence alignments such that the sequences that are joined together belong to those that interact in their species of origin. When each protein is unique then the species name is sufficient to guide this match, however, when there are multiple related sequences (paralogs) in each species then the pairing is more difficult. In bacteria a good guide can be gained from genome co-location as interacting proteins tend to be in a common operon but in eukaryotes this simple principle is not sufficient. Read More

    Biologically feasible gene trees, reconciliation maps and informative triples.
    Algorithms Mol Biol 2017 29;12:23. Epub 2017 Aug 29.
    Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Strasse 47, 17487 Greifswald, Germany.
    Background: The history of gene families-which are equivalent togene trees-can be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are, that is, if there is a possible true history that would explain a given gene tree. In practice, this problem is boiled down to finding a reconciliation map-also known as DTL-scenario-between the event-labeled gene trees and a (possibly unknown) species tree. Read More

    Partially local three-way alignments and the sequence signatures of mitochondrial genome rearrangements.
    Algorithms Mol Biol 2017 23;12:22. Epub 2017 Aug 23.
    Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.
    Background: Genomic DNA frequently undergoes rearrangement of the gene order that can be localized by comparing the two DNA sequences. In mitochondrial genomes different mechanisms are likely at work, at least some of which involve the duplication of sequence around the location of the apparent breakpoints. We hypothesize that these different mechanisms of genome rearrangement leave distinctive sequence footprints. Read More

    A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification.
    Algorithms Mol Biol 2017 18;12:21. Epub 2017 Aug 18.
    Genome Informatics, Institute of Human Genetics, University of Duisburg-Essen, University Hospital Essen, Hufelandstr. 55, 45147 Essen, Germany.
    Background: Mixtures of beta distributions are a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1.

    Methods: While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Read More

    ASP-based method for the enumeration of attractors in non-deterministic synchronous and asynchronous multi-valued networks.
    Algorithms Mol Biol 2017 15;12:20. Epub 2017 Aug 15.
    École Centrale de Nantes, LS2N UMR CNRS 6004, 1 rue de la Noë, 44321 Nantes, France.
    Background: This paper addresses the problem of finding attractors in biological regulatory networks. We focus here on non-deterministic synchronous and asynchronous multi-valued networks, modeled using automata networks (AN). AN is a general and well-suited formalism to study complex interactions between different components (genes, proteins,. Read More

    Identification of bifurcation transitions in biological regulatory networks using Answer-Set Programming.
    Algorithms Mol Biol 2017 20;12:19. Epub 2017 Jul 20.
    LRI UMR 8623, Univ. Paris-Sud-CNRS, Université Paris-Saclay, 91405 Orsay, France.
    Background: Numerous cellular differentiation processes can be captured using discrete qualitative models of biological regulatory networks. These models describe the temporal evolution of the state of the network subject to different competing transitions, potentially leading the system to different attractors. This paper focusses on the formal identification of states and transitions that are crucial for preserving or pre-empting the reachability of a given behaviour. Read More

    A graph extension of the positional Burrows-Wheeler transform and its applications.
    Algorithms Mol Biol 2017 11;12:18. Epub 2017 Jul 11.
    Genomics Institute, University of California Santa Cruz, CBSE, 501C Engineering 2, MS: CBSE, 1156 High St., Santa Cruz, CA 95064 USA.
    We present a generalization of the positional Burrows-Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. Read More

    Isometric gene tree reconciliation revisited.
    Algorithms Mol Biol 2017 13;12:17. Epub 2017 Jun 13.
    Faculty of Mathematics, Physics, and Informatics, Comenius University, Mlynská dolina, 842 48 Bratislava, Slovakia.
    Background: Isometric gene tree reconciliation is a gene tree/species tree reconciliation problem where both the gene tree and the species tree include branch lengths, and these branch lengths must be respected by the reconciliation. The problem was introduced by Ma et al. in 2008 in the context of reconstructing evolutionary histories of genomes in the infinite sites model. Read More

    A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses.
    Algorithms Mol Biol 2017 11;12:12. Epub 2017 May 11.
    Department of Computer Science, ETH Zurich, Universitätstrasse 6, 8092 Zurich, Switzerland.
    Background: Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the: Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. Read More

    Algorithms for computing the double cut and join distance on both gene order and intergenic sizes.
    Algorithms Mol Biol 2017 5;12:16. Epub 2017 Jun 5.
    Institut National de Recherche en Informatique et en Automatique (Inria) Grenoble Rhône-Alpes, 655 avenue de l'Europe, 38330 Montbonnot-Saint-Martin, France.
    Background: Combinatorial works on genome rearrangements have so far ignored the influence of intergene sizes, i.e. the number of nucleotides between consecutive genes, although it was recently shown decisive for the accuracy of inference methods (Biller et al. Read More

    StreAM-[Formula: see text]: algorithms for analyzing coarse grained RNA dynamics based on Markov models of connectivity-graphs.
    Algorithms Mol Biol 2017 30;12:15. Epub 2017 May 30.
    Department of Biology, Department of Computer Science, Department of Physics, TU Darmstadt, Schnittspahnstr. 2, 64283 Darmstadt, Germany.
    Background: In this work, we present a new coarse grained representation of RNA dynamics. It is based on adjacency matrices and their interactions patterns obtained from molecular dynamics simulations. RNA molecules are well-suited for this representation due to their composition which is mainly modular and assessable by the secondary structure alone. Read More

    The gene family-free median of three.
    Algorithms Mol Biol 2017 26;12:14. Epub 2017 May 26.
    Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6 Canada.
    Background: The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes.

    Methods: We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. Read More

    Complexity and algorithms for copy-number evolution problems.
    Algorithms Mol Biol 2017 16;12:13. Epub 2017 May 16.
    School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
    Background: Cancer is an evolutionary process characterized by the accumulation of somatic mutations in a population of cells that form a tumor. One frequent type of mutations is copy number aberrations, which alter the number of copies of genomic regions. The number of copies of each position along a chromosome constitutes the chromosome's copy-number profile. Read More

    Core column prediction for protein multiple sequence alignments.
    Algorithms Mol Biol 2017 19;12:11. Epub 2017 Apr 19.
    Department of Computer Science, The University of Arizona, Tucson, AZ 85721 USA.
    Background: In a computed protein multiple sequence alignment, theof a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. Read More

    Aligning coding sequences with frameshift extension penalties.
    Algorithms Mol Biol 2017 31;12:10. Epub 2017 Mar 31.
    Département d'informatique, Faculté des Sciences, Université de Sherbrooke, Sherbrooke, QC J1K2R1 Canada.
    Background: Frameshift translation is an important phenomenon that contributes to the appearance of novel coding DNA sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of gene coding regions. Frameshift translations can be identified by aligning two CDS, from a same gene or from homologous genes, while accounting for their codon structure. Two main classes of algorithms have been proposed to solve the problem of aligning CDS, either by amino acid sequence alignment back-translation, or by simultaneously accounting for the nucleotide and amino acid levels. Read More

    Gerbil: a fast and memory-efficient -mer counter with GPU-support.
    Algorithms Mol Biol 2017 31;12. Epub 2017 Mar 31.
    Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.
    Background: A basic task in bioinformatics is the counting of -mers in genome sequences. Existing -mer counting tools are most often optimized for small < 32 and suffer from excessive memory resource consumption or degrading performance for large . However, given the technology trend towards long reads of next-generation sequencers, support for large becomes increasingly important. Read More

    The feasibility of genome-scale biological network inference using Graphics Processing Units.
    Algorithms Mol Biol 2017 20;12. Epub 2017 Mar 20.
    Department of Molecular and Integrative Physiology, University of Michigan, North Campus Research Complex, Ann Arbor, MI USA.
    Systems research spanning fields from biology to finance involves the identification of models to represent the underpinnings of complex systems. Formal approaches for data-driven identification of network interactions include statistical inference-based approaches and methods to identify dynamical systems models that are capable of fitting multivariate data. Availability of large data sets and so-called 'big data' applications in biology present great opportunities as well as major challenges for systems identification/reverse engineering applications. Read More

    An efficient algorithm for testing the compatibility of phylogenies with nested taxa.
    Algorithms Mol Biol 2017 16;12. Epub 2017 Mar 16.
    Department of Computer Science, Iowa State University, Atanasoff Hall, Ames, IA USA.
    Background: Semi-labeled trees generalize ordinary phylogenetic trees, allowing internal nodes to be labeled by higher-order taxa. Taxonomies are examples of semi-labeled trees. Suppose we are given collection [Formula: see text] of semi-labeled trees over various subsets of a set of taxa. Read More

    On the computational complexity of the maximum parsimony reconciliation problem in the duplication-loss-coalescence model.
    Algorithms Mol Biol 2017 14;12. Epub 2017 Mar 14.
    Department of Computer Science, Harvey Mudd College, Claremont, USA.
    Background: Phylogenetic tree reconciliation is a widely-used method for inferring the evolutionary histories of genes and species. In the duplication-loss-coalescence (DLC) model, we seek a reconciliation that explains the incongruence between a gene and species tree using gene duplication, loss, and deep coalescence events. In the maximum parsimony framework, costs are associated with these event types and a reconciliation is sought that minimizes the total cost of the events required to map the gene tree onto the species tree. Read More

    On avoided words, absent words, and their application to biological sequence analysis.
    Algorithms Mol Biol 2017 14;12. Epub 2017 Mar 14.
    Computational Regulatory Genomics, MRC Clinical Sciences Centre (CSC), Du Cane Road, London, W12 0NN UK.
    Background: The deviation of the observed frequency of a wordfrom its expected frequency in a given sequenceis used to determine whether or not the word is. This concept is particularly useful in DNA linguistic analysis. The value of the deviation of, denoted by [Formula: see text], effectively characterises the extent of a word by its edge contrast in the context in which it occurs. Read More

    Approximating the correction of weighted and unweighted orthology and paralogy relations.
    Algorithms Mol Biol 2017 11;12. Epub 2017 Mar 11.
    Département d'informatique et de recherche opérationnelle, Université de Montréal, Quebec, Canada.
    Background: Given a gene family, the relations between genes (orthology/paralogy), are represented by a relation graph, where edges connect pairs of orthologous genes and "missing" edges represent paralogs. While a gene tree directly induces a relation graph, the converse is not always true. Indeed, a relation graph is not necessarily "satisfiable", i. Read More

    Approximating the DCJ distance of balanced genomes in linear time.
    Algorithms Mol Biol 2017 9;12. Epub 2017 Mar 9.
    Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, MS Brazil.
    Background: Rearrangements are large-scale mutations in genomes, responsible for complex changes and structural variations. Most rearrangements that modify the organization of a genome can be represented by the double cut and join (DCJ) operation. Given two balanced genomes, i. Read More

    Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds.
    Algorithms Mol Biol 2017 14;12. Epub 2017 Feb 14.
    CRIStAL (UMR 9189 Lille University/CNRS)-Inria Lille, Bat M3 ext, Université Lille 1, 59655 Villeneuve d'Ascq, France.
    Background: , also named, have been proven to be more sensitive than contiguous seeds () in nucleic and amino-acid sequences analysis. Initially proposed to detect sequence similarities and to anchor sequence alignments, spaced seeds have more recently been applied in severalrelated methods. Unfortunately, spaced seeds need to be initially designed. Read More

    Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads.
    Algorithms Mol Biol 2017 22;12. Epub 2017 Feb 22.
    Inria Grenoble, 655, Avenue de l'Europe, 38334 Montbonnot, France.
    Background: The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Read More

    An efficient algorithm for protein structure comparison using elastic shape analysis.
    Algorithms Mol Biol 2016 29;11:27. Epub 2016 Sep 29.
    ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India ; Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi, 110012 India.
    Background: Protein structure comparison play important role in in silico functional prediction of a new protein. It is also used for understanding the evolutionary relationships among proteins. A variety of methods have been proposed in literature for comparing protein structures but they have their own limitations in terms of accuracy and complexity with respect to computational time and space. Read More

    Analysis of gene copy number changes in tumor phylogenetics.
    Algorithms Mol Biol 2016 22;11:26. Epub 2016 Sep 22.
    Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208 USA.
    Backgound: Evolution of cancer cells is characterized by large scale and rapid changes in the chromosomal  landscape. The fluorescence in situ hybridization (FISH) technique provides a way to measure the copy numbers of preselected genes in a group of cells and has been found to be a reliable source of data to model the evolution of tumor cells. Chowdhury et al. Read More

    Enumeration of minimal stoichiometric precursor sets in metabolic networks.
    Algorithms Mol Biol 2016 19;11:25. Epub 2016 Sep 19.
    Erable team, INRIA Grenoble Rhône-Alpes, 655 Avenue de l'Europe, 38330 Montbonnot Saint-Martin, France.
    Background: What an organism needs at least from its environment to produce a set of metabolites, e.g. target(s) of interest and/or biomass, has been called a minimal precursor set. Read More

    Using a constraint-based regression method for relative quantification of somatic mutations in pyrosequencing signals: a case for NRAS analysis.
    Algorithms Mol Biol 2016 15;11:24. Epub 2016 Sep 15.
    Institut de Recherche Expérimentale et Clinique (IREC), Center for Applied Molecular Technologies (CTMA), Université catholique de Louvain, Clos chapelle-aux-champs B1.30.24, 1200 Brussels, Belgium.
    Background: Pyrosequencing Allele Quantification (AQ) is a cost-effective DNA sequencing method that can be used for detecting somatic mutations in formalin-fixed paraffin-embedded (FFPE) samples. The method displays a low turnaround time and a high sensitivity. Pyrosequencing suffers however from two main drawbacks including (i) low specificity and (ii) difficult signal interpretation when multiple mutations are reported in a hotspot genomic region. Read More

    BiC2PAM: constraint-guided biclustering for biological data analysis with domain knowledge.
    Algorithms Mol Biol 2016 14;11:23. Epub 2016 Sep 14.
    INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal.
    Background: Biclustering has been largely used in biological data analysis, enabling the discovery of putative functional modules from omic and network data. Despite the recognized importance of incorporating domain knowledge to guide biclustering and guarantee a focus on relevant and non-trivial biclusters, this possibility has not yet been comprehensively addressed. This results from the fact that the majority of existing algorithms are only able to deliver sub-optimal solutions with restrictive assumptions on the structure, coherency and quality of biclustering solutions, thus preventing the up-front satisfaction of knowledge-driven constraints. Read More

    An improved Four-Russians method and sparsified Four-Russians algorithm for RNA folding.
    Algorithms Mol Biol 2016 5;11:22. Epub 2016 Aug 5.
    Department of Computer Science, UC Davis, One Shields Avenue, Davis, CA USA.
    Background: The basic RNA secondary structure prediction problem or single sequence folding problem (SSF) was solved 35 years ago by a now well-known [Formula: see text]-time dynamic programming method. Recently three methodologies-Valiant, Four-Russians, and Sparsification-have been applied to speedup RNA secondary structure prediction. The sparsification method exploits two properties of the input: the number of subsequence Z with the endpoints belonging to the optimal folding set and the maximum number base-pairs L. Read More

    A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.
    Algorithms Mol Biol 2016 18;11:20. Epub 2016 Jul 18.
    Institute of Theoretical Computer Science, Ulm University, James-Franck-Ring O27/537, 89069 Ulm, Germany.
    Background: Recently, Marcus et al. (Bioinformatics 30:3476-83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an [Formula: see text] time algorithm called splitMEM that constructs this graph directly (i. Read More

    A Bayesian inference method for the analysis of transcriptional regulatory networks in metagenomic data.
    Algorithms Mol Biol 2016 8;11:19. Epub 2016 Jul 8.
    Department of Biological Sciences, University of Maryland Baltimore County (UMBC), 1000 Hilltop Circle, Baltimore, MD 21250 USA.
    Background: Metagenomics enables the analysis of bacterial population composition and the study of emergent population features, such as shared metabolic pathways. Recently, we have shown that metagenomics datasets can be leveraged to characterize population-wide transcriptional regulatory networks, or meta-regulons, providing insights into how bacterial populations respond collectively to specific triggers. Here we formalize a Bayesian inference framework to analyze the composition of transcriptional regulatory networks in metagenomes by determining the probability of regulation of orthologous gene sequences. Read More

    An effective sequence-alignment-free superpositioning of pairwise or multiple structures with missing data.
    Algorithms Mol Biol 2016 21;11:18. Epub 2016 Jun 21.
    National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190 China.
    Background: Superpositioning is an important problem in structural biology. Determining an optimal superposition requires a one-to-one correspondence between the atoms of two proteins structures. However, in practice, some atoms are missing from their original structures. Read More

    An algorithm to parse segment packing in predicted protein contact maps.
    Algorithms Mol Biol 2016 18;11:17. Epub 2016 Jun 18.
    Francis Crick Institute, 1 Midland Rd, London, NW1 1AT UK.
    Background: The analysis of correlation in alignments generates a matrix of predicted contacts between positions in the structure and while these can arise for many reasons, the simplest explanation is that the pair of residues are in contact in a three-dimensional structure and are affecting each others selection pressure. To analyse these data, A dynamic programming algorithm was developed for parsing secondary structure interactions in predicted contact maps.

    Results: The non-local nature of the constraints required an iterated approach (using a "frozen approximation") but with good starting definitions, a single pass was usually sufficient. Read More

    Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features.
    Algorithms Mol Biol 2016 1;11:16. Epub 2016 Jun 1.
    Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India.
    Background: Identification of splice sites is essential for annotation of genes. Though existing approaches have achieved an acceptable level of accuracy, still there is a need for further improvement. Besides, most of the approaches are species-specific and hence it is required to develop approaches compatible across species. Read More

    Towards sub-quadratic time and space complexity solutions for the dated tree reconciliation problem.
    Algorithms Mol Biol 2016 21;11:15. Epub 2016 May 21.
    School of Information Technologies, University of Sydney, 1 Cleveland St, Sydney, 2006 NSW Australia ; School of Physical Sciences, University Of Tasmania, Hobart, 7005 Tasmania Australia.
    Background: Recent coevolutionary analysis has considered tree topology as a means to reduce the asymptotic complexity associated with inferring the complex coevolutionary interrelationships that arise between phylogenetic trees. Targeted algorithmic design for specific tree topologies has to date been highly successful, with one recent formulation providing a logarithmic space complexity reduction for the dated tree reconciliation problem.

    Methods: In this work we build on this prior analysis providing a further asymptotic space reduction, by providing a new formulation for the dynamic programming table used by a number of popular coevolutionary analysis techniques. Read More

    BicNET: Flexible module discovery in large-scale biological networks using biclustering.
    Algorithms Mol Biol 2016 20;11:14. Epub 2016 May 20.
    INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal.
    Background: Despite the recognized importance of module discovery in biological networks to enhance our understanding of complex biological systems, existing methods generally suffer from two major drawbacks. First, there is a focus on modules where biological entities are strongly connected, leading to the discovery of trivial/well-known modules and to the inaccurate exclusion of biological entities with subtler yet relevant roles. Second, there is a generalized intolerance towards different forms of noise, including uncertainty associated with less-studied biological entities (in the context of literature-driven networks) and experimental noise (in the context of data-driven networks). Read More

    Models and algorithms for genome rearrangement with positional constraints.
    Algorithms Mol Biol 2016 17;11:13. Epub 2016 May 17.
    McGill Centre for Bioinformatics and School of Computer Science, McGill University, Montréal, H3C2B4 Canada.
    Background: Traditionally, the merit of a rearrangement scenario between two gene orders has been measured based on a parsimony criteria alone; two scenarios with the same number of rearrangements are considered equally good. In this paper, we acknowledge that each rearrangement has a certain likelihood of occurring based on biological constraints, e.g. Read More

    Circular sequence comparison: algorithms and applications.
    Algorithms Mol Biol 2016 10;11:12. Epub 2016 May 10.
    Department of Informatics, King's College London, London, UK.
    Background: Sequence comparison is a fundamental step in many important tasks in bioinformatics; from phylogenetic reconstruction to the reconstruction of genomes. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular molecular structure is a common phenomenon in nature, a caveat of the adaptation of alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences. Read More

    An exact algorithm for finding cancer driver somatic genome alterations: the weighted mutually exclusive maximum set cover problem.
    Algorithms Mol Biol 2016 4;11:11. Epub 2016 May 4.
    Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206 USA.
    Background: The mutual exclusivity of somatic genome alterations (SGAs), such as somatic mutations and copy number alterations, is an important observation of tumors and is widely used to search for cancer signaling pathways or SGAs related to tumor development. However, one problem with current methods that use mutual exclusivity is that they are not signal-based; another problem is that they use heuristic algorithms to handle the NP-hard problems, which cannot guarantee to find the optimal solutions of their models.

    Method: In this study, we propose a novel signal-based method that utilizes the intrinsic relationship between SGAs on signaling pathways and expression changes of downstream genes regulated by pathways to identify cancer signaling pathways using the mutually exclusive property. Read More

    1 OF 7