Search our Database of Scientific Publications and Authors

I’m looking for a

    313 results match your criteria Algorithms for Molecular Biology [Journal]

    1 OF 7

    HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.
    Algorithms Mol Biol 2017 29;12:25. Epub 2017 Sep 29.
    School of Computer Science and Technology, Tianjin University, Tianjin, China.
    Background: Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types.

    Methods: Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e. Read More

    Algorithms for matching partially labelled sequence graphs.
    Algorithms Mol Biol 2017 25;12:24. Epub 2017 Sep 25.
    Francis Crick Institute, 1 Midland Road, London, NW1 1AT UK.
    Background: In order to find correlated pairs of positions between proteins, which are useful in predicting interactions, it is necessary to concatenate two large multiple sequence alignments such that the sequences that are joined together belong to those that interact in their species of origin. When each protein is unique then the species name is sufficient to guide this match, however, when there are multiple related sequences (paralogs) in each species then the pairing is more difficult. In bacteria a good guide can be gained from genome co-location as interacting proteins tend to be in a common operon but in eukaryotes this simple principle is not sufficient. Read More

    Biologically feasible gene trees, reconciliation maps and informative triples.
    Algorithms Mol Biol 2017 29;12:23. Epub 2017 Aug 29.
    Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Strasse 47, 17487 Greifswald, Germany.
    Background: The history of gene families-which are equivalent to event-labeled gene trees-can be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are biologically feasible, that is, if there is a possible true history that would explain a given gene tree. In practice, this problem is boiled down to finding a reconciliation map-also known as DTL-scenario-between the event-labeled gene trees and a (possibly unknown) species tree. Read More

    Partially local three-way alignments and the sequence signatures of mitochondrial genome rearrangements.
    Algorithms Mol Biol 2017 23;12:22. Epub 2017 Aug 23.
    Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.
    Background: Genomic DNA frequently undergoes rearrangement of the gene order that can be localized by comparing the two DNA sequences. In mitochondrial genomes different mechanisms are likely at work, at least some of which involve the duplication of sequence around the location of the apparent breakpoints. We hypothesize that these different mechanisms of genome rearrangement leave distinctive sequence footprints. Read More

    A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification.
    Algorithms Mol Biol 2017 18;12:21. Epub 2017 Aug 18.
    Genome Informatics, Institute of Human Genetics, University of Duisburg-Essen, University Hospital Essen, Hufelandstr. 55, 45147 Essen, Germany.
    Background: Mixtures of beta distributions are a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1.

    Methods: While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Read More

    ASP-based method for the enumeration of attractors in non-deterministic synchronous and asynchronous multi-valued networks.
    Algorithms Mol Biol 2017 15;12:20. Epub 2017 Aug 15.
    École Centrale de Nantes, LS2N UMR CNRS 6004, 1 rue de la Noë, 44321 Nantes, France.
    Background: This paper addresses the problem of finding attractors in biological regulatory networks. We focus here on non-deterministic synchronous and asynchronous multi-valued networks, modeled using automata networks (AN). AN is a general and well-suited formalism to study complex interactions between different components (genes, proteins,. Read More

    Identification of bifurcation transitions in biological regulatory networks using Answer-Set Programming.
    Algorithms Mol Biol 2017 20;12:19. Epub 2017 Jul 20.
    LRI UMR 8623, Univ. Paris-Sud-CNRS, Université Paris-Saclay, 91405 Orsay, France.
    Background: Numerous cellular differentiation processes can be captured using discrete qualitative models of biological regulatory networks. These models describe the temporal evolution of the state of the network subject to different competing transitions, potentially leading the system to different attractors. This paper focusses on the formal identification of states and transitions that are crucial for preserving or pre-empting the reachability of a given behaviour. Read More

    A graph extension of the positional Burrows-Wheeler transform and its applications.
    Algorithms Mol Biol 2017 11;12:18. Epub 2017 Jul 11.
    Genomics Institute, University of California Santa Cruz, CBSE, 501C Engineering 2, MS: CBSE, 1156 High St., Santa Cruz, CA 95064 USA.
    We present a generalization of the positional Burrows-Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. Read More

    Isometric gene tree reconciliation revisited.
    Algorithms Mol Biol 2017 13;12:17. Epub 2017 Jun 13.
    Faculty of Mathematics, Physics, and Informatics, Comenius University, Mlynská dolina, 842 48 Bratislava, Slovakia.
    Background: Isometric gene tree reconciliation is a gene tree/species tree reconciliation problem where both the gene tree and the species tree include branch lengths, and these branch lengths must be respected by the reconciliation. The problem was introduced by Ma et al. in 2008 in the context of reconstructing evolutionary histories of genomes in the infinite sites model. Read More

    A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses.
    Algorithms Mol Biol 2017 11;12:12. Epub 2017 May 11.
    Department of Computer Science, ETH Zurich, Universitätstrasse 6, 8092 Zurich, Switzerland.
    Background: Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the idealde novopeptide sequencing problem: Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. Read More

    Algorithms for computing the double cut and join distance on both gene order and intergenic sizes.
    Algorithms Mol Biol 2017 5;12:16. Epub 2017 Jun 5.
    Institut National de Recherche en Informatique et en Automatique (Inria) Grenoble Rhône-Alpes, 655 avenue de l'Europe, 38330 Montbonnot-Saint-Martin, France.
    Background: Combinatorial works on genome rearrangements have so far ignored the influence of intergene sizes, i.e. the number of nucleotides between consecutive genes, although it was recently shown decisive for the accuracy of inference methods (Biller et al. Read More

    StreAM-[Formula: see text]: algorithms for analyzing coarse grained RNA dynamics based on Markov models of connectivity-graphs.
    Algorithms Mol Biol 2017 30;12:15. Epub 2017 May 30.
    Department of Biology, Department of Computer Science, Department of Physics, TU Darmstadt, Schnittspahnstr. 2, 64283 Darmstadt, Germany.
    Background: In this work, we present a new coarse grained representation of RNA dynamics. It is based on adjacency matrices and their interactions patterns obtained from molecular dynamics simulations. RNA molecules are well-suited for this representation due to their composition which is mainly modular and assessable by the secondary structure alone. Read More

    The gene family-free median of three.
    Algorithms Mol Biol 2017 26;12:14. Epub 2017 May 26.
    Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6 Canada.
    Background: The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes.

    Methods: We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. Read More

    Complexity and algorithms for copy-number evolution problems.
    Algorithms Mol Biol 2017 16;12:13. Epub 2017 May 16.
    School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
    Background: Cancer is an evolutionary process characterized by the accumulation of somatic mutations in a population of cells that form a tumor. One frequent type of mutations is copy number aberrations, which alter the number of copies of genomic regions. The number of copies of each position along a chromosome constitutes the chromosome's copy-number profile. Read More

    Core column prediction for protein multiple sequence alignments.
    Algorithms Mol Biol 2017 19;12:11. Epub 2017 Apr 19.
    Department of Computer Science, The University of Arizona, Tucson, AZ 85721 USA.
    Background: In a computed protein multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. Read More

    Aligning coding sequences with frameshift extension penalties.
    Algorithms Mol Biol 2017 31;12:10. Epub 2017 Mar 31.
    Département d'informatique, Faculté des Sciences, Université de Sherbrooke, Sherbrooke, QC J1K2R1 Canada.
    Background: Frameshift translation is an important phenomenon that contributes to the appearance of novel coding DNA sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of gene coding regions. Frameshift translations can be identified by aligning two CDS, from a same gene or from homologous genes, while accounting for their codon structure. Two main classes of algorithms have been proposed to solve the problem of aligning CDS, either by amino acid sequence alignment back-translation, or by simultaneously accounting for the nucleotide and amino acid levels. Read More

    Gerbil: a fast and memory-efficient k-mer counter with GPU-support.
    Algorithms Mol Biol 2017 31;12. Epub 2017 Mar 31.
    Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.
    Background: A basic task in bioinformatics is the counting of k-mers in genome sequences. Existing k-mer counting tools are most often optimized for small k < 32 and suffer from excessive memory resource consumption or degrading performance for large k. However, given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important. Read More

    The feasibility of genome-scale biological network inference using Graphics Processing Units.
    Algorithms Mol Biol 2017 20;12. Epub 2017 Mar 20.
    Department of Molecular and Integrative Physiology, University of Michigan, North Campus Research Complex, Ann Arbor, MI USA.
    Systems research spanning fields from biology to finance involves the identification of models to represent the underpinnings of complex systems. Formal approaches for data-driven identification of network interactions include statistical inference-based approaches and methods to identify dynamical systems models that are capable of fitting multivariate data. Availability of large data sets and so-called 'big data' applications in biology present great opportunities as well as major challenges for systems identification/reverse engineering applications. Read More

    An efficient algorithm for testing the compatibility of phylogenies with nested taxa.
    Algorithms Mol Biol 2017 16;12. Epub 2017 Mar 16.
    Department of Computer Science, Iowa State University, Atanasoff Hall, Ames, IA USA.
    Background: Semi-labeled trees generalize ordinary phylogenetic trees, allowing internal nodes to be labeled by higher-order taxa. Taxonomies are examples of semi-labeled trees. Suppose we are given collection [Formula: see text] of semi-labeled trees over various subsets of a set of taxa. Read More

    On the computational complexity of the maximum parsimony reconciliation problem in the duplication-loss-coalescence model.
    Algorithms Mol Biol 2017 14;12. Epub 2017 Mar 14.
    Department of Computer Science, Harvey Mudd College, Claremont, USA.
    Background: Phylogenetic tree reconciliation is a widely-used method for inferring the evolutionary histories of genes and species. In the duplication-loss-coalescence (DLC) model, we seek a reconciliation that explains the incongruence between a gene and species tree using gene duplication, loss, and deep coalescence events. In the maximum parsimony framework, costs are associated with these event types and a reconciliation is sought that minimizes the total cost of the events required to map the gene tree onto the species tree. Read More

    On avoided words, absent words, and their application to biological sequence analysis.
    Algorithms Mol Biol 2017 14;12. Epub 2017 Mar 14.
    Computational Regulatory Genomics, MRC Clinical Sciences Centre (CSC), Du Cane Road, London, W12 0NN UK.
    Background: The deviation of the observed frequency of a word w from its expected frequency in a given sequence x is used to determine whether or not the word is avoided. This concept is particularly useful in DNA linguistic analysis. The value of the deviation of w, denoted by [Formula: see text], effectively characterises the extent of a word by its edge contrast in the context in which it occurs. Read More

    Approximating the correction of weighted and unweighted orthology and paralogy relations.
    Algorithms Mol Biol 2017 11;12. Epub 2017 Mar 11.
    Département d'informatique et de recherche opérationnelle, Université de Montréal, Quebec, Canada.
    Background: Given a gene family, the relations between genes (orthology/paralogy), are represented by a relation graph, where edges connect pairs of orthologous genes and "missing" edges represent paralogs. While a gene tree directly induces a relation graph, the converse is not always true. Indeed, a relation graph is not necessarily "satisfiable", i. Read More

    Approximating the DCJ distance of balanced genomes in linear time.
    Algorithms Mol Biol 2017 9;12. Epub 2017 Mar 9.
    Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, MS Brazil.
    Background: Rearrangements are large-scale mutations in genomes, responsible for complex changes and structural variations. Most rearrangements that modify the organization of a genome can be represented by the double cut and join (DCJ) operation. Given two balanced genomes, i. Read More

    Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds.
    Algorithms Mol Biol 2017 14;12. Epub 2017 Feb 14.
    CRIStAL (UMR 9189 Lille University/CNRS)-Inria Lille, Bat M3 ext, Université Lille 1, 59655 Villeneuve d'Ascq, France.
    Background: Spaced seeds, also named gapped q-grams, gapped k-mers, spaced q-grams, have been proven to be more sensitive than contiguous seeds (contiguous q-grams, contiguous k-mers) in nucleic and amino-acid sequences analysis. Initially proposed to detect sequence similarities and to anchor sequence alignments, spaced seeds have more recently been applied in several alignment-free related methods. Unfortunately, spaced seeds need to be initially designed. Read More

    Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads.
    Algorithms Mol Biol 2017 22;12. Epub 2017 Feb 22.
    Inria Grenoble, 655, Avenue de l'Europe, 38334 Montbonnot, France.
    Background: The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Read More

    An efficient algorithm for protein structure comparison using elastic shape analysis.
    Algorithms Mol Biol 2016 29;11:27. Epub 2016 Sep 29.
    ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India ; Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi, 110012 India.
    Background: Protein structure comparison play important role in in silico functional prediction of a new protein. It is also used for understanding the evolutionary relationships among proteins. A variety of methods have been proposed in literature for comparing protein structures but they have their own limitations in terms of accuracy and complexity with respect to computational time and space. Read More

    Analysis of gene copy number changes in tumor phylogenetics.
    Algorithms Mol Biol 2016 22;11:26. Epub 2016 Sep 22.
    Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208 USA.
    Backgound: Evolution of cancer cells is characterized by large scale and rapid changes in the chromosomal  landscape. The fluorescence in situ hybridization (FISH) technique provides a way to measure the copy numbers of preselected genes in a group of cells and has been found to be a reliable source of data to model the evolution of tumor cells. Chowdhury et al. Read More

    Enumeration of minimal stoichiometric precursor sets in metabolic networks.
    Algorithms Mol Biol 2016 19;11:25. Epub 2016 Sep 19.
    Erable team, INRIA Grenoble Rhône-Alpes, 655 Avenue de l'Europe, 38330 Montbonnot Saint-Martin, France.
    Background: What an organism needs at least from its environment to produce a set of metabolites, e.g. target(s) of interest and/or biomass, has been called a minimal precursor set. Read More

    Using a constraint-based regression method for relative quantification of somatic mutations in pyrosequencing signals: a case for NRAS analysis.
    Algorithms Mol Biol 2016 15;11:24. Epub 2016 Sep 15.
    Institut de Recherche Expérimentale et Clinique (IREC), Center for Applied Molecular Technologies (CTMA), Université catholique de Louvain, Clos chapelle-aux-champs B1.30.24, 1200 Brussels, Belgium.
    Background: Pyrosequencing Allele Quantification (AQ) is a cost-effective DNA sequencing method that can be used for detecting somatic mutations in formalin-fixed paraffin-embedded (FFPE) samples. The method displays a low turnaround time and a high sensitivity. Pyrosequencing suffers however from two main drawbacks including (i) low specificity and (ii) difficult signal interpretation when multiple mutations are reported in a hotspot genomic region. Read More

    BiC2PAM: constraint-guided biclustering for biological data analysis with domain knowledge.
    Algorithms Mol Biol 2016 14;11:23. Epub 2016 Sep 14.
    INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal.
    Background: Biclustering has been largely used in biological data analysis, enabling the discovery of putative functional modules from omic and network data. Despite the recognized importance of incorporating domain knowledge to guide biclustering and guarantee a focus on relevant and non-trivial biclusters, this possibility has not yet been comprehensively addressed. This results from the fact that the majority of existing algorithms are only able to deliver sub-optimal solutions with restrictive assumptions on the structure, coherency and quality of biclustering solutions, thus preventing the up-front satisfaction of knowledge-driven constraints. Read More

    An improved Four-Russians method and sparsified Four-Russians algorithm for RNA folding.
    Algorithms Mol Biol 2016 5;11:22. Epub 2016 Aug 5.
    Department of Computer Science, UC Davis, One Shields Avenue, Davis, CA USA.
    Background: The basic RNA secondary structure prediction problem or single sequence folding problem (SSF) was solved 35 years ago by a now well-known [Formula: see text]-time dynamic programming method. Recently three methodologies-Valiant, Four-Russians, and Sparsification-have been applied to speedup RNA secondary structure prediction. The sparsification method exploits two properties of the input: the number of subsequence Z with the endpoints belonging to the optimal folding set and the maximum number base-pairs L. Read More

    A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.
    Algorithms Mol Biol 2016 18;11:20. Epub 2016 Jul 18.
    Institute of Theoretical Computer Science, Ulm University, James-Franck-Ring O27/537, 89069 Ulm, Germany.
    Background: Recently, Marcus et al. (Bioinformatics 30:3476-83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an [Formula: see text] time algorithm called splitMEM that constructs this graph directly (i. Read More

    A Bayesian inference method for the analysis of transcriptional regulatory networks in metagenomic data.
    Algorithms Mol Biol 2016 8;11:19. Epub 2016 Jul 8.
    Department of Biological Sciences, University of Maryland Baltimore County (UMBC), 1000 Hilltop Circle, Baltimore, MD 21250 USA.
    Background: Metagenomics enables the analysis of bacterial population composition and the study of emergent population features, such as shared metabolic pathways. Recently, we have shown that metagenomics datasets can be leveraged to characterize population-wide transcriptional regulatory networks, or meta-regulons, providing insights into how bacterial populations respond collectively to specific triggers. Here we formalize a Bayesian inference framework to analyze the composition of transcriptional regulatory networks in metagenomes by determining the probability of regulation of orthologous gene sequences. Read More

    An effective sequence-alignment-free superpositioning of pairwise or multiple structures with missing data.
    Algorithms Mol Biol 2016 21;11:18. Epub 2016 Jun 21.
    National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190 China.
    Background: Superpositioning is an important problem in structural biology. Determining an optimal superposition requires a one-to-one correspondence between the atoms of two proteins structures. However, in practice, some atoms are missing from their original structures. Read More

    An algorithm to parse segment packing in predicted protein contact maps.
    Algorithms Mol Biol 2016 18;11:17. Epub 2016 Jun 18.
    Francis Crick Institute, 1 Midland Rd, London, NW1 1AT UK.
    Background: The analysis of correlation in alignments generates a matrix of predicted contacts between positions in the structure and while these can arise for many reasons, the simplest explanation is that the pair of residues are in contact in a three-dimensional structure and are affecting each others selection pressure. To analyse these data, A dynamic programming algorithm was developed for parsing secondary structure interactions in predicted contact maps.

    Results: The non-local nature of the constraints required an iterated approach (using a "frozen approximation") but with good starting definitions, a single pass was usually sufficient. Read More

    Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features.
    Algorithms Mol Biol 2016 1;11:16. Epub 2016 Jun 1.
    Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India.
    Background: Identification of splice sites is essential for annotation of genes. Though existing approaches have achieved an acceptable level of accuracy, still there is a need for further improvement. Besides, most of the approaches are species-specific and hence it is required to develop approaches compatible across species. Read More

    Towards sub-quadratic time and space complexity solutions for the dated tree reconciliation problem.
    Algorithms Mol Biol 2016 21;11:15. Epub 2016 May 21.
    School of Information Technologies, University of Sydney, 1 Cleveland St, Sydney, 2006 NSW Australia ; School of Physical Sciences, University Of Tasmania, Hobart, 7005 Tasmania Australia.
    Background: Recent coevolutionary analysis has considered tree topology as a means to reduce the asymptotic complexity associated with inferring the complex coevolutionary interrelationships that arise between phylogenetic trees. Targeted algorithmic design for specific tree topologies has to date been highly successful, with one recent formulation providing a logarithmic space complexity reduction for the dated tree reconciliation problem.

    Methods: In this work we build on this prior analysis providing a further asymptotic space reduction, by providing a new formulation for the dynamic programming table used by a number of popular coevolutionary analysis techniques. Read More

    BicNET: Flexible module discovery in large-scale biological networks using biclustering.
    Algorithms Mol Biol 2016 20;11:14. Epub 2016 May 20.
    INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal.
    Background: Despite the recognized importance of module discovery in biological networks to enhance our understanding of complex biological systems, existing methods generally suffer from two major drawbacks. First, there is a focus on modules where biological entities are strongly connected, leading to the discovery of trivial/well-known modules and to the inaccurate exclusion of biological entities with subtler yet relevant roles. Second, there is a generalized intolerance towards different forms of noise, including uncertainty associated with less-studied biological entities (in the context of literature-driven networks) and experimental noise (in the context of data-driven networks). Read More

    Models and algorithms for genome rearrangement with positional constraints.
    Algorithms Mol Biol 2016 17;11:13. Epub 2016 May 17.
    McGill Centre for Bioinformatics and School of Computer Science, McGill University, Montréal, H3C2B4 Canada.
    Background: Traditionally, the merit of a rearrangement scenario between two gene orders has been measured based on a parsimony criteria alone; two scenarios with the same number of rearrangements are considered equally good. In this paper, we acknowledge that each rearrangement has a certain likelihood of occurring based on biological constraints, e.g. Read More

    Circular sequence comparison: algorithms and applications.
    Algorithms Mol Biol 2016 10;11:12. Epub 2016 May 10.
    Department of Informatics, King's College London, London, UK.
    Background: Sequence comparison is a fundamental step in many important tasks in bioinformatics; from phylogenetic reconstruction to the reconstruction of genomes. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular molecular structure is a common phenomenon in nature, a caveat of the adaptation of alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences. Read More

    An exact algorithm for finding cancer driver somatic genome alterations: the weighted mutually exclusive maximum set cover problem.
    Algorithms Mol Biol 2016 4;11:11. Epub 2016 May 4.
    Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206 USA.
    Background: The mutual exclusivity of somatic genome alterations (SGAs), such as somatic mutations and copy number alterations, is an important observation of tumors and is widely used to search for cancer signaling pathways or SGAs related to tumor development. However, one problem with current methods that use mutual exclusivity is that they are not signal-based; another problem is that they use heuristic algorithms to handle the NP-hard problems, which cannot guarantee to find the optimal solutions of their models.

    Method: In this study, we propose a novel signal-based method that utilizes the intrinsic relationship between SGAs on signaling pathways and expression changes of downstream genes regulated by pathways to identify cancer signaling pathways using the mutually exclusive property. Read More

    Jabba: hybrid error correction for long sequencing reads.
    Algorithms Mol Biol 2016 3;11:10. Epub 2016 May 3.
    Department of Information Technology, Ghent University - iMinds, Ghent, Belgium ; Bioinformatics Institute Ghent, Ghent, Belgium.
    Background: Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Read More

    Bitpacking techniques for indexing genomes: II. Enhanced suffix arrays.
    Algorithms Mol Biol 2016 23;11. Epub 2016 Apr 23.
    Department of Bioinformatics and Computational Biology, Genentech, Inc., 1 DNA Way, South San Francisco, CA 94080 USA.
    Background: Suffix arrays and their variants are used widely for representing genomes in search applications. Enhanced suffix arrays (ESAs) provide fast search speed, but require large auxiliary data structures for storing longest common prefix and child interval information. We explore techniques for compressing ESAs to accelerate genomic search and reduce memory requirements. Read More

    RNA folding with hard and soft constraints.
    Algorithms Mol Biol 2016 23;11. Epub 2016 Apr 23.
    Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, 1090 Vienna, Austria ; Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg, Denmark ; Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany ; Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany ; Fraunhofer Institut for Cell Therapy and Immunology, Perlickstraße 1, 04103 Leipzig, Germany ; Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM87501 USA.
    Background: A large class of RNA secondary structure prediction programs uses an elaborate energy model grounded in extensive thermodynamic measurements and exact dynamic programming algorithms. External experimental evidence can be in principle be incorporated by means of hard constraints that restrict the search space or by means of soft constraints that distort the energy model. In particular recent advances in coupling chemical and enzymatic probing with sequencing techniques but also comparative approaches provide an increasing amount of experimental data to be combined with secondary structure prediction. Read More

    Sparse RNA folding revisited: space-efficient minimum free energy structure prediction.
    Algorithms Mol Biol 2016 23;11. Epub 2016 Apr 23.
    Ingenuity Lab, 11421 Saskatchewan Drive NW, Edmonton, Canada ; National Institute for Nanotechnology, 11421 Saskatchewan Drive NW, Edmonton, Canada ; Department of Chemical and Materials Engineering, University of Alberta, Edmonton, Canada.
    Background: RNA secondary structure prediction by energy minimization is the central computational tool for the analysis of structural non-coding RNAs and their interactions. Sparsification has been successfully applied to improve the time efficiency of various structure prediction algorithms while guaranteeing the same result; however, for many such folding problems, space efficiency is of even greater concern, particularly for long RNA sequences. So far, space-efficient sparsified RNA folding with fold reconstruction was solved only for simple base-pair-based pseudo-energy models. Read More

    MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics.
    Algorithms Mol Biol 2016 21;11. Epub 2016 Apr 21.
    Department of Information Engineering, University of Padova, via Gradenigo 6/a, 35131 Padova, Italy.
    Background: Measuring sequence similarity is central for many problems in bioinformatics. In several contexts alignment-free techniques based on exact occurrences of substrings are faster, but also less accurate, than alignment-based approaches. Recently, several studies attempted to bridge the accuracy gap with the introduction of approximate matches in the definition of composition-based similarity measures. Read More

    Bitpacking techniques for indexing genomes: I. Hash tables.
    Algorithms Mol Biol 2016 18;11. Epub 2016 Apr 18.
    Department of Bioinformatics and Computational Biology, Genentech, Inc., 1 DNA Way, 94080, South San Francisco, CA 94080 USA.
    Background: Hash tables constitute a widely used data structure for indexing genomes that provides a list of genomic positions for each possible oligomer of a given size. The offset array in a hash table grows exponentially with the oligomer size and precludes the use of larger oligomers that could facilitate rapid alignment of sequences to a genome.

    Results: We propose to compress the offset array using vectorized bitpacking. Read More

    The link between orthology relations and gene trees: a correction perspective.
    Algorithms Mol Biol 2016 16;11. Epub 2016 Apr 16.
    Département d'informatique et de recherche opérationnelle, Université de Montréal, Montreal, QC Canada.
    Background: While tree-oriented methods for inferring orthology and paralogy relations between genes are based on reconciling a gene tree with a species tree, many tree-free methods are also available (usually based on sequence similarity). Recently, the link between orthology relations and gene trees has been formally considered from the perspective of reconstructing phylogenies from orthology relations. In this paper, we consider this link from a correction point of view. Read More

    1 OF 7