413 results match your criteria Algorithms for Molecular Biology[Journal]

Heuristic shortest hyperpaths in cell signaling hypergraphs.

Algorithms Mol Biol 2022 May 26;17(1):12. Epub 2022 May 26.

Department of Computer Science, The University of Arizona, Tucson, Arizona, 85721, USA.

Background: Cell signaling pathways, which are a series of reactions that start at receptors and end at transcription factors, are basic to systems biology. Properly modeling the reactions in such pathways requires directed hypergraphs, where an edge is now directed between two sets of vertices. Inferring a pathway by the most parsimonious series of reactions corresponds to finding a shortest hyperpath in a directed hypergraph, which is NP-complete. Read More

View Article and Full-Text PDF

Embedding gene trees into phylogenetic networks by conflict resolution algorithms.

Algorithms Mol Biol 2022 May 19;17(1):11. Epub 2022 May 19.

University of Warsaw, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, 02-097, Warsaw, Poland.

Background: Phylogenetic networks are mathematical models of evolutionary processes involving reticulate events such as hybridization, recombination, or horizontal gene transfer. One of the crucial notions in phylogenetic network modelling is displayed tree, which is obtained from a network by removing a set of reticulation edges. Displayed trees may represent an evolutionary history of a gene family if the evolution is shaped by reticulation events. Read More

View Article and Full-Text PDF

Bi-alignments with affine gaps costs.

Algorithms Mol Biol 2022 May 16;17(1):10. Epub 2022 May 16.

AMIBio, Laboratoire d'Informatique de l'École Polytechnique (LIX), Institute Polytechnique de Paris (IP Paris), Batiment Turing, 1 rue d'Estienne d'Orves, 91120, Palaiseau, France.

Background: Commonly, sequence and structure elements are assumed to evolve congruently, such that homologous sequence positions correspond to homologous structural features. Assuming congruent evolution, alignments based on sequence and structure similarity can therefore optimize both similarities at the same time in a single alignment. To model incongruent evolution, where sequence and structural features diverge positionally, we recently introduced bi-alignments. Read More

View Article and Full-Text PDF

Efficient privacy-preserving variable-length substring match for genome sequence.

Algorithms Mol Biol 2022 Apr 26;17(1). Epub 2022 Apr 26.

Department of Computer Science and Engineering, Waseda University, Tokyo, Japan.

The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. Read More

View Article and Full-Text PDF

Tree diet: reducing the treewidth to unlock FPT algorithms in RNA bioinformatics.

Algorithms Mol Biol 2022 Apr 2;17(1). Epub 2022 Apr 2.

LIGM, CNRS, Univ Gustave Eiffel, 77454, Marne-la-Vallée, France.

Hard graph problems are ubiquitous in Bioinformatics, inspiring the design of specialized Fixed-Parameter Tractable algorithms, many of which rely on a combination of tree-decomposition and dynamic programming. The time/space complexities of such approaches hinge critically on low values for the treewidth tw of the input graph. In order to extend their scope of applicability, we introduce the TREE-DIET problem, i. Read More

View Article and Full-Text PDF

Adding hydrogen atoms to molecular models via fragment superimposition.

Algorithms Mol Biol 2022 Mar 29;17(1). Epub 2022 Mar 29.

Computational Biology and Simulation, Technical University of Darmstadt, Schnittspahnstraße 2, 64287, Darmstadt, Germany.

Background: Most experimentally determined structures of biomolecules lack annotated hydrogen positions due to their low electron density. However, thorough structure analysis and simulations require knowledge about the positions of hydrogen atoms. Existing methods for their prediction are either limited to a certain range of molecules or only work effectively on small compounds. Read More

View Article and Full-Text PDF

Perplexity: evaluating transcript abundance estimation in the absence of ground truth.

Algorithms Mol Biol 2022 Mar 25;17(1). Epub 2022 Mar 25.

Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA.

Background: There has been rapid development of probabilistic models and inference methods for transcript abundance estimation from RNA-seq data. These models aim to accurately estimate transcript-level abundances, to account for different biases in the measurement process, and even to assess uncertainty in resulting estimates that can be propagated to subsequent analyses. The assumed accuracy of the estimates inferred by such methods underpin gene expression based analysis routinely carried out in the lab. Read More

View Article and Full-Text PDF

Space-efficient representation of genomic k-mer count tables.

Algorithms Mol Biol 2022 Mar 21;17(1). Epub 2022 Mar 21.

LIGM, Université Gustave Eiffel, Marne-la-Vallée, France.

Motivation: k-mer counting is a common task in bioinformatic pipelines, with many dedicated tools available. Many of these tools produce in output k-mer count tables containing both k-mers and counts, easily reaching tens of GB. Furthermore, such tables do not support efficient random-access queries in general. Read More

View Article and Full-Text PDF

Fast characterization of segmental duplication structure in multiple genome assemblies.

Algorithms Mol Biol 2022 Mar 18;17(1). Epub 2022 Mar 18.

Department of Computer Science, University of Victoria, Victoria, BC, V8P 5C2, Canada.

Motivation: The increasing availability of high-quality genome assemblies raised interest in the characterization of genomic architecture. Major architectural elements, such as common repeats and segmental duplications (SDs), increase genome plasticity that stimulates further evolution by changing the genomic structure and inventing new genes. Optimal computation of SDs within a genome requires quadratic-time local alignment algorithms that are impractical due to the size of most genomes. Read More

View Article and Full-Text PDF

Parsimonious Clone Tree Integration in cancer.

Algorithms Mol Biol 2022 Mar 14;17(1). Epub 2022 Mar 14.

Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA.

Background: Every tumor is composed of heterogeneous clones, each corresponding to a distinct subpopulation of cells that accumulated different types of somatic mutations, ranging from single-nucleotide variants (SNVs) to copy-number aberrations (CNAs). As the analysis of this intra-tumor heterogeneity has important clinical applications, several computational methods have been introduced to identify clones from DNA sequencing data. However, due to technological and methodological limitations, current analyses are restricted to identifying tumor clones only based on either SNVs or CNAs, preventing a comprehensive characterization of a tumor's clonal composition. Read More

View Article and Full-Text PDF

Efficiently sparse listing of classes of optimal cophylogeny reconciliations.

Algorithms Mol Biol 2022 Feb 15;17(1). Epub 2022 Feb 15.

Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, 69622, Villeurbanne, France.

Background: Cophylogeny reconciliation is a powerful method for analyzing host-parasite (or host-symbiont) co-evolution. It models co-evolution as an optimization problem where the set of all optimal solutions may represent different biological scenarios which thus need to be analyzed separately. Despite the significant research done in the area, few approaches have addressed the problem of helping the biologist deal with the often huge space of optimal solutions. Read More

View Article and Full-Text PDF
February 2022

A new 1.375-approximation algorithm for sorting by transpositions.

Algorithms Mol Biol 2022 Jan 15;17(1). Epub 2022 Jan 15.

Departmento de Ciência da Computação, Universidade de Brasília, Brasília, Brazil.

Background: SORTING BY TRANSPOSITIONS (SBT) is a classical problem in genome rearrangements. In 2012, SBT was proven to be [Formula: see text]-hard and the best approximation algorithm with a 1.375 ratio was proposed in 2006 by Elias and Hartman (EH algorithm). Read More

View Article and Full-Text PDF
January 2022

An optimized FM-index library for nucleotide and amino acid search.

Algorithms Mol Biol 2021 Dec 31;16(1):25. Epub 2021 Dec 31.

Department of Computer Science, University of Montana, Missoula, MT, USA.

Background: Pattern matching is a key step in a variety of biological sequence analysis pipelines. The FM-index is a compressed data structure for pattern matching, with search run time that is independent of the length of the database text. Implementation of the FM-index is reasonably complicated, so that increased adoption will be aided by the availability of a fast and flexible FM-index library. Read More

View Article and Full-Text PDF
December 2021

An improved approximation algorithm for the reversal and transposition distance considering gene order and intergenic sizes.

Algorithms Mol Biol 2021 Dec 29;16(1):24. Epub 2021 Dec 29.

Institute of Computing, University of Campinas, 1251 Albert Einstein Ave., 13083-852, Campinas, Brazil.

Background: In the comparative genomics field, one of the goals is to estimate a sequence of genetic changes capable of transforming a genome into another. Genome rearrangement events are mutations that can alter the genetic content or the arrangement of elements from the genome. Reversal and transposition are two of the most studied genome rearrangement events. Read More

View Article and Full-Text PDF
December 2021

A simpler linear-time algorithm for the common refinement of rooted phylogenetic trees on a common leaf set.

Algorithms Mol Biol 2021 Dec 6;16(1):23. Epub 2021 Dec 6.

Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany.

Background: The supertree problem, i.e., the task of finding a common refinement of a set of rooted trees is an important topic in mathematical phylogenetics. Read More

View Article and Full-Text PDF
December 2021

Testing the agreement of trees with internal labels.

Algorithms Mol Biol 2021 Dec 4;16(1):22. Epub 2021 Dec 4.

Department of Computer Science, Iowa State University, Ames, IA, USA.

Background: A semi-labeled tree is a tree where all leaves as well as, possibly, some internal nodes are labeled with taxa. Semi-labeled trees encompass ordinary phylogenetic trees and taxonomies. Suppose we are given a collection [Formula: see text] of semi-labeled trees, called input trees, over partially overlapping sets of taxa. Read More

View Article and Full-Text PDF
December 2021

Approximation algorithm for rearrangement distances considering repeated genes and intergenic regions.

Algorithms Mol Biol 2021 Oct 13;16(1):21. Epub 2021 Oct 13.

Institute of Computing, University of Campinas, Campinas, Brazil.

The rearrangement distance is a method to compare genomes of different species. Such distance is the number of rearrangement events necessary to transform one genome into another. Two commonly studied events are the transposition, which exchanges two consecutive blocks of the genome, and the reversal, which reverts a block of the genome. Read More

View Article and Full-Text PDF
October 2021

DeepGRP: engineering a software tool for predicting genomic repetitive elements using Recurrent Neural Networks with attention.

Algorithms Mol Biol 2021 Aug 23;16(1):20. Epub 2021 Aug 23.

ZBH - Center for Bioinformatics, MIN-Fakultät, Universität Hamburg, Bundesstrasse 43, 20146, Hamburg, Germany.

Background: Repetitive elements contribute a large part of eukaryotic genomes. For example, about 40 to 50% of human, mouse and rat genomes are repetitive. So identifying and classifying repeats is an important step in genome annotation. Read More

View Article and Full-Text PDF

Heuristic algorithms for best match graph editing.

Algorithms Mol Biol 2021 Aug 17;16(1):19. Epub 2021 Aug 17.

Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04109 Leipzig, Leipzig, Germany.

Background: Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Read More

View Article and Full-Text PDF

A novel method for inference of acyclic chemical compounds with bounded branch-height based on artificial neural networks and integer programming.

Algorithms Mol Biol 2021 Aug 14;16(1):18. Epub 2021 Aug 14.

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, 611-0011, Japan.

Analysis of chemical graphs is becoming a major research topic in computational molecular biology due to its potential applications to drug design. One of the major approaches in such a study is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a novel two-phase framework has been proposed for inverse QSAR/QSPR, where in the first phase an artificial neural network (ANN) is used to construct a prediction function. Read More

View Article and Full-Text PDF

INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis.

Algorithms Mol Biol 2021 Aug 10;16(1):17. Epub 2021 Aug 10.

Department of Infectious Disease Epidemiology, Imperial College, London, UK.

Motivation: Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. Read More

View Article and Full-Text PDF

Approximate search for known gene clusters in new genomes using PQ-trees.

Algorithms Mol Biol 2021 Jul 9;16(1):16. Epub 2021 Jul 9.

Department of Computer Science, Ben Gurion University of the Negev, Be'er Sheva, Israel.

Gene clusters are groups of genes that are co-locally conserved across various genomes, not necessarily in the same order. Their discovery and analysis is valuable in tasks such as gene annotation and prediction of gene interactions, and in the study of genome organization and evolution. The discovery of conserved gene clusters in a given set of genomes is a well studied problem, but with the rapid sequencing of prokaryotic genomes a new problem is inspired. Read More

View Article and Full-Text PDF

Shape decomposition algorithms for laser capture microdissection.

Algorithms Mol Biol 2021 Jul 8;16(1):15. Epub 2021 Jul 8.

Bioinformatics Group, Faculty of Biology and Biotechnology, Ruhr University Bochum, Bochum, Germany.

Background: In the context of biomarker discovery and molecular characterization of diseases, laser capture microdissection is a highly effective approach to extract disease-specific regions from complex, heterogeneous tissue samples. For the extraction to be successful, these regions have to satisfy certain constraints in size and shape and thus have to be decomposed into feasible fragments.

Results: We model this problem of constrained shape decomposition as the computation of optimal feasible decompositions of simple polygons. Read More

View Article and Full-Text PDF

Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors.

Algorithms Mol Biol 2021 Jul 6;16(1):14. Epub 2021 Jul 6.

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.

Background: Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor's evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. Read More

View Article and Full-Text PDF

Bayesian optimization with evolutionary and structure-based regularization for directed protein evolution.

Algorithms Mol Biol 2021 Jul 1;16(1):13. Epub 2021 Jul 1.

Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA.

Background: Directed evolution (DE) is a technique for protein engineering that involves iterative rounds of mutagenesis and screening to search for sequences that optimize a given property, such as binding affinity to a specified target. Unfortunately, the underlying optimization problem is under-determined, and so mutations introduced to improve the specified property may come at the expense of unmeasured, but nevertheless important properties (ex. solubility, thermostability, etc). Read More

View Article and Full-Text PDF

Using Robinson-Foulds supertrees in divide-and-conquer phylogeny estimation.

Algorithms Mol Biol 2021 Jun 28;16(1):12. Epub 2021 Jun 28.

Computer Science Department, University of Illinois at Urbana-Champaign, Urbana, USA.

One of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a "supertree method". Read More

View Article and Full-Text PDF

Using the longest run subsequence problem within homology-based scaffolding.

Algorithms Mol Biol 2021 Jun 28;16(1):11. Epub 2021 Jun 28.

Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.

Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. Read More

View Article and Full-Text PDF

Disk compression of k-mer sets.

Algorithms Mol Biol 2021 Jun 21;16(1):10. Epub 2021 Jun 21.

Penn State University, State College, PA, USA.

K-mer based methods have become prevalent in many areas of bioinformatics. In applications such as database search, they often work with large multi-terabyte-sized datasets. Storing such large datasets is a detriment to tool developers, tool users, and reproducibility efforts. Read More

View Article and Full-Text PDF

The Bourque distances for mutation trees of cancers.

Algorithms Mol Biol 2021 Jun 10;16(1). Epub 2021 Jun 10.

Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore, Singapore.

Background: Mutation trees are rooted trees in which nodes are of arbitrary degree and labeled with a mutation set. These trees, also referred to as clonal trees, are used in computational oncology to represent the mutational history of tumours. Classical tree metrics such as the popular Robinson-Foulds distance are of limited use for the comparison of mutation trees. Read More

View Article and Full-Text PDF

LazyB: fast and cheap genome assembly.

Algorithms Mol Biol 2021 Jun 1;16(1). Epub 2021 Jun 1.

Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá, D.C, Colombia.

Background: Advances in genome sequencing over the last years have lead to a fundamental paradigm shift in the field. With steadily decreasing sequencing costs, genome projects are no longer limited by the cost of raw sequencing data, but rather by computational problems associated with genome assembly. There is an urgent demand for more efficient and and more accurate methods is particular with regard to the highly complex and often very large genomes of animals and plants. Read More

View Article and Full-Text PDF