343 results match your criteria Algorithms for Molecular Biology [Journal]


An average-case sublinear forward algorithm for the haploid Li and Stephens model.

Algorithms Mol Biol 2019 2;14:11. Epub 2019 Apr 2.

2NYU School of Medicine, 550 First Ave, New York, NY 10016 USA.

Background: Hidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithm as long as the representative reference panel used in the model is sufficiently small. Specifically, the monoploid Li and Stephens model and its variants are linear in reference panel size unless heuristic approximations are used. However, sequencing projects numbering in the thousands to hundreds of thousands of individuals are underway, and others numbering in the millions are anticipated. Read More

View Article

Download full-text PDF

Source
https://almob.biomedcentral.com/articles/10.1186/s13015-019-
Publisher Site
http://dx.doi.org/10.1186/s13015-019-0144-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6446408PMC
April 2019
1 Read

Differentially mutated subnetworks discovery.

Algorithms Mol Biol 2019 30;14:10. Epub 2019 Mar 30.

3Department of Information Engineering, University of Padova, Padova, Italy.

Problem: We study the problem of identifying differentially mutated subnetworks of a large gene-gene interaction network, that is, subnetworks that display a significant difference in mutation frequency in two sets of cancer samples. We formally define the associated computational problem and show that the problem is NP-hard.

Algorithm: We propose a novel and efficient algorithm, called DAMOKLE, to identify differentially mutated subnetworks given genome-wide mutation data for two sets of cancer samples. Read More

View Article

Download full-text PDF

Source
https://almob.biomedcentral.com/articles/10.1186/s13015-019-
Publisher Site
http://dx.doi.org/10.1186/s13015-019-0146-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6441493PMC
March 2019
1 Read

Repairing Boolean logical models from time-series data using Answer Set Programming.

Algorithms Mol Biol 2019 25;14. Epub 2019 Mar 25.

INESC-ID/Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol 9, 1000-029 Lisbon, Portugal.

Background: Boolean models of biological signalling-regulatory networks are increasingly used to formally describe and understand complex biological processes. These models may become inconsistent as new data become available and need to be repaired. In the past, the focus has been shed on the inference of (classes of) models given an interaction network and time-series data sets. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0145-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6434824PMC

Kermit: linkage map guided long read assembly.

Algorithms Mol Biol 2019 20;14. Epub 2019 Mar 20.

1Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.

Background : With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0143-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6425630PMC
March 2019
1 Read

Reconciling multiple genes trees via segmental duplications and losses.

Algorithms Mol Biol 2019 20;14. Epub 2019 Mar 20.

3ISEM, CNRS, IRD, EPHE, Universit de Montpellier, Montpellier, France.

Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0139-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6425616PMC

External memory BWT and LCP computation for sequence collections with applications.

Algorithms Mol Biol 2019 8;14. Epub 2019 Mar 8.

4Institute of Computing, University of Campinas, Av. Albert Einstein, 1251, 13083-852 Campinas, Brazil.

Background: Sequencing technologies produce larger and larger collections of biosequences that have to be stored in compressed indices supporting fast search operations. Many compressed indices are based on the Burrows-Wheeler Transform (BWT) and the longest common prefix (LCP) array. Because of the sheer size of the input it is important to build these data structures in external memory and time using in the best possible way the available RAM. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0140-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6408864PMC
March 2019
1 Read

Connectivity problems on heterogeneous graphs.

Algorithms Mol Biol 2019 8;14. Epub 2019 Mar 8.

2Department of Electrical Engineering and Computer Science, UC Berkeley, Berkeley, CA USA.

Background: Network connectivity problems are abundant in computational biology research, where graphs are used to represent a range of phenomena: from physical interactions between molecules to more abstract relationships such as gene co-expression. One common challenge in studying biological networks is the need to extract meaningful, small subgraphs out of large databases of potential interactions. A useful abstraction for this task turned out to be the Steiner Network problems: given a reference "database" graph, find a parsimonious subgraph that satisfies a given set of connectivity demands. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0141-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6408827PMC
March 2019
2 Reads

Semi-nonparametric modeling of topological domain formation from epigenetic data.

Algorithms Mol Biol 2019 5;14. Epub 2019 Mar 5.

2Computational Biology Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, 15213 USA.

Background: Hi-C experiments capturing the 3D genome architecture have led to the discovery of topologically-associated domains (TADs) that form an important part of the 3D genome organization and appear to play a role in gene regulation and other functions. Several histone modifications have been independently associated with TAD formation, but their combinatorial effects on domain formation remain poorly understood at a global scale.

Results: We propose a convex semi-nonparametric approach called based on Bernstein polynomials to explore the joint effects of histone markers on TAD formation as well as predict TADs solely from the histone data. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0142-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6399866PMC

Automated partial atomic charge assignment for drug-like molecules: a fast knapsack approach.

Algorithms Mol Biol 2019 5;14. Epub 2019 Feb 5.

1Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstr. 1, 40225 Düsseldorf, Germany.

A key factor in computational drug design is the consistency and reliability with which intermolecular interactions between a wide variety of molecules can be described. Here we present a procedure to efficiently, reliably and automatically assign partial atomic charges to atoms based on known distributions. We formally introduce the molecular charge assignment problem, where the task is to select a charge from a set of candidate charges for every atom of a given query molecule. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0138-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364451PMC
February 2019

Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy.

Algorithms Mol Biol 2019 6;14. Epub 2019 Feb 6.

3Department of Computer Science, University of Illinois Urbana-Champaign, 201 N. Goodwin Avenue, Urbana, IL 61801 USA.

Background: Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch weights are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was published in SODA 2001. The main empirical advantage of over other AFC methods is its use of neighbor joining () to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0136-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364484PMC
February 2019

SNPs detection by eBWT positional clustering.

Algorithms Mol Biol 2019 6;14. Epub 2019 Feb 6.

1Dipartimento di Informatica, University of Pisa, Pisa, Italy.

Background: Sequencing technologies keep on turning cheaper and faster, thus putting a growing pressure for data structures designed to efficiently store raw data, and possibly perform analysis therein. In this view, there is a growing interest in alignment-free and reference-free variants calling methods that only make use of (suitably indexed) raw reads data.

Results: We develop the theory that (i) describes how the extended Burrows-Wheeler Transform (eBWT) of a collection of reads tends to cluster together bases that cover the same genome position (ii) predicts the size of such clusters, and (iii) exhibits an elegant and precise LCP array based procedure to locate such clusters in the eBWT. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0137-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364478PMC
February 2019

Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments.

Algorithms Mol Biol 2018 8;13:17. Epub 2018 Dec 8.

1Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8000 Aarhus C, Denmark.

Background: Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0135-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6286601PMC
December 2018
1 Read

Superbubbles revisited.

Algorithms Mol Biol 2018 1;13:16. Epub 2018 Dec 1.

1Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Universität Leipzig, Augustusplatz 12, 04107 Leipzig, Germany.

Background: Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles are therefore of important for the processing of HTS data. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0134-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6271648PMC
December 2018
1 Read

Coordinate systems for supergenomes.

Algorithms Mol Biol 2018 24;13:15. Epub 2018 Sep 24.

1Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Universität Leipzig, Augustusplatz 12, 04107 Leipzig, Germany.

Background: Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools to visualize-omics data simultaneously for multiple species are sorely lacking. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0133-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151955PMC
September 2018
2 Reads

Improved de novo peptide sequencing using LC retention time information.

Algorithms Mol Biol 2018 29;13:14. Epub 2018 Aug 29.

Department of Computer Science, ETH Zurich, Universitätstrasse 6, 8092 Zürich, Switzerland.

Background: Liquid chromatography combined with tandem mass spectrometry is an important tool in proteomics for peptide identification. Liquid chromatography temporally separates the peptides in a sample. The peptides that elute one after another are analyzed via tandem mass spectrometry by measuring the mass-to-charge ratio of a peptide and its fragments. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0132-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6114869PMC
August 2018
17 Reads

Sorting signed circular permutations by super short operations.

Algorithms Mol Biol 2018 26;13:13. Epub 2018 Jul 26.

1Institute of Computing, University of Campinas, Campinas, Brazil.

Background: One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or , that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed permutations of the form , and in that case we can consider without loss of generality that one of them is the identity permutation , and that we just need to the other (i. Read More

View Article

Download full-text PDF

Source
https://almob.biomedcentral.com/articles/10.1186/s13015-018-
Publisher Site
http://dx.doi.org/10.1186/s13015-018-0131-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6060566PMC
July 2018
16 Reads

Split-inducing indels in phylogenomic analysis.

Algorithms Mol Biol 2018 16;13:12. Epub 2018 Jul 16.

2Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.

Background: Most phylogenetic studies using molecular data treat gaps in multiple sequence alignments as missing data or even completely exclude alignment columns that contain gaps.

Results: Here we show that gap patterns in large-scale, genome-wide alignments are themselves phylogenetically informative and can be used to infer reliable phylogenies provided the gap data are properly filtered to reduce noise introduced by the alignment method. We introduce here the notion of split-inducing indels () that define an approximate bipartition of the taxon set. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0130-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6047143PMC
July 2018
25 Reads

Locus-aware decomposition of gene trees with respect to polytomous species trees.

Algorithms Mol Biol 2018 4;13:11. Epub 2018 Jun 4.

1Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, Warsaw, Poland.

Background: Horizontal gene transfer (HGT), a process of acquisition and fixation of foreign genetic material, is an important biological phenomenon. Several approaches to HGT inference have been proposed. However, most of them either rely on approximate, non-phylogenetic methods or on the tree reconciliation, which is computationally intensive and sensitive to parameter values. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0128-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5985597PMC
June 2018
6 Reads

A fast and accurate enumeration-based algorithm for haplotyping a triploid individual.

Algorithms Mol Biol 2018 1;13:10. Epub 2018 Jun 1.

2College of Computer Science and Information Technology, Guangxi Normal University, Guilin, 541004 China.

Background: Haplotype assembly, reconstructing haplotypes from sequence data, is one of the major computational problems in bioinformatics. Most of the current methodologies for haplotype assembly are designed for diploid individuals. In recent years, genomes having more than two sets of homologous chromosomes have attracted many research groups that are interested in the genomics of disease, phylogenetics, botany and evolution. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0129-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5984336PMC
June 2018
4 Reads

Finding local genome rearrangements.

Algorithms Mol Biol 2018 4;13. Epub 2018 May 4.

1CNRS, LIRMM, Université Montpellier, 161 rue Ada, 34392 Montpellier, France.

Background: The double cut and join (DCJ) model of genome rearrangement is well studied due to its mathematical simplicity and power to account for the many events that transform gene order. These studies have mostly been devoted to the understanding of minimum length scenarios transforming one genome into another. In this paper we search instead for rearrangement scenarios that minimize the number of rearrangements whose breakpoints are unlikely due to some biological criteria. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0127-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5934872PMC
May 2018
5 Reads

FSH: fast spaced seed hashing exploiting adjacent hashes.

Algorithms Mol Biol 2018 22;13. Epub 2018 Mar 22.

Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.

Background: Patterns with wildcards in specified positions, namely , are increasingly used instead of -mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of -mers can be rapidly computed by exploiting the large overlap between consecutive -mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0125-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5863468PMC
March 2018
5 Reads

Outlier detection in BLAST hits.

Algorithms Mol Biol 2018 22;13. Epub 2018 Mar 22.

1Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, 20742 USA.

Background: An important task in a metagenomic analysis is the assignment of taxonomic labels to sequences in a sample. Most widely used methods for taxonomy assignment compare a sequence in the sample to a database of known sequences. Many approaches use the best BLAST hit(s) to assign the taxonomic label. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0126-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5863388PMC
March 2018
5 Reads
1.460 Impact Factor

OCTAL: Optimal Completion of gene trees in polynomial time.

Algorithms Mol Biol 2018 15;13. Epub 2018 Mar 15.

Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL 61801 USA.

Background: For a combination of reasons (including data generation protocols, approaches to taxon and gene sampling, and gene birth and loss), estimated gene trees are often incomplete, meaning that they do not contain all of the species of interest. As incomplete gene trees can impact downstream analyses, accurate completion of gene trees is desirable.

Results: We introduce the , a general optimization problem that involves completing an unrooted binary tree (i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0124-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5853121PMC
March 2018
5 Reads

Derivative-free neural network for optimizing the scoring functions associated with dynamic programming of pairwise-profile alignment.

Algorithms Mol Biol 2018 15;13. Epub 2018 Feb 15.

1Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8579 Japan.

Background: A profile-comparison method with position-specific scoring matrix (PSSM) is among the most accurate alignment methods. Currently, cosine similarity and correlation coefficients are used as scoring functions of dynamic programming to calculate similarity between PSSMs. However, it is unclear whether these functions are optimal for profile alignment methods. Read More

View Article

Download full-text PDF

Source
https://almob.biomedcentral.com/articles/10.1186/s13015-018-
Publisher Site
http://dx.doi.org/10.1186/s13015-018-0123-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5815186PMC
February 2018
13 Reads

Fast phylogenetic inference from typing data.

Algorithms Mol Biol 2018 15;13. Epub 2018 Feb 15.

3INESC-ID Lisboa, Rua Alves Redol 9, 1000-029 Lisboa, Portugal.

Background: Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0119-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5815242PMC
February 2018
7 Reads

A safe and complete algorithm for metagenomic assembly.

Algorithms Mol Biol 2018 7;13. Epub 2018 Feb 7.

Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

Background: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph that together cover all nodes, or edges, of . Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0122-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5802251PMC
February 2018
4 Reads

Time-consistent reconciliation maps and forbidden time travel.

Algorithms Mol Biol 2018 6;13. Epub 2018 Feb 6.

1Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Strasse 47, 17487 Greifswald, Germany.

Background: In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree with a species trees , relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0121-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5800358PMC
February 2018
6 Reads

Gene tree parsimony for incomplete gene trees: addressing true biological loss.

Algorithms Mol Biol 2018 19;13. Epub 2018 Jan 19.

2Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, USA.

Motivation: Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some (i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0120-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5774205PMC
January 2018
4 Reads

Phylogeny reconstruction based on the length distribution of -mismatch common substrings.

Algorithms Mol Biol 2017 11;12:27. Epub 2017 Dec 11.

Department of Bioinformatics, Institute of Microbiology and Genetics, University of Goettingen, Goldschmidtstr. 1, 37077 Göttingen, Germany.

Background: Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487-1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated based on the average length of exact common substrings. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0118-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5724348PMC
December 2017
22 Reads

Generalized enhanced suffix array construction in external memory.

Algorithms Mol Biol 2017 7;12:26. Epub 2017 Dec 7.

Institute of Mathematics and Computer Science, University of São Paulo, Av. Trabalhador São-carlense, 400, São Carlos, 13560-970 Brazil.

Background: Suffix arrays, augmented by additional data structures, allow solving efficiently many string processing problems. The external memory construction of the generalized suffix array for a string collection is a fundamental task when the size of the input collection or the data structure exceeds the available internal memory.

Results: In this article we present and analyze [Formula: see text] [introduced in CPM (External memory generalized suffix and [Formula: see text] arrays construction. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0117-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719966PMC
December 2017
10 Reads

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

Algorithms Mol Biol 2017 29;12:25. Epub 2017 Sep 29.

School of Computer Science and Technology, Tianjin University, Tianjin, China.

Background: Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types.

Methods: Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0116-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5622559PMC
September 2017
9 Reads

Algorithms for matching partially labelled sequence graphs.

Authors:
William R Taylor

Algorithms Mol Biol 2017 25;12:24. Epub 2017 Sep 25.

Francis Crick Institute, 1 Midland Road, London, NW1 1AT UK.

Background: In order to find correlated pairs of positions between proteins, which are useful in predicting interactions, it is necessary to concatenate two large multiple sequence alignments such that the sequences that are joined together belong to those that interact in their species of origin. When each protein is unique then the species name is sufficient to guide this match, however, when there are multiple related sequences (paralogs) in each species then the pairing is more difficult. In bacteria a good guide can be gained from genome co-location as interacting proteins tend to be in a common operon but in eukaryotes this simple principle is not sufficient. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0115-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5613400PMC
September 2017
6 Reads

Biologically feasible gene trees, reconciliation maps and informative triples.

Authors:
Marc Hellmuth

Algorithms Mol Biol 2017 29;12:23. Epub 2017 Aug 29.

Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Strasse 47, 17487 Greifswald, Germany.

Background: The history of gene families-which are equivalent to gene trees-can be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are , that is, if there is a possible true history that would explain a given gene tree. In practice, this problem is boiled down to finding a reconciliation map-also known as DTL-scenario-between the event-labeled gene trees and a (possibly unknown) species tree. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0114-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5576477PMC
August 2017
7 Reads

Partially local three-way alignments and the sequence signatures of mitochondrial genome rearrangements.

Algorithms Mol Biol 2017 23;12:22. Epub 2017 Aug 23.

Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.

Background: Genomic DNA frequently undergoes rearrangement of the gene order that can be localized by comparing the two DNA sequences. In mitochondrial genomes different mechanisms are likely at work, at least some of which involve the duplication of sequence around the location of the apparent breakpoints. We hypothesize that these different mechanisms of genome rearrangement leave distinctive sequence footprints. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0113-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5569537PMC
August 2017
5 Reads

A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification.

Algorithms Mol Biol 2017 18;12:21. Epub 2017 Aug 18.

Genome Informatics, Institute of Human Genetics, University of Duisburg-Essen, University Hospital Essen, Hufelandstr. 55, 45147 Essen, Germany.

Background: Mixtures of beta distributions are a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1.

Methods: While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0112-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5563068PMC
August 2017
6 Reads

ASP-based method for the enumeration of attractors in non-deterministic synchronous and asynchronous multi-valued networks.

Algorithms Mol Biol 2017 15;12:20. Epub 2017 Aug 15.

École Centrale de Nantes, LS2N UMR CNRS 6004, 1 rue de la Noë, 44321 Nantes, France.

Background: This paper addresses the problem of finding attractors in biological regulatory networks. We focus here on non-deterministic synchronous and asynchronous multi-valued networks, modeled using automata networks (AN). AN is a general and well-suited formalism to study complex interactions between different components (genes, proteins,. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0111-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5557630PMC
August 2017
7 Reads

Identification of bifurcation transitions in biological regulatory networks using Answer-Set Programming.

Algorithms Mol Biol 2017 20;12:19. Epub 2017 Jul 20.

LRI UMR 8623, Univ. Paris-Sud-CNRS, Université Paris-Saclay, 91405 Orsay, France.

Background: Numerous cellular differentiation processes can be captured using discrete qualitative models of biological regulatory networks. These models describe the temporal evolution of the state of the network subject to different competing transitions, potentially leading the system to different attractors. This paper focusses on the formal identification of states and transitions that are crucial for preserving or pre-empting the reachability of a given behaviour. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0110-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5520421PMC
July 2017
7 Reads

A graph extension of the positional Burrows-Wheeler transform and its applications.

Algorithms Mol Biol 2017 11;12:18. Epub 2017 Jul 11.

Genomics Institute, University of California Santa Cruz, CBSE, 501C Engineering 2, MS: CBSE, 1156 High St., Santa Cruz, CA 95064 USA.

We present a generalization of the positional Burrows-Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0109-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5505026PMC
July 2017
9 Reads

Isometric gene tree reconciliation revisited.

Algorithms Mol Biol 2017 13;12:17. Epub 2017 Jun 13.

Faculty of Mathematics, Physics, and Informatics, Comenius University, Mlynská dolina, 842 48 Bratislava, Slovakia.

Background: Isometric gene tree reconciliation is a gene tree/species tree reconciliation problem where both the gene tree and the species tree include branch lengths, and these branch lengths must be respected by the reconciliation. The problem was introduced by Ma et al. in 2008 in the context of reconstructing evolutionary histories of genomes in the infinite sites model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0108-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5470333PMC
June 2017
7 Reads

A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses.

Algorithms Mol Biol 2017 11;12:12. Epub 2017 May 11.

Department of Computer Science, ETH Zurich, Universitätstrasse 6, 8092 Zurich, Switzerland.

Background: Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the : Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0104-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464308PMC
May 2017
12 Reads

Algorithms for computing the double cut and join distance on both gene order and intergenic sizes.

Algorithms Mol Biol 2017 5;12:16. Epub 2017 Jun 5.

Institut National de Recherche en Informatique et en Automatique (Inria) Grenoble Rhône-Alpes, 655 avenue de l'Europe, 38330 Montbonnot-Saint-Martin, France.

Background: Combinatorial works on genome rearrangements have so far ignored the influence of intergene sizes, i.e. the number of nucleotides between consecutive genes, although it was recently shown decisive for the accuracy of inference methods (Biller et al. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0107-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5460591PMC
June 2017
13 Reads

StreAM-[Formula: see text]: algorithms for analyzing coarse grained RNA dynamics based on Markov models of connectivity-graphs.

Algorithms Mol Biol 2017 30;12:15. Epub 2017 May 30.

Department of Biology, Department of Computer Science, Department of Physics, TU Darmstadt, Schnittspahnstr. 2, 64283 Darmstadt, Germany.

Background: In this work, we present a new coarse grained representation of RNA dynamics. It is based on adjacency matrices and their interactions patterns obtained from molecular dynamics simulations. RNA molecules are well-suited for this representation due to their composition which is mainly modular and assessable by the secondary structure alone. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0105-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5450175PMC
May 2017
18 Reads

The gene family-free median of three.

Algorithms Mol Biol 2017 26;12:14. Epub 2017 May 26.

Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6 Canada.

Background: The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes.

Methods: We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0106-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5446766PMC
May 2017
6 Reads

Complexity and algorithms for copy-number evolution problems.

Algorithms Mol Biol 2017 16;12:13. Epub 2017 May 16.

School of Computer Science, Tel Aviv University, Tel Aviv, Israel.

Background: Cancer is an evolutionary process characterized by the accumulation of somatic mutations in a population of cells that form a tumor. One frequent type of mutations is copy number aberrations, which alter the number of copies of genomic regions. The number of copies of each position along a chromosome constitutes the chromosome's copy-number profile. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0103-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5433102PMC
May 2017
29 Reads

Core column prediction for protein multiple sequence alignments.

Algorithms Mol Biol 2017 19;12:11. Epub 2017 Apr 19.

Department of Computer Science, The University of Arizona, Tucson, AZ 85721 USA.

Background: In a computed protein multiple sequence alignment, the of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0102-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5397798PMC
April 2017
7 Reads

Aligning coding sequences with frameshift extension penalties.

Algorithms Mol Biol 2017 31;12:10. Epub 2017 Mar 31.

Département d'informatique, Faculté des Sciences, Université de Sherbrooke, Sherbrooke, QC J1K2R1 Canada.

Background: Frameshift translation is an important phenomenon that contributes to the appearance of novel coding DNA sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of gene coding regions. Frameshift translations can be identified by aligning two CDS, from a same gene or from homologous genes, while accounting for their codon structure. Two main classes of algorithms have been proposed to solve the problem of aligning CDS, either by amino acid sequence alignment back-translation, or by simultaneously accounting for the nucleotide and amino acid levels. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0101-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374649PMC
March 2017
6 Reads

Gerbil: a fast and memory-efficient -mer counter with GPU-support.

Algorithms Mol Biol 2017 31;12. Epub 2017 Mar 31.

Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.

Background: A basic task in bioinformatics is the counting of -mers in genome sequences. Existing -mer counting tools are most often optimized for small  < 32 and suffer from excessive memory resource consumption or degrading performance for large . However, given the technology trend towards long reads of next-generation sequencers, support for large  becomes increasingly important. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0097-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374613PMC
March 2017
5 Reads

The feasibility of genome-scale biological network inference using Graphics Processing Units.

Algorithms Mol Biol 2017 20;12. Epub 2017 Mar 20.

Department of Molecular and Integrative Physiology, University of Michigan, North Campus Research Complex, Ann Arbor, MI USA.

Systems research spanning fields from biology to finance involves the identification of models to represent the underpinnings of complex systems. Formal approaches for data-driven identification of network interactions include statistical inference-based approaches and methods to identify dynamical systems models that are capable of fitting multivariate data. Availability of large data sets and so-called 'big data' applications in biology present great opportunities as well as major challenges for systems identification/reverse engineering applications. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0100-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5360040PMC
March 2017
6 Reads

An efficient algorithm for testing the compatibility of phylogenies with nested taxa.

Algorithms Mol Biol 2017 16;12. Epub 2017 Mar 16.

Department of Computer Science, Iowa State University, Atanasoff Hall, Ames, IA USA.

Background: Semi-labeled trees generalize ordinary phylogenetic trees, allowing internal nodes to be labeled by higher-order taxa. Taxonomies are examples of semi-labeled trees. Suppose we are given collection [Formula: see text] of semi-labeled trees over various subsets of a set of taxa. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0099-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5356459PMC
March 2017
6 Reads

On the computational complexity of the maximum parsimony reconciliation problem in the duplication-loss-coalescence model.

Algorithms Mol Biol 2017 14;12. Epub 2017 Mar 14.

Department of Computer Science, Harvey Mudd College, Claremont, USA.

Background: Phylogenetic tree reconciliation is a widely-used method for inferring the evolutionary histories of genes and species. In the duplication-loss-coalescence (DLC) model, we seek a reconciliation that explains the incongruence between a gene and species tree using gene duplication, loss, and deep coalescence events. In the maximum parsimony framework, costs are associated with these event types and a reconciliation is sought that minimizes the total cost of the events required to map the gene tree onto the species tree. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0098-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5349084PMC
March 2017
10 Reads