332 results match your criteria Algorithms for Molecular Biology [Journal]


Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments.

Algorithms Mol Biol 2018 8;13:17. Epub 2018 Dec 8.

1Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8000 Aarhus C, Denmark.

Background: Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0135-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6286601PMC
December 2018
1 Read

Superbubbles revisited.

Algorithms Mol Biol 2018 1;13:16. Epub 2018 Dec 1.

1Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Universität Leipzig, Augustusplatz 12, 04107 Leipzig, Germany.

Background: Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles are therefore of important for the processing of HTS data. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0134-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6271648PMC
December 2018
1 Read

Coordinate systems for supergenomes.

Algorithms Mol Biol 2018 24;13:15. Epub 2018 Sep 24.

1Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Universität Leipzig, Augustusplatz 12, 04107 Leipzig, Germany.

Background: Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools to visualize-omics data simultaneously for multiple species are sorely lacking. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0133-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151955PMC
September 2018
2 Reads

Improved de novo peptide sequencing using LC retention time information.

Algorithms Mol Biol 2018 29;13:14. Epub 2018 Aug 29.

Department of Computer Science, ETH Zurich, Universitätstrasse 6, 8092 Zürich, Switzerland.

Background: Liquid chromatography combined with tandem mass spectrometry is an important tool in proteomics for peptide identification. Liquid chromatography temporally separates the peptides in a sample. The peptides that elute one after another are analyzed via tandem mass spectrometry by measuring the mass-to-charge ratio of a peptide and its fragments. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0132-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6114869PMC
August 2018
16 Reads

Sorting signed circular permutations by super short operations.

Algorithms Mol Biol 2018 26;13:13. Epub 2018 Jul 26.

1Institute of Computing, University of Campinas, Campinas, Brazil.

Background: One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or , that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed permutations of the form , and in that case we can consider without loss of generality that one of them is the identity permutation , and that we just need to the other (i. Read More

View Article

Download full-text PDF

Source
https://almob.biomedcentral.com/articles/10.1186/s13015-018-
Publisher Site
http://dx.doi.org/10.1186/s13015-018-0131-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6060566PMC
July 2018
7 Reads

Split-inducing indels in phylogenomic analysis.

Algorithms Mol Biol 2018 16;13:12. Epub 2018 Jul 16.

2Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.

Background: Most phylogenetic studies using molecular data treat gaps in multiple sequence alignments as missing data or even completely exclude alignment columns that contain gaps.

Results: Here we show that gap patterns in large-scale, genome-wide alignments are themselves phylogenetically informative and can be used to infer reliable phylogenies provided the gap data are properly filtered to reduce noise introduced by the alignment method. We introduce here the notion of split-inducing indels () that define an approximate bipartition of the taxon set. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0130-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6047143PMC
July 2018
18 Reads

Locus-aware decomposition of gene trees with respect to polytomous species trees.

Algorithms Mol Biol 2018 4;13:11. Epub 2018 Jun 4.

1Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, Warsaw, Poland.

Background: Horizontal gene transfer (HGT), a process of acquisition and fixation of foreign genetic material, is an important biological phenomenon. Several approaches to HGT inference have been proposed. However, most of them either rely on approximate, non-phylogenetic methods or on the tree reconciliation, which is computationally intensive and sensitive to parameter values. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0128-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5985597PMC
June 2018
6 Reads

A fast and accurate enumeration-based algorithm for haplotyping a triploid individual.

Algorithms Mol Biol 2018 1;13:10. Epub 2018 Jun 1.

2College of Computer Science and Information Technology, Guangxi Normal University, Guilin, 541004 China.

Background: Haplotype assembly, reconstructing haplotypes from sequence data, is one of the major computational problems in bioinformatics. Most of the current methodologies for haplotype assembly are designed for diploid individuals. In recent years, genomes having more than two sets of homologous chromosomes have attracted many research groups that are interested in the genomics of disease, phylogenetics, botany and evolution. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0129-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5984336PMC
June 2018
4 Reads

Finding local genome rearrangements.

Algorithms Mol Biol 2018 4;13. Epub 2018 May 4.

1CNRS, LIRMM, Université Montpellier, 161 rue Ada, 34392 Montpellier, France.

Background: The double cut and join (DCJ) model of genome rearrangement is well studied due to its mathematical simplicity and power to account for the many events that transform gene order. These studies have mostly been devoted to the understanding of minimum length scenarios transforming one genome into another. In this paper we search instead for rearrangement scenarios that minimize the number of rearrangements whose breakpoints are unlikely due to some biological criteria. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0127-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5934872PMC
May 2018
5 Reads

FSH: fast spaced seed hashing exploiting adjacent hashes.

Algorithms Mol Biol 2018 22;13. Epub 2018 Mar 22.

Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.

Background: Patterns with wildcards in specified positions, namely , are increasingly used instead of -mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of -mers can be rapidly computed by exploiting the large overlap between consecutive -mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0125-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5863468PMC
March 2018
5 Reads

Outlier detection in BLAST hits.

Algorithms Mol Biol 2018 22;13. Epub 2018 Mar 22.

1Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, 20742 USA.

Background: An important task in a metagenomic analysis is the assignment of taxonomic labels to sequences in a sample. Most widely used methods for taxonomy assignment compare a sequence in the sample to a database of known sequences. Many approaches use the best BLAST hit(s) to assign the taxonomic label. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0126-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5863388PMC
March 2018
4 Reads
1.460 Impact Factor

OCTAL: Optimal Completion of gene trees in polynomial time.

Algorithms Mol Biol 2018 15;13. Epub 2018 Mar 15.

Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL 61801 USA.

Background: For a combination of reasons (including data generation protocols, approaches to taxon and gene sampling, and gene birth and loss), estimated gene trees are often incomplete, meaning that they do not contain all of the species of interest. As incomplete gene trees can impact downstream analyses, accurate completion of gene trees is desirable.

Results: We introduce the , a general optimization problem that involves completing an unrooted binary tree (i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0124-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5853121PMC
March 2018
5 Reads

Derivative-free neural network for optimizing the scoring functions associated with dynamic programming of pairwise-profile alignment.

Algorithms Mol Biol 2018 15;13. Epub 2018 Feb 15.

1Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8579 Japan.

Background: A profile-comparison method with position-specific scoring matrix (PSSM) is among the most accurate alignment methods. Currently, cosine similarity and correlation coefficients are used as scoring functions of dynamic programming to calculate similarity between PSSMs. However, it is unclear whether these functions are optimal for profile alignment methods. Read More

View Article

Download full-text PDF

Source
https://almob.biomedcentral.com/articles/10.1186/s13015-018-
Publisher Site
http://dx.doi.org/10.1186/s13015-018-0123-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5815186PMC
February 2018
9 Reads

Fast phylogenetic inference from typing data.

Algorithms Mol Biol 2018 15;13. Epub 2018 Feb 15.

3INESC-ID Lisboa, Rua Alves Redol 9, 1000-029 Lisboa, Portugal.

Background: Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0119-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5815242PMC
February 2018
7 Reads

A safe and complete algorithm for metagenomic assembly.

Algorithms Mol Biol 2018 7;13. Epub 2018 Feb 7.

Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

Background: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph that together cover all nodes, or edges, of . Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0122-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5802251PMC
February 2018
4 Reads

Time-consistent reconciliation maps and forbidden time travel.

Algorithms Mol Biol 2018 6;13. Epub 2018 Feb 6.

1Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Strasse 47, 17487 Greifswald, Germany.

Background: In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree with a species trees , relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0121-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5800358PMC
February 2018
6 Reads

Gene tree parsimony for incomplete gene trees: addressing true biological loss.

Algorithms Mol Biol 2018 19;13. Epub 2018 Jan 19.

2Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, USA.

Motivation: Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some (i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0120-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5774205PMC
January 2018
4 Reads

Phylogeny reconstruction based on the length distribution of -mismatch common substrings.

Algorithms Mol Biol 2017 11;12:27. Epub 2017 Dec 11.

Department of Bioinformatics, Institute of Microbiology and Genetics, University of Goettingen, Goldschmidtstr. 1, 37077 Göttingen, Germany.

Background: Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487-1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated based on the average length of exact common substrings. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0118-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5724348PMC
December 2017
22 Reads

Generalized enhanced suffix array construction in external memory.

Algorithms Mol Biol 2017 7;12:26. Epub 2017 Dec 7.

Institute of Mathematics and Computer Science, University of São Paulo, Av. Trabalhador São-carlense, 400, São Carlos, 13560-970 Brazil.

Background: Suffix arrays, augmented by additional data structures, allow solving efficiently many string processing problems. The external memory construction of the generalized suffix array for a string collection is a fundamental task when the size of the input collection or the data structure exceeds the available internal memory.

Results: In this article we present and analyze [Formula: see text] [introduced in CPM (External memory generalized suffix and [Formula: see text] arrays construction. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0117-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719966PMC
December 2017
10 Reads

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

Algorithms Mol Biol 2017 29;12:25. Epub 2017 Sep 29.

School of Computer Science and Technology, Tianjin University, Tianjin, China.

Background: Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types.

Methods: Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0116-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5622559PMC
September 2017
9 Reads

Algorithms for matching partially labelled sequence graphs.

Authors:
William R Taylor

Algorithms Mol Biol 2017 25;12:24. Epub 2017 Sep 25.

Francis Crick Institute, 1 Midland Road, London, NW1 1AT UK.

Background: In order to find correlated pairs of positions between proteins, which are useful in predicting interactions, it is necessary to concatenate two large multiple sequence alignments such that the sequences that are joined together belong to those that interact in their species of origin. When each protein is unique then the species name is sufficient to guide this match, however, when there are multiple related sequences (paralogs) in each species then the pairing is more difficult. In bacteria a good guide can be gained from genome co-location as interacting proteins tend to be in a common operon but in eukaryotes this simple principle is not sufficient. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0115-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5613400PMC
September 2017
6 Reads

Biologically feasible gene trees, reconciliation maps and informative triples.

Authors:
Marc Hellmuth

Algorithms Mol Biol 2017 29;12:23. Epub 2017 Aug 29.

Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Strasse 47, 17487 Greifswald, Germany.

Background: The history of gene families-which are equivalent to gene trees-can be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are , that is, if there is a possible true history that would explain a given gene tree. In practice, this problem is boiled down to finding a reconciliation map-also known as DTL-scenario-between the event-labeled gene trees and a (possibly unknown) species tree. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0114-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5576477PMC
August 2017
7 Reads

Partially local three-way alignments and the sequence signatures of mitochondrial genome rearrangements.

Algorithms Mol Biol 2017 23;12:22. Epub 2017 Aug 23.

Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.

Background: Genomic DNA frequently undergoes rearrangement of the gene order that can be localized by comparing the two DNA sequences. In mitochondrial genomes different mechanisms are likely at work, at least some of which involve the duplication of sequence around the location of the apparent breakpoints. We hypothesize that these different mechanisms of genome rearrangement leave distinctive sequence footprints. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0113-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5569537PMC
August 2017
5 Reads

A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification.

Algorithms Mol Biol 2017 18;12:21. Epub 2017 Aug 18.

Genome Informatics, Institute of Human Genetics, University of Duisburg-Essen, University Hospital Essen, Hufelandstr. 55, 45147 Essen, Germany.

Background: Mixtures of beta distributions are a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1.

Methods: While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0112-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5563068PMC
August 2017
6 Reads

ASP-based method for the enumeration of attractors in non-deterministic synchronous and asynchronous multi-valued networks.

Algorithms Mol Biol 2017 15;12:20. Epub 2017 Aug 15.

École Centrale de Nantes, LS2N UMR CNRS 6004, 1 rue de la Noë, 44321 Nantes, France.

Background: This paper addresses the problem of finding attractors in biological regulatory networks. We focus here on non-deterministic synchronous and asynchronous multi-valued networks, modeled using automata networks (AN). AN is a general and well-suited formalism to study complex interactions between different components (genes, proteins,. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0111-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5557630PMC
August 2017
7 Reads

Identification of bifurcation transitions in biological regulatory networks using Answer-Set Programming.

Algorithms Mol Biol 2017 20;12:19. Epub 2017 Jul 20.

LRI UMR 8623, Univ. Paris-Sud-CNRS, Université Paris-Saclay, 91405 Orsay, France.

Background: Numerous cellular differentiation processes can be captured using discrete qualitative models of biological regulatory networks. These models describe the temporal evolution of the state of the network subject to different competing transitions, potentially leading the system to different attractors. This paper focusses on the formal identification of states and transitions that are crucial for preserving or pre-empting the reachability of a given behaviour. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0110-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5520421PMC
July 2017
7 Reads

A graph extension of the positional Burrows-Wheeler transform and its applications.

Algorithms Mol Biol 2017 11;12:18. Epub 2017 Jul 11.

Genomics Institute, University of California Santa Cruz, CBSE, 501C Engineering 2, MS: CBSE, 1156 High St., Santa Cruz, CA 95064 USA.

We present a generalization of the positional Burrows-Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0109-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5505026PMC
July 2017
9 Reads

Isometric gene tree reconciliation revisited.

Algorithms Mol Biol 2017 13;12:17. Epub 2017 Jun 13.

Faculty of Mathematics, Physics, and Informatics, Comenius University, Mlynská dolina, 842 48 Bratislava, Slovakia.

Background: Isometric gene tree reconciliation is a gene tree/species tree reconciliation problem where both the gene tree and the species tree include branch lengths, and these branch lengths must be respected by the reconciliation. The problem was introduced by Ma et al. in 2008 in the context of reconstructing evolutionary histories of genomes in the infinite sites model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0108-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5470333PMC
June 2017
7 Reads

A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses.

Algorithms Mol Biol 2017 11;12:12. Epub 2017 May 11.

Department of Computer Science, ETH Zurich, Universitätstrasse 6, 8092 Zurich, Switzerland.

Background: Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the : Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0104-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464308PMC
May 2017
8 Reads

Algorithms for computing the double cut and join distance on both gene order and intergenic sizes.

Algorithms Mol Biol 2017 5;12:16. Epub 2017 Jun 5.

Institut National de Recherche en Informatique et en Automatique (Inria) Grenoble Rhône-Alpes, 655 avenue de l'Europe, 38330 Montbonnot-Saint-Martin, France.

Background: Combinatorial works on genome rearrangements have so far ignored the influence of intergene sizes, i.e. the number of nucleotides between consecutive genes, although it was recently shown decisive for the accuracy of inference methods (Biller et al. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0107-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5460591PMC
June 2017
13 Reads

StreAM-[Formula: see text]: algorithms for analyzing coarse grained RNA dynamics based on Markov models of connectivity-graphs.

Algorithms Mol Biol 2017 30;12:15. Epub 2017 May 30.

Department of Biology, Department of Computer Science, Department of Physics, TU Darmstadt, Schnittspahnstr. 2, 64283 Darmstadt, Germany.

Background: In this work, we present a new coarse grained representation of RNA dynamics. It is based on adjacency matrices and their interactions patterns obtained from molecular dynamics simulations. RNA molecules are well-suited for this representation due to their composition which is mainly modular and assessable by the secondary structure alone. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0105-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5450175PMC
May 2017
17 Reads

The gene family-free median of three.

Algorithms Mol Biol 2017 26;12:14. Epub 2017 May 26.

Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6 Canada.

Background: The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes.

Methods: We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0106-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5446766PMC
May 2017
6 Reads

Complexity and algorithms for copy-number evolution problems.

Algorithms Mol Biol 2017 16;12:13. Epub 2017 May 16.

School of Computer Science, Tel Aviv University, Tel Aviv, Israel.

Background: Cancer is an evolutionary process characterized by the accumulation of somatic mutations in a population of cells that form a tumor. One frequent type of mutations is copy number aberrations, which alter the number of copies of genomic regions. The number of copies of each position along a chromosome constitutes the chromosome's copy-number profile. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0103-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5433102PMC
May 2017
23 Reads

Core column prediction for protein multiple sequence alignments.

Algorithms Mol Biol 2017 19;12:11. Epub 2017 Apr 19.

Department of Computer Science, The University of Arizona, Tucson, AZ 85721 USA.

Background: In a computed protein multiple sequence alignment, the of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0102-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5397798PMC
April 2017
7 Reads

Aligning coding sequences with frameshift extension penalties.

Algorithms Mol Biol 2017 31;12:10. Epub 2017 Mar 31.

Département d'informatique, Faculté des Sciences, Université de Sherbrooke, Sherbrooke, QC J1K2R1 Canada.

Background: Frameshift translation is an important phenomenon that contributes to the appearance of novel coding DNA sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of gene coding regions. Frameshift translations can be identified by aligning two CDS, from a same gene or from homologous genes, while accounting for their codon structure. Two main classes of algorithms have been proposed to solve the problem of aligning CDS, either by amino acid sequence alignment back-translation, or by simultaneously accounting for the nucleotide and amino acid levels. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0101-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374649PMC
March 2017
6 Reads

Gerbil: a fast and memory-efficient -mer counter with GPU-support.

Algorithms Mol Biol 2017 31;12. Epub 2017 Mar 31.

Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.

Background: A basic task in bioinformatics is the counting of -mers in genome sequences. Existing -mer counting tools are most often optimized for small  < 32 and suffer from excessive memory resource consumption or degrading performance for large . However, given the technology trend towards long reads of next-generation sequencers, support for large  becomes increasingly important. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0097-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374613PMC
March 2017
5 Reads

The feasibility of genome-scale biological network inference using Graphics Processing Units.

Algorithms Mol Biol 2017 20;12. Epub 2017 Mar 20.

Department of Molecular and Integrative Physiology, University of Michigan, North Campus Research Complex, Ann Arbor, MI USA.

Systems research spanning fields from biology to finance involves the identification of models to represent the underpinnings of complex systems. Formal approaches for data-driven identification of network interactions include statistical inference-based approaches and methods to identify dynamical systems models that are capable of fitting multivariate data. Availability of large data sets and so-called 'big data' applications in biology present great opportunities as well as major challenges for systems identification/reverse engineering applications. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0100-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5360040PMC
March 2017
6 Reads

An efficient algorithm for testing the compatibility of phylogenies with nested taxa.

Algorithms Mol Biol 2017 16;12. Epub 2017 Mar 16.

Department of Computer Science, Iowa State University, Atanasoff Hall, Ames, IA USA.

Background: Semi-labeled trees generalize ordinary phylogenetic trees, allowing internal nodes to be labeled by higher-order taxa. Taxonomies are examples of semi-labeled trees. Suppose we are given collection [Formula: see text] of semi-labeled trees over various subsets of a set of taxa. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0099-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5356459PMC
March 2017
6 Reads

On the computational complexity of the maximum parsimony reconciliation problem in the duplication-loss-coalescence model.

Algorithms Mol Biol 2017 14;12. Epub 2017 Mar 14.

Department of Computer Science, Harvey Mudd College, Claremont, USA.

Background: Phylogenetic tree reconciliation is a widely-used method for inferring the evolutionary histories of genes and species. In the duplication-loss-coalescence (DLC) model, we seek a reconciliation that explains the incongruence between a gene and species tree using gene duplication, loss, and deep coalescence events. In the maximum parsimony framework, costs are associated with these event types and a reconciliation is sought that minimizes the total cost of the events required to map the gene tree onto the species tree. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0098-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5349084PMC
March 2017
10 Reads

On avoided words, absent words, and their application to biological sequence analysis.

Algorithms Mol Biol 2017 14;12. Epub 2017 Mar 14.

Computational Regulatory Genomics, MRC Clinical Sciences Centre (CSC), Du Cane Road, London, W12 0NN UK.

Background: The deviation of the observed frequency of a word from its expected frequency in a given sequence is used to determine whether or not the word is . This concept is particularly useful in DNA linguistic analysis. The value of the deviation of , denoted by [Formula: see text], effectively characterises the extent of a word by its edge contrast in the context in which it occurs. Read More

View Article

Download full-text PDF

Source
http://almob.biomedcentral.com/articles/10.1186/s13015-017-0
Publisher Site
http://dx.doi.org/10.1186/s13015-017-0094-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5348888PMC
March 2017
29 Reads

Approximating the correction of weighted and unweighted orthology and paralogy relations.

Algorithms Mol Biol 2017 11;12. Epub 2017 Mar 11.

Département d'informatique et de recherche opérationnelle, Université de Montréal, Quebec, Canada.

Background: Given a gene family, the relations between genes (orthology/paralogy), are represented by a relation graph, where edges connect pairs of orthologous genes and "missing" edges represent paralogs. While a gene tree directly induces a relation graph, the converse is not always true. Indeed, a relation graph is not necessarily "satisfiable", i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0096-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5346272PMC
March 2017
7 Reads

Approximating the DCJ distance of balanced genomes in linear time.

Algorithms Mol Biol 2017 9;12. Epub 2017 Mar 9.

Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, MS Brazil.

Background: Rearrangements are large-scale mutations in genomes, responsible for complex changes and structural variations. Most rearrangements that modify the organization of a genome can be represented by the double cut and join (DCJ) operation. Given two balanced genomes, i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0095-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5345319PMC
March 2017
8 Reads

Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds.

Authors:
Laurent Noé

Algorithms Mol Biol 2017 14;12. Epub 2017 Feb 14.

CRIStAL (UMR 9189 Lille University/CNRS)-Inria Lille, Bat M3 ext, Université Lille 1, 59655 Villeneuve d'Ascq, France.

Background: , also named , have been proven to be more sensitive than contiguous seeds () in nucleic and amino-acid sequences analysis. Initially proposed to detect sequence similarities and to anchor sequence alignments, spaced seeds have more recently been applied in several related methods. Unfortunately, spaced seeds need to be initially designed. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0092-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5310094PMC
February 2017
6 Reads

Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads.

Algorithms Mol Biol 2017 22;12. Epub 2017 Feb 22.

Inria Grenoble, 655, Avenue de l'Europe, 38334 Montbonnot, France.

Background: The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0091-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5322684PMC
February 2017
15 Reads

Erratum to: A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.

Algorithms Mol Biol 2016 28;11:28. Epub 2016 Nov 28.

Institute of Theoretical Computer Science, Ulm University, James-Franck-Ring O27/537, 89069 Ulm, Germany.

[This corrects the article DOI: 10.1186/s13015-016-0083-7.]. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-016-0090-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5126863PMC
November 2016
20 Reads

An efficient algorithm for protein structure comparison using elastic shape analysis.

Algorithms Mol Biol 2016 29;11:27. Epub 2016 Sep 29.

ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India ; Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi, 110012 India.

Background: Protein structure comparison play important role in in silico functional prediction of a new protein. It is also used for understanding the evolutionary relationships among proteins. A variety of methods have been proposed in literature for comparing protein structures but they have their own limitations in terms of accuracy and complexity with respect to computational time and space. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-016-0089-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5041553PMC
September 2016
6 Reads

Analysis of gene copy number changes in tumor phylogenetics.

Algorithms Mol Biol 2016 22;11:26. Epub 2016 Sep 22.

Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208 USA.

Backgound: Evolution of cancer cells is characterized by large scale and rapid changes in the chromosomal  landscape. The fluorescence in situ hybridization (FISH) technique provides a way to measure the copy numbers of preselected genes in a group of cells and has been found to be a reliable source of data to model the evolution of tumor cells. Chowdhury et al. Read More

View Article

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5034472PMC
http://dx.doi.org/10.1186/s13015-016-0088-2DOI Listing
September 2016
13 Reads

Enumeration of minimal stoichiometric precursor sets in metabolic networks.

Algorithms Mol Biol 2016 19;11:25. Epub 2016 Sep 19.

Erable team, INRIA Grenoble Rhône-Alpes, 655 Avenue de l'Europe, 38330 Montbonnot Saint-Martin, France.

Background: What an organism needs at least from its environment to produce a set of metabolites, e.g. target(s) of interest and/or biomass, has been called a minimal precursor set. Read More

View Article

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5029113PMC
http://dx.doi.org/10.1186/s13015-016-0087-3DOI Listing
September 2016
24 Reads

Using a constraint-based regression method for relative quantification of somatic mutations in pyrosequencing signals: a case for NRAS analysis.

Algorithms Mol Biol 2016 15;11:24. Epub 2016 Sep 15.

Institut de Recherche Expérimentale et Clinique (IREC), Center for Applied Molecular Technologies (CTMA), Université catholique de Louvain, Clos chapelle-aux-champs B1.30.24, 1200 Brussels, Belgium.

Background: Pyrosequencing Allele Quantification (AQ) is a cost-effective DNA sequencing method that can be used for detecting somatic mutations in formalin-fixed paraffin-embedded (FFPE) samples. The method displays a low turnaround time and a high sensitivity. Pyrosequencing suffers however from two main drawbacks including (i) low specificity and (ii) difficult signal interpretation when multiple mutations are reported in a hotspot genomic region. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-016-0086-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5024468PMC
September 2016
15 Reads

BiC2PAM: constraint-guided biclustering for biological data analysis with domain knowledge.

Algorithms Mol Biol 2016 14;11:23. Epub 2016 Sep 14.

INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal.

Background: Biclustering has been largely used in biological data analysis, enabling the discovery of putative functional modules from omic and network data. Despite the recognized importance of incorporating domain knowledge to guide biclustering and guarantee a focus on relevant and non-trivial biclusters, this possibility has not yet been comprehensively addressed. This results from the fact that the majority of existing algorithms are only able to deliver sub-optimal solutions with restrictive assumptions on the structure, coherency and quality of biclustering solutions, thus preventing the up-front satisfaction of knowledge-driven constraints. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-016-0085-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5024481PMC
September 2016
5 Reads