368 results match your criteria Algorithms For Molecular Biology[Journal]


Precise parallel volumetric comparison of molecular surfaces and electrostatic isopotentials.

Algorithms Mol Biol 2020 25;15:11. Epub 2020 May 25.

Department of Computer Science and Engineering, Lehigh University, 113 Research Drive, Bethlehem, PA USA.

Geometric comparisons of binding sites and their electrostatic properties can identify subtle variations that select different binding partners and subtle similarities that accommodate similar partners. Because subtle features are central for explaining how proteins achieve specificity, algorithmic efficiency and geometric precision are central to algorithmic design. To address these concerns, this paper presents pClay, the first algorithm to perform parallel and arbitrarily precise comparisons of molecular surfaces and electrostatic isopotentials as geometric solids. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-020-00168-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7247173PMC

Context-aware seeds for read mapping.

Algorithms Mol Biol 2020 23;15:10. Epub 2020 May 23.

Computational Biology Department, Carnegie Mellon University, Pittsburgh, 15213 USA.

Motivation: Most modern seed-and-extend NGS read mappers employ a seeding scheme that requires extracting non-overlapping seeds in each read in order to find all valid mappings under an edit distance threshold of . As grows, this seeding scheme forces mappers to use more and shorter seeds, which increases the seed hits (seed frequencies) and therefore reduces the efficiency of mappers.

Results: We propose a novel seeding framework, context-aware seeds (CAS). Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-020-00172-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7245042PMC

Detecting transcriptomic structural variants in heterogeneous contexts via the Multiple Compatible Arrangements Problem.

Algorithms Mol Biol 2020 15;15. Epub 2020 May 15.

Computational Biology Department, Carnegie Mellon University, 5000 Forbes Ave, 15213 Pittsburgh, PA USA.

Background: Transcriptomic structural variants (TSVs)-large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally challenging problem. Among all the confounding factors, sample heterogeneity, where each sample contains multiple distinct alleles, poses a critical obstacle to accurate TSV prediction. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-020-00170-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7227063PMC

The distance and median problems in the single-cut-or-join model with single-gene duplications.

Algorithms Mol Biol 2020 4;15. Epub 2020 May 4.

1Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6 Canada.

Background: In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free models, the problem is NP-complete for most extensions of these models accounting for duplicated genes. Moreover, problems involving more than two genomes, such as the genome median and the Small Parsimony problem, are intractable for most duplication-free models, with some exceptions, for example the Single-Cut-or-Join (SCJ) model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-020-00169-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197181PMC

Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences.

Algorithms Mol Biol 2020 16;15. Epub 2020 Apr 16.

1Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 USA.

Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related techniques assume that sites are independent and identically distributed (i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-020-00167-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7164268PMC
April 2020
1.463 Impact Factor

Linear-time algorithms for phylogenetic tree completion under Robinson-Foulds distance.

Authors:
Mukul S Bansal

Algorithms Mol Biol 2020 13;15. Epub 2020 Apr 13.

1Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Way, Storrs, USA.

Background: We consider two fundamental computational problems that arise when comparing phylogenetic trees, rooted or unrooted, with non-identical leaf sets. The first problem arises when comparing two trees where the leaf set of one tree is a proper subset of the other. The second problem arises when the two trees to be compared have only partially overlapping leaf sets. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-020-00166-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7155338PMC

From pairs of most similar sequences to phylogenetic best matches.

Algorithms Mol Biol 2020 9;15. Epub 2020 Apr 9.

CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México.

Background: Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (reciprocal best matches). This approximation of best matches by best hits becomes exact for ultrametric dissimilarities, i.e. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-020-00165-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147060PMC

Alignment- and reference-free phylogenomics with colored de Bruijn graphs.

Authors:
Roland Wittler

Algorithms Mol Biol 2020 7;15. Epub 2020 Apr 7.

1Genome Informatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany.

Background: The increasing amount of available genome sequence data enables large-scale comparative studies. A common task is the inference of phylogenies-a challenging task if close reference sequences are not available, genome sequences are incompletely assembled, or the high number of genomes precludes multiple sequence alignment in reasonable time.

Results: We present a new whole-genome based approach to infer phylogenies that is alignment- and reference-free. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-020-00164-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7137503PMC

GrpClassifierEC: a novel classification approach based on the ensemble clustering space.

Algorithms Mol Biol 2020 13;15. Epub 2020 Feb 13.

2The Department of Community Information Systems, Zefat Academic College, 13206 Zefat, Israel.

Background: Advances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover, the geometric space may not reflects the actual similarity between the different objects. As a result, in this research we use clustering-based space that convert the geometric space of the molecular to a categorical space based on clustering results. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-020-0162-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7017541PMC
February 2020

Finding all maximal perfect haplotype blocks in linear time.

Algorithms Mol Biol 2020 10;15. Epub 2020 Feb 10.

4Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.

Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying such regions from individuals' haplotype data, however, require excessive computing times and therefore are not applicable to current datasets. In 2019, Cunha et al. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-020-0163-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7008532PMC
February 2020

Non-parametric correction of estimated gene trees using TRACTION.

Algorithms Mol Biol 2020 4;15. Epub 2020 Jan 4.

Department of Computer Science, University of Illinois at Urbana-Champaign, Goodwin Ave, Urbana, IL USA.

Motivation: Estimated gene trees are often inaccurate, due to insufficient phylogenetic signal in the single gene alignment, among other causes. Gene tree correction aims to improve the accuracy of an estimated gene tree by using computational techniques along with auxiliary information, such as a reference species tree or sequencing data. However, gene trees and species trees can differ as a result of gene duplication and loss (GDL), incomplete lineage sorting (ILS), and other biological processes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0161-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6942343PMC
January 2020

Kohdista: an efficient method to index and query possible Rmap alignments.

Algorithms Mol Biol 2019 12;14:25. Epub 2019 Dec 12.

3Computer & Information Science & Engineering, University of Florida, Gainesville, FL USA.

Background: Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical maps are assembled using an overlap-layout-consensus approach using raw optical map data, which are referred to as Rmaps. Due to the high error-rate of Rmap data, finding the overlap between Rmaps remains challenging. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0160-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907254PMC
December 2019

TMRS: an algorithm for computing the time to the most recent substitution event from a multiple alignment column.

Algorithms Mol Biol 2019 18;14:23. Epub 2019 Nov 18.

1Department of Computational Biology and Medical Sciences, GSFS, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba Japan.

Background : As the number of sequenced genomes grows, researchers have access to an increasingly rich source for discovering detailed evolutionary information. However, the computational technologies for inferring biologically important evolutionary events are not sufficiently developed.

Results : We present algorithms to estimate the evolutionary time ( ) to the most recent substitution event from a multiple alignment column by using a probabilistic model of sequence evolution. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0158-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6859643PMC
November 2019

A cubic algorithm for the generalized rank median of three genomes.

Algorithms Mol Biol 2019 26;14:16. Epub 2019 Jul 26.

2Institute of Computing, University of Campinas, Campinas, Brazil.

Background: The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Among these, one of the most intractable ones has been that of finding the median of three genomes, a special case of the ancestral reconstruction problem. In this work we re-examine our recently proposed way of measuring genome rearrangement distance, namely, the rank distance between the matrix representations of the corresponding genomes, and show that the median of three genomes can be computed exactly in polynomial time , where , with respect to this distance, when the median is allowed to be an arbitrary orthogonal matrix. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0150-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6867026PMC

NANUQ: a method for inferring species networks from gene trees under the coalescent model.

Algorithms Mol Biol 2019 6;14:24. Epub 2019 Dec 6.

Department of of Mathematics and Statistics, University of Alaska Fairbanks, 1792 Ambler Lane, Fairbanks, AK 99775 USA.

Species networks generalize the notion of species trees to allow for hybridization or other lateral gene transfer. Under the network multispecies coalescent model, individual gene trees arising from a network can have any topology, but arise with frequencies dependent on the network structure and numerical parameters. We propose a new algorithm for statistical inference of a level-1 species network under this model, from data consisting of gene tree topologies, and provide the theoretical justification for it. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0159-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6896299PMC
December 2019

Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics.

Algorithms Mol Biol 2019 15;14:22. Epub 2019 Nov 15.

MIAT, Université de Toulouse, INRA, Castanet-Tolosan, France.

Background: Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution, locus-level measurements. An intuitive way of doing this is to perform a modified Hierarchical Agglomerative Clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) are allowed to be merged. But a major practical drawback of this method is its quadratic time and space complexity in the number of loci, which is typically of the order of to for each chromosome. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0157-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857244PMC
November 2019

Super short operations on both gene order and intergenic sizes.

Algorithms Mol Biol 2019 5;14:21. Epub 2019 Nov 5.

1Institute of Computing, University of Campinas, Campinas, Brazil.

Background: The evolutionary distance between two genomes can be estimated by computing a minimum length sequence of operations, called , that transform one genome into another. Usually, a genome is modeled as an ordered sequence of genes, and most of the studies in the genome rearrangement literature consist in shaping biological scenarios into mathematical models. For instance, allowing different genome rearrangements operations at the same time, adding constraints to these rearrangements (e. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0156-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6833170PMC
November 2019

Bayesian localization of CNV candidates in WGS data within minutes.

Algorithms Mol Biol 2019 23;14:20. Epub 2019 Sep 23.

1Department of Computer Science and Engineering, University of Gothenburg | Chalmers, Rännvägen 6, 412 58 Gothenburg, Sweden.

Background: Full Bayesian inference for detecting copy number variants (CNV) from whole-genome sequencing (WGS) data is still largely infeasible due to computational demands. A recently introduced approach to perform Forward-Backward Gibbs sampling using dynamic Haar wavelet compression has alleviated issues of convergence and, to some extent, speed. Yet, the problem remains challenging in practice. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0154-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6757390PMC
September 2019

Implications of non-uniqueness in phylogenetic deconvolution of bulk DNA samples of tumors.

Algorithms Mol Biol 2019 3;14:19. Epub 2019 Sep 3.

1Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA.

Background: Tumors exhibit extensive intra-tumor heterogeneity, the presence of groups of cellular populations with distinct sets of somatic mutations. This heterogeneity is the result of an evolutionary process, described by a phylogenetic tree. In addition to enabling clinicians to devise patient-specific treatment plans, phylogenetic trees of tumors enable researchers to decipher the mechanisms of tumorigenesis and metastasis. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0155-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6719395PMC
September 2019

A branching process for homology distribution-based inference of polyploidy, speciation and loss.

Algorithms Mol Biol 2019 1;14:18. Epub 2019 Aug 1.

Department of Mathematics and Statistics, University of Ottawa, 150 Louis Pasteur Pvt, Ottawa, K1N 6N5 Canada.

Background: The statistical distribution of the similarity or difference between pairs of paralogous genes, created by whole genome doubling, or between pairs of orthologous genes in two related species is an important source of information about genomic evolution, especially in plants.

Methods: We derive the mixture of distributions of sequence similarity for duplicate gene pairs generated by repeated episodes of whole gene doubling. This involves integrating sequence divergence and gene pair loss through fractionation, using a branching process and a mutational model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0153-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6676528PMC

A multi-labeled tree dissimilarity measure for comparing "clonal trees" of tumor progression.

Algorithms Mol Biol 2019 27;14:17. Epub 2019 Jul 27.

1Department of Computer Science, Indiana University, Bloomington, IN USA.

We introduce a new dissimilarity measure between a pair of "clonal trees", each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone that harbors it. Given two clonal trees, our multi-labeled tree dissimilarity (MLTD) measure is defined as the minimum number of mutation/label deletions, (empty) leaf deletions, and vertex (clonal) expansions, applied in any order, to convert each of the two trees to the maximum common tree. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0152-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6661107PMC
July 2019
1.463 Impact Factor

A general framework for genome rearrangement with biological constraints.

Algorithms Mol Biol 2019 19;14:15. Epub 2019 Jul 19.

1CNRS, LIRMM, Université Montpellier, 161 Rue Ada, 34392 Montpellier, France.

This paper generalizes previous studies on genome rearrangement under biological constraints, using double cut and join (DCJ). We propose a model for weighted DCJ, along with a family of optimization problems called -MCPS (Minimum Cost Parsimonious Scenario), that are based on labeled graphs. We show how to compute solutions to general instances of -MCPS, given an algorithm to compute -MCPS on a circular genome with exactly one occurrence of each gene. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0149-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6642580PMC
July 2019
5 Reads

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge.

Algorithms Mol Biol 2019 19;14:14. Epub 2019 Jul 19.

Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL 61801 USA.

Background: Divide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of such approaches.

Results: In this paper, we introduce a divide-and-conquer approach that does not require supertree estimation: we divide the species set into pairwise disjoint subsets, construct a tree on each subset using a base method, and then combine the subset trees using a distance matrix. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0151-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6642500PMC

Prefix-free parsing for building big BWTs.

Algorithms Mol Biol 2019 24;14:13. Epub 2019 May 24.

5Johns Hopkins University, Baltimore, MD USA.

High-throughput sequencing technologies have led to explosive growth of genomic databases; one of which will soon reach hundreds of terabytes. For many applications we want to build and store indexes of these databases but constructing such indexes is a challenge. Fortunately, many of these genomic databases are highly-repetitive-a characteristic that can be exploited to ease the computation of the Burrows-Wheeler Transform (BWT), which underlies many popular indexes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0148-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6534911PMC
May 2019
6 Reads

Linear time minimum segmentation enables scalable founder reconstruction.

Algorithms Mol Biol 2019 17;14:12. Epub 2019 May 17.

1Department of Computer Science, University of Helsinki, Pietari Kalmin katu 5, 00014 Helsinki, Finland.

Background:  We study a preprocessing routine relevant in pan-genomic analyses: consider a set of aligned haplotype sequences of complete human chromosomes. Due to the enormous size of such data, one would like to represent this input set with a few sequences that retain as well as possible the contiguities of the original sequences. Such a smaller set gives a scalable way to exploit pan-genomic information in further analyses (e. Read More

View Article

Download full-text PDF

Source
https://almob.biomedcentral.com/articles/10.1186/s13015-019-
Publisher Site
http://dx.doi.org/10.1186/s13015-019-0147-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6525415PMC
May 2019
8 Reads

An average-case sublinear forward algorithm for the haploid Li and Stephens model.

Algorithms Mol Biol 2019 2;14:11. Epub 2019 Apr 2.

2NYU School of Medicine, 550 First Ave, New York, NY 10016 USA.

Background: Hidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithm as long as the representative reference panel used in the model is sufficiently small. Specifically, the monoploid Li and Stephens model and its variants are linear in reference panel size unless heuristic approximations are used. However, sequencing projects numbering in the thousands to hundreds of thousands of individuals are underway, and others numbering in the millions are anticipated. Read More

View Article

Download full-text PDF

Source
https://almob.biomedcentral.com/articles/10.1186/s13015-019-
Publisher Site
http://dx.doi.org/10.1186/s13015-019-0144-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6446408PMC
April 2019
16 Reads

Differentially mutated subnetworks discovery.

Algorithms Mol Biol 2019 30;14:10. Epub 2019 Mar 30.

3Department of Information Engineering, University of Padova, Padova, Italy.

Problem: We study the problem of identifying differentially mutated subnetworks of a large gene-gene interaction network, that is, subnetworks that display a significant difference in mutation frequency in two sets of cancer samples. We formally define the associated computational problem and show that the problem is NP-hard.

Algorithm: We propose a novel and efficient algorithm, called DAMOKLE, to identify differentially mutated subnetworks given genome-wide mutation data for two sets of cancer samples. Read More

View Article

Download full-text PDF

Source
https://almob.biomedcentral.com/articles/10.1186/s13015-019-
Publisher Site
http://dx.doi.org/10.1186/s13015-019-0146-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6441493PMC
March 2019
9 Reads

Repairing Boolean logical models from time-series data using Answer Set Programming.

Algorithms Mol Biol 2019 25;14. Epub 2019 Mar 25.

INESC-ID/Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol 9, 1000-029 Lisbon, Portugal.

Background: Boolean models of biological signalling-regulatory networks are increasingly used to formally describe and understand complex biological processes. These models may become inconsistent as new data become available and need to be repaired. In the past, the focus has been shed on the inference of (classes of) models given an interaction network and time-series data sets. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0145-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6434824PMC
March 2019
2 Reads

Kermit: linkage map guided long read assembly.

Algorithms Mol Biol 2019 20;14. Epub 2019 Mar 20.

1Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.

Background : With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0143-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6425630PMC
March 2019
15 Reads

Reconciling multiple genes trees via segmental duplications and losses.

Algorithms Mol Biol 2019 20;14. Epub 2019 Mar 20.

3ISEM, CNRS, IRD, EPHE, Universit de Montpellier, Montpellier, France.

Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0139-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6425616PMC
March 2019
3 Reads

External memory BWT and LCP computation for sequence collections with applications.

Algorithms Mol Biol 2019 8;14. Epub 2019 Mar 8.

4Institute of Computing, University of Campinas, Av. Albert Einstein, 1251, 13083-852 Campinas, Brazil.

Background: Sequencing technologies produce larger and larger collections of biosequences that have to be stored in compressed indices supporting fast search operations. Many compressed indices are based on the Burrows-Wheeler Transform (BWT) and the longest common prefix (LCP) array. Because of the sheer size of the input it is important to build these data structures in external memory and time using in the best possible way the available RAM. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0140-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6408864PMC
March 2019
3 Reads

Connectivity problems on heterogeneous graphs.

Algorithms Mol Biol 2019 8;14. Epub 2019 Mar 8.

2Department of Electrical Engineering and Computer Science, UC Berkeley, Berkeley, CA USA.

Background: Network connectivity problems are abundant in computational biology research, where graphs are used to represent a range of phenomena: from physical interactions between molecules to more abstract relationships such as gene co-expression. One common challenge in studying biological networks is the need to extract meaningful, small subgraphs out of large databases of potential interactions. A useful abstraction for this task turned out to be the Steiner Network problems: given a reference "database" graph, find a parsimonious subgraph that satisfies a given set of connectivity demands. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0141-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6408827PMC
March 2019
19 Reads

Semi-nonparametric modeling of topological domain formation from epigenetic data.

Algorithms Mol Biol 2019 5;14. Epub 2019 Mar 5.

2Computational Biology Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, 15213 USA.

Background: Hi-C experiments capturing the 3D genome architecture have led to the discovery of topologically-associated domains (TADs) that form an important part of the 3D genome organization and appear to play a role in gene regulation and other functions. Several histone modifications have been independently associated with TAD formation, but their combinatorial effects on domain formation remain poorly understood at a global scale.

Results: We propose a convex semi-nonparametric approach called based on Bernstein polynomials to explore the joint effects of histone markers on TAD formation as well as predict TADs solely from the histone data. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0142-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6399866PMC
March 2019
2 Reads

Automated partial atomic charge assignment for drug-like molecules: a fast knapsack approach.

Algorithms Mol Biol 2019 5;14. Epub 2019 Feb 5.

1Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstr. 1, 40225 Düsseldorf, Germany.

A key factor in computational drug design is the consistency and reliability with which intermolecular interactions between a wide variety of molecules can be described. Here we present a procedure to efficiently, reliably and automatically assign partial atomic charges to atoms based on known distributions. We formally introduce the molecular charge assignment problem, where the task is to select a charge from a set of candidate charges for every atom of a given query molecule. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0138-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364451PMC
February 2019
4 Reads

Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy.

Algorithms Mol Biol 2019 6;14. Epub 2019 Feb 6.

3Department of Computer Science, University of Illinois Urbana-Champaign, 201 N. Goodwin Avenue, Urbana, IL 61801 USA.

Background: Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch weights are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was published in SODA 2001. The main empirical advantage of over other AFC methods is its use of neighbor joining () to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0136-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364484PMC
February 2019
2 Reads

SNPs detection by eBWT positional clustering.

Algorithms Mol Biol 2019 6;14. Epub 2019 Feb 6.

1Dipartimento di Informatica, University of Pisa, Pisa, Italy.

Background: Sequencing technologies keep on turning cheaper and faster, thus putting a growing pressure for data structures designed to efficiently store raw data, and possibly perform analysis therein. In this view, there is a growing interest in alignment-free and reference-free variants calling methods that only make use of (suitably indexed) raw reads data.

Results: We develop the theory that (i) describes how the extended Burrows-Wheeler Transform (eBWT) of a collection of reads tends to cluster together bases that cover the same genome position (ii) predicts the size of such clusters, and (iii) exhibits an elegant and precise LCP array based procedure to locate such clusters in the eBWT. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0137-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364478PMC
February 2019
3 Reads

Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments.

Algorithms Mol Biol 2018 8;13:17. Epub 2018 Dec 8.

1Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8000 Aarhus C, Denmark.

Background: Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0135-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6286601PMC
December 2018
5 Reads

Superbubbles revisited.

Algorithms Mol Biol 2018 1;13:16. Epub 2018 Dec 1.

1Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Universität Leipzig, Augustusplatz 12, 04107 Leipzig, Germany.

Background: Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles are therefore of important for the processing of HTS data. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0134-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6271648PMC
December 2018
4 Reads

Coordinate systems for supergenomes.

Algorithms Mol Biol 2018 24;13:15. Epub 2018 Sep 24.

1Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Universität Leipzig, Augustusplatz 12, 04107 Leipzig, Germany.

Background: Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools to visualize-omics data simultaneously for multiple species are sorely lacking. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0133-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151955PMC
September 2018
7 Reads

Improved de novo peptide sequencing using LC retention time information.

Algorithms Mol Biol 2018 29;13:14. Epub 2018 Aug 29.

Department of Computer Science, ETH Zurich, Universitätstrasse 6, 8092 Zürich, Switzerland.

Background: Liquid chromatography combined with tandem mass spectrometry is an important tool in proteomics for peptide identification. Liquid chromatography temporally separates the peptides in a sample. The peptides that elute one after another are analyzed via tandem mass spectrometry by measuring the mass-to-charge ratio of a peptide and its fragments. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0132-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6114869PMC
August 2018
24 Reads

Sorting signed circular permutations by super short operations.

Algorithms Mol Biol 2018 26;13:13. Epub 2018 Jul 26.

1Institute of Computing, University of Campinas, Campinas, Brazil.

Background: One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or , that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed permutations of the form , and in that case we can consider without loss of generality that one of them is the identity permutation , and that we just need to the other (i. Read More

View Article

Download full-text PDF

Source
https://almob.biomedcentral.com/articles/10.1186/s13015-018-
Publisher Site
http://dx.doi.org/10.1186/s13015-018-0131-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6060566PMC
July 2018
37 Reads

Split-inducing indels in phylogenomic analysis.

Algorithms Mol Biol 2018 16;13:12. Epub 2018 Jul 16.

2Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.

Background: Most phylogenetic studies using molecular data treat gaps in multiple sequence alignments as missing data or even completely exclude alignment columns that contain gaps.

Results: Here we show that gap patterns in large-scale, genome-wide alignments are themselves phylogenetically informative and can be used to infer reliable phylogenies provided the gap data are properly filtered to reduce noise introduced by the alignment method. We introduce here the notion of split-inducing indels () that define an approximate bipartition of the taxon set. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0130-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6047143PMC
July 2018
42 Reads

Locus-aware decomposition of gene trees with respect to polytomous species trees.

Algorithms Mol Biol 2018 4;13:11. Epub 2018 Jun 4.

1Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, Warsaw, Poland.

Background: Horizontal gene transfer (HGT), a process of acquisition and fixation of foreign genetic material, is an important biological phenomenon. Several approaches to HGT inference have been proposed. However, most of them either rely on approximate, non-phylogenetic methods or on the tree reconciliation, which is computationally intensive and sensitive to parameter values. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0128-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5985597PMC
June 2018
9 Reads

A fast and accurate enumeration-based algorithm for haplotyping a triploid individual.

Algorithms Mol Biol 2018 1;13:10. Epub 2018 Jun 1.

2College of Computer Science and Information Technology, Guangxi Normal University, Guilin, 541004 China.

Background: Haplotype assembly, reconstructing haplotypes from sequence data, is one of the major computational problems in bioinformatics. Most of the current methodologies for haplotype assembly are designed for diploid individuals. In recent years, genomes having more than two sets of homologous chromosomes have attracted many research groups that are interested in the genomics of disease, phylogenetics, botany and evolution. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0129-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5984336PMC
June 2018
6 Reads

Finding local genome rearrangements.

Algorithms Mol Biol 2018 4;13. Epub 2018 May 4.

1CNRS, LIRMM, Université Montpellier, 161 rue Ada, 34392 Montpellier, France.

Background: The double cut and join (DCJ) model of genome rearrangement is well studied due to its mathematical simplicity and power to account for the many events that transform gene order. These studies have mostly been devoted to the understanding of minimum length scenarios transforming one genome into another. In this paper we search instead for rearrangement scenarios that minimize the number of rearrangements whose breakpoints are unlikely due to some biological criteria. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0127-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5934872PMC
May 2018
7 Reads

FSH: fast spaced seed hashing exploiting adjacent hashes.

Algorithms Mol Biol 2018 22;13. Epub 2018 Mar 22.

Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.

Background: Patterns with wildcards in specified positions, namely , are increasingly used instead of -mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of -mers can be rapidly computed by exploiting the large overlap between consecutive -mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0125-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5863468PMC
March 2018
9 Reads

Outlier detection in BLAST hits.

Algorithms Mol Biol 2018 22;13. Epub 2018 Mar 22.

1Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, 20742 USA.

Background: An important task in a metagenomic analysis is the assignment of taxonomic labels to sequences in a sample. Most widely used methods for taxonomy assignment compare a sequence in the sample to a database of known sequences. Many approaches use the best BLAST hit(s) to assign the taxonomic label. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0126-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5863388PMC
March 2018
8 Reads
1.460 Impact Factor

OCTAL: Optimal Completion of gene trees in polynomial time.

Algorithms Mol Biol 2018 15;13. Epub 2018 Mar 15.

Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL 61801 USA.

Background: For a combination of reasons (including data generation protocols, approaches to taxon and gene sampling, and gene birth and loss), estimated gene trees are often incomplete, meaning that they do not contain all of the species of interest. As incomplete gene trees can impact downstream analyses, accurate completion of gene trees is desirable.

Results: We introduce the , a general optimization problem that involves completing an unrooted binary tree (i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-018-0124-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5853121PMC
March 2018
8 Reads

Derivative-free neural network for optimizing the scoring functions associated with dynamic programming of pairwise-profile alignment.

Algorithms Mol Biol 2018 15;13. Epub 2018 Feb 15.

1Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8579 Japan.

Background: A profile-comparison method with position-specific scoring matrix (PSSM) is among the most accurate alignment methods. Currently, cosine similarity and correlation coefficients are used as scoring functions of dynamic programming to calculate similarity between PSSMs. However, it is unclear whether these functions are optimal for profile alignment methods. Read More

View Article

Download full-text PDF

Source
https://almob.biomedcentral.com/articles/10.1186/s13015-018-
Publisher Site
http://dx.doi.org/10.1186/s13015-018-0123-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5815186PMC
February 2018
30 Reads

Fast phylogenetic inference from typing data.

Algorithms Mol Biol 2018 15;13. Epub 2018 Feb 15.

3INESC-ID Lisboa, Rua Alves Redol 9, 1000-029 Lisboa, Portugal.

Background: Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-017-0119-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5815242PMC
February 2018
9 Reads