Publications by authors named "João Meidanis"

13 Publications

  • Page 1 of 1

A cubic algorithm for the generalized rank median of three genomes.

Algorithms Mol Biol 2019 26;14:16. Epub 2019 Jul 26.

2Institute of Computing, University of Campinas, Campinas, Brazil.

Background: The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Among these, one of the most intractable ones has been that of finding the median of three genomes, a special case of the ancestral reconstruction problem. In this work we re-examine our recently proposed way of measuring genome rearrangement distance, namely, the rank distance between the matrix representations of the corresponding genomes, and show that the median of three genomes can be computed exactly in polynomial time , where , with respect to this distance, when the median is allowed to be an arbitrary orthogonal matrix.

Results: We define the five fundamental subspaces depending on three input genomes, and use their properties to show that a particular action on each of these subspaces produces a median. In the process we introduce the notion of -stable subspaces. We also show that the median found by our algorithm is always orthogonal, symmetric, and conserves any adjacencies or telomeres present in at least 2 out of 3 input genomes.

Conclusions: We test our method on both simulated and real data. We find that the majority of the realistic inputs result in genomic outputs, and for those that do not, our two heuristics perform well in terms of reconstructing a genomic matrix attaining a score close to the lower bound, while running in a reasonable amount of time. We conclude that the rank distance is not only theoretically intriguing, but also practically useful for median-finding, and potentially ancestral genome reconstruction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13015-019-0150-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6867026PMC
July 2019

Models for Similarity Distributions of Syntenic Homologs and Applications to Phylogenomics.

IEEE/ACM Trans Comput Biol Bioinform 2018 Jul 31. Epub 2018 Jul 31.

We outline an integrated approach to speciation and whole genome duplication (WGD) to resolve the occurrence of these events in phylogenetic analysis. We propose a more principled way of estimating the parameters of gene divergence and fractionation than the standard mixture of normals analysis. We formulate an algorithm for resolving data on local peaks in the distributions of duplicate gene similarities for a number of related genomes. We illustrate with a comprehensive analysis of WGD-origin duplicate gene data from the family Brassicaceae.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2018.2849377DOI Listing
July 2018

On the rank-distance median of 3 permutations.

BMC Bioinformatics 2018 05 8;19(Suppl 6):142. Epub 2018 May 8.

University of Campinas, Campinas, Brazil.

Background: Recently, Pereira Zanetti, Biller and Meidanis have proposed a new definition of a rearrangement distance between genomes. In this formulation, each genome is represented as a matrix, and the distance d is the rank distance between these matrices. Although defined in terms of matrices, the rank distance is equal to the minimum total weight of a series of weighted operations that leads from one genome to the other, including inversions, translocations, transpositions, and others. The computational complexity of the median-of-three problem according to this distance is currently unknown. The genome matrices are a special kind of permutation matrices, which we study in this paper. In their paper, the authors provide an [Formula: see text] algorithm for determining three candidate medians, prove the tight approximation ratio [Formula: see text], and provide a sufficient condition for their candidates to be true medians. They also conduct some experiments that suggest that their method is accurate on simulated and real data.

Results: In this paper, we extend their results and provide the following: Three invariants characterizing the problem of finding the median of 3 matrices A sufficient condition for uniqueness of medians that can be checked in O(n) A faster, [Formula: see text] algorithm for determining the median under this condition A new heuristic algorithm for this problem based on compressed sensing A [Formula: see text] algorithm that exactly solves the problem when the inputs are orthogonal matrices, a class that includes both permutations and genomes as special cases.

Conclusions: Our work provides the first proof that, with respect to the rank distance, the problem of finding the median of 3 genomes, as well as the median of 3 permutations, is exactly solvable in polynomial time, a result which should be contrasted with its NP-hardness for the DCJ (double cut-and-join) distance and most other families of genome rearrangement operations. This result, backed by our experimental tests, indicates that the rank distance is a viable alternative to the DCJ distance widely used in genome comparisons.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-018-2131-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998913PMC
May 2018

Median Approximations for Genomes Modeled as Matrices.

Bull Math Biol 2016 04 12;78(4):786-814. Epub 2016 Apr 12.

Institute of Computing, University of Campinas, Campinas, Brazil.

The genome median problem is an important problem in phylogenetic reconstruction under rearrangement models. It can be stated as follows: Given three genomes, find a fourth that minimizes the sum of the pairwise rearrangement distances between it and the three input genomes. In this paper, we model genomes as matrices and study the matrix median problem using the rank distance. It is known that, for any metric distance, at least one of the corners is a [Formula: see text]-approximation of the median. Our results allow us to compute up to three additional matrix median candidates, all of them with approximation ratios at least as good as the best corner, when the input matrices come from genomes. We also show a class of instances where our candidates are optimal. From the application point of view, it is usually more interesting to locate medians farther from the corners, and therefore, these new candidates are potentially more useful. In addition to the approximation algorithm, we suggest a heuristic to get a genome from an arbitrary square matrix. This is useful to translate the results of our median approximation algorithm back to genomes, and it has good results in our tests. To assess the relevance of our approach in the biological context, we ran simulated evolution tests and compared our solutions to those of an exact DCJ median solver. The results show that our method is capable of producing very good candidates.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s11538-016-0162-4DOI Listing
April 2016

Extending the algebraic formalism for genome rearrangements to include linear chromosomes.

IEEE/ACM Trans Comput Biol Bioinform 2013 Jul-Aug;10(4):819-31

University of Campinas, Campinas and Scylla Bioinformatics, Brazil.

Algebraic rearrangement theory, as introduced by Meidanis and Dias, focuses on representing the order in which genes appear in chromosomes, and applies to circular chromosomes only. By shifting our attention to genome adjacencies, we introduce the adjacency algebraic theory, extending the original algebraic theory to linear chromosomes in a very natural way, also allowing the original algebraic distance formula to be used to the general multichromosomal case, with both linear and circular chromosomes. The resulting distance, which we call algebraic distance here, is very similar to, but not quite the same as, double-cut-and-join distance. We present linear time algorithms to compute it and to sort genomes. We show how to compute the rearrangement distance from the adjacency graph, for an easier comparison with other rearrangement distances. A thorough discussion on the relationship between the chromosomal and adjacency representation is also given, and we show how all classic rearrangement operations can be modeled using the algebraic theory.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2012.161DOI Listing
August 2014

Rearrangement-based phylogeny using the Single-Cut-or-Join operation.

IEEE/ACM Trans Comput Biol Bioinform 2013 Jan-Feb;10(1):122-34

Institute of Computing, University of Campinas, Brazil.

Recently, the Single-Cut-or-Join (SCJ) operation was proposed as a basis for a new rearrangement distance between multichromosomal genomes, leading to very fast algorithms, both in theory and in practice. However, it was not clear how well this new distance fares when it comes to using it to solve relevant problems, such as the reconstruction of evolutionary history. In this paper, we advance current knowledge, by testing SCJ's ability regarding evolutionary reconstruction in two aspects: 1) How well does SCJ reconstruct evolutionary topologies? and 2) How well does SCJ reconstruct ancestral genomes? In the process of answering these questions, we implemented SCJ-based methods, and made them available to the community. We ran experiments using as many as 200 genomes, with as many as 3,000 genes. For the first question, we found out that SCJ can recover typically between 60 percent and more than 95 percent of the topology, as measured through the Robinson-Foulds distance (a.k.a. split distance) between trees. In other words, 60 percent to more than 95 percent of the original splits are also present in the reconstructed tree. For the second question, given a topology, SCJ's ability to reconstruct ancestral genomes depends on how far from the leaves the ancestral is. For nodes close to the leaves, about 85 percent of the gene adjacencies can be recovered. This percentage decreases as we move up the tree, but, even at the root, about 50 percent of the adjacencies are recovered, for as many as 64 leaves. Our findings corroborate the fact that SCJ leads to very conservative genome reconstructions, yielding very few false-positive gene adjacencies in the ancestrals, at the expense of a relatively larger amount of false negatives. In addition, experiments with real data from the Campanulaceae and Protostomes groups show that SCJ reconstructs topologies of quality comparable to the accepted trees of the species involved. As far as time is concerned, the methods we implemented can find a topology for 64 genomes with 2,000 genes each in about 10.7 minutes, and reconstruct the ancestral genomes in a 64-leaf tree in about 3 seconds, both on a typical desktop computer. It should be noted that our code is written in Java and we made no significant effort to optimize it.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2012.168DOI Listing
January 2014

Genome sequence and assembly of Bos indicus.

J Hered 2012 May-Jun;103(3):342-8. Epub 2012 Feb 7.

Department of Molecular Biology, Genoa Biotecnologia SA, São Paulo, SP 04004-030, Brazil.

Cattle are divided into 2 groups referred to as taurine and indicine, both of which have been under strong artificial selection due to their importance for human nutrition. A side effect of this domestication includes a loss of genetic diversity within each specialized breed. Recently, the first taurine genome was sequenced and assembled, allowing for a better understanding of this ruminant species. However, genetic information from indicine breeds has been limited. Here, we present the first genome sequence of an indicine breed (Nellore) generated with 52X coverage by SOLiD sequencing platform. As expected, both genomes share high similarity at the nucleotide level for all autosomes and the X chromosome. Regarding the Y chromosome, the homology was considerably lower, most likely due to uncompleted assembly of the taurine Y chromosome. We were also able to cover 97% of the annotated taurine protein-coding genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jhered/esr153DOI Listing
August 2012

SCJ: a breakpoint-like distance that simplifies several rearrangement problems.

IEEE/ACM Trans Comput Biol Bioinform 2011 Sep-Oct;8(5):1318-29

Institute of Computing, University of Campinas, Brazil.

The breakpoint distance is one of the most straightforward genome comparison measures. Surprisingly, when it comes to defining it precisely for multichromosomal genomes with both linear and circular chromosomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, Tannier et al. defined it differently in 2008, and in this paper we provide yet another alternative, calling it SCJ for single-cut-or-join, in analogy to the popular double cut and join (DCJ) measure. We show that several genome rearrangement problems, such as median and halving, become easy for SCJ, and provide linear and higher polynomial time algorithms for them. For the multichromosomal linear genome median problem, this is the first polynomial time algorithm described, since for other distances this problem is NP-hard. In addition, we show that small parsimony under SCJ is also easy, and can be solved by a variant of Fitch's algorithm. In contrast, big parsimony is NP-hard under SCJ. This new distance measure may be of value as a speedily computable, first approximation to distances based on more realistic rearrangement models.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2011.34DOI Listing
January 2012

A fully resolved consensus between fully resolved phylogenetic trees.

Genet Mol Res 2006 Mar 31;5(1):269-83. Epub 2006 Mar 31.

Instituto de Computação, Unicamp, Campinas, SP, Brazil.

Nowadays, there are many phylogeny reconstruction methods, each with advantages and disadvantages. We explored the advantages of each method, putting together the common parts of trees constructed by several methods, by means of a consensus computation. A number of phylogenetic consensus methods are already known. Unfortunately, there is also a taboo concerning consensus methods, because most biologists see them mainly as comparators and not as phylogenetic tree constructors. We challenged this taboo by defining a consensus method that builds a fully resolved phylogenetic tree based on the most common parts of fully resolved trees in a given collection. We also generated results showing that this consensus is in a way a kind of "median" of the input trees; as such it can be closer to the correct tree in many situations.
View Article and Find Full Text PDF

Download full-text PDF

Source
March 2006

Analysis and functional annotation of an expressed sequence tag collection for tropical crop sugarcane.

Genome Res 2003 Dec 12;13(12):2725-35. Epub 2003 Nov 12.

Centro de Biologia Molecular e Engenharia Genética, Instituto da Computação, Universidade Estadual de Campinas, 13083-970 Campinas-SP, Brazil.

To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.1532103DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC403815PMC
December 2003

A lower bound on the reversal and transposition diameter.

J Comput Biol 2002 ;9(5):743-5

Universidade de Campinas, Instituto de Computação, 13083-971 Campinas, São Paulo, Brazil.

One possible model to study genome evolution is to represent genomes as permutations of genes and compute distances based on the minimum number of certain operations (rearrangements) needed to transform one permutation into another. Under this model, the shorter the distance, the closer the genomes are. Two operations that have been extensively studied are the reversal and the transposition. A reversal is an operation that reverses the order of the genes on a certain portion of the permutation. A transposition is an operation that "cuts" a certain portion of the permutation and "pastes" it elsewhere in the same permutation. In this note, we show that the reversal and transposition distance of the signed permutation pi(n) = (-1 -2.-(n - 1)-n) with respect to the identity is left floor n/2 right floor + 2 for all n>or=3. We conjecture that this value is the diameter of the permutation group under these operations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1089/106652702761034163DOI Listing
May 2003

Whole-genome analysis of transporters in the plant pathogen Xylella fastidiosa.

Microbiol Mol Biol Rev 2002 Jun;66(2):272-99

Instituto de Computação, Universidade de Campinas, Campinas, São Paulo 13083-970, Brazil.

The transport systems of the first completely sequenced genome of a plant parasite, Xylella fastidiosa, were analyzed. In all, 209 proteins were classified here as constitutive members of transport families; thus, we have identified 69 new transporters in addition to the 140 previously annotated. The analysis lead to several hints on potential ways of controlling the disease it causes on citrus trees. An ADP:ATP translocator, previously found in intracellular parasites only, was found in X. fastidiosa. A P-type ATPase is missing-among the 24 completely sequenced eubacteria to date, only three (including X. fastidiosa) do not have a P-type ATPase, and they are all parasites transmitted by insect vectors. An incomplete phosphotransferase system (PTS) was found, without the permease subunits-we conjecture either that they are among the hypothetical proteins or that the PTS plays a solely metabolic regulatory role. We propose that the Ttg2 ABC system might be an import system eventually involved in glutamate import rather than a toluene exporter, as previously annotated. X. fastidiosa exhibits fewer proteins with > or =4 alpha-helical transmembrane spanners than any other completely sequenced prokaryote to date. X. fastidiosa has only 2.7% of all open reading frames identifiable as major transporters, which puts it as the eubacterium having the lowest percentage of open reading frames involved in transport, closer to two archaea, Methanococcus jannaschii (2.4%) and Methanobacterium thermoautotrophicum (2.4%).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC120790PMC
http://dx.doi.org/10.1128/mmbr.66.2.272-299.2002DOI Listing
June 2002