**13** Publications

- Page
**1**of**1**

Br J Haematol 2020 05 18;189(4):e150-e154. Epub 2020 Mar 18.

Laboratório de Biologia Molecular, Centro Infantil Boldrini, Campinas, São Paulo, Brazil.

## Download full-text PDF |
Source |
---|---|

http://dx.doi.org/10.1111/bjh.16571 | DOI Listing |

May 2020

Algorithms Mol Biol 2019 26;14:16. Epub 2019 Jul 26.

2Institute of Computing, University of Campinas, Campinas, Brazil.

## Download full-text PDF |
Source |
---|---|

http://dx.doi.org/10.1186/s13015-019-0150-y | DOI Listing |

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6867026 | PMC |

July 2019

IEEE/ACM Trans Comput Biol Bioinform 2018 Jul 31. Epub 2018 Jul 31.

We outline an integrated approach to speciation and whole genome duplication (WGD) to resolve the occurrence of these events in phylogenetic analysis. We propose a more principled way of estimating the parameters of gene divergence and fractionation than the standard mixture of normals analysis. We formulate an algorithm for resolving data on local peaks in the distributions of duplicate gene similarities for a number of related genomes. We illustrate with a comprehensive analysis of WGD-origin duplicate gene data from the family Brassicaceae.

## Download full-text PDF |
Source |
---|---|

http://dx.doi.org/10.1109/TCBB.2018.2849377 | DOI Listing |

July 2018

BMC Bioinformatics 2018 05 8;19(Suppl 6):142. Epub 2018 May 8.

University of Campinas, Campinas, Brazil.

## Download full-text PDF |
Source |
---|---|

http://dx.doi.org/10.1186/s12859-018-2131-4 | DOI Listing |

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998913 | PMC |

May 2018

Bull Math Biol 2016 04 12;78(4):786-814. Epub 2016 Apr 12.

Institute of Computing, University of Campinas, Campinas, Brazil.

The genome median problem is an important problem in phylogenetic reconstruction under rearrangement models. It can be stated as follows: Given three genomes, find a fourth that minimizes the sum of the pairwise rearrangement distances between it and the three input genomes. In this paper, we model genomes as matrices and study the matrix median problem using the rank distance. It is known that, for any metric distance, at least one of the corners is a [Formula: see text]-approximation of the median. Our results allow us to compute up to three additional matrix median candidates, all of them with approximation ratios at least as good as the best corner, when the input matrices come from genomes. We also show a class of instances where our candidates are optimal. From the application point of view, it is usually more interesting to locate medians farther from the corners, and therefore, these new candidates are potentially more useful. In addition to the approximation algorithm, we suggest a heuristic to get a genome from an arbitrary square matrix. This is useful to translate the results of our median approximation algorithm back to genomes, and it has good results in our tests. To assess the relevance of our approach in the biological context, we ran simulated evolution tests and compared our solutions to those of an exact DCJ median solver. The results show that our method is capable of producing very good candidates.

## Download full-text PDF |
Source |
---|---|

http://dx.doi.org/10.1007/s11538-016-0162-4 | DOI Listing |

April 2016

IEEE/ACM Trans Comput Biol Bioinform 2013 Jul-Aug;10(4):819-31

University of Campinas, Campinas and Scylla Bioinformatics, Brazil.

Algebraic rearrangement theory, as introduced by Meidanis and Dias, focuses on representing the order in which genes appear in chromosomes, and applies to circular chromosomes only. By shifting our attention to genome adjacencies, we introduce the adjacency algebraic theory, extending the original algebraic theory to linear chromosomes in a very natural way, also allowing the original algebraic distance formula to be used to the general multichromosomal case, with both linear and circular chromosomes. The resulting distance, which we call algebraic distance here, is very similar to, but not quite the same as, double-cut-and-join distance. We present linear time algorithms to compute it and to sort genomes. We show how to compute the rearrangement distance from the adjacency graph, for an easier comparison with other rearrangement distances. A thorough discussion on the relationship between the chromosomal and adjacency representation is also given, and we show how all classic rearrangement operations can be modeled using the algebraic theory.

## Download full-text PDF |
Source |
---|---|

http://dx.doi.org/10.1109/TCBB.2012.161 | DOI Listing |

August 2014

IEEE/ACM Trans Comput Biol Bioinform 2013 Jan-Feb;10(1):122-34

Institute of Computing, University of Campinas, Brazil.

Recently, the Single-Cut-or-Join (SCJ) operation was proposed as a basis for a new rearrangement distance between multichromosomal genomes, leading to very fast algorithms, both in theory and in practice. However, it was not clear how well this new distance fares when it comes to using it to solve relevant problems, such as the reconstruction of evolutionary history. In this paper, we advance current knowledge, by testing SCJ's ability regarding evolutionary reconstruction in two aspects: 1) How well does SCJ reconstruct evolutionary topologies? and 2) How well does SCJ reconstruct ancestral genomes? In the process of answering these questions, we implemented SCJ-based methods, and made them available to the community. We ran experiments using as many as 200 genomes, with as many as 3,000 genes. For the first question, we found out that SCJ can recover typically between 60 percent and more than 95 percent of the topology, as measured through the Robinson-Foulds distance (a.k.a. split distance) between trees. In other words, 60 percent to more than 95 percent of the original splits are also present in the reconstructed tree. For the second question, given a topology, SCJ's ability to reconstruct ancestral genomes depends on how far from the leaves the ancestral is. For nodes close to the leaves, about 85 percent of the gene adjacencies can be recovered. This percentage decreases as we move up the tree, but, even at the root, about 50 percent of the adjacencies are recovered, for as many as 64 leaves. Our findings corroborate the fact that SCJ leads to very conservative genome reconstructions, yielding very few false-positive gene adjacencies in the ancestrals, at the expense of a relatively larger amount of false negatives. In addition, experiments with real data from the Campanulaceae and Protostomes groups show that SCJ reconstructs topologies of quality comparable to the accepted trees of the species involved. As far as time is concerned, the methods we implemented can find a topology for 64 genomes with 2,000 genes each in about 10.7 minutes, and reconstruct the ancestral genomes in a 64-leaf tree in about 3 seconds, both on a typical desktop computer. It should be noted that our code is written in Java and we made no significant effort to optimize it.

## Download full-text PDF |
Source |
---|---|

http://dx.doi.org/10.1109/TCBB.2012.168 | DOI Listing |

January 2014

J Hered 2012 May-Jun;103(3):342-8. Epub 2012 Feb 7.

Department of Molecular Biology, Genoa Biotecnologia SA, São Paulo, SP 04004-030, Brazil.

Cattle are divided into 2 groups referred to as taurine and indicine, both of which have been under strong artificial selection due to their importance for human nutrition. A side effect of this domestication includes a loss of genetic diversity within each specialized breed. Recently, the first taurine genome was sequenced and assembled, allowing for a better understanding of this ruminant species. However, genetic information from indicine breeds has been limited. Here, we present the first genome sequence of an indicine breed (Nellore) generated with 52X coverage by SOLiD sequencing platform. As expected, both genomes share high similarity at the nucleotide level for all autosomes and the X chromosome. Regarding the Y chromosome, the homology was considerably lower, most likely due to uncompleted assembly of the taurine Y chromosome. We were also able to cover 97% of the annotated taurine protein-coding genes.

## Download full-text PDF |
Source |
---|---|

http://dx.doi.org/10.1093/jhered/esr153 | DOI Listing |

August 2012

IEEE/ACM Trans Comput Biol Bioinform 2011 Sep-Oct;8(5):1318-29

Institute of Computing, University of Campinas, Brazil.

The breakpoint distance is one of the most straightforward genome comparison measures. Surprisingly, when it comes to defining it precisely for multichromosomal genomes with both linear and circular chromosomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, Tannier et al. defined it differently in 2008, and in this paper we provide yet another alternative, calling it SCJ for single-cut-or-join, in analogy to the popular double cut and join (DCJ) measure. We show that several genome rearrangement problems, such as median and halving, become easy for SCJ, and provide linear and higher polynomial time algorithms for them. For the multichromosomal linear genome median problem, this is the first polynomial time algorithm described, since for other distances this problem is NP-hard. In addition, we show that small parsimony under SCJ is also easy, and can be solved by a variant of Fitch's algorithm. In contrast, big parsimony is NP-hard under SCJ. This new distance measure may be of value as a speedily computable, first approximation to distances based on more realistic rearrangement models.

## Download full-text PDF |
Source |
---|---|

http://dx.doi.org/10.1109/TCBB.2011.34 | DOI Listing |

January 2012

Genet Mol Res 2006 Mar 31;5(1):269-83. Epub 2006 Mar 31.

Instituto de Computação, Unicamp, Campinas, SP, Brazil.

Nowadays, there are many phylogeny reconstruction methods, each with advantages and disadvantages. We explored the advantages of each method, putting together the common parts of trees constructed by several methods, by means of a consensus computation. A number of phylogenetic consensus methods are already known. Unfortunately, there is also a taboo concerning consensus methods, because most biologists see them mainly as comparators and not as phylogenetic tree constructors. We challenged this taboo by defining a consensus method that builds a fully resolved phylogenetic tree based on the most common parts of fully resolved trees in a given collection. We also generated results showing that this consensus is in a way a kind of "median" of the input trees; as such it can be closer to the correct tree in many situations.

## Download full-text PDF |
Source |
---|

March 2006

Genome Res 2003 Dec 12;13(12):2725-35. Epub 2003 Nov 12.

Centro de Biologia Molecular e Engenharia Genética, Instituto da Computação, Universidade Estadual de Campinas, 13083-970 Campinas-SP, Brazil.

To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged.

## Download full-text PDF |
Source |
---|---|

http://dx.doi.org/10.1101/gr.1532103 | DOI Listing |

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC403815 | PMC |

December 2003

J Comput Biol 2002 ;9(5):743-5

Universidade de Campinas, Instituto de Computação, 13083-971 Campinas, São Paulo, Brazil.

One possible model to study genome evolution is to represent genomes as permutations of genes and compute distances based on the minimum number of certain operations (rearrangements) needed to transform one permutation into another. Under this model, the shorter the distance, the closer the genomes are. Two operations that have been extensively studied are the reversal and the transposition. A reversal is an operation that reverses the order of the genes on a certain portion of the permutation. A transposition is an operation that "cuts" a certain portion of the permutation and "pastes" it elsewhere in the same permutation. In this note, we show that the reversal and transposition distance of the signed permutation pi(n) = (-1 -2.-(n - 1)-n) with respect to the identity is left floor n/2 right floor + 2 for all n>or=3. We conjecture that this value is the diameter of the permutation group under these operations.

## Download full-text PDF |
Source |
---|---|

http://dx.doi.org/10.1089/106652702761034163 | DOI Listing |

May 2003

Microbiol Mol Biol Rev 2002 Jun;66(2):272-99

Instituto de Computação, Universidade de Campinas, Campinas, São Paulo 13083-970, Brazil.

The transport systems of the first completely sequenced genome of a plant parasite, Xylella fastidiosa, were analyzed. In all, 209 proteins were classified here as constitutive members of transport families; thus, we have identified 69 new transporters in addition to the 140 previously annotated. The analysis lead to several hints on potential ways of controlling the disease it causes on citrus trees. An ADP:ATP translocator, previously found in intracellular parasites only, was found in X. fastidiosa. A P-type ATPase is missing-among the 24 completely sequenced eubacteria to date, only three (including X. fastidiosa) do not have a P-type ATPase, and they are all parasites transmitted by insect vectors. An incomplete phosphotransferase system (PTS) was found, without the permease subunits-we conjecture either that they are among the hypothetical proteins or that the PTS plays a solely metabolic regulatory role. We propose that the Ttg2 ABC system might be an import system eventually involved in glutamate import rather than a toluene exporter, as previously annotated. X. fastidiosa exhibits fewer proteins with > or =4 alpha-helical transmembrane spanners than any other completely sequenced prokaryote to date. X. fastidiosa has only 2.7% of all open reading frames identifiable as major transporters, which puts it as the eubacterium having the lowest percentage of open reading frames involved in transport, closer to two archaea, Methanococcus jannaschii (2.4%) and Methanobacterium thermoautotrophicum (2.4%).

## Download full-text PDF |
Source |
---|---|

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC120790 | PMC |

http://dx.doi.org/10.1128/mmbr.66.2.272-299.2002 | DOI Listing |

June 2002