Publications by authors named "Arian Smit"

36 Publications

The Dfam community resource of transposable element families, sequence models, and genome annotations.

Mob DNA 2021 Jan 12;12(1). Epub 2021 Jan 12.

Institute for Systems Biology, Seattle, WA, 98109, USA.

Dfam is an open access database of repetitive DNA families, sequence models, and genome annotations. The 3.0-3.3 releases of Dfam ( https://dfam.org ) represent an evolution from a proof-of-principle collection of transposable element families in model organisms into a community resource for a broad range of species, and for both curated and uncurated datasets. In addition, releases since Dfam 3.0 provide auxiliary consensus sequence models, transposable element protein alignments, and a formalized classification system to support the growing diversity of organisms represented in the resource. The latest release includes 266,740 new de novo generated transposable element families from 336 species contributed by the EBI. This expansion demonstrates the utility of many of Dfam's new features and provides insight into the long term challenges ahead for improving de novo generated transposable element datasets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13100-020-00230-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805219PMC
January 2021

RepeatModeler2 for automated genomic discovery of transposable element families.

Proc Natl Acad Sci U S A 2020 04 16;117(17):9451-9457. Epub 2020 Apr 16.

Institute for Systems Biology, Seattle, WA 98109

The accelerating pace of genome sequencing throughout the tree of life is driving the need for improved unsupervised annotation of genome components such as transposable elements (TEs). Because the types and sequences of TEs are highly variable across species, automated TE discovery and annotation are challenging and time-consuming tasks. A critical first step is the de novo identification and accurate compilation of sequence models representing all of the unique TE families dispersed in the genome. Here we introduce RepeatModeler2, a pipeline that greatly facilitates this process. This program brings substantial improvements over the original version of RepeatModeler, one of the most widely used tools for TE discovery. In particular, this version incorporates a module for structural discovery of complete long terminal repeat (LTR) retroelements, which are widespread in eukaryotic genomes but recalcitrant to automated identification because of their size and sequence complexity. We benchmarked RepeatModeler2 on three model species with diverse TE landscapes and high-quality, manually curated TE libraries: (fruit fly), (zebrafish), and (rice). In these three species, RepeatModeler2 identified approximately 3 times more consensus sequences matching with >95% sequence identity and sequence coverage to the manually curated sequences than the original RepeatModeler. As expected, the greatest improvement is for LTR retroelements. Thus, RepeatModeler2 represents a valuable addition to the genome annotation toolkit that will enhance the identification and study of TEs in eukaryotic genome sequences. RepeatModeler2 is available as source code or a containerized package under an open license (https://github.com/Dfam-consortium/RepeatModeler, http://www.repeatmasker.org/RepeatModeler/).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1921046117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7196820PMC
April 2020

Discovery of a new repeat family in the Callithrix jacchus genome.

Genome Res 2016 05 25;26(5):649-59. Epub 2016 Feb 25.

Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana 70803, USA;

We identified a novel repeat family, termed Platy-1, in the Callithrix jacchus (common marmoset) genome that arose around the time of the divergence of platyrrhines and catarrhines and established itself as a repeat family in New World monkeys (NWMs). A full-length Platy-1 element is ∼100 bp in length, making it the shortest known short interspersed element (SINE) in primates, and harbors features characteristic of non-LTR retrotransposons. We identified 2268 full-length Platy-1 elements across 62 subfamilies in the common marmoset genome. Our subfamily reconstruction and phylogenetic analyses support Platy-1 propagation throughout the evolution of NWMs in the lineage leading to C. jacchus Platy-1 appears to have reached its amplification peak in the common ancestor of current day marmosets and has since moderately declined. However, identification of more than 200 Platy-1 elements identical to their respective consensus sequence, and the presence of polymorphic elements within common marmoset populations, suggests ongoing retrotransposition activity. Platy-1, a SINE, appears to have originated from an Alu element, and hence is likely derived from 7SL RNA. Our analyses illustrate the birth of a new repeat family and its propagation dynamics in the lineage leading to the common marmoset over the last 40 million years.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.199075.115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4864456PMC
May 2016

The Dfam database of repetitive DNA families.

Nucleic Acids Res 2016 Jan 26;44(D1):D81-9. Epub 2015 Nov 26.

University of Montana, Missoula, MT 59812, USA

Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv1272DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702899PMC
January 2016

A call for benchmarking transposable element annotation methods.

Mob DNA 2015 4;6:13. Epub 2015 Aug 4.

School of Computer Science, McGill University, McConnell Engineering Bldg., Rm. 318, 3480 Rue University, Montréal, Québec H3A 0E9 Canada ; McGill Centre for Bioinformatics, McGill University, Montréal, Québec Canada.

DNA derived from transposable elements (TEs) constitutes large parts of the genomes of complex eukaryotes, with major impacts not only on genomic research but also on how organisms evolve and function. Although a variety of methods and tools have been developed to detect and annotate TEs, there are as yet no standard benchmarks-that is, no standard way to measure or compare their accuracy. This lack of accuracy assessment calls into question conclusions from a wide range of research that depends explicitly or implicitly on TE annotation. In the absence of standard benchmarks, toolmakers are impeded in improving their tools, annotators cannot properly assess which tools might best suit their needs, and downstream researchers cannot judge how accuracy limitations might impact their studies. We therefore propose that the TE research community create and adopt standard TE annotation benchmarks, and we call for other researchers to join the authors in making this long-overdue effort a success.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13100-015-0044-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4524446PMC
August 2015

Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs.

Science 2014 Dec 11;346(6215):1254449. Epub 2014 Dec 11.

Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, MS 39762, USA. Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA. Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA.

To provide context for the diversification of archosaurs--the group that includes crocodilians, dinosaurs, and birds--we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1254449DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4386873PMC
December 2014

Multiple lineages of ancient CR1 retroposons shaped the early genome evolution of amniotes.

Genome Biol Evol 2014 Dec 11;7(1):205-17. Epub 2014 Dec 11.

Institute of Experimental Pathology (ZMBE), University of Münster, Germany.

Chicken repeat 1 (CR1) retroposons are long interspersed elements (LINEs) that are ubiquitous within amniote genomes and constitute the most abundant family of transposed elements in birds, crocodilians, turtles, and snakes. They are also present in mammalian genomes, where they reside as numerous relics of ancient retroposition events. Yet, despite their relevance for understanding amniote genome evolution, the diversity and evolution of CR1 elements has never been studied on an amniote-wide level. We reconstruct the temporal and quantitative activity of CR1 subfamilies via presence/absence analyses across crocodilian phylogeny and comparative analyses of 12 crocodilian genomes, revealing relative genomic stasis of retroposition during genome evolution of extant Crocodylia. Our large-scale phylogenetic analysis of amniote CR1 subfamilies suggests the presence of at least seven ancient CR1 lineages in the amniote ancestor; and amniote-wide analyses of CR1 successions and quantities reveal differential retention (presence of ancient relics or recent activity) of these CR1 lineages across amniote genome evolution. Interestingly, birds and lepidosaurs retained the fewest ancient CR1 lineages among amniotes and also exhibit smaller genome sizes. Our study is the first to analyze CR1 evolution in a genome-wide and amniote-wide context and the data strongly suggest that the ancestral amniote genome contained myriad CR1 elements from multiple ancient lineages, and remnants of these are still detectable in the relatively stable genomes of crocodilians and turtles. Early mammalian genome evolution was thus characterized by a drastic shift from CR1 prevalence to dominance and hyperactivity of L2 LINEs in monotremes and L1 LINEs in therians.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gbe/evu256DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4316615PMC
December 2014

Evolution and gene capture in ancient endogenous retroviruses - insights from the crocodilian genomes.

Retrovirology 2014 Dec 12;11:71. Epub 2014 Dec 12.

Faculty of Veterinary Science, University of Sydney, Sydney, NSW, 2006, Australia.

Background: Crocodilians are thought to be hosts to a diverse and divergent complement of endogenous retroviruses (ERVs) but a comprehensive investigation is yet to be performed. The recent sequencing of three crocodilian genomes provides an opportunity for a more detailed and accurate representation of the ERV diversity that is present in these species. Here we investigate the diversity, distribution and evolution of ERVs from the genomes of three key crocodilian species, and outline the key processes driving crocodilian ERV proliferation and evolution.

Results: ERVs and ERV related sequences make up less than 2% of crocodilian genomes. We recovered and described 45 ERV groups within the three crocodilian genomes, many of which are species specific. We have also revealed a new class of ERV, ERV4, which appears to be common to crocodilians and turtles, and currently has no characterised exogenous counterpart. For the first time, we formally describe the characteristics of this ERV class and its classification relative to other recognised ERV and retroviral classes. This class shares some sequence similarity and sequence characteristics with ERV3, although it is phylogenetically distinct from the other ERV classes. We have also identified two instances of gene capture by crocodilian ERVs, one of which, the capture of a host KIT-ligand mRNA has occurred without the loss of an ERV domain.

Conclusions: This study indicates that crocodilian ERVs comprise a wide variety of lineages, many of which appear to reflect ancient infections. In particular, ERV4 appears to have a limited host range, with current data suggesting that it is confined to crocodilians and some lineages of turtles. Also of interest are two ERV groups that demonstrate evidence of host gene capture. This study provides a framework to facilitate further studies into non-mammalian vertebrates and highlights the need for further studies into such species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12977-014-0071-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4299795PMC
December 2014

Growth signals employ CGGBP1 to suppress transcription of Alu-SINEs.

Cell Cycle 2016 06;15(12):1558-71

a Department of Immunology, Genetics and Pathology, Rudbeck Laboratory , Uppsala University , Uppsala , Sweden.

CGGBP1 (CGG triplet repeat-binding protein 1) regulates cell proliferation, stress response, cytokinesis, telomeric integrity and transcription. It could affect these processes by modulating target gene expression under different conditions. Identification of CGGBP1-target genes and their regulation could reveal how a transcription regulator affects such diverse cellular processes. Here we describe the mechanisms of differential gene expression regulation by CGGBP1 in quiescent or growing cells. By studying global gene expression patterns and genome-wide DNA-binding patterns of CGGBP1, we show that a possible mechanism through which it affects the expression of RNA Pol II-transcribed genes in trans depends on Alu RNA. We also show that it regulates Alu transcription in cis by binding to Alu promoter. Our results also indicate that potential phosphorylation of CGGBP1 upon growth stimulation facilitates its nuclear retention, Alu-binding and dislodging of RNA Pol III therefrom. These findings provide insights into how Alu transcription is regulated in response to growth signals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.4161/15384101.2014.967094DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4934077PMC
June 2016

The UCSC Genome Browser database: 2015 update.

Nucleic Acids Res 2015 Jan 26;43(Database issue):D670-81. Epub 2014 Nov 26.

Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA.

Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku1177DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383971PMC
January 2015

Gibbon genome and the fast karyotype evolution of small apes.

Nature 2014 Sep;513(7517):195-201

Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

Gibbons are small arboreal apes that display an accelerated rate of evolutionary chromosomal rearrangement and occupy a key node in the primate phylogeny between Old World monkeys and great apes. Here we present the assembly and analysis of a northern white-cheeked gibbon (Nomascus leucogenys) genome. We describe the propensity for a gibbon-specific retrotransposon (LAVA) to insert into chromosome segregation genes and alter transcription by providing a premature termination site, suggesting a possible molecular mechanism for the genome plasticity of the gibbon lineage. We further show that the gibbon genera (Nomascus, Hylobates, Hoolock and Symphalangus) experienced a near-instantaneous radiation ∼5 million years ago, coincident with major geographical changes in southeast Asia that caused cycles of habitat compression and expansion. Finally, we identify signatures of positive selection in genes important for forelimb development (TBX5) and connective tissues (COL1A1) that may have been involved in the adaptation of gibbons to their arboreal habitat.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature13679DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4249732PMC
September 2014

Realistic artificial DNA sequences as negative controls for computational genomics.

Nucleic Acids Res 2014 Jul 6;42(12):e99. Epub 2014 May 6.

Institute for Systems Biology, 401 Terry Ave. N, Seattle, WA 98109, USA

A common practice in computational genomic analysis is to use a set of 'background' sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such 'background' sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by 'shuffling' real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku356DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4081056PMC
July 2014

Multiscale representation of genomic signals.

Nat Methods 2014 Jun 13;11(6):689-94. Epub 2014 Apr 13.

Institute for Systems Biology, Seattle, Washington, USA.

Genomic information is encoded on a wide range of distance scales, ranging from tens of bases to megabases. We developed a multiscale framework to analyze and visualize the information content of genomic signals. Different types of signals, such as G+C content or DNA methylation, are characterized by distinct patterns of signal enrichment or depletion across scales spanning several orders of magnitude. These patterns are associated with a variety of genomic annotations. By integrating the information across all scales, we demonstrated improved prediction of gene expression from polymerase II chromatin immunoprecipitation sequencing (ChIP-seq) measurements, and we observed that gene expression differences in colorectal cancer are related to methylation patterns that extend beyond the single-gene scale. Our software is available at https://github.com/tknijnen/msr/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.2924DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4040162PMC
June 2014

Dfam: a database of repetitive DNA based on profile hidden Markov models.

Nucleic Acids Res 2013 Jan 30;41(Database issue):D70-82. Epub 2012 Nov 30.

HHMI Janelia Farm Research Campus, Ashburn, VA 20147, USA.

We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks1265DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531169PMC
January 2013

Orangutan Alu quiescence reveals possible source element: support for ancient backseat drivers.

Mob DNA 2012 Apr 30;3. Epub 2012 Apr 30.

Department of Biological Sciences, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA.

Background: Sequence analysis of the orangutan genome revealed that recent proliferative activity of Alu elements has been uncharacteristically quiescent in the Pongo (orangutan) lineage, compared with all previously studied primate genomes. With relatively few young polymorphic insertions, the genomic landscape of the orangutan seemed like the ideal place to search for a driver, or source element, of Alu retrotransposition.

Results: Here we report the identification of a nearly pristine insertion possessing all the known putative hallmarks of a retrotranspositionally competent Alu element. It is located in an intronic sequence of the DGKB gene on chromosome 7 and is highly conserved in Hominidae (the great apes), but absent from Hylobatidae (gibbon and siamang). We provide evidence for the evolution of a lineage-specific subfamily of this shared Alu insertion in orangutans and possibly the lineage leading to humans. In the orangutan genome, this insertion contains three orangutan-specific diagnostic mutations which are characteristic of the youngest polymorphic Alu subfamily, AluYe5b5_Pongo. In the Homininae lineage (human, chimpanzee and gorilla), this insertion has acquired three different mutations which are also found in a single human-specific Alu insertion.

Conclusions: This seemingly stealth-like amplification, ongoing at a very low rate over millions of years of evolution, suggests that this shared insertion may represent an ancient backseat driver of Alu element expansion.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1759-8753-3-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3357318PMC
April 2012

Chromosomal haplotypes by genetic phasing of human families.

Am J Hum Genet 2011 Sep;89(3):382-97

Institute for Systems Biology, Seattle, WA 98109, USA.

Assignment of alleles to haplotypes for nearly all the variants on all chromosomes can be performed by genetic analysis of a nuclear family with three or more children. Whole-genome sequence data enable deterministic phasing of nearly all sequenced alleles by permitting assignment of recombinations to precise chromosomal positions and specific meioses. We demonstrate this process of genetic phasing on two families each with four children. We generate haplotypes for all of the children and their parents; these haplotypes span all genotyped positions, including rare variants. Misassignments of phase between variants (switch errors) are nearly absent. Our algorithm can also produce multimegabase haplotypes for nuclear families with just two children and can handle families with missing individuals. We implement our algorithm in a suite of software scripts (Haploscribe). Haplotypes and family genome sequences will become increasingly important for personalized medicine and for fundamental biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2011.07.023DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3169815PMC
September 2011

Comparative and demographic analysis of orang-utan genomes.

Authors:
Devin P Locke LaDeana W Hillier Wesley C Warren Kim C Worley Lynne V Nazareth Donna M Muzny Shiaw-Pyng Yang Zhengyuan Wang Asif T Chinwalla Pat Minx Makedonka Mitreva Lisa Cook Kim D Delehaunty Catrina Fronick Heather Schmidt Lucinda A Fulton Robert S Fulton Joanne O Nelson Vincent Magrini Craig Pohl Tina A Graves Chris Markovic Andy Cree Huyen H Dinh Jennifer Hume Christie L Kovar Gerald R Fowler Gerton Lunter Stephen Meader Andreas Heger Chris P Ponting Tomas Marques-Bonet Can Alkan Lin Chen Ze Cheng Jeffrey M Kidd Evan E Eichler Simon White Stephen Searle Albert J Vilella Yuan Chen Paul Flicek Jian Ma Brian Raney Bernard Suh Richard Burhans Javier Herrero David Haussler Rui Faria Olga Fernando Fleur Darré Domènec Farré Elodie Gazave Meritxell Oliva Arcadi Navarro Roberta Roberto Oronzo Capozzi Nicoletta Archidiacono Giuliano Della Valle Stefania Purgato Mariano Rocchi Miriam K Konkel Jerilyn A Walker Brygg Ullmer Mark A Batzer Arian F A Smit Robert Hubley Claudio Casola Daniel R Schrider Matthew W Hahn Victor Quesada Xose S Puente Gonzalo R Ordoñez Carlos López-Otín Tomas Vinar Brona Brejova Aakrosh Ratan Robert S Harris Webb Miller Carolin Kosiol Heather A Lawson Vikas Taliwal André L Martins Adam Siepel Arindam Roychoudhury Xin Ma Jeremiah Degenhardt Carlos D Bustamante Ryan N Gutenkunst Thomas Mailund Julien Y Dutheil Asger Hobolth Mikkel H Schierup Oliver A Ryder Yuko Yoshinaga Pieter J de Jong George M Weinstock Jeffrey Rogers Elaine R Mardis Richard A Gibbs Richard K Wilson

Nature 2011 Jan;469(7331):529-33

The Genome Center at Washington University, Washington University School of Medicine, 4444 Forest Park Avenue, Saint Louis, Missouri 63108, USA.

'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature09687DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3060778PMC
January 2011

The genome of a songbird.

Nature 2010 Apr;464(7289):757-62

The Genome Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA.

The zebra finch is an important model organism in several fields with unique relevance to human neuroscience. Like other songbirds, the zebra finch communicates through learned vocalizations, an ability otherwise documented only in humans and a few other animals and lacking in the chicken-the only bird with a sequenced genome until now. Here we present a structural, functional and comparative analysis of the genome sequence of the zebra finch (Taeniopygia guttata), which is a songbird belonging to the large avian order Passeriformes. We find that the overall structures of the genomes are similar in zebra finch and chicken, but they differ in many intrachromosomal rearrangements, lineage-specific gene family expansions, the number of long-terminal-repeat-based retrotransposons, and mechanisms of sex chromosome dosage compensation. We show that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets. We also show evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience. These results indicate an active involvement of the genome in neural processes underlying vocal communication and identify potential genetic substrates for the evolution and regulation of this behaviour.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature08819DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3187626PMC
April 2010

Analysis of genetic inheritance in a family quartet by whole-genome sequencing.

Science 2010 Apr 10;328(5978):636-9. Epub 2010 Mar 10.

Institute for Systems Biology, Seattle, WA 98103, USA.

We analyzed the whole-genome sequences of a family of four, consisting of two siblings and their parents. Family-based sequencing allowed us to delineate recombination sites precisely, identify 70% of the sequencing errors (resulting in > 99.999% accuracy), and identify very rare single-nucleotide polymorphisms. We also directly estimated a human intergeneration mutation rate of approximately 1.1 x 10(-8) per position per haploid genome. Both offspring in this family have two recessive disorders: Miller syndrome, for which the gene was concurrently identified, and primary ciliary dyskinesia, for which causative genes have been previously identified. Family-based genome analysis enabled us to narrow the candidate genes for both of these Mendelian disorders to only four. Our results demonstrate the value of complete genome sequencing in families.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1186802DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3037280PMC
April 2010

DupMasker: a tool for annotating primate segmental duplications.

Genome Res 2008 Aug 23;18(8):1362-8. Epub 2008 May 23.

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA.

Segmental duplications (SDs) play an important role in genome rearrangement, evolution, and the copy-number variation (CNV) of primate genomes. Such sequences are difficult to detect, a priori, because they share no defining sequence features that distinguish them from unique portions of the genome. Current sequence annotation of segmental duplications requires computationally intensive, genome-wide self-comparisons that cannot be easily implemented on new data sets. Based on the successful implementation of RepeatMasker, we developed a new genome annotation tool, DupMasker. The program uses a library of nonredundant consensus sequences of human segmental duplications, wherein a majority of the ancestral origins have been determined based on comparisons to mammalian outgroup genomes. Using DupMasker, new human and nonhuman primate (NHP) sequences may be readily queried to provide details on the origin and degree of sequence identity of each duplicon. This program can be applied to delineate the order and orientation of duplicons within complex duplication blocks and used to characterize structural variation differences between sequenced human haplotypes. We predict this tool will be valuable in the annotation of large-insert sequence clones, allowing putative unique and duplicated regions of the genomes to be annotated prior to whole genome assembly comparisons.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.078477.108DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2493431PMC
August 2008

Genome analysis of the platypus reveals unique signatures of evolution.

Authors:
Wesley C Warren LaDeana W Hillier Jennifer A Marshall Graves Ewan Birney Chris P Ponting Frank Grützner Katherine Belov Webb Miller Laura Clarke Asif T Chinwalla Shiaw-Pyng Yang Andreas Heger Devin P Locke Pat Miethke Paul D Waters Frédéric Veyrunes Lucinda Fulton Bob Fulton Tina Graves John Wallis Xose S Puente Carlos López-Otín Gonzalo R Ordóñez Evan E Eichler Lin Chen Ze Cheng Janine E Deakin Amber Alsop Katherine Thompson Patrick Kirby Anthony T Papenfuss Matthew J Wakefield Tsviya Olender Doron Lancet Gavin A Huttley Arian F A Smit Andrew Pask Peter Temple-Smith Mark A Batzer Jerilyn A Walker Miriam K Konkel Robert S Harris Camilla M Whittington Emily S W Wong Neil J Gemmell Emmanuel Buschiazzo Iris M Vargas Jentzsch Angelika Merkel Juergen Schmitz Anja Zemann Gennady Churakov Jan Ole Kriegs Juergen Brosius Elizabeth P Murchison Ravi Sachidanandam Carly Smith Gregory J Hannon Enkhjargal Tsend-Ayush Daniel McMillan Rosalind Attenborough Willem Rens Malcolm Ferguson-Smith Christophe M Lefèvre Julie A Sharp Kevin R Nicholas David A Ray Michael Kube Richard Reinhardt Thomas H Pringle James Taylor Russell C Jones Brett Nixon Jean-Louis Dacheux Hitoshi Niwa Yoko Sekita Xiaoqiu Huang Alexander Stark Pouya Kheradpour Manolis Kellis Paul Flicek Yuan Chen Caleb Webber Ross Hardison Joanne Nelson Kym Hallsworth-Pepin Kim Delehaunty Chris Markovic Pat Minx Yucheng Feng Colin Kremitzki Makedonka Mitreva Jarret Glasscock Todd Wylie Patricia Wohldmann Prathapan Thiru Michael N Nhan Craig S Pohl Scott M Smith Shunfeng Hou Mikhail Nefedov Pieter J de Jong Marilyn B Renfree Elaine R Mardis Richard K Wilson

Nature 2008 May;453(7192):175-83

Genome Sequencing Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA.

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature06936DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2803040PMC
May 2008

Mobile DNA in Old World monkeys: a glimpse through the rhesus macaque genome.

Science 2007 Apr;316(5822):238-40

Department of Biological Sciences, Biological Computation and Visualization Center, Center for Bio-Modular Multi-Scale Systems, Louisiana State University, Baton Rouge, LA 70803, USA.

The completion of the draft sequence of the rhesus macaque genome allowed us to study the genomic composition and evolution of transposable elements in this representative of the Old World monkey lineage, a group of diverse primates closely related to humans. The L1 family of long interspersed elements appears to have evolved as a single lineage, and Alu elements have evolved into four currently active lineages. We also found evidence of elevated horizontal transmissions of retroviruses and the absence of DNA transposon activity in the Old World monkey lineage. In addition, approximately 100 precursors of composite SVA (short interspersed element, variable number of tandem repeat, and Alu) elements were identified, with the majority being shared by the common ancestor of humans and rhesus macaques. Mobile elements compose roughly 50% of primate genomes, and our findings illustrate their diversity and strong influence on genome evolution between closely related species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1139462DOI Listing
April 2007

Evolutionary and biomedical insights from the rhesus macaque genome.

Authors:
Richard A Gibbs Jeffrey Rogers Michael G Katze Roger Bumgarner George M Weinstock Elaine R Mardis Karin A Remington Robert L Strausberg J Craig Venter Richard K Wilson Mark A Batzer Carlos D Bustamante Evan E Eichler Matthew W Hahn Ross C Hardison Kateryna D Makova Webb Miller Aleksandar Milosavljevic Robert E Palermo Adam Siepel James M Sikela Tony Attaway Stephanie Bell Kelly E Bernard Christian J Buhay Mimi N Chandrabose Marvin Dao Clay Davis Kimberly D Delehaunty Yan Ding Huyen H Dinh Shannon Dugan-Rocha Lucinda A Fulton Ramatu Ayiesha Gabisi Toni T Garner Jennifer Godfrey Alicia C Hawes Judith Hernandez Sandra Hines Michael Holder Jennifer Hume Shalini N Jhangiani Vandita Joshi Ziad Mohid Khan Ewen F Kirkness Andrew Cree R Gerald Fowler Sandra Lee Lora R Lewis Zhangwan Li Yih-Shin Liu Stephanie M Moore Donna Muzny Lynne V Nazareth Dinh Ngoc Ngo Geoffrey O Okwuonu Grace Pai David Parker Heidie A Paul Cynthia Pfannkoch Craig S Pohl Yu-Hui Rogers San Juana Ruiz Aniko Sabo Jireh Santibanez Brian W Schneider Scott M Smith Erica Sodergren Amanda F Svatek Teresa R Utterback Selina Vattathil Wesley Warren Courtney Sherell White Asif T Chinwalla Yucheng Feng Aaron L Halpern Ladeana W Hillier Xiaoqiu Huang Pat Minx Joanne O Nelson Kymberlie H Pepin Xiang Qin Granger G Sutton Eli Venter Brian P Walenz John W Wallis Kim C Worley Shiaw-Pyng Yang Steven M Jones Marco A Marra Mariano Rocchi Jacqueline E Schein Robert Baertsch Laura Clarke Miklós Csürös Jarret Glasscock R Alan Harris Paul Havlak Andrew R Jackson Huaiyang Jiang Yue Liu David N Messina Yufeng Shen Henry Xing-Zhi Song Todd Wylie Lan Zhang Ewan Birney Kyudong Han Miriam K Konkel Jungnam Lee Arian F A Smit Brygg Ullmer Hui Wang Jinchuan Xing Richard Burhans Ze Cheng John E Karro Jian Ma Brian Raney Xinwei She Michael J Cox Jeffery P Demuth Laura J Dumas Sang-Gook Han Janet Hopkins Anis Karimpour-Fard Young H Kim Jonathan R Pollack Tomas Vinar Charles Addo-Quaye Jeremiah Degenhardt Alexandra Denby Melissa J Hubisz Amit Indap Carolin Kosiol Bruce T Lahn Heather A Lawson Alison Marklein Rasmus Nielsen Eric J Vallender Andrew G Clark Betsy Ferguson Ryan D Hernandez Kashif Hirani Hildegard Kehrer-Sawatzki Jessica Kolb Shobha Patil Ling-Ling Pu Yanru Ren David Glenn Smith David A Wheeler Ian Schenck Edward V Ball Rui Chen David N Cooper Belinda Giardine Fan Hsu W James Kent Arthur Lesk David L Nelson William E O'brien Kay Prüfer Peter D Stenson James C Wallace Hui Ke Xiao-Ming Liu Peng Wang Andy Peng Xiang Fan Yang Galt P Barber David Haussler Donna Karolchik Andy D Kern Robert M Kuhn Kayla E Smith Ann S Zwieg

Science 2007 Apr;316(5822):222-34

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.

The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1139247DOI Listing
April 2007

Functional noncoding sequences derived from SINEs in the mammalian genome.

Genome Res 2006 Jul 22;16(7):864-74. Epub 2006 May 22.

Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Yokohama, Japan.

Recent comparative analyses of mammalian sequences have revealed that a large number of nonprotein-coding genomic regions are under strong selective constraint. Here, we report that some of these loci have been derived from a newly defined family of ancient SINEs (short interspersed repetitive elements). This is a surprising result, as SINEs and other transposable elements are commonly thought to be genomic parasites. We named the ancient SINE family AmnSINE1, for Amniota SINE1, because we found it to be present in mammals as well as in birds, and some copies predate the mammalian-bird split 310 million years ago (Mya). AmnSINE1 has a chimeric structure of a 5S rRNA and a tRNA-derived SINE, and is related to five tRNA-derived SINE families that we characterized here in the coelacanth, dogfish shark, hagfish, and amphioxus genomes. All of the newly described SINE families have a common central domain that is also shared by zebrafish SINE3, and we collectively name them the DeuSINE (Deuterostomia SINE) superfamily. Notably, of the approximately 1000 still identifiable copies of AmnSINE1 in the human genome, 105 correspond to loci phylogenetically highly conserved among mammalian orthologs. The conservation is strongest over the central domain. Thus, AmnSINE1 appears to be the best example of a transposable element of which a significant fraction of the copies have acquired genomic functionality.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.5255506DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1484453PMC
July 2006

A third approach to gene prediction suggests thousands of additional human transcribed regions.

PLoS Comput Biol 2006 Mar 17;2(3):e18. Epub 2006 Mar 17.

Institute for Systems Biology, Seattle, Washington, USA.

The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent "genomic deserts."
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.0020018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1391917PMC
March 2006

Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates.

Genome Res 2006 Jan 12;16(1):78-87. Epub 2005 Dec 12.

Department of Biology, Queens College, the City University of New York, Flushing, New York 11367, USA.

We investigated the evolution of the families of LINE-1 (L1) retrotransposons that have amplified in the human lineage since the origin of primates. We identified two phases in the evolution of L1. From approximately 70 million years ago (Mya) until approximately 40 Mya, three distinct L1 lineages were simultaneously active in the genome of ancestral primates. In contrast, during the last 40 million years (Myr), i.e., during the evolution of anthropoid primates, a single lineage of families has evolved and amplified. We found that novel (i.e., unrelated) regulatory regions (5'UTR) have been frequently recruited during the evolution of L1, whereas the two open-reading frames (ORF1 and ORF2) have remained relatively conserved. We found that L1 families coexisted and formed independently evolving L1 lineages only when they had different 5'UTRs. We propose that L1 families with different 5'UTR can coexist because they don't rely on the same host-encoded factors for their transcription and therefore do not compete with each other. The most prolific L1 families (families L1PA8 to L1PA3) amplified between 40 and 12 Mya. This period of high activity corresponds to an episode of adaptive evolution in a segment of ORF1. The correlation between the high activity of L1 families and adaptive evolution could result from the coevolution of L1 and a host-encoded repressor of L1 activity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.4001406DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1356131PMC
January 2006

A novel abundant family of retroposed elements (DAS-SINEs) in the nine-banded armadillo (Dasypus novemcinctus).

Mol Biol Evol 2005 Apr 22;22(4):886-93. Epub 2004 Dec 22.

Institute of Experimental Pathology (ZMBE), University of Münster, Münster, Germany.

About half of the mammalian genome is composed of retroposons. Long interspersed elements (LINEs) and short interspersed elements (SINEs) are the most abundant repetitive elements and account for about 21% and 13% of the human genome, respectively. SINEs have been detected in all major mammalian lineages, except for the South American order Xenarthra, also termed Edentata (armadillos, anteaters, and sloths). Investigating this order, we discovered a novel high-copy-number family of tRNA derived SINEs in the nine-banded armadillo Dasypus novemcinctus, a species that successfully crossed the Central American land bridge to North America in the Pliocene. A specific computer algorithm was developed, and we detected and extracted 687 specific SINEs from databases. Termed DAS-SINEs, we further divided them into six distinct subfamilies. We extracted tRNA(Ala)-derived monomers, two types of dimers, and three subfamilies of chimeric fusion products of a tRNA(Ala) domain and an approximately 180-nt sequence of thus far unidentified origin. Comparisons of secondary structures of the DAS-SINEs' tRNA domains suggest selective pressure to maintain a tRNA-like D-arm structure in the respective founder RNAs, as shown by compensatory mutations. By analysis of subfamily-specific genetic variability, comparison of the proportion of direct repeats, and analysis of self-integrations as well as key events of dimerization and deletions or insertions, we were able to delineate the evolutionary history of the DAS-SINE subfamilies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msi071DOI Listing
April 2005

Aligning multiple genomic sequences with the threaded blockset aligner.

Genome Res 2004 Apr;14(4):708-15

Howard Hughes Medical Institute, University of California at Santa Cruz, Santa Cruz, California 95064, USA.

We define a "threaded blockset," which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for "threaded blockset aligner") builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The output of TBA can be projected onto any genome chosen as a reference, thus guaranteeing that different projections present consistent predictions of which genomic positions are orthologous. This capability is illustrated using a new visualization tool to view TBA-generated alignments of vertebrate Hox clusters from both the mammalian and fish perspectives. Experimental evaluation of alignment quality, using a program that simulates evolutionary change in genomic sequences, indicates that TBA is more accurate than earlier programs. To perform the dynamic-programming alignment step, TBA runs a stand-alone program called MULTIZ, which can be used to align highly rearranged or incompletely sequenced genomes. We describe our use of MULTIZ to produce the whole-genome multiple alignments at the Santa Cruz Genome Browser.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.1933104DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC383317PMC
April 2004

Patterns of insertions and their covariation with substitutions in the rat, mouse, and human genomes.

Genome Res 2004 Apr;14(4):517-27

Department of Biochemistry and Molecular Biology, Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.

The rates at which human genomic DNA changes by neutral substitution and insertion of certain families of transposable elements covary in large, megabase-sized segments. We used the rat, mouse, and human genomic DNA sequences to examine these processes in more detail in comparisons over both shorter (rat-mouse) and longer (rodent-primate) times, and demonstrated the generality of the covariation. Different families of transposable elements show distinctive insertion preferences and patterns of variation with substitution rates. SINEs are more abundant in GC-rich DNA, but the regional GC preference for insertion (monitored in young SINEs) differs between rodents and humans. In contrast, insertions in the rodent genomes are predominantly LINEs, which prefer to insert into AT-rich DNA in all three mammals. The insertion frequency of repeats other than SINEs correlates strongly positively with the frequency of substitutions in all species. However, correlations with SINEs show the opposite effects. The correlations are explained only in part by the GC content, indicating that other factors also contribute to the inherent tendency of DNA segments to change over evolutionary time.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.1984404DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC383295PMC
April 2004

Genome sequence of the Brown Norway rat yields insights into mammalian evolution.

Authors:
Richard A Gibbs George M Weinstock Michael L Metzker Donna M Muzny Erica J Sodergren Steven Scherer Graham Scott David Steffen Kim C Worley Paula E Burch Geoffrey Okwuonu Sandra Hines Lora Lewis Christine DeRamo Oliver Delgado Shannon Dugan-Rocha George Miner Margaret Morgan Alicia Hawes Rachel Gill Celera Robert A Holt Mark D Adams Peter G Amanatides Holly Baden-Tillson Mary Barnstead Soo Chin Cheryl A Evans Steve Ferriera Carl Fosler Anna Glodek Zhiping Gu Don Jennings Cheryl L Kraft Trixie Nguyen Cynthia M Pfannkoch Cynthia Sitter Granger G Sutton J Craig Venter Trevor Woodage Douglas Smith Hong-Mei Lee Erik Gustafson Patrick Cahill Arnold Kana Lynn Doucette-Stamm Keith Weinstock Kim Fechtel Robert B Weiss Diane M Dunn Eric D Green Robert W Blakesley Gerard G Bouffard Pieter J De Jong Kazutoyo Osoegawa Baoli Zhu Marco Marra Jacqueline Schein Ian Bosdet Chris Fjell Steven Jones Martin Krzywinski Carrie Mathewson Asim Siddiqui Natasja Wye John McPherson Shaying Zhao Claire M Fraser Jyoti Shetty Sofiya Shatsman Keita Geer Yixin Chen Sofyia Abramzon William C Nierman Paul H Havlak Rui Chen K James Durbin Amy Egan Yanru Ren Xing-Zhi Song Bingshan Li Yue Liu Xiang Qin Simon Cawley Kim C Worley A J Cooney Lisa M D'Souza Kirt Martin Jia Qian Wu Manuel L Gonzalez-Garay Andrew R Jackson Kenneth J Kalafus Michael P McLeod Aleksandar Milosavljevic Davinder Virk Andrei Volkov David A Wheeler Zhengdong Zhang Jeffrey A Bailey Evan E Eichler Eray Tuzun Ewan Birney Emmanuel Mongin Abel Ureta-Vidal Cara Woodwark Evgeny Zdobnov Peer Bork Mikita Suyama David Torrents Marina Alexandersson Barbara J Trask Janet M Young Hui Huang Huajun Wang Heming Xing Sue Daniels Darryl Gietzen Jeanette Schmidt Kristian Stevens Ursula Vitt Jim Wingrove Francisco Camara M Mar Albà Josep F Abril Roderic Guigo Arian Smit Inna Dubchak Edward M Rubin Olivier Couronne Alexander Poliakov Norbert Hübner Detlev Ganten Claudia Goesele Oliver Hummel Thomas Kreitler Young-Ae Lee Jan Monti Herbert Schulz Heike Zimdahl Heinz Himmelbauer Hans Lehrach Howard J Jacob Susan Bromberg Jo Gullings-Handley Michael I Jensen-Seaman Anne E Kwitek Jozef Lazar Dean Pasko Peter J Tonellato Simon Twigger Chris P Ponting Jose M Duarte Stephen Rice Leo Goodstadt Scott A Beatson Richard D Emes Eitan E Winter Caleb Webber Petra Brandt Gerald Nyakatura Margaret Adetobi Francesca Chiaromonte Laura Elnitski Pallavi Eswara Ross C Hardison Minmei Hou Diana Kolbe Kateryna Makova Webb Miller Anton Nekrutenko Cathy Riemer Scott Schwartz James Taylor Shan Yang Yi Zhang Klaus Lindpaintner T Dan Andrews Mario Caccamo Michele Clamp Laura Clarke Valerie Curwen Richard Durbin Eduardo Eyras Stephen M Searle Gregory M Cooper Serafim Batzoglou Michael Brudno Arend Sidow Eric A Stone J Craig Venter Bret A Payseur Guillaume Bourque Carlos López-Otín Xose S Puente Kushal Chakrabarti Sourav Chatterji Colin Dewey Lior Pachter Nicolas Bray Von Bing Yap Anat Caspi Glenn Tesler Pavel A Pevzner David Haussler Krishna M Roskin Robert Baertsch Hiram Clawson Terrence S Furey Angie S Hinrichs Donna Karolchik William J Kent Kate R Rosenbloom Heather Trumbower Matt Weirauch David N Cooper Peter D Stenson Bin Ma Michael Brent Manimozhiyan Arumugam David Shteynberg Richard R Copley Martin S Taylor Harold Riethman Uma Mudunuri Jane Peterson Mark Guyer Adam Felsenfeld Susan Old Stephen Mockrin Francis Collins

Nature 2004 Apr;428(6982):493-521

Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, MS BCM226, One Baylor Plaza, Houston, Texas 77030, USA. http://www.hgsc.bcm.tmc.edu

The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature02426DOI Listing
April 2004