Publications by authors named "Jan P Buchmann"

22 Publications

  • Page 1 of 1

Collecting and managing taxonomic data with NCBI-taxonomist.

Bioinformatics 2020 Dec 16. Epub 2020 Dec 16.

Marie Bashir Institute for Infectious Diseases and Biosecurity, School of Life and Environmental Sciences and School of Medical Sciences, The University of Sydney, Sydney, Australia.

Summary: We present NCBI-taxonomist - a command-line tool written in Python that collects and manages taxonomic data from the National Center for Biotechnology Information (NCBI). NCBI-taxonomist does not depend on a pre-downloaded taxonomic database but can store data locally. NCBI-taxonomist has six commands to map, collect, extract, resolve, import and group taxonomic data that can be linked together to create powerful analytical pipelines. Because many life science databases use the same taxonomic information, the data managed by NCBI-taxonomist is not limited to NCBI and can be used to find data linked to taxonomic information present in other scientific databases.

Availability And Implementation: NCBI-taxonomist is implemented in Python 3 (≥3.8) and available at https://gitlab.com/janpb/ncbi-taxonomist and via PyPi (https://pypi.org/project/ncbi-taxonomist/), as a Docker container (https://gitlab.com/janpb/ncbi-taxonomist/container_registry/) and Singularity (v3.5.3) image (https://cloud.sylabs.io/library/jpb/ncbi-taxonomist). NCBI-taxonomist is licensed under the GPLv3.

Supplementary Information: https://ncbi-taxonomist.readthedocs.io/en/latest/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa1027DOI Listing
December 2020

NCBI's Virus Discovery Codeathon: Building "FIVE" -The Federated Index of Viral Experiments API Index.

Viruses 2020 12 10;12(12). Epub 2020 Dec 10.

National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA.

Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus-host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/v12121424DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7764237PMC
December 2020

CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data.

Genome Biol 2020 04 28;21(1):103. Epub 2020 Apr 28.

Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia.

There is an increasing demand for accurate and fast metagenome classifiers that can not only identify bacteria, but all members of a microbial community. We used a recently developed concept in read mapping to develop a highly accurate metagenomic classification pipeline named CCMetagen. The pipeline substantially outperforms other commonly used software in identifying bacteria and fungi and can efficiently use the entire NCBI nucleotide collection as a reference to detect species with incomplete genome data from all biological kingdoms. CCMetagen is user-friendly, and the results can be easily integrated into microbial community analysis software for streamlined and automated microbiome studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02014-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7189439PMC
April 2020

Correction for Batovska et al., "Coding-Complete Genome Sequence of Yada Yada Virus, a Novel Alphavirus Detected in Australian Mosquitoes".

Microbiol Resour Announc 2020 Mar 5;9(10). Epub 2020 Mar 5.

Agriculture Victoria Research, AgriBio Centre for AgriBioscience, Bundoora, Victoria, Australia.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/MRA.00103-20DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7171210PMC
March 2020

Coding-Complete Genome Sequence of Yada Yada Virus, a Novel Alphavirus Detected in Australian Mosquitoes.

Microbiol Resour Announc 2020 Jan 9;9(2). Epub 2020 Jan 9.

Agriculture Victoria Research, AgriBio Centre for AgriBioscience, Bundoora, Victoria, Australia.

Here, we report the detection of a novel alphavirus in Australian mosquitoes, provisionally named Yada Yada virus (YYV). Phylogenetic analysis indicated that YYV belongs to the mosquito-specific alphavirus complex. The assembled genome is 11,612 nucleotides in length and encodes two open reading frames.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/MRA.01476-19DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6952676PMC
January 2020

NCBI's Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements.

Genes (Basel) 2019 09 16;10(9). Epub 2019 Sep 16.

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.

A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps; (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node; (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience; and (iv) a cloud infrastructure allows a diverse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/genes10090714DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6771016PMC
September 2019

Entrezpy: a Python library to dynamically interact with the NCBI Entrez databases.

Bioinformatics 2019 11;35(21):4511-4514

Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW 2006, Australia.

Summary: Entrezpy is a Python library that automates the querying and downloading of data from the Entrez databases at National Center for Biotechnology Information by interacting with E-Utilities. Entrezpy implements complex queries by automatically creating E-Utility parameters from the results obtained that can then be used directly in subsequent queries. Entrezpy also allows the user to cache and retrieve results locally, implements interactions with all Entrez databases as part of an analysis pipeline and adjusts parameters within an ongoing query or using prior results. Entrezpy's modular design enables it to easily extend and adjust existing E-Utility functions.

Availability And Implementation: Entrezpy is implemented in Python 3 (≥3.6) and depends only on the Python Standard Library. It is available via PyPi (https://pypi.org/project/entrezpy/) and at https://gitlab.com/ncbipy/entrezpy.git. Entrezpy is licensed under the LGPLv3 and also at http://entrezpy.readthedocs.io/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz385DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821292PMC
November 2019

A Simple Method to Detect Candidate Overlapping Genes in Viruses Using Single Genome Sequences.

Mol Biol Evol 2018 10;35(10):2572-2581

Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW , Australia.

Overlapping genes in viruses maximize the coding capacity of their genomes and allow the generation of new genes without major increases in genome size. Despite their importance, the evolution and function of overlapping genes are often not well understood, in part due to difficulties in their detection. In addition, most bioinformatic approaches for the detection of overlapping genes require the comparison of multiple genome sequences that may not be available in metagenomic surveys of virus biodiversity. We introduce a simple new method for identifying candidate functional overlapping genes using single virus genome sequences. Our method uses randomization tests to estimate the expected length of open reading frames and then identifies overlapping open reading frames that significantly exceed this length and are thus predicted to be functional. We applied this method to 2548 reference RNA virus genomes and find that it has both high sensitivity and low false discovery for genes that overlap by at least 50 nucleotides. Notably, this analysis provided evidence for 29 previously undiscovered functional overlapping genes, some of which are coded in the antisense direction suggesting there are limitations in our current understanding of RNA virus replication.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msy155DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6188560PMC
October 2018

The Biological Object Notation (BON): a structured file format for biological data.

Sci Rep 2018 06 25;8(1):9644. Epub 2018 Jun 25.

Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, the University of Sydney, Sydney, New South Wales, 2006, Australia.

The large size and high complexity of biological data can represent a major methodological challenge for the analysis and exchange of data sets between computers and applications. There has also been a substantial increase in the amount of metadata associated with biological data sets, which is being increasingly incorporated into existing data formats. Despite the existence of structured formats based on XML, biological data sets are mainly formatted using unstructured file formats, and the incorporation of metadata results in increasingly complex parsing routines such that they become more error prone. To overcome these problems, we present the "biological object notation" (BON) format, a new way to exchange and parse nearly all biological data sets more efficiently and with less error than other currently available formats. Based on JavaScript Object Notation (JSON), BON simplifies parsing by clearly separating the biological data from its metadata and reduces complexity compared to XML based formats. The ability to selectively compress data up to 87% compared to other file formats and the reduced complexity results in improved transfer times and less error prone applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-018-28016-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6018389PMC
June 2018

Virological Sampling of Inaccessible Wildlife with Drones.

Viruses 2018 06 2;10(6). Epub 2018 Jun 2.

Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW 2006, Australia.

There is growing interest in characterizing the viromes of diverse mammalian species, particularly in the context of disease emergence. However, little is known about virome diversity in aquatic mammals, in part due to difficulties in sampling. We characterized the virome of the exhaled breath (or blow) of the Eastern Australian humpback whale (). To achieve an unbiased survey of virome diversity, a meta-transcriptomic analysis was performed on 19 pooled whale blow samples collected via a purpose-built Unmanned Aerial Vehicle (UAV, or drone) approximately 3 km off the coast of Sydney, Australia during the 2017 winter annual northward migration from Antarctica to northern Australia. To our knowledge, this is the first time that UAVs have been used to sample viruses. Despite the relatively small number of animals surveyed in this initial study, we identified six novel virus species from five viral families. This work demonstrates the potential of UAVs in studies of virus disease, diversity, and evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/v10060300DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6024715PMC
June 2018

17 Century Variola Virus Reveals the Recent History of Smallpox.

Curr Biol 2016 12 8;26(24):3407-3412. Epub 2016 Dec 8.

McMaster Ancient DNA Centre, Department of Anthropology, McMaster University, Hamilton, ON L8S 4L8, Canada; Department of Biology, McMaster University, Hamilton, ON L8S 4L8, Canada; Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4L8, Canada; Humans and the Microbiome Program, Canadian Institute for Advanced Research, Toronto, ON M5G 1Z8, Canada. Electronic address:

Smallpox holds a unique position in the history of medicine. It was the first disease for which a vaccine was developed and remains the only human disease eradicated by vaccination. Although there have been claims of smallpox in Egypt, India, and China dating back millennia [1-4], the timescale of emergence of the causative agent, variola virus (VARV), and how it evolved in the context of increasingly widespread immunization, have proven controversial [4-9]. In particular, some molecular-clock-based studies have suggested that key events in VARV evolution only occurred during the last two centuries [4-6] and hence in apparent conflict with anecdotal historical reports, although it is difficult to distinguish smallpox from other pustular rashes by description alone. To address these issues, we captured, sequenced, and reconstructed a draft genome of an ancient strain of VARV, sampled from a Lithuanian child mummy dating between 1643 and 1665 and close to the time of several documented European epidemics [1, 2, 10]. When compared to vaccinia virus, this archival strain contained the same pattern of gene degradation as 20 century VARVs, indicating that such loss of gene function had occurred before ca. 1650. Strikingly, the mummy sequence fell basal to all currently sequenced strains of VARV on phylogenetic trees. Molecular-clock analyses revealed a strong clock-like structure and that the timescale of smallpox evolution is more recent than often supposed, with the diversification of major viral lineages only occurring within the 18 and 19 centuries, concomitant with the development of modern vaccination.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cub.2016.10.061DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5196022PMC
December 2016

Fire blight disease reactome: RNA-seq transcriptional profile of apple host plant defense responses to Erwinia amylovora pathogen infection.

Sci Rep 2016 Feb 17;6:21600. Epub 2016 Feb 17.

Agroscope Changins-Wädenswil ACW, Plant Protection Division, Wädenswil, Switzerland.

The molecular basis of resistance and susceptibility of host plants to fire blight, a major disease threat to pome fruit production globally, is largely unknown. RNA-sequencing data from challenged and mock-inoculated flowers were analyzed to assess the susceptible response of apple to the fire blight pathogen Erwinia amylovora. In presence of the pathogen 1,080 transcripts were differentially expressed at 48 h post inoculation. These included putative disease resistance, stress, pathogen related, general metabolic, and phytohormone related genes. Reads, mapped to regions on the apple genome where no genes were assigned, were used to identify potential novel genes and open reading frames. To identify transcripts specifically expressed in response to E. amylovora, RT-PCRs were conducted and compared to the expression patterns of the fire blight biocontrol agent Pantoea vagans strain C9-1, another apple pathogen Pseudomonas syringae pv. papulans, and mock inoculated apple flowers. This led to the identification of a peroxidase superfamily gene that was lower expressed in response to E. amylovora suggesting a potential role in the susceptibility response. Overall, this study provides the first transcriptional profile by RNA-seq of the host plant during fire blight disease and insights into the response of susceptible apple plants to E. amylovora.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep21600DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4756370PMC
February 2016

Cell Walls and the Convergent Evolution of the Viral Envelope.

Microbiol Mol Biol Rev 2015 Dec;79(4):403-18

Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Biological Sciences, and Sydney Medical School, The University of Sydney, Sydney, New South Wales, Australia

Why some viruses are enveloped while others lack an outer lipid bilayer is a major question in viral evolution but one that has received relatively little attention. The viral envelope serves several functions, including protecting the RNA or DNA molecule(s), evading recognition by the immune system, and facilitating virus entry. Despite these commonalities, viral envelopes come in a wide variety of shapes and configurations. The evolution of the viral envelope is made more puzzling by the fact that nonenveloped viruses are able to infect a diverse range of hosts across the tree of life. We reviewed the entry, transmission, and exit pathways of all (101) viral families on the 2013 International Committee on Taxonomy of Viruses (ICTV) list. By doing this, we revealed a strong association between the lack of a viral envelope and the presence of a cell wall in the hosts these viruses infect. We were able to propose a new hypothesis for the existence of enveloped and nonenveloped viruses, in which the latter represent an adaptation to cells surrounded by a cell wall, while the former are an adaptation to animal cells where cell walls are absent. In particular, cell walls inhibit viral entry and exit, as well as viral transport within an organism, all of which are critical waypoints for successful infection and spread. Finally, we discuss how this new model for the origin of the viral envelope impacts our overall understanding of virus evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/MMBR.00017-15DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4651029PMC
December 2015

Distinct lineages of Ebola virus in Guinea during the 2014 West African epidemic.

Nature 2015 Aug 24;524(7563):102-4. Epub 2015 Jun 24.

Institut Pasteur de Dakar, Arbovirus and Viral Hemorrhagic Fever Unit, BP 220, Dakar, Senegal.

An epidemic of Ebola virus disease of unprecedented scale has been ongoing for more than a year in West Africa. As of 29 April 2015, there have been 26,277 reported total cases (of which 14,895 have been laboratory confirmed) resulting in 10,899 deaths. The source of the outbreak was traced to the prefecture of Guéckédou in the forested region of southeastern Guinea. The virus later spread to the capital, Conakry, and to the neighbouring countries of Sierra Leone, Liberia, Nigeria, Senegal and Mali. In March 2014, when the first cases were detected in Conakry, the Institut Pasteur of Dakar, Senegal, deployed a mobile laboratory in Donka hospital to provide diagnostic services to the greater Conakry urban area and other regions of Guinea. Through this process we sampled 85 Ebola viruses (EBOV) from patients infected from July to November 2014, and report their full genome sequences here. Phylogenetic analysis reveals the sustained transmission of three distinct viral lineages co-circulating in Guinea, including the urban setting of Conakry and its surroundings. One lineage is unique to Guinea and closely related to the earliest sampled viruses of the epidemic. A second lineage contains viruses probably reintroduced from neighbouring Sierra Leone on multiple occasions, while a third lineage later spread from Guinea to Mali. Each lineage is defined by multiple mutations, including non-synonymous changes in the virion protein 35 (VP35), glycoprotein (GP) and RNA-dependent RNA polymerase (L) proteins. The viral GP is characterized by a glycosylation site modification and mutations in the mucin-like domain that could modify the outer shape of the virion. These data illustrate the ongoing ability of EBOV to develop lineage-specific and potentially phenotypically important variation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature14612DOI Listing
August 2015

Analysis of CACTA transposases reveals intron loss as major factor influencing their exon/intron structure in monocotyledonous and eudicotyledonous hosts.

Mob DNA 2014 1;5:24. Epub 2014 Sep 1.

Institute of Biotechnology, Viikki Biocenter, University of Helsinki, PO Box 65, FIN-00014 Helsinki, Finland ; Biotechnology and Food Research, MTT Agrifood Research Finland, Myllytie 1, FIN-31600 Jokioinen, Finland.

Background: CACTA elements are DNA transposons and are found in numerous organisms. Despite their low activity, several thousand copies can be identified in many genomes. CACTA elements transpose using a 'cut-and-paste' mechanism, which is facilitated by a DDE transposase. DDE transposases from CACTA elements contain, despite their conserved function, different exon numbers among various CACTA families. While earlier studies analyzed the ancestral history of the DDE transposases, no studies have examined exon loss and gain with a view of mechanisms that could drive the changes.

Results: We analyzed 64 transposases from different CACTA families among monocotyledonous and eudicotyledonous host species. The annotation of the exon/intron boundaries showed a range from one to six exons. A robust multiple sequence alignment of the 64 transposases based on their protein sequences was created and used for phylogenetic analysis, which revealed eight different clades. We observed that the exon numbers in CACTA transposases are not specific for a host genome. We found that ancient CACTA lineages diverged before the divergence of monocotyledons and eudicotyledons. Most exon/intron boundaries were found in three distinct regions among all the transposases, grouping 63 conserved intron/exon boundaries.

Conclusions: We propose a model for the ancestral CACTA transposase gene, which consists of four exons, that predates the divergence of the monocotyledons and eudicotyledons. Based on this model, we propose pathways of intron loss or gain to explain the observed variation in exon numbers. While intron loss appears to have prevailed, a putative case of intron gain was nevertheless observed.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1759-8753-5-24DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4158355PMC
September 2014

The Tvv1 retrotransposon family is conserved between plant genomes separated by over 100 million years.

Theor Appl Genet 2014 May 4;127(5):1223-35. Epub 2014 Mar 4.

MTT/BI Plant Genomics Lab, Institute of Biotechnology, University of Helsinki, P.O. Box 65, Biocenter 3, Viikinkaari 1, 00014, Helsinki, Finland,

Key Message: Combining several different approaches, we have examined the structure, variability, and distribution of Tvv1 retrotransposons. Tvv1 is an unusual example of a low-copy retrotransposon metapopulation dispersed unevenly among very distant species and is promising for the development of molecular markers. Retrotransposons are ubiquitous throughout the genomes of the vascular plants, but individual retrotransposon families tend to be confined to the level of plant genus or at most family. This restricts the general applicability of a family as molecular markers. Here, we characterize a new plant retrotransposon named Tvv1_Sdem, a member of the Copia superfamily of LTR retrotransposons, from the genome of the wild potato Solanum demissum. Comparative analyses based on structure and sequence showed a high level of similarity of Tvv1_Sdem with Tvv1-VB, a retrotransposon previously described in the grapevine genome Vitis vinifera. Extending the analysis to other species by in silico and in vitro approaches revealed the presence of Tvv1 family members in potato, tomato, and poplar genomes, and led to the identification of full-length copies of Tvv1 in these species. We were also able to identify polymorphism in UTL sequences between Tvv1_Sdem copies from wild and cultivated potatoes that are useful as molecular markers. Combining different approaches, our results suggest that the Tvv1 family of retrotransposons has a monophyletic origin and has been maintained in both the rosids and the asterids, the major clades of dicotyledonous plants, since their divergence about 100 MYA. To our knowledge, Tvv1 represents an unusual plant retrotransposon metapopulation comprising highly similar members disjointedly dispersed among very distant species. The twin features of Tvv1 presence in evolutionarily distant genomes and the diversity of its UTL region in each species make it useful as a source of robust molecular markers for diversity studies and breeding.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00122-014-2293-zDOI Listing
May 2014

The wheat powdery mildew genome shows the unique evolution of an obligate biotroph.

Nat Genet 2013 Sep 14;45(9):1092-6. Epub 2013 Jul 14.

Institute of Plant Biology, University of Zurich, Zurich, Switzerland.

Wheat powdery mildew, Blumeria graminis forma specialis tritici, is a devastating fungal pathogen with a poorly understood evolutionary history. Here we report the draft genome sequence of wheat powdery mildew, the resequencing of three additional isolates from different geographic regions and comparative analyses with the barley powdery mildew genome. Our comparative genomic analyses identified 602 candidate effector genes, with many showing evidence of positive selection. We characterize patterns of genetic diversity and suggest that mildew genomes are mosaics of ancient haplogroups that existed before wheat domestication. The patterns of diversity in modern isolates suggest that there was no pronounced loss of genetic diversity upon formation of the new host bread wheat 10,000 years ago. We conclude that the ready adaptation of B. graminis f.sp. tritici to the new host species was based on a diverse haplotype pool that provided great genetic potential for pathogen variation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.2704DOI Listing
September 2013

Genotype-specific SNP map based on whole chromosome 3B sequence information from wheat cultivars Arina and Forno.

Plant Biotechnol J 2013 Jan 10;11(1):23-32. Epub 2012 Oct 10.

Institute of Plant Biology, University of Zurich, Zurich, Switzerland.

Agronomically important traits are frequently controlled by rare, genotype-specific alleles. Such genes can only be mapped in a population derived from the donor genotype. This requires the development of a specific genetic map, which is difficult in wheat because of the low level of polymorphism among elite cultivars. The absence of sufficient polymorphism, the complexity of the hexaploid wheat genome as well as the lack of complete sequence information make the construction of genetic maps with a high density of reproducible and polymorphic markers challenging. We developed a genotype-specific genetic map of chromosome 3B from winter wheat cultivars Arina and Forno. Chromosome 3B was isolated from the two cultivars and then sequenced to 10-fold coverage. This resulted in a single-nucleotide polymorphisms (SNP) database of the complete chromosome. Based on proposed synteny with the Brachypodium model genome and gene annotation, sequences close to coding regions were used for the development of 70 SNP-based markers. They were mapped on a Arina × Forno Recombinant Inbred Lines population and found to be spread over the complete chromosome 3B. While overall synteny was well maintained, numerous exceptions and inversions of syntenic gene order were identified. Additionally, we found that the majority of recombination events occurred in distal parts of chromosome 3B, particularly in hot-spot regions. Compared with the earlier map based on SSR and RFLP markers, the number of markers increased fourfold. The approach presented here allows fast development of genotype-specific polymorphic markers that can be used for mapping and marker-assisted selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/pbi.12003DOI Listing
January 2013

Inter-species sequence comparison of Brachypodium reveals how transposon activity corrodes genome colinearity.

Plant J 2012 Aug 18;71(4):550-63. Epub 2012 Jun 18.

Institute of Plant Biology, University of Zurich, Zollikerstraße 107, 8008 Zurich, Switzerland.

Intergenic sequences evolve rapidly in plant genomes through a process known as genomic turnover. To investigate the influence of DNA transposons on genomic turnover, we compared 1 Mbp of orthologous genomic sequences from Brachypodium distachyon and Brachypodium sylvaticum. We found that B. distachyon and B. sylvaticum diverged approximately 1.7-2.0 million years ago. Of a total of 219 genes identified on the analyzed sequences, 211 were colinear. However, only 24 transposable elements of a total of 451 were orthologous (i.e. inserted in the common ancestor). We characterized in detail 59 insertions and 60 excisions of DNA transposons in one or other species, which altered 17% of the intergenic space. The DNA transposon excision sites showed complex and highly diagnostic sequence motifs for double-strand break (DSB) repair. DNA transposon excisions can lead to extensive deletions of hundreds of base pairs of flanking sequence if the DSB is repaired by 'single-strand annealing', or insertions of up to several hundred base pairs of 'filler DNA' if the DSB is repaired by 'synthesis-dependent strand annealing'. In some cases, DSBs were repaired by a combination of both methods. We present a model for the evolution of intergenic sequences in which repair of DSBs upon DNA transposon excision is a major factor in the rapid turnover and erosion of intergenic sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/j.1365-313X.2012.05007.xDOI Listing
August 2012

Comparative sequence analysis of wheat and barley powdery mildew fungi reveals gene colinearity, dates divergence and indicates host-pathogen co-evolution.

Fungal Genet Biol 2011 Mar 16;48(3):327-34. Epub 2010 Oct 16.

Institute of Plant Biology, University of Zurich, Zollikerstrasse 107, Zurich, Switzerland.

The two fungal pathogens Blumeria graminis f. sp. tritici (B.g. tritici) and hordei (B.g. hordei) cause powdery mildew specifically in wheat or barley. They have the same life cycle, but their growth is restricted to the respective host. Here, we compared the sequences of two loci in both cereal mildews to determine their divergence time and their relationship with the evolution of their hosts. We sequenced a total of 273.3kb derived from B.g. tritici BAC sequences and compared them with the orthologous regions in the B.g. hordei genome. Protein-coding genes were colinear and well conserved. In contrast, the intergenic regions showed very low conservation mostly due to different integration patterns of transposable elements. To estimate the divergence time of B.g. tritici and B.g. hordei, we used conserved intergenic sequences including orthologous transposable elements. This revealed that B.g. tritici and B.g. hordei have diverged about 10 million years ago (MYA), two million years after wheat and barley (12 MYA). These data suggest that B.g. tritici and B.g. hordei have co-evolved with their hosts during most of their evolutionary history after host divergence, possibly after a short phase of host expansion when the same pathogen could still grow on the two diverged hosts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.fgb.2010.10.003DOI Listing
March 2011

Patching gaps in plant genomes results in gene movement and erosion of colinearity.

Genome Res 2010 Sep 7;20(9):1229-37. Epub 2010 Jun 7.

Institute of Plant Biology, University Zurich, CH-8008 Zurich, Switzerland.

Colinearity of genes in plant genomes generally decreases with increasing evolutionary distance while the actual number of genes remains more or less constant. To characterize the molecular mechanisms of this "gene movement," we identified non-colinear genes by three-way comparison of the genomes of Brachypodium, rice, and sorghum. We found that genomic fragments of up to 50 kb containing the non-colinear genes are duplicated to acceptor sites elsewhere in the genome. Apparent movement of genes may usually be the result of subsequent deletions of genes in the donor region. Often, the duplicated fragments are precisely bordered by transposable elements (TEs) at the acceptor site. Highly diagnostic sequence motifs at these borders strongly suggest that these gene movements were the result of double-strand break (DSB) repair through synthesis-dependent strand annealing. In these cases, a copy of the foreign DNA fragment is used as filler DNA to repair the DSB linked with the transposition of TEs. Interestingly, most TEs we found associated with gene movement have a very low copy number in the genome and for several we did not find autonomous copies. This suggests that some of these elements spontaneously arose from unspecific interaction with TE proteins that are encoded by autonomous elements. Additionally, we found evidence that gene movements can also be caused when DSBs are repaired after template slippage or unequal crossing-over events. The observed frequency of gene movements can explain the erosion of gene colinearity between plant genomes during evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.107284.110DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2928501PMC
September 2010

Fine mapping and syntenic integration of the semi-dwarfing gene sdw3 of barley.

Funct Integr Genomics 2010 Nov 13;10(4):509-21. Epub 2010 May 13.

Leibniz Institute of Plant Genetics and Crop Plant Research, Corrensstrasse 3, Gatersleben 06466, Germany.

The barley mutant allele sdw3 confers a gibberellin-insensitive, semi-dwarf phenotype with potential for breeding of new semi-dwarfed barley cultivars. Towards map-based cloning, sdw3 was delimited by high-resolution genetic mapping to a 0.04 cM interval in a "cold spot" of recombination of the proximal region of the short arm of barley chromosome 2H. Extensive synteny between the barley Sdw3 locus (Hvu_sdw3) and the orthologous regions (Osa_sdw3, Sbi_sdw3, Bsy_sdw3) of three other grass species (Oryza sativa, Sorghum bicolor, Brachypodium sylvaticum) allowed for efficient synteny-based marker saturation in the target interval. Comparative sequence analysis revealed colinearity for 23 out of the 38, 35, and 29 genes identified in Brachypodium, rice, and Sorghum, respectively. Markers co-segregating with Hvu_sdw3 were generated from two of these genes. Initial attempts at chromosome walking in barley were performed with seven orthologous gene probes which were delimiting physical distances of 223, 123, and 127 kb in Brachypodium, rice, and Sorghum, respectively. Six non-overlapping small bacterial artificial chromosome (BAC) clone contigs (cumulative length of 670 kb) were obtained, which indicated a considerably larger physical size of Hvu_sdw3. Low-pass sequencing of selected BAC clones from these barley contigs exhibited a substantially lower gene frequency per physical distance and the presence of additional non-colinear genes. Four candidate genes for sdw3 were identified within barley BAC sequences that either co-segregated with the gene sdw3 or were located adjacent to these co-segregating genes. Identification of genic sequences in the sdw3 context provides tools for marker-assisted selection. Eventual identification of the actual gene will contribute new information for a basic understanding of the mechanisms underlying growth regulation in barley.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10142-010-0173-4DOI Listing
November 2010