Publications by authors named "Jonathan Crabtree"

50 Publications

HMPDACC: a Human Microbiome Project Multi-omic data resource.

Nucleic Acids Res 2021 01;49(D1):D734-D742

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA.

The Human Microbiome Project (HMP) explored microbial communities of the human body in both healthy and disease states. Two phases of the HMP (HMP and iHMP) together generated >48TB of data (public and controlled access) from multiple, varied omics studies of both the microbiome and associated hosts. The Human Microbiome Project Data Coordination Center (HMPDACC) was established to provide a portal to access data and resources produced by the HMP. The HMPDACC provides a unified data repository, multi-faceted search functionality, analysis pipelines and standardized protocols to facilitate community use of HMP data. Recent efforts have been put toward making HMP data more findable, accessible, interoperable and reusable. HMPDACC resources are freely available at www.hmpdacc.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa996DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778886PMC
January 2021

Capture-based enrichment of Theileria parva DNA enables full genome assembly of first buffalo-derived strain and reveals exceptional intra-specific genetic diversity.

PLoS Negl Trop Dis 2020 10 29;14(10):e0008781. Epub 2020 Oct 29.

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America.

Theileria parva is an economically important, intracellular, tick-transmitted parasite of cattle. A live vaccine against the parasite is effective against challenge from cattle-transmissible T. parva but not against genotypes originating from the African Cape buffalo, a major wildlife reservoir, prompting the need to characterize genome-wide variation within and between cattle- and buffalo-associated T. parva populations. Here, we describe a capture-based target enrichment approach that enables, for the first time, de novo assembly of nearly complete T. parva genomes derived from infected host cell lines. This approach has exceptionally high specificity and sensitivity and is successful for both cattle- and buffalo-derived T. parva parasites. De novo genome assemblies generated for cattle genotypes differ from the reference by ~54K single nucleotide polymorphisms (SNPs) throughout the 8.31 Mb genome, an average of 6.5 SNPs/kb. We report the first buffalo-derived T. parva genome, which is ~20 kb larger than the genome from the reference, cattle-derived, Muguga strain, and contains 25 new potential genes. The average non-synonymous nucleotide diversity (πN) per gene, between buffalo-derived T. parva and the Muguga strain, was 1.3%. This remarkably high level of genetic divergence is supported by an average Wright's fixation index (FST), genome-wide, of 0.44, reflecting a degree of genetic differentiation between cattle- and buffalo-derived T. parva parasites more commonly seen between, rather than within, species. These findings present clear implications for vaccine development, further demonstrated by the ability to assemble nearly all known antigens in the buffalo-derived strain, which will be critical in design of next generation vaccines. The DNA capture approach used provides a clear advantage in specificity over alternative T. parva DNA enrichment methods used previously, such as those that utilize schizont purification, is less labor intensive, and enables in-depth comparative genomics in this apicomplexan parasite.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pntd.0008781DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7654785PMC
October 2020

Interactive exploratory data analysis of Integrative Human Microbiome Project data using Metaviz.

F1000Res 2020 12;9:601. Epub 2020 Jun 12.

Department of Computer Science, University of Maryland, College Park, College Park, Maryland, 20742, USA.

The rich data produced by the second phase of the Human Microbiome Project (iHMP) offers a unique opportunity to test hypotheses that interactions between microbial communities and a human host might impact an individual's health or disease status. In this work we describe infrastructure that integrates Metaviz, an interactive microbiome data analysis and visualization tool, with the iHMP Data Coordination Center web portal and the R/Bioconductor package. We describe integrative statistical and visual analyses of two datasets from iHMP using Metaviz along with the R/Bioconductor package for statistical analysis of differential abundance analysis. These use cases demonstrate the utility of a combined approach to access and analyze data from this resource.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.24345.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7366035PMC
June 2020

The TRUST Principles for digital repositories.

Sci Data 2020 05 14;7(1):144. Epub 2020 May 14.

RCSB, Protein Data Bank, Rutgers, The State University of New Jersey, Institute for Quantitative Biomedicine at Rutgers, New Jersey, USA.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-020-0486-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7224370PMC
May 2020

A comprehensive non-redundant gene catalog reveals extensive within-community intraspecies diversity in the human vagina.

Nat Commun 2020 02 26;11(1):940. Epub 2020 Feb 26.

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.

Analysis of metagenomic and metatranscriptomic data is complicated and typically requires extensive computational resources. Leveraging a curated reference database of genes encoded by members of the target microbiome can make these analyses more tractable. In this study, we assemble a comprehensive human vaginal non-redundant gene catalog (VIRGO) that includes 0.95 million non-redundant genes. The gene catalog is functionally and taxonomically annotated. We also construct a vaginal orthologous groups (VOG) from VIRGO. The gene-centric design of VIRGO and VOG provides an easily accessible tool to comprehensively characterize the structure and function of vaginal metagenome and metatranscriptome datasets. To highlight the utility of VIRGO, we analyze 1,507 additional vaginal metagenomes, and identify a high degree of intraspecies diversity within and across vaginal microbiota. VIRGO offers a convenient reference database and toolkit that will facilitate a more in-depth understanding of the role of vaginal microorganisms in women's health and reproductive outcomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-14677-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7044274PMC
February 2020

Strains used in whole organism Plasmodium falciparum vaccine trials differ in genome structure, sequence, and immunogenic potential.

Genome Med 2020 01 8;12(1). Epub 2020 Jan 8.

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.

Background: Plasmodium falciparum (Pf) whole-organism sporozoite vaccines have been shown to provide significant protection against controlled human malaria infection (CHMI) in clinical trials. Initial CHMI studies showed significantly higher durable protection against homologous than heterologous strains, suggesting the presence of strain-specific vaccine-induced protection. However, interpretation of these results and understanding of their relevance to vaccine efficacy have been hampered by the lack of knowledge on genetic differences between vaccine and CHMI strains, and how these strains are related to parasites in malaria endemic regions.

Methods: Whole genome sequencing using long-read (Pacific Biosciences) and short-read (Illumina) sequencing platforms was conducted to generate de novo genome assemblies for the vaccine strain, NF54, and for strains used in heterologous CHMI (7G8 from Brazil, NF166.C8 from Guinea, and NF135.C10 from Cambodia). The assemblies were used to characterize sequences in each strain relative to the reference 3D7 (a clone of NF54) genome. Strains were compared to each other and to a collection of clinical isolates (sequenced as part of this study or from public repositories) from South America, sub-Saharan Africa, and Southeast Asia.

Results: While few variants were detected between 3D7 and NF54, we identified tens of thousands of variants between NF54 and the three heterologous strains. These variants include SNPs, indels, and small structural variants that fall in regulatory and immunologically important regions, including transcription factors (such as PfAP2-L and PfAP2-G) and pre-erythrocytic antigens that may be key for sporozoite vaccine-induced protection. Additionally, these variants directly contributed to diversity in immunologically important regions of the genomes as detected through in silico CD8 T cell epitope predictions. Of all heterologous strains, NF135.C10 had the highest number of unique predicted epitope sequences when compared to NF54. Comparison to global clinical isolates revealed that these four strains are representative of their geographic origin despite long-term culture adaptation; of note, NF135.C10 is from an admixed population, and not part of recently formed subpopulations resistant to artemisinin-based therapies present in the Greater Mekong Sub-region.

Conclusions: These results will assist in the interpretation of vaccine efficacy of whole-organism vaccines against homologous and heterologous CHMI.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13073-019-0708-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6950926PMC
January 2020

TwinBLAST: When Two Is Better than One.

Microbiol Resour Announc 2019 Aug 29;8(35). Epub 2019 Aug 29.

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA.

Analysis of sequence read pairs can be essential for characterizing structural variation, including junction-spanning pairs of reads (JSPRs) suggesting recent lateral/horizontal gene transfer. TwinBLAST can be used to facilitate this analysis of JSPRs by enabling the visualization and curation of two BLAST reports side by side in a single interface.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/MRA.00842-19DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6715876PMC
August 2019

Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium.

Nat Biotechnol 2017 Nov 2;35(11):1077-1086. Epub 2017 Oct 2.

Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

In order for human microbiome studies to translate into actionable outcomes for health, meta-analysis of reproducible data from population-scale cohorts is needed. Achieving sufficient reproducibility in microbiome research has proven challenging. We report a baseline investigation of variability in taxonomic profiling for the Microbiome Quality Control (MBQC) project baseline study (MBQC-base). Blinded specimen sets from human stool, chemostats, and artificial microbial communities were sequenced by 15 laboratories and analyzed using nine bioinformatics protocols. Variability depended most on biospecimen type and origin, followed by DNA extraction, sample handling environment, and bioinformatics. Analysis of artificial community specimens revealed differences in extraction efficiency and bioinformatic classification. These results may guide researchers in experimental design choices for gut microbiome studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.3981DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5839636PMC
November 2017

Strains, functions and dynamics in the expanded Human Microbiome Project.

Nature 2017 10 20;550(7674):61-66. Epub 2017 Sep 20.

Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA.

The characterization of baseline microbial and functional diversity in the human microbiome has enabled studies of microbiome-related disease, diversity, biogeography, and molecular function. The National Institutes of Health Human Microbiome Project has provided one of the broadest such characterizations so far. Here we introduce a second wave of data from the study, comprising 1,631 new metagenomes (2,355 total) targeting diverse body sites with multiple time points in 265 individuals. We applied updated profiling and assembly methods to provide new characterizations of microbiome personalization. Strain identification revealed subspecies clades specific to body sites; it also quantified species with phylogenetic diversity under-represented in isolate genomes. Body-wide functional profiling classified pathways into universal, human-enriched, and body site-enriched subsets. Finally, temporal analysis decomposed microbial variation into rapidly variable, moderately variable, and stable subsets. This study furthers our knowledge of baseline human microbial diversity and enables an understanding of personalized microbiome function and dynamics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature23889DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5831082PMC
October 2017

CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.

BMC Genomics 2017 04 27;18(1):332. Epub 2017 Apr 27.

Institute for Genome Sciences, Baltimore, MD, USA.

Background: The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics.

Results: CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2.

Conclusions: CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-017-3717-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408420PMC
April 2017

Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data.

Microbiome 2017 Jan 25;5(1). Epub 2017 Jan 25.

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.

Background: A variety of bacteria are known to influence carcinogenesis. Therefore, we sought to investigate if publicly available whole genome and whole transcriptome sequencing data generated by large public cancer genome efforts, like The Cancer Genome Atlas (TCGA), could be used to identify bacteria associated with cancer. The Burrows-Wheeler aligner (BWA) was used to align a subset of Illumina paired-end sequencing data from TCGA to the human reference genome and all complete bacterial genomes in the RefSeq database in an effort to identify bacterial read pairs from the microbiome.

Results: Through careful consideration of all of the bacterial taxa present in the cancer types investigated, their relative abundance, and batch effects, we were able to identify some read pairs from certain taxa as likely resulting from contamination. In particular, the presence of Mycobacterium tuberculosis complex in the ovarian serous cystadenocarcinoma (OV) and glioblastoma multiforme (GBM) samples was correlated with the sequencing center of the samples. Additionally, there was a correlation between the presence of Ralstonia spp. and two specific plates of acute myeloid leukemia (AML) samples. At the end, associations remained between Pseudomonas-like and Acinetobacter-like read pairs in AML, and Pseudomonas-like read pairs in stomach adenocarcinoma (STAD) that could not be explained through batch effects or systematic contamination as seen in other samples.

Conclusions: This approach suggests that it is possible to identify bacteria that may be present in human tumor samples from public genome sequencing data that can be examined further experimentally. More weight should be given to this approach in the future when bacterial associations with diseases are suspected.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s40168-016-0224-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5264480PMC
January 2017

Genome-wide diversity and gene expression profiling of Babesia microti isolates identify polymorphic genes that mediate host-pathogen interactions.

Sci Rep 2016 10 18;6:35284. Epub 2016 Oct 18.

Department of Internal Medicine, Section of Infectious Diseases, Yale School of Medicine, 15 York St., New Haven, Connecticut, CT 06520 USA.

Babesia microti, a tick-transmitted, intraerythrocytic protozoan parasite circulating mainly among small mammals, is the primary cause of human babesiosis. While most cases are transmitted by Ixodes ticks, the disease may also be transmitted through blood transfusion and perinatally. A comprehensive analysis of genome composition, genetic diversity, and gene expression profiling of seven B. microti isolates revealed that genetic variation in isolates from the Northeast United States is almost exclusively associated with genes encoding the surface proteome and secretome of the parasite. Furthermore, we found that polymorphism is restricted to a small number of genes, which are highly expressed during infection. In order to identify pathogen-encoded factors involved in host-parasite interactions, we screened a proteome array comprised of 174 B. microti proteins, including several predicted members of the parasite secretome. Using this immuno-proteomic approach we identified several novel antigens that trigger strong host immune responses during the onset of infection. The genomic and immunological data presented herein provide the first insights into the determinants of B. microti interaction with its mammalian hosts and their relevance for understanding the selective pressures acting on parasite evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep35284DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5082761PMC
October 2016

An integrated genomic and transcriptomic survey of mucormycosis-causing fungi.

Nat Commun 2016 07 22;7:12218. Epub 2016 Jul 22.

Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland 21201, USA.

Mucormycosis is a life-threatening infection caused by Mucorales fungi. Here we sequence 30 fungal genomes, and perform transcriptomics with three representative Rhizopus and Mucor strains and with human airway epithelial cells during fungal invasion, to reveal key host and fungal determinants contributing to pathogenesis. Analysis of the host transcriptional response to Mucorales reveals platelet-derived growth factor receptor B (PDGFRB) signaling as part of a core response to divergent pathogenic fungi; inhibition of PDGFRB reduces Mucorales-induced damage to host cells. The unique presence of CotH invasins in all invasive Mucorales, and the correlation between CotH gene copy number and clinical prevalence, are consistent with an important role for these proteins in mucormycosis pathogenesis. Our work provides insight into the evolution of this medically and economically important group of fungi, and identifies several molecular pathways that might be exploited as potential therapeutic targets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms12218DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4961843PMC
July 2016

Functional dynamics of the gut microbiome in elderly people during probiotic consumption.

mBio 2015 Apr 14;6(2). Epub 2015 Apr 14.

Unlabelled: A mechanistic understanding of the purported health benefits conferred by consumption of probiotic bacteria has been limited by our knowledge of the resident gut microbiota and its interaction with the host. Here, we detail the impact of a single-organism probiotic, Lactobacillus rhamnosus GG ATCC 53103 (LGG), on the structure and functional dynamics (gene expression) of the gut microbiota in a study of 12 healthy individuals, 65 to 80 years old. The analysis revealed that while the overall community composition was stable as assessed by 16S rRNA profiling, the transcriptional response of the gut microbiota was modulated by probiotic treatment. Comparison of transcriptional profiles based on taxonomic composition yielded three distinct transcriptome groups that displayed considerable differences in functional dynamics. The transcriptional profile of LGG in vivo was remarkably concordant across study subjects despite the considerable interindividual nature of the gut microbiota. However, we identified genes involved in flagellar motility, chemotaxis, and adhesion from Bifidobacterium and the dominant butyrate producers Roseburia and Eubacterium whose expression was increased during probiotic consumption, suggesting that LGG may promote interactions between key constituents of the microbiota and the host epithelium. These results provide evidence for the discrete functional effects imparted by a specific single-organism probiotic and challenge the prevailing notion that probiotics substantially modify the resident microbiota within nondiseased individuals in an appreciable fashion.

Importance: Probiotic bacteria have been used for over a century to promote digestive health. Many individuals report that probiotics alleviate a number of digestive issues, yet little evidence links how probiotic microbes influence human health. Here, we show how the resident microbes that inhabit the healthy human gut respond to a probiotic. The well-studied probiotic Lactobacillus rhamnosus GG ATCC 53103 (LGG) was administered in a clinical trial, and a suite of measurements of the resident microbes were taken to evaluate potential changes over the course of probiotic consumption. We found that LGG transiently enriches for functions to potentially promote anti-inflammatory pathways in the resident microbes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/mBio.00231-15DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4453556PMC
April 2015

Genomic epidemiology of the Haitian cholera outbreak: a single introduction followed by rapid, extensive, and continued spread characterized the onset of the epidemic.

mBio 2014 Nov 4;5(6):e01721. Epub 2014 Nov 4.

Unlabelled: For centuries, cholera has been one of the most feared diseases. The causative agent Vibrio cholerae is a waterborne Gram-negative enteric pathogen eliciting a severe watery diarrheal disease. In October 2010, the seventh pandemic reached Haiti, a country that had not experienced cholera for more than a century. By using whole-genome sequence typing and mapping strategies of 116 serotype O1 strains from global sources, including 44 Haitian genomes, we present a detailed reconstructed evolutionary history of the seventh pandemic with a focus on the Haitian outbreak. We catalogued subtle genomic alterations at the nucleotide level in the genome core and architectural rearrangements from whole-genome map comparisons. Isolates closely related to the Haitian isolates caused several recent outbreaks in southern Asia. This study provides evidence for a single-source introduction of cholera from Nepal into Haiti followed by rapid, extensive, and continued clonal expansion. The phylogeographic patterns in both southern Asia and Haiti argue for the rapid dissemination of V. cholerae across the landscape necessitating real-time surveillance efforts to complement the whole-genome epidemiological analysis. As eradication efforts move forward, phylogeographic knowledge will be important for identifying persistent sources and monitoring success at regional levels. The results of molecular and epidemiological analyses of this outbreak suggest that an indigenous Haitian source of V. cholerae is unlikely and that an indigenous source has not contributed to the genomic evolution of this clade.

Importance: In this genomic epidemiology study, we have applied high-resolution whole-genome-based sequence typing methodologies on a comprehensive set of genome sequences that have become available in the aftermath of the Haitian cholera epidemic. These sequence resources enabled us to reassess the degree of genomic heterogeneity within the Vibrio cholerae O1 serotype and to refine boundaries and evolutionary relationships. The established phylogenomic framework showed how outbreak isolates fit into the global phylogeographic patterns compared to a comprehensive globally and temporally diverse strain collection and provides strong molecular evidence that points to a nonindigenous source of the 2010 Haitian cholera outbreak and refines epidemiological standards used in outbreak investigations for outbreak inclusion/exclusion following the concept of genomic epidemiology. The generated phylogenomic data have major public health relevance in translating sequence-based information to assist in future diagnostic, epidemiological, surveillance, and forensic studies of cholera.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/mBio.01721-14DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4222100PMC
November 2014

Circleator: flexible circular visualization of genome-associated data with BioPerl and SVG.

Bioinformatics 2014 Nov 29;30(21):3125-7. Epub 2014 Jul 29.

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, i3 Institute, University of Technology, Sydney, PO Box 123 Broadway NSW 2007, Australia, Department of Microbial Pathogenesis, University of Maryland Dental School, Baltimore, MD 21201, Center for Health-Related Informatics and Bioimaging, University of Maryland, College Park, MD 20740 and Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD 21201, USA Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, i3 Institute, University of Technology, Sydney, PO Box 123 Broadway NSW 2007, Australia, Department of Microbial Pathogenesis, University of Maryland Dental School, Baltimore, MD 21201, Center for Health-Related Informatics and Bioimaging, University of Maryland, College Park, MD 20740 and Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD 21201, USA Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, i3 Institute, University of Technology, Sydney, PO Box 123 Broadway NSW 2007, Australia, Department of Microbial Pathogenesis, University of Maryland Dental School, Baltimore, MD 21201, Center for Health-Related Informatics and Bioimaging, University of Maryland, College Park, MD 20740 and Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD 21201, USA.

Summary: Circleator is a Perl application that generates circular figures of genome-associated data. It leverages BioPerl to support standard annotation and sequence file formats and produces publication-quality SVG output. It is designed to be both flexible and easy to use. It includes a library of circular track types and predefined configuration files for common use-cases, including. (i) visualizing gene annotation and DNA sequence data from a GenBank flat file, (ii) displaying patterns of gene conservation in related microbial strains, (iii) showing Single Nucleotide Polymorphisms (SNPs) and indels relative to a reference genome and gene set and (iv) viewing RNA-Seq plots.

Availability And Implementation: Circleator is freely available under the Artistic License 2.0 from http://jonathancrabtree.github.io/Circleator/ and is integrated with the CloVR cloud-based sequence analysis Virtual Machine (VM), which can be downloaded from http://clovr.org or run on Amazon EC2.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btu505DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4201160PMC
November 2014

Draft Genome Sequences of Human Pathogenic Fungus Geomyces pannorum Sensu Lato and Bat White Nose Syndrome Pathogen Geomyces (Pseudogymnoascus) destructans.

Genome Announc 2013 Dec 19;1(6). Epub 2013 Dec 19.

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA.

We report the draft genome sequences of Geomyces pannorum sensu lato and Geomyces (Pseudogymnoascus) destructans. G. pannorum has a larger proteome than G. destructans, containing more proteins with ascribed enzymatic functions. This dichotomy in the genomes of related psychrophilic fungi is a valuable target for defining their distinct saprobic and pathogenic attributes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/genomeA.01045-13DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3868853PMC
December 2013

Using Sybil for interactive comparative genomics of microbes on the web.

Bioinformatics 2012 Jan 24;28(2):160-6. Epub 2011 Nov 24.

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.

Motivation: Analysis of multiple genomes requires sophisticated tools that provide search, visualization, interactivity and data export. Comparative genomics datasets tend to be large and complex, making development of these tools difficult. In addition to scalability, comparative genomics tools must also provide user-friendly interfaces such that the research scientist can explore complex data with minimal technical expertise.

Results: We describe a new version of the Sybil software package and its application to the important human pathogen Streptococcus pneumoniae. This new software provides a feature-rich set of comparative genomics tools for inspection of multiple genome structures, mining of orthologous gene families and identification of potential vaccine candidates.

Availability: The S.pneumoniae resource is online at http://strepneumo-sybil.igs.umaryland.edu. The software, database and website are available for download as a portable virtual machine and from http://sourceforge.net/projects/sybil.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btr652DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3259440PMC
January 2012

The Aspergillus Genome Database (AspGD): recent developments in comprehensive multispecies curation, comparative genomics and community resources.

Nucleic Acids Res 2012 Jan 12;40(Database issue):D653-9. Epub 2011 Nov 12.

Department of Genetics, Stanford University Medical School, Stanford, CA 94305-5120, USA.

The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available, web-based resource for researchers studying fungi of the genus Aspergillus, which includes organisms of clinical, agricultural and industrial importance. AspGD curators have now completed comprehensive review of the entire published literature about Aspergillus nidulans and Aspergillus fumigatus, and this annotation is provided with streamlined, ortholog-based navigation of the multispecies information. AspGD facilitates comparative genomics by providing a full-featured genomics viewer, as well as matched and standardized sets of genomic information for the sequenced aspergilli. AspGD also provides resources to foster interaction and dissemination of community information and resources. We welcome and encourage feedback at aspergillus-curator@lists.stanford.edu.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkr875DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245136PMC
January 2012

New resources for functional analysis of omics data for the genus Aspergillus.

BMC Genomics 2011 Oct 5;12:486. Epub 2011 Oct 5.

Institute of Biology Leiden, Leiden University, Sylviusweg 72, 2333 BE Leiden, The Netherlands.

Background: Detailed and comprehensive genome annotation can be considered a prerequisite for effective analysis and interpretation of omics data. As such, Gene Ontology (GO) annotation has become a well accepted framework for functional annotation. The genus Aspergillus comprises fungal species that are important model organisms, plant and human pathogens as well as industrial workhorses. However, GO annotation based on both computational predictions and extended manual curation has so far only been available for one of its species, namely A. nidulans.

Results: Based on protein homology, we mapped 97% of the 3,498 GO annotated A. nidulans genes to at least one of seven other Aspergillus species: A. niger, A. fumigatus, A. flavus, A. clavatus, A. terreus, A. oryzae and Neosartorya fischeri. GO annotation files compatible with diverse publicly available tools have been generated and deposited online. To further improve their accessibility, we developed a web application for GO enrichment analysis named FetGOat and integrated GO annotations for all Aspergillus species with public genome sequences. Both the annotation files and the web application FetGOat are accessible via the Broad Institute's website (http://www.broadinstitute.org/fetgoat/index.html). To demonstrate the value of those new resources for functional analysis of omics data for the genus Aspergillus, we performed two case studies analyzing microarray data recently published for A. nidulans, A. niger and A. oryzae.

Conclusions: We mapped A. nidulans GO annotation to seven other Aspergilli. By depositing the newly mapped GO annotation online as well as integrating it into the web tool FetGOat, we provide new, valuable and easily accessible resources for omics data analysis and interpretation for the genus Aspergillus. Furthermore, we have given a general example of how a well annotated genome can help improving GO annotation of related species to subsequently facilitate the interpretation of omics data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-12-486DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3217955PMC
October 2011

Draft genome sequence of the oilseed species Ricinus communis.

Nat Biotechnol 2010 Sep 22;28(9):951-6. Epub 2010 Aug 22.

J. Craig Venter Institute, Rockville, Maryland, USA.

Castor bean (Ricinus communis) is an oilseed crop that belongs to the spurge (Euphorbiaceae) family, which comprises approximately 6,300 species that include cassava (Manihot esculenta), rubber tree (Hevea brasiliensis) and physic nut (Jatropha curcas). It is primarily of economic interest as a source of castor oil, used for the production of high-quality lubricants because of its high proportion of the unusual fatty acid ricinoleic acid. However, castor bean genomics is also relevant to biosecurity as the seeds contain high levels of ricin, a highly toxic, ribosome-inactivating protein. Here we report the draft genome sequence of castor bean (4.6-fold coverage), the first for a member of the Euphorbiaceae. Whereas most of the key genes involved in oil synthesis and turnover are single copy, the number of members of the ricin gene family is larger than previously thought. Comparative genomics analysis suggests the presence of an ancient hexaploidization event that is conserved across the dicotyledonous lineage.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.1674DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2945230PMC
September 2010

A catalog of reference genomes from the human microbiome.

Science 2010 May;328(5981):994-9

The human microbiome refers to the community of microorganisms, including prokaryotes, viruses, and microbial eukaryotes, that populate the human body. The National Institutes of Health launched an initiative that focuses on describing the diversity of microbial species that are associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains, previously unidentified ("novel") polypeptides that had both unmasked sequence length greater than 100 amino acids and no BLASTP match to any nonreference entry in the nonredundant subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (approximately 97%) were unique. In addition, this set of microbial genomes allows for approximately 40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic data sets. In addition, the associated metrics and standards used by our group for quality assurance are presented.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1183605DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2940224PMC
May 2010

Ergatis: a web interface and scalable software system for bioinformatics workflows.

Bioinformatics 2010 Jun 22;26(12):1488-92. Epub 2010 Apr 22.

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.

Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users.

Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects.

Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btq167DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881353PMC
June 2010

Continuing evolution of Burkholderia mallei through genome reduction and large-scale rearrangements.

Genome Biol Evol 2010 Jan 22;2:102-16. Epub 2010 Jan 22.

J. Craig Venter Institute, Rockville, Maryland, USA.

Burkholderia mallei (Bm), the causative agent of the predominately equine disease glanders, is a genetically uniform species that is very closely related to the much more diverse species Burkholderia pseudomallei (Bp), an opportunistic human pathogen and the primary cause of melioidosis. To gain insight into the relative lack of genetic diversity within Bm, we performed whole-genome comparative analysis of seven Bm strains and contrasted these with eight Bp strains. The Bm core genome (shared by all seven strains) is smaller in size than that of Bp, but the inverse is true for the variable gene sets that are distributed across strains. Interestingly, the biological roles of the Bm variable gene sets are much more homogeneous than those of Bp. The Bm variable genes are found mostly in contiguous regions flanked by insertion sequence (IS) elements, which appear to mediate excision and subsequent elimination of groups of genes that are under reduced selection in the mammalian host. The analysis suggests that the Bm genome continues to evolve through random IS-mediated recombination events, and differences in gene content may contribute to differences in virulence observed among Bm strains. The results are consistent with the view that Bm recently evolved from a single strain of Bp upon introduction into an animal host followed by expansion of IS elements, prophage elimination, and genome rearrangements and reduction mediated by homologous recombination across IS elements.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gbe/evq003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2839346PMC
January 2010

The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community.

Nucleic Acids Res 2010 Jan 22;38(Database issue):D420-7. Epub 2009 Sep 22.

Department of Genetics, Stanford University Medical School, Stanford, CA 94305-5120, USA.

The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at aspergillus-curator@genome.stanford.edu.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkp751DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808984PMC
January 2010

Comparative genomics of mutualistic viruses of Glyptapanteles parasitic wasps.

Genome Biol 2008 30;9(12):R183. Epub 2008 Dec 30.

J Craig Venter Institute, Rockville, MD 20850, USA.

Background: Polydnaviruses, double-stranded DNA viruses with segmented genomes, have evolved as obligate endosymbionts of parasitoid wasps. Virus particles are replication deficient and produced by female wasps from proviral sequences integrated into the wasp genome. These particles are co-injected with eggs into caterpillar hosts, where viral gene expression facilitates parasitoid survival and, thereby, survival of proviral DNA. Here we characterize and compare the encapsidated viral genome sequences of bracoviruses in the family Polydnaviridae associated with Glyptapanteles gypsy moth parasitoids, along with near complete proviral sequences from which both viral genomes are derived.

Results: The encapsidated Glyptapanteles indiensis and Glyptapanteles flavicoxis bracoviral genomes, each composed of 29 different size segments, total approximately 517 and 594 kbp, respectively. They are generated from a minimum of seven distinct loci in the wasp genome. Annotation of these sequences revealed numerous novel features for polydnaviruses, including insect-like sugar transporter genes and transposable elements. Evolutionary analyses suggest that positive selection is widespread among bracoviral genes.

Conclusions: The structure and organization of G. indiensis and G. flavicoxis bracovirus proviral segments as multiple loci containing one to many viral segments, flanked and separated by wasp gene-encoding DNA, is confirmed. Rapid evolution of bracovirus genes supports the hypothesis of bracovirus genes in an 'arms race' between bracovirus and caterpillar. Phylogenetic analyses of the bracoviral genes encoding sugar transporters provides the first robust evidence of a wasp origin for some polydnavirus genes. We hypothesize transposable elements, such as those described here, could facilitate transfer of genes between proviral segments and host DNA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2008-9-12-r183DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2646287PMC
March 2009

IDEA: Interactive Display for Evolutionary Analyses.

BMC Bioinformatics 2008 Dec 8;9:524. Epub 2008 Dec 8.

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA.

Background: The availability of complete genomic sequences for hundreds of organisms promises to make obtaining genome-wide estimates of substitution rates, selective constraints and other molecular evolution variables of interest an increasingly important approach to addressing broad evolutionary questions. Two of the programs most widely used for this purpose are codeml and baseml, parts of the PAML (Phylogenetic Analysis by Maximum Likelihood) suite. A significant drawback of these programs is their lack of a graphical user interface, which can limit their user base and considerably reduce their efficiency.

Results: We have developed IDEA (Interactive Display for Evolutionary Analyses), an intuitive graphical input and output interface which interacts with PHYLIP for phylogeny reconstruction and with codeml and baseml for molecular evolution analyses. IDEA's graphical input and visualization interfaces eliminate the need to edit and parse text input and output files, reducing the likelihood of errors and improving processing time. Further, its interactive output display gives the user immediate access to results. Finally, IDEA can process data in parallel on a local machine or computing grid, allowing genome-wide analyses to be completed quickly.

Conclusion: IDEA provides a graphical user interface that allows the user to follow a codeml or baseml analysis from parameter input through to the exploration of results. Novel options streamline the analysis process, and post-analysis visualization of phylogenies, evolutionary rates and selective constraint along protein sequences simplifies the interpretation of results. The integration of these functions into a single tool eliminates the need for lengthy data handling and parsing, significantly expediting access to global patterns in the data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-9-524DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655098PMC
December 2008

The HuRef Browser: a web resource for individual human genomics.

Nucleic Acids Res 2009 Jan 26;37(Database issue):D1018-24. Epub 2008 Nov 26.

J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.

The HuRef Genome Browser is a web application for the navigation and analysis of the previously published genome of a human individual, termed HuRef. The browser provides a comparative view between the NCBI human reference sequence and the HuRef assembly, and it enables the navigation of the HuRef genome in the context of HuRef, NCBI and Ensembl annotations. Single nucleotide polymorphisms, indels, inversions, structural and copy-number variations are shown in the context of existing functional annotations on either genome in the comparative view. Demonstrated here are some potential uses of the browser to enable a better understanding of individual human genetic variation. The browser provides full access to the underlying reads with sequence and quality information, the genome assembly and the evidence supporting the identification of DNA polymorphisms. The HuRef Browser is a unique and versatile tool for browsing genome assemblies and studying individual human sequence variation in a diploid context. The browser is available online at http://huref.jcvi.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkn939DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2686481PMC
January 2009

Comparative genomics of the neglected human malaria parasite Plasmodium vivax.

Nature 2008 Oct;455(7214):757-63

The Institute for Genomic Research/J. Craig Venter Institute, 9704 Medical Research Drive, Rockville, Maryland 20850, USA.

The human malaria parasite Plasmodium vivax is responsible for 25-40% of the approximately 515 million annual cases of malaria worldwide. Although seldom fatal, the parasite elicits severe and incapacitating clinical symptoms and often causes relapses months after a primary infection has cleared. Despite its importance as a major human pathogen, P. vivax is little studied because it cannot be propagated continuously in the laboratory except in non-human primates. We sequenced the genome of P. vivax to shed light on its distinctive biological features, and as a means to drive development of new drugs and vaccines. Here we describe the synteny and isochore structure of P. vivax chromosomes, and show that the parasite resembles other malaria parasites in gene content and metabolic potential, but possesses novel gene families and potential alternative invasion pathways not recognized previously. Completion of the P. vivax genome provides the scientific community with a valuable resource that can be used to advance investigation into this neglected species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature07327DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2651158PMC
October 2008