Publications by authors named "Doreen Ware"

125 Publications

De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes.

Science 2021 08;373(6555):655-662

USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA.

We report de novo genome assemblies, transcriptomes, annotations, and methylomes for the 26 inbreds that serve as the founders for the maize nested association mapping population. The number of pan-genes in these diverse genomes exceeds 103,000, with approximately a third found across all genotypes. The results demonstrate that the ancient tetraploid character of maize continues to degrade by fractionation to the present day. Excellent contiguity over repeat arrays and complete annotation of centromeres revealed additional variation in major cytological landmarks. We show that combining structural variation with single-nucleotide polymorphisms can improve the power of quantitative mapping studies. We also document variation at the level of DNA methylation and demonstrate that unmethylated regions are enriched for cis-regulatory elements that contribute to phenotypic variation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.abg5289DOI Listing
August 2021

Ranked Choice Voting for Representative Transcripts with TRaCE.

Bioinformatics 2021 Jul 23. Epub 2021 Jul 23.

Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11768, USA.

Summary: Genome sequencing projects annotate protein-coding gene models with multiple transcripts, aiming to represent all of the available transcript evidence. However, downstream analyses often operate on only one representative transcript per gene locus, sometimes known as the canonical transcript. To choose canonical transcripts, TRaCE (Transcript Ranking and Canonical Election) holds an 'election' in which a set of RNA-seq samples rank transcripts by annotation edit distance. These sample-specific votes are tallied along with other criteria such as protein length and InterPro domain coverage. The winner is selected as the canonical transcript, but the election proceeds through multiple rounds of voting to order all the transcripts by relevance. Based on the set of expression data provided, TRaCE can identify the most common isoforms from a broad expression atlas or prioritize alternative transcripts expressed in specific contexts.

Availability And Implementation: Transcript ranking code can be found on GitHub at {{https://github.com/warelab/TRaCE}}.

Supplementary Information: Additional data are available in the GitHub repository.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab542DOI Listing
July 2021

Gene disruption by structural mutations drives selection in US rice breeding over the last century.

PLoS Genet 2021 03 18;17(3):e1009389. Epub 2021 Mar 18.

USDA-ARS, Genomics and Bioinformatics Research Unit, Stoneville, Mississippi, United States of America.

The genetic basis of general plant vigor is of major interest to food producers, yet the trait is recalcitrant to genetic mapping because of the number of loci involved, their small effects, and linkage. Observations of heterosis in many crops suggests that recessive, malfunctioning versions of genes are a major cause of poor performance, yet we have little information on the mutational spectrum underlying these disruptions. To address this question, we generated a long-read assembly of a tropical japonica rice (Oryza sativa) variety, Carolina Gold, which allowed us to identify structural mutations (>50 bp) and orient them with respect to their ancestral state using the outgroup, Oryza glaberrima. Supporting prior work, we find substantial genome expansion in the sativa branch. While transposable elements (TEs) account for the largest share of size variation, the majority of events are not directly TE-mediated. Tandem duplications are the most common source of insertions and are highly enriched among 50-200bp mutations. To explore the relative impact of various mutational classes on crop fitness, we then track these structural events over the last century of US rice improvement using 101 resequenced varieties. Within this material, a pattern of temporary hybridization between medium and long-grain varieties was followed by recent divergence. During this long-term selection, structural mutations that impact gene exons have been removed at a greater rate than intronic indels and single-nucleotide mutations. These results support the use of ab initio estimates of mutational burden, based on structural data, as an orthogonal predictor in genomic selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1009389DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7971508PMC
March 2021

Single-cell RNA sequencing of developing maize ears facilitates functional analysis and trait candidate gene discovery.

Dev Cell 2021 02 4;56(4):557-568.e6. Epub 2021 Jan 4.

Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA. Electronic address:

Crop productivity depends on activity of meristems that produce optimized plant architectures, including that of the maize ear. A comprehensive understanding of development requires insight into the full diversity of cell types and developmental domains and the gene networks required to specify them. Until now, these were identified primarily by morphology and insights from classical genetics, which are limited by genetic redundancy and pleiotropy. Here, we investigated the transcriptional profiles of 12,525 single cells from developing maize ears. The resulting developmental atlas provides a single-cell RNA sequencing (scRNA-seq) map of an inflorescence. We validated our results by mRNA in situ hybridization and by fluorescence-activated cell sorting (FACS) RNA-seq, and we show how these data may facilitate genetic studies by predicting genetic redundancy, integrating transcriptional networks, and identifying candidate genes associated with crop yield traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.devcel.2020.12.015DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7904613PMC
February 2021

Amino Acid and Carbohydrate Metabolism Are Coordinated to Maintain Energetic Balance during Drought in Sugarcane.

Int J Mol Sci 2020 Nov 30;21(23). Epub 2020 Nov 30.

Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP 05508-000, Brazil.

The ability to expand crop plantations without irrigation is a major goal to increase agriculture sustainability. To achieve this end, we need to understand the mechanisms that govern plant growth responses under drought conditions. In this study, we combined physiological, transcriptomic, and genomic data to provide a comprehensive picture of drought and recovery responses in the leaves and roots of sugarcane. Transcriptomic profiling using oligoarrays and RNA-seq identified 2898 (out of 21,902) and 46,062 (out of 373,869) transcripts as differentially expressed, respectively. Co-expression analysis revealed modules enriched in photosynthesis, small molecule metabolism, alpha-amino acid metabolism, trehalose biosynthesis, serine family amino acid metabolism, and carbohydrate transport. Together, our findings reveal that carbohydrate metabolism is coordinated with the degradation of amino acids to provide carbon skeletons to the tricarboxylic acid cycle. This coordination may help to maintain energetic balance during drought stress adaptation, facilitating recovery after the stress is alleviated. Our results shed light on candidate regulatory elements and pave the way to biotechnology strategies towards the development of drought-tolerant sugarcane plants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/ijms21239124DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7729667PMC
November 2020

Highly accurate long-read HiFi sequencing data for five complex genomes.

Sci Data 2020 11 17;7(1):399. Epub 2020 Nov 17.

Pacific Biosciences of California Inc., 1305 O'Brien Dr., Menlo Park, CA, 94025, USA.

The PacBio HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10-25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-020-00743-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673114PMC
November 2020

Gramene 2021: harnessing the power of comparative genomics and pathways for plant research.

Nucleic Acids Res 2021 01;49(D1):D1452-D1463

Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.

Gramene (http://www.gramene.org), a knowledgebase founded on comparative functional analyses of genomic and pathway data for model plants and major crops, supports agricultural researchers worldwide. The resource is committed to open access and reproducible science based on the FAIR data principles. Since the last NAR update, we made nine releases; doubled the genome portal's content; expanded curated genes, pathways and expression sets; and implemented the Domain Informational Vocabulary Extraction (DIVE) algorithm for extracting gene function information from publications. The current release, #63 (October 2020), hosts 93 reference genomes-over 3.9 million genes in 122 947 families with orthologous and paralogous classifications. Plant Reactome portrays pathway networks using a combination of manual biocuration in rice (320 reference pathways) and orthology-based projections to 106 species. The Reactome platform facilitates comparison between reference and projected pathways, gene expression analyses and overlays of gene-gene interactions. Gramene integrates ontology-based protein structure-function annotation; information on genetic, epigenetic, expression, and phenotypic diversity; and gene functional annotations extracted from plant-focused journals using DIVE. We train plant researchers in biocuration of genes and pathways; host curated maize gene structures as tracks in the maize genome browser; and integrate curated rice genes and pathways in the Plant Reactome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa979DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7779000PMC
January 2021

Identification of the First Nuclear Male Sterility Gene (Male-sterile 9) in Sorghum.

Plant Genome 2019 11;12(3):1-12

Plant Stress and Germplasm Development Unit, USDA-ARS, Lubbock, TX, 79415.

Core Ideas: The male-sterile 9 (ms9) is a novel nuclear male-sterile mutant in sorghum. The Ms9 gene encodes a PHD-finger transcription factor critical for pollen development. The identification of the Ms9 gene provides a strategy to control male sterility in sorghum. Nuclear male sterility (NMS) is important for understanding microspore development and could facilitate the development of new strategies to control male sterility. Several NMS lines and mutants have been reported in sorghum [Sorghum bicolor (L.) Moench] previously. However, no male-sterile gene has been identified, hampering the utility of NMS in sorghum breeding. In this study, we characterized a new NMS mutant, male sterile 9 (ms9), which is distinct from all other reported NMS loci. The ms9 mutant is stable under a variety of environmental conditions. Homozygous ms9 plants produced normal ovaries but small pale-colored anthers that contained no pollen grains. Microscopic analyses revealed abnormal microspore development of ms9 at the midmicrospore stage, causing degeneration of microspore inside the anther lobes and male sterility of ms9 plants. Using MutMap, we identified the Ms9 gene as a plant homeotic domain (PHD)-finger transcription factor similar to Ms1 in Arabidopsis thaliana (L.) Heynh. and Ptc1 in rice (Oryza sativa L.). Ms9 is the first NMS gene identified in sorghum. Thus, the Ms9 gene and ms9 mutant provide new genetic tools for studying pollen development and controlling male sterility in sorghum.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3835/plantgenome2019.03.0020DOI Listing
November 2019

BSAseq: an interactive and integrated web-based workflow for identification of causal mutations in bulked F2 populations.

Bioinformatics 2021 04;37(3):382-387

USDA-ARS Cropping Systems Research Laboratory, Lubbock, TX 79415, USA.

Summary: With the advance of next-generation sequencing technologies and reductions in the costs of these techniques, bulked segregant analysis (BSA) has become not only a powerful tool for mapping quantitative trait loci but also a useful way to identify causal gene mutations underlying phenotypes of interest. However, due to the presence of background mutations and errors in sequencing, genotyping, and reference assembly, it is often difficult to distinguish true causal mutations from background mutations. In this study, we developed the BSAseq workflow, which includes an automated bioinformatics analysis pipeline with a probabilistic model for estimating the linked region (the region linked to the causal mutation) and an interactive Shiny web application for visualizing the results. We deeply sequenced a sorghum male-sterile parental line (ms8) to capture the majority of background mutations in our bulked F2 data. We applied the workflow to 11 bulked sorghum F2 populations and 1 rice F2 population and identified the true causal mutation in each population. The workflow is intuitive and straightforward, facilitating its adoption by users without bioinformatics analysis skills. We anticipate that the BSAseq workflow will be broadly applicable to the identification of causal mutations for many phenotypes of interest.

Availability And Implementation: BSAseq is freely available on https://www.sciapps.org/page/bsa.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa709DOI Listing
April 2021

Gapless assembly of maize chromosomes using long-read technologies.

Genome Biol 2020 05 20;21(1):121. Epub 2020 May 20.

Department of Genetics, University of Georgia, Athens, GA, 30602, USA.

Creating gapless telomere-to-telomere assemblies of complex genomes is one of the ultimate challenges in genomics. We use two independent assemblies and an optical map-based merging pipeline to produce a maize genome (B73-Ab10) composed of 63 contigs and a contig N50 of 162 Mb. This genome includes gapless assemblies of chromosome 3 (236 Mb) and chromosome 9 (162 Mb), and 53 Mb of the Ab10 meiotic drive haplotype. The data also reveal the internal structure of seven centromeres and five heterochromatic knobs, showing that the major tandem repeat arrays (CentC, knob180, and TR-1) are discontinuous and frequently interspersed with retroelements.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-02029-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7238635PMC
May 2020

Effect of sequence depth and length in long-read assembly of the maize inbred NC358.

Nat Commun 2020 05 8;11(1):2288. Epub 2020 May 8.

Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA.

Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11-21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-16037-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7211024PMC
May 2020

Management, Analyses, and Distribution of the MaizeCODE Data on the Cloud.

Front Plant Sci 2020 31;11:289. Epub 2020 Mar 31.

Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, United States.

MaizeCODE is a project aimed at identifying and analyzing functional elements in the maize genome. In its initial phase, MaizeCODE assayed up to five tissues from four maize strains (B73, NC350, W22, TIL11) by RNA-Seq, Chip-Seq, RAMPAGE, and small RNA sequencing. To facilitate reproducible science and provide both human and machine access to the MaizeCODE data, we enhanced SciApps, a cloud-based portal, for analysis and distribution of both raw data and analysis results. Based on the SciApps workflow platform, we generated new components to support the complete cycle of MaizeCODE data management. These include publicly accessible scientific workflows for the reproducible and shareable analysis of various functional data, a RESTful API for batch processing and distribution of data and metadata, a searchable data page that lists each MaizeCODE experiment as a reproducible workflow, and integrated JBrowse genome browser tracks linked with workflows and metadata. The SciApps portal is a flexible platform that allows the integration of new analysis tools, workflows, and genomic data from multiple projects. Through metadata and a ready-to-compute cloud-based platform, the portal experience improves access to the MaizeCODE data and facilitates its analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fpls.2020.00289DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7136414PMC
March 2020

Variant phasing and haplotypic expression from long-read sequencing in maize.

Commun Biol 2020 02 18;3(1):78. Epub 2020 Feb 18.

Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.

Haplotype phasing maize genetic variants is important for genome interpretation, population genetic analysis and functional analysis of allelic activity. We performed an isoform-level phasing study using two maize inbred lines and their reciprocal crosses, based on single-molecule, full-length cDNA sequencing. To phase and analyze transcripts between hybrids and parents, we developed IsoPhase. Using this tool, we validated the majority of SNPs called against matching short-read data from embryo, endosperm and root tissues, and identified allele-specific, gene-level and isoform-level differential expression between the inbred parental lines and hybrid offspring. After phasing 6907 genes in the reciprocal hybrids, we annotated the SNPs and identified large-effect genes. In addition, we identified parent-of-origin isoforms, distinct novel isoforms in maize parent and hybrid lines, and imprinted genes from different tissues. Finally, we characterized variation in cis- and trans-regulatory effects. Our study provides measures of haplotypic expression that could increase accuracy in studies of allelic expression.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-020-0805-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7028979PMC
February 2020

Haplotyping the Vitis collinear core genome with rhAmpSeq improves marker transferability in a diverse genus.

Nat Commun 2020 01 21;11(1):413. Epub 2020 Jan 21.

USDA-ARS, Grape Genetics Research Unit, Geneva, NY, 14456, USA.

Transferable DNA markers are essential for breeding and genetics. Grapevine (Vitis) breeders utilize disease resistance alleles from congeneric species ~20 million years divergent, but existing Vitis marker platforms have cross-species transfer rates as low as 2%. Here, we apply a marker strategy targeting the inferred Vitis core genome. Incorporating seven linked-read de novo assemblies and three existing assemblies, the Vitis collinear core genome is estimated to converge at 39.8 Mb (8.67% of the genome). Adding shotgun genome sequences from 40 accessions enables identification of conserved core PCR primer binding sites flanking polymorphic haplotypes with high information content. From these target regions, we develop 2,000 rhAmpSeq markers as a PCR multiplex and validate the panel in four biparental populations spanning the diversity of the Vitis genus, showing transferability increases to 91.9%. This marker development strategy should be widely applicable for genetic studies in many taxa, particularly those ~20 million years divergent.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-019-14280-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6972940PMC
January 2020

Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline.

Genome Biol 2019 12 16;20(1):275. Epub 2019 Dec 16.

Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA.

Background: Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations.

Results: We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species.

Conclusions: The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-019-1905-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6913007PMC
December 2019

A high-resolution gene expression atlas links dedicated meristem genes to key architectural traits.

Genome Res 2019 12 19;29(12):1962-1973. Epub 2019 Nov 19.

Center for Plant Molecular Biology, University of Tuebingen, 72076 Tuebingen, Germany.

The shoot apical meristem (SAM) orchestrates the balance between stem cell proliferation and organ initiation essential for postembryonic shoot growth. Meristems show a striking diversity in shape and size. How this morphological diversity relates to variation in plant architecture and the molecular circuitries driving it are unclear. By generating a high-resolution gene expression atlas of the vegetative maize shoot apex, we show here that distinct sets of genes govern the regulation and identity of stem cells in maize versus Cell identities in the maize SAM reflect the combinatorial activity of transcription factors (TFs) that drive the preferential, differential expression of individual members within gene families functioning in a plethora of cellular processes. Subfunctionalization thus emerges as a fundamental feature underlying cell identity. Moreover, we show that adult plant characters are, to a significant degree, regulated by gene circuitries acting in the SAM, with natural variation modulating agronomically important architectural traits enriched specifically near dynamically expressed SAM genes and the TFs that regulate them. Besides unique mechanisms of maize stem cell regulation, our atlas thus identifies key new targets for crop improvement.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.250878.119DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6886502PMC
December 2019

Plant Reactome: a knowledgebase and resource for comparative pathway analysis.

Nucleic Acids Res 2020 01;48(D1):D1093-D1103

Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR, USA.

Plant Reactome (https://plantreactome.gramene.org) is an open-source, comparative plant pathway knowledgebase of the Gramene project. It uses Oryza sativa (rice) as a reference species for manual curation of pathways and extends pathway knowledge to another 82 plant species via gene-orthology projection using the Reactome data model and framework. It currently hosts 298 reference pathways, including metabolic and transport pathways, transcriptional networks, hormone signaling pathways, and plant developmental processes. In addition to browsing plant pathways, users can upload and analyze their omics data, such as the gene-expression data, and overlay curated or experimental gene-gene interaction data to extend pathway knowledge. The curation team actively engages researchers and students on gene and pathway curation by offering workshops and online tutorials. The Plant Reactome supports, implements and collaborates with the wider community to make data and tools related to genes, genomes, and pathways Findable, Accessible, Interoperable and Re-usable (FAIR).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz996DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7145600PMC
January 2020

Sorghum Encodes an ω-3 Fatty Acid Desaturase that Increases Grain Number by Reducing Jasmonic Acid Levels.

Int J Mol Sci 2019 Oct 28;20(21). Epub 2019 Oct 28.

Plant Stress and Germplasm Development Unit, Cropping Systems Research Laboratory, U.S. Department of Agriculture-Agricultural Research Service, Lubbock, TX 79415, USA.

Grain number per panicle is an important component of grain yield in sorghum ( (L.)) and other cereal crops. Previously, we reported that mutations in multi-seeded 1 ( and genes result in a two-fold increase in grain number per panicle due to the restoration of the fertility of the pedicellate spikelets, which invariably abort in natural sorghum accessions. Here, we report the identification of another gene, which is also involved in the regulation of grain numbers in sorghum. Four bulked F populations from crosses between BTx623 and each of the independent mutants p6, p14, p21, and p24 were sequenced to 20× coverage of the whole genome on a HiSeq 2000 system. Bioinformatic analyses of the sequence data showed that one gene, Sorbi_3001G407600, harbored homozygous mutations in all four populations. This gene encodes a plastidial ω-3 fatty acid desaturase that catalyzes the conversion of linoleic acid (18:2) to linolenic acid (18:3), a substrate for jasmonic acid (JA) biosynthesis. The mutants had reduced levels of linolenic acid in both leaves and developing panicles that in turn decreased the levels of JA. Furthermore, the panicle phenotype was reversed by treatment with methyl-JA (MeJA). Our characterization of and now demonstrates that JA-regulated processes are critical to the phenotype. The identification of the gene reveals a new target that could be manipulated to increase grain number per panicle in sorghum, and potentially other cereal crops, through the genomic editing of functional orthologs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/ijms20215359DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6862555PMC
October 2019

Double triage to identify poorly annotated genes in maize: The missing link in community curation.

PLoS One 2019 28;14(10):e0224086. Epub 2019 Oct 28.

DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America.

The sophistication of gene prediction algorithms and the abundance of RNA-based evidence for the maize genome may suggest that manual curation of gene models is no longer necessary. However, quality metrics generated by the MAKER-P gene annotation pipeline identified 17,225 of 130,330 (13%) protein-coding transcripts in the B73 Reference Genome V4 gene set with models of low concordance to available biological evidence. Working with eight graduate students, we used the Apollo annotation editor to curate 86 transcript models flagged by quality metrics and a complimentary method using the Gramene gene tree visualizer. All of the triaged models had significant errors-including missing or extra exons, non-canonical splice sites, and incorrect UTRs. A correct transcript model existed for about 60% of genes (or transcripts) flagged by quality metrics; we attribute this to the convention of elevating the transcript with the longest coding sequence (CDS) to the canonical, or first, position. The remaining 40% of flagged genes resulted in novel annotations and represent a manual curation space of about 10% of the maize genome (~4,000 protein-coding genes). MAKER-P metrics have a specificity of 100%, and a sensitivity of 85%; the gene tree visualizer has a specificity of 100%. Together with the Apollo graphical editor, our double triage provides an infrastructure to support the community curation of eukaryotic genomes by scientists, students, and potentially even citizen scientists.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0224086PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6816542PMC
March 2020

Ensembl Genomes 2020-enabling non-vertebrate genomic research.

Nucleic Acids Res 2020 01;48(D1):D689-D695

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz890DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6943047PMC
January 2020

Fertility of Pedicellate Spikelets in Sorghum Is Controlled by a Jasmonic Acid Regulatory Module.

Int J Mol Sci 2019 Oct 8;20(19). Epub 2019 Oct 8.

Plant Stress and Germplasm Development Unit, Cropping Systems Research Laboratory, U.S. Department of Agriculture-Agricultural Research Service, Lubbock, TX 79415, USA.

As in other cereal crops, the panicles of sorghum ( (L.) Moench) comprise two types of floral spikelets (grass flowers). Only sessile spikelets (SSs) are capable of producing viable grains, whereas pedicellate spikelets (PSs) cease development after initiation and eventually abort. Consequently, grain number per panicle (GNP) is lower than the total number of flowers produced per panicle. The mechanism underlying this differential fertility is not well understood. To investigate this issue, we isolated a series of ethyl methane sulfonate (EMS)-induced () mutants that result in full spikelet fertility, effectively doubling GNP. Previously, we showed that MSD1 is a TCP (Teosinte branched/Cycloidea/PCF) transcription factor that regulates jasmonic acid (JA) biosynthesis, and ultimately floral sex organ development. Here, we show that encodes a lipoxygenase (LOX) that catalyzes the first committed step of JA biosynthesis. Further, we demonstrate that MSD1 binds to the promoters of and other JA pathway genes. Together, these results show that a JA-induced module regulates sorghum panicle development and spikelet fertility. The findings advance our understanding of inflorescence development and could lead to new strategies for increasing GNP and grain yield in sorghum and other cereal crops.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/ijms20194951DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6801740PMC
October 2019

Reviving the Transcriptome Studies: An Insight Into the Emergence of Single-Molecule Transcriptome Sequencing.

Front Genet 2019 26;10:384. Epub 2019 Apr 26.

Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, United States.

Advances in transcriptomics have provided an exceptional opportunity to study functional implications of the genetic variability. Technologies such as RNA-Seq have emerged as state-of-the-art techniques for transcriptome analysis that take advantage of high-throughput next-generation sequencing. However, similar to their predecessors, these approaches continue to impose major challenges on full-length transcript structure identification, primarily due to inherent limitations of read length. With the development of single-molecule sequencing (SMS) from PacBio, a growing number of studies on the transcriptome of different organisms have been reported. SMS has emerged as advantageous for comprehensive genome annotation including identification of novel genes/isoforms, long non-coding RNAs and fusion transcripts. This approach can be used across a broad spectrum of species to better interpret the coding information of the genome, and facilitate the biological function study. We provide an overview of SMS platform and its diverse applications in various biological studies, and our perspective on the challenges associated with the transcriptome studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2019.00384DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6498185PMC
April 2019

The Dominant and Poorly Penetrant Phenotypes of Maize Are Caused by DNA Methylation Changes at a Linked Transposon.

Plant Cell 2018 12 18;30(12):3006-3023. Epub 2018 Dec 18.

Department of Plant Science, Pennsylvania State University, University Park, Pennsylvania 16802

The maize () mutant () has been implicated in the epigenetic modifications of (), which regulates the production of the flavonoid pigments phlobaphenes. Here, we show that the gene maps to a genetically recalcitrant region near the centromere of chromosome 10. Transcriptome analysis of mutant and wild-type plants identified a candidate gene in the mapping region using a comparative sequence-based approach. The candidate gene, GRMZM2G053177, is overexpressed by >45-fold in multiple tissues of , explaining the dominance of and its phenotypes. In the mutant stock, GRMZM2G053177 has a unique transcript originating within a CACTA transposon inserted in its first intron, and it is missing the first four codons of the wild-type transcript. GRMZM2G053177 expression is regulated by the DNA methylation status of the CACTA transposon, explaining the incomplete penetrance and poor expressivity of Transgenic overexpression lines of GRMZM2G053177 () phenocopy the -induced pigmentation in coleoptiles, tassels, leaf sheaths, husks, pericarps, and cob glumes. Transcriptome analysis of versus wild-type tissues revealed changes in several pathways related to abiotic and biotic stress. Thus, this study addresses the enigma of identity in maize, which had gone unsolved for more than 50 years.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1105/tpc.18.00546DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6354275PMC
December 2018

Transcriptional regulation of nitrogen-associated metabolism and growth.

Nature 2018 11 24;563(7730):259-264. Epub 2018 Oct 24.

Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA, USA.

Nitrogen is an essential macronutrient for plant growth and basic metabolic processes. The application of nitrogen-containing fertilizer increases yield, which has been a substantial factor in the green revolution. Ecologically, however, excessive application of fertilizer has disastrous effects such as eutrophication. A better understanding of how plants regulate nitrogen metabolism is critical to increase plant yield and reduce fertilizer overuse. Here we present a transcriptional regulatory network and twenty-one transcription factors that regulate the architecture of root and shoot systems in response to changes in nitrogen availability. Genetic perturbation of a subset of these transcription factors revealed coordinate transcriptional regulation of enzymes involved in nitrogen metabolism. Transcriptional regulators in the network are transcriptionally modified by feedback via genetic perturbation of nitrogen metabolism. The network, genes and gene-regulatory modules identified here will prove critical to increasing agricultural productivity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-018-0656-3DOI Listing
November 2018

Publisher Correction: Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza.

Nat Genet 2018 11;50(11):1618

Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan.

This article was not made open access when initially published online, which was corrected before print publication. In addition, ORCID links were missing for 12 authors and have been added to the HTML and PDF versions of the article.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0261-2DOI Listing
November 2018

AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture.

Database (Oxford) 2018 01 1;2018. Epub 2018 Jan 1.

Boyce Thompson Institute, Ithaca, NY, USA.

The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bay088DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146126PMC
January 2018

Improved RNA-seq Workflows Using CyVerse Cyberinfrastructure.

Curr Protoc Bioinformatics 2018 09 31;63(1):e53. Epub 2018 Aug 31.

Cold Spring Harbor Laboratory, Cold Spring Harbor, New York.

RNA-seq is a vital method for understanding gene structure and expression patterns. Typical RNA-seq analysis protocols use sequencing reads of length 50 to 150 nucleotides for alignment to the reference genome and assembly of transcripts. The resultant transcripts are quantified and used for differential expression and visualization. Existing tools and protocols for RNA-seq are vast and diverse; given their differences in performance, it is critical to select an analysis protocol that is scalable, accurate, and easy to use. Tuxedo, a popular alignment-based protocol for RNA-seq analysis, has been updated with HISAT2, StringTie, StringTie-merge, and Ballgown, and the updated protocol outperforms its predecessor. Similarly, new pseudo-alignment-based protocols like Kallisto and Sleuth reduce runtime and improve performance. However, these tools are challenging for researchers lacking command-line experience. Here, we describe two new RNA-seq analysis protocols, in which all tools are deployed on CyVerse Cyberinfrastructure with user-friendly graphical user interfaces, and validate their performance using plant RNA-seq data. © 2018 by John Wiley & Sons, Inc.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cpbi.53DOI Listing
September 2018

The maize W22 genome provides a foundation for functional genomics and transposon biology.

Nat Genet 2018 09 30;50(9):1282-1288. Epub 2018 Jul 30.

Horticultural Sciences Department, University of Florida, Gainesville, FL, USA.

The maize W22 inbred has served as a platform for maize genetics since the mid twentieth century. To streamline maize genome analyses, we have sequenced and de novo assembled a W22 reference genome using short-read sequencing technologies. We show that significant structural heterogeneity exists in comparison to the B73 reference genome at multiple scales, from transposon composition and copy number variation to single-nucleotide polymorphisms. The generation of this reference genome enables accurate placement of thousands of Mutator (Mu) and Dissociation (Ds) transposable element insertions for reverse and forward genetics studies. Annotation of the genome has been achieved using RNA-seq analysis, differential nuclease sensitivity profiling and bisulfite sequencing to map open reading frames, open chromatin sites and DNA methylation profiles, respectively. Collectively, the resources developed here integrate W22 as a community reference genome for functional genomics and provide a foundation for the maize pan-genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0158-0DOI Listing
September 2018

Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes.

Nat Genet 2018 09 30;50(9):1289-1295. Epub 2018 Jul 30.

State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, China.

Maize is an important crop with a high level of genome diversity and heterosis. The genome sequence of a typical female line, B73, was previously released. Here, we report a de novo genome assembly of a corresponding male representative line, Mo17. More than 96.4% of the 2,183 Mb assembled genome can be accounted for by 362 scaffolds in ten pseudochromosomes with 38,620 annotated protein-coding genes. Comparative analysis revealed large gene-order and gene structural variations: approximately 10% of the annotated genes were mutually nonsyntenic, and more than 20% of the predicted genes had either large-effect mutations or large structural variations, which might cause considerable protein divergence between the two inbred lines. Our study provides a high-quality reference-genome sequence of an important maize germplasm, and the intraspecific gene order and gene structural variations identified should have implications for heterosis and genome evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0182-0DOI Listing
September 2018
-->