Publications by authors named "Mario Caccamo"

56 Publications

Reap the crop wild relatives for breeding future crops.

Trends Biotechnol 2021 Oct 8. Epub 2021 Oct 8.

Centre of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India; State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Murdoch University, Murdoch, WA 6150, Australia. Electronic address:

Crop wild relatives (CWRs) have provided breeders with several 'game-changing' traits or genes that have boosted crop resilience and global agricultural production. Advances in breeding and genomics have accelerated the identification of valuable CWRs for use in crop improvement. The enhanced genetic diversity of breeding pools carrying optimum combinations of favorable alleles for targeted crop-growing regions is crucial to sustain genetic gain. In parallel, growing sequence information on wild genomes in combination with precise gene-editing tools provide a fast-track route to transform CWRs into ideal future crops. Data-informed germplasm collection and management strategies together with adequate policy support will be equally important to improve access to CWRs and their sustainable use to meet food and nutrition security targets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.tibtech.2021.08.009DOI Listing
October 2021

, a Model Species for Diplosporous Apomixis.

Plants (Basel) 2021 Aug 31;10(9). Epub 2021 Aug 31.

Centro de Recursos Naturales Renovables de la Zona Semiárida (CERZOS-CCT-CONICET Bahía Blanca), Camino de la Carrindanga km 7, Bahía Blanca 8000, Argentina.

(Schrad.) Ness is a grass with a particular apomictic embryo sac development called Eragrostis type. Apomixis is a type of asexual reproduction that produces seeds without fertilization in which the resulting progeny is genetically identical to the mother plant and with the potential to fix the hybrid vigour from more than one generation, among other advantages. The absence of meiosis and the occurrence of only two rounds of mitosis instead of three during embryo sac development make this model unique and suitable to be transferred to economically important crops. Throughout this review, we highlight the advances in the knowledge of apomixis in using different techniques such as cytoembryology, DNA methylation analyses, small-RNA-seq, RNA-seq, genome assembly, and genotyping by sequencing. The main bulk of evidence points out that apomixis is inherited as a single Mendelian factor, and it is regulated by genetic and epigenetic mechanisms controlled by a complex network. With all this information, we propose a model of the mechanisms involved in diplosporous apomixis in this grass. All the genetic and epigenetic resources generated in to study the reproductive mode changed its status from an orphan to a well-characterised species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/plants10091818DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8472828PMC
August 2021

Resequencing of 672 Native Rice Accessions to Explore Genetic Diversity and Trait Associations in Vietnam.

Rice (N Y) 2021 Jun 10;14(1):52. Epub 2021 Jun 10.

Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK.

Background: Vietnam possesses a vast diversity of rice landraces due to its geographical situation, latitudinal range, and a variety of ecosystems. This genetic diversity constitutes a highly valuable resource at a time when the highest rice production areas in the low-lying Mekong and Red River Deltas are enduring increasing threats from climate changes, particularly in rainfall and temperature patterns.

Results: We analysed 672 Vietnamese rice genomes, 616 newly sequenced, that encompass the range of rice varieties grown in the diverse ecosystems found throughout Vietnam. We described four Japonica and five Indica subpopulations within Vietnam likely adapted to the region of origin. We compared the population structure and genetic diversity of these Vietnamese rice genomes to the 3000 genomes of Asian cultivated rice. The named Indica-5 (I5) subpopulation was expanded in Vietnam and contained lowland Indica accessions, which had very low shared ancestry with accessions from any other subpopulation and were previously overlooked as admixtures. We scored phenotypic measurements for nineteen traits and identified 453 unique genotype-phenotype significant associations comprising twenty-one QTLs (quantitative trait loci). The strongest associations were observed for grain size traits, while weaker associations were observed for a range of characteristics, including panicle length, heading date and leaf width.

Conclusions: We showed how the rice diversity within Vietnam relates to the wider Asian rice diversity by using a number of approaches to provide a clear picture of the novel diversity present within Vietnam, mainly around the Indica-5 subpopulation. Our results highlight differences in genome composition and trait associations among traditional Vietnamese rice accessions, which are likely the product of adaption to multiple environmental conditions and regional preferences in a very diverse country. Our results highlighted traits and their associated genomic regions that are a potential source of novel loci and alleles to breed a new generation of low input sustainable and climate resilient rice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12284-021-00481-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8192651PMC
June 2021

Differential Methylation Patterns in Apomictic vs. Sexual Genotypes of the Diplosporous Grass .

Plants (Basel) 2021 May 10;10(5). Epub 2021 May 10.

Centro de Recursos Naturales Renovables de la Zona Semiárida (CERZOS-CCT-CONICET Bahía Blanca), Camino de la Carrindanga km 7, 8000 Bahía Blanca, Argentina.

DNA methylation is an epigenetic mechanism by which a methyl group is added to a cytosine or an adenine. When located in a gene/regulatory sequence it may repress or de-repress genes, depending on the context and species. is an apomictic grass in which facultative genotypes increases the frequency of sexual pistils triggered by epigenetic mechanisms. The aim of the present study was to look for correlations between the reproductive mode and specific methylated genes or genomic regions. To do so, plants with contrasting reproductive modes were investigated through MCSeEd (Methylation Context Sensitive Enzyme ddRad) showing higher levels of DNA methylation in apomictic genotypes. Moreover, an increased proportion of differentially methylated positions over the regulatory regions were observed, suggesting its possible role in regulation of gene expression. Interestingly, the methylation pathway was also found to be self-regulated since two of the main genes ( and ), involved in de-methylation, were found differentially methylated between genotypes with different reproductive behavior. Moreover, this work allowed us to detect several genes regulated by methylation that were previously found as differentially expressed in the comparisons between apomictic and sexual genotypes, linking DNA methylation to differences in reproductive mode.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/plants10050946DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150776PMC
May 2021

Building a successful international research community through data sharing: The case of the Wheat Information System (WheatIS).

F1000Res 2020 5;9:536. Epub 2020 Jun 5.

Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France.

The International Wheat Information System (WheatIS) Expert Working Group (EWG) was initiated in 2012 under the Wheat Initiative with a broad range of contributing organizations. The mission of the WheatIS EWG was to create an informational infrastructure, establish data standards, and build a single portal that allows search, retrieval, and display of globally distributed wheat data sets that are indexed in standard data formats at servers around the world. The web portal at WheatIS.org was released publicly in 2015, and by 2020, it expanded to 8 geographically-distributed nodes and around 20 organizations under its umbrella.   In this paper, we present our experience, the challenges we faced, and the answer we brought for establishing an international research community to build an informational infrastructure. Our hope is that our experience with building wheatis.org will guide current and future research communities to facilitate institutional and international challenges to create global tools and resources to help their respective scientific communities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.23525.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7953914PMC
April 2021

Diversity analysis of 80,000 wheat accessions reveals consequences and opportunities of selection footprints.

Nat Commun 2020 09 11;11(1):4572. Epub 2020 Sep 11.

Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz Km. 45 El Batán, Texcoco, C.P., 56237, Mexico.

Undomesticated wild species, crop wild relatives, and landraces represent sources of variation for wheat improvement to address challenges from climate change and the growing human population. Here, we study 56,342 domesticated hexaploid, 18,946 domesticated tetraploid and 3,903 crop wild relatives in a massive-scale genotyping and diversity analysis. Using DArTseq technology, we identify more than 300,000 high-quality SNPs and SilicoDArT markers and align them to three reference maps: the IWGSC RefSeq v1.0 genome assembly, the durum wheat genome assembly (cv. Svevo), and the DArT genetic map. On average, 72% of the markers are uniquely placed on these maps and 50% are linked to genes. The analysis reveals landraces with unexplored diversity and genetic footprints defined by regions under selection. This provides fertile ground to develop wheat varieties of the future by exploring specific gene or chromosome regions and identifying germplasm conserving allelic diversity missing in current breeding programs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-18404-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7486412PMC
September 2020

Primula vulgaris (primrose) genome assembly, annotation and gene expression, with comparative genomics on the heterostyly supergene.

Sci Rep 2018 12 18;8(1):17942. Epub 2018 Dec 18.

School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, United Kingdom.

Primula vulgaris (primrose) exhibits heterostyly: plants produce self-incompatible pin- or thrum-form flowers, with anthers and stigma at reciprocal heights. Darwin concluded that this arrangement promotes insect-mediated cross-pollination; later studies revealed control by a cluster of genes, or supergene, known as the S (Style length) locus. The P. vulgaris S locus is absent from pin plants and hemizygous in thrum plants (thrum-specific); mutation of S locus genes produces self-fertile homostyle flowers with anthers and stigma at equal heights. Here, we present a 411 Mb P. vulgaris genome assembly of a homozygous inbred long homostyle, representing ~87% of the genome. We annotate over 24,000 P. vulgaris genes, and reveal more genes up-regulated in thrum than pin flowers. We show reduced genomic read coverage across the S locus in other Primula species, including P. veris, where we define the conserved structure and expression of the S locus genes in thrum. Further analysis reveals the S locus has elevated repeat content (64%) compared to the wider genome (37%). Our studies suggest conservation of S locus genetic architecture in Primula, and provide a platform for identification and evolutionary analysis of the S locus and downstream targets that regulate heterostyly in diverse heterostylous species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-018-36304-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6299000PMC
December 2018

Yerba mate (Ilex paraguariensis, A. St.-Hil.) de novo transcriptome assembly based on tissue specific genomic expression profiles.

BMC Genomics 2018 Dec 7;19(1):891. Epub 2018 Dec 7.

Grupo de Investigación en Genética Aplicada (GIGA), Facultad de Ciencias Exactas Químicas y Naturales, Instituto de Biología Subtropical (IBS UNaM-CONICET), Universidad Nacional de Misiones, Jujuy 1745, CP3300, Posadas, Misiones, Argentina.

Background: The most common infusion in southern Latin-American countries is prepared with dried leaves of Ilex paraguariensis A. St.-Hil., an aboriginal ancestral beverage known for its high polyphenols concentration currently consumed in > 90% of homes in Argentina, in Paraguay and Uruguay. The economy of entire provinces heavily relies on the production, collection and manufacture of Ilex paraguariensis, the fifth plant species with highest antioxidant activity. Polyphenols are associated to relevant health benefits including strong antioxidant properties. Despite its regional relevance and potential biotechnological applications, little is known about functional genomics and genetics underlying phenotypic variation of relevant traits. By generating tissue specific transcriptomic profiles, we aimed to comprehensively annotate genes in the Ilex paraguariensis phenylpropanoid pathway and to evaluate differential expression profiles.

Results: In this study we generated a reliable transcriptome assembly based on a collection of 15 RNA-Seq libraries from different tissues of Ilex paraguariensis. A total of 554 million RNA-Seq reads were assembled into 193,897 transcripts, where 24,612 annotated full-length transcripts had complete ORF. We assessed the transcriptome assembly quality, completeness and accuracy using BUSCO and TransRate; consistency was also evaluated by experimentally validating 11 predicted genes by PCR and sequencing. Functional annotation against KEGG Pathway database identified 1395 unigenes involved in biosynthesis of secondary metabolites, 531 annotated transcripts corresponded to the phenylpropanoid pathway. The top 30 differentially expressed genes among tissue revealed genes involved in photosynthesis and stress response. These significant differences were then validated by qRT-PCR.

Conclusions: Our study is the first to provide data from whole genome gene expression profiles in different Ilex paraguariensis tissues, experimentally validating in-silico predicted genes key to the phenylpropanoid (antioxidant) pathway. Our results provide essential genomic data of potential use in breeding programs for polyphenol content. Further studies are necessary to assess if the observed expression variation in the phenylpropanoid pathway annotated genes is related to variations in leaves' polyphenol content at the population scale. These results set the current reference for Ilex paraguariensis genomic studies and provide a substantial contribution to research and biotechnological applications of phenylpropanoid secondary metabolites.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-018-5240-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6286616PMC
December 2018

Construction of a map-based reference genome sequence for barley, Hordeum vulgare L.

Sci Data 2017 04 27;4:170044. Epub 2017 Apr 27.

College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China.

Barley (Hordeum vulgare L.) is a cereal grass mainly used as animal fodder and raw material for the malting industry. The map-based reference genome sequence of barley cv. 'Morex' was constructed by the International Barley Genome Sequencing Consortium (IBSC) using hierarchical shotgun sequencing. Here, we report the experimental and computational procedures to (i) sequence and assemble more than 80,000 bacterial artificial chromosome (BAC) clones along the minimum tiling path of a genome-wide physical map, (ii) find and validate overlaps between adjacent BACs, (iii) construct 4,265 non-redundant sequence scaffolds representing clusters of overlapping BACs, and (iv) order and orient these BAC clusters along the seven barley chromosomes using positional information provided by dense genetic maps, an optical map and chromosome conformation capture sequencing (Hi-C). Integrative access to these sequence and mapping resources is provided by the barley genome explorer (BARLEX).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.44DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5407242PMC
April 2017

A chromosome conformation capture ordered sequence of the barley genome.

Nature 2017 04;544(7651):427-433

European Molecular Biology Laboratory - The European Bioinformatics Institute, Hinxton CB10 1SD, UK.

Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature22043DOI Listing
April 2017

Finding a partner in the ocean: molecular and evolutionary bases of the response to sexual cues in a planktonic diatom.

New Phytol 2017 Jul 21;215(1):140-156. Epub 2017 Apr 21.

Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Villa Comunale 1, Naples, 80121, Italy.

Microalgae play a major role as primary producers in aquatic ecosystems. Cell signalling regulates their interactions with the environment and other organisms, yet this process in phytoplankton is poorly defined. Using the marine planktonic diatom Pseudo-nitzschia multistriata, we investigated the cell response to cues released during sexual reproduction, an event that demands strong regulatory mechanisms and impacts on population dynamics. We sequenced the genome of P. multistriata and performed phylogenomic and transcriptomic analyses, which allowed the definition of gene gains and losses, horizontal gene transfers, conservation and evolutionary rate of sex-related genes. We also identified a small number of conserved noncoding elements. Sexual reproduction impacted on cell cycle progression and induced an asymmetric response of the opposite mating types. G protein-coupled receptors and cyclic guanosine monophosphate (cGMP) are implicated in the response to sexual cues, which overall entails a modulation of cell cycle, meiosis-related and nutrient transporter genes, suggesting a fine control of nutrient uptake even under nutrient-replete conditions. The controllable life cycle and the genome sequence of P. multistriata allow the reconstruction of changes occurring in diatoms in a key phase of their life cycle, providing hints on the evolution and putative function of their genes and empowering studies on sexual reproduction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/nph.14557DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5485032PMC
July 2017

Whole-Genome Characteristics and Polymorphic Analysis of Vietnamese Rice Landraces as a Comprehensive Information Resource for Marker-Assisted Selection.

Int J Genomics 2017 7;2017:9272363. Epub 2017 Feb 7.

Department of Genetic Engineering, Agricultural Genetics Institute, Vietnam Academy of Agricultural Sciences, Km2 Pham Van Dong, Tuliem, Hanoi, Vietnam.

Next generation sequencing technologies have provided numerous opportunities for application in the study of whole plant genomes. In this study, we present the sequencing and bioinformatic analyses of five typical rice landraces including three and two with potential blast resistance. A total of 688.4 million 100 bp paired-end reads have yielded approximately 30-fold coverage to compare with the Nipponbare reference genome. Among them, a small number of reads were mapped to both chromosomes and organellar genomes. Over two million and eight hundred thousand single nucleotide polymorphisms (SNPs) and insertions and deletions (InDels) in and lines have been determined, which potentially have significant impacts on multiple transcripts of genes. SNP deserts, contiguous SNP-low regions, were found on chromosomes 1, 4, and 5 of all genomes of rice examined. Based on the distribution of SNPs per 100 kilobase pairs, the phylogenetic relationships among the landraces have been constructed. This is the first step towards revealing several salient features of rice genomes in Vietnam and providing significant information resources to further marker-assisted selection (MAS) in rice breeding programs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1155/2017/9272363DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5318636PMC
February 2017

Genome sequence and genetic diversity of European ash trees.

Nature 2017 01 26;541(7636):212-216. Epub 2016 Dec 26.

School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK.

Ash trees (genus Fraxinus, family Oleaceae) are widespread throughout the Northern Hemisphere, but are being devastated in Europe by the fungus Hymenoscyphus fraxineus, causing ash dieback, and in North America by the herbivorous beetle Agrilus planipennis. Here we sequence the genome of a low-heterozygosity Fraxinus excelsior tree from Gloucestershire, UK, annotating 38,852 protein-coding genes of which 25% appear ash specific when compared with the genomes of ten other plant species. Analyses of paralogous genes suggest a whole-genome duplication shared with olive (Olea europaea, Oleaceae). We also re-sequence 37 F. excelsior trees from Europe, finding evidence for apparent long-term decline in effective population size. Using our reference sequence, we re-analyse association transcriptomic data, yielding improved markers for reduced susceptibility to ash dieback. Surveys of these markers in British populations suggest that reduced susceptibility to ash dieback may be more widespread in Great Britain than in Denmark. We also present evidence that susceptibility of trees to H. fraxineus is associated with their iridoid glycoside levels. This rapid, integrated, multidisciplinary research response to an emerging health threat in a non-model organism opens the way for mitigation of the epidemic.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature20786DOI Listing
January 2017

Genetic architecture and evolution of the S locus supergene in Primula vulgaris.

Nat Plants 2016 12 2;2(12):16188. Epub 2016 Dec 2.

School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich NR4 7TJ, UK.

Darwin's studies on heterostyly in Primula described two floral morphs, pin and thrum, with reciprocal anther and stigma heights that promote insect-mediated cross-pollination. This key innovation evolved independently in several angiosperm families. Subsequent studies on heterostyly in Primula contributed to the foundation of modern genetic theory and the neo-Darwinian synthesis. The established genetic model for Primula heterostyly involves a diallelic S locus comprising several genes, with rare recombination events that result in self-fertile homostyle flowers with anthers and stigma at the same height. Here we reveal the S locus supergene as a tightly linked cluster of thrum-specific genes that are absent in pins. We show that thrums are hemizygous not heterozygous for the S locus, which suggests that homostyles do not arise by recombination between S locus haplotypes as previously proposed. Duplication of a floral homeotic gene 51.7 million years (Myr) ago, followed by its neofunctionalization, created the current S locus assemblage which led to floral heteromorphy in Primula. Our findings provide new insights into the structure, function and evolution of this archetypal supergene.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nplants.2016.188DOI Listing
December 2016

transPLANT Resources for Triticeae Genomic Data.

Plant Genome 2016 03;9(1)

The genome sequences of many important Triticeae species, including bread wheat ( L.) and barley ( L.), remained uncharacterized for a long time because their high repeat content, large sizes, and polyploidy. As a result of improvements in sequencing technologies and novel analyses strategies, several of these have recently been deciphered. These efforts have generated new insights into Triticeae biology and genome organization and have important implications for downstream usage by breeders, experimental biologists, and comparative genomicists. transPLANT () is an EU-funded project aimed at constructing hardware, software, and data infrastructure for genome-scale research in the life sciences. Since the Triticeae data are intrinsically complex, heterogenous, and distributed, the transPLANT consortium has undertaken efforts to develop common data formats and tools that enable the exchange and integration of data from distributed resources. Here we present an overview of the individual Triticeae genome resources hosted by transPLANT partners, introduce the objectives of transPLANT, and outline common developments and interfaces supporting integrated data access.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3835/plantgenome2015.06.0038DOI Listing
March 2016

CerealsDB 3.0: expansion of resources and data integration.

BMC Bioinformatics 2016 Jun 24;17:256. Epub 2016 Jun 24.

School of Biological Sciences, University of Bristol, Bristol, BS8 1UG, UK.

Background: The increase in human populations around the world has put pressure on resources, and as a consequence food security has become an important challenge for the 21st century. Wheat (Triticum aestivum) is one of the most important crops in human and livestock diets, and the development of wheat varieties that produce higher yields, combined with increased resistance to pests and resilience to changes in climate, has meant that wheat breeding has become an important focus of scientific research. In an attempt to facilitate these improvements in wheat, plant breeders have employed molecular tools to help them identify genes for important agronomic traits that can be bred into new varieties. Modern molecular techniques have ensured that the rapid and inexpensive characterisation of SNP markers and their validation with modern genotyping methods has produced a valuable resource that can be used in marker assisted selection. CerealsDB was created as a means of quickly disseminating this information to breeders and researchers around the globe.

Description: CerealsDB version 3.0 is an online resource that contains a wide range of genomic datasets for wheat that will assist plant breeders and scientists to select the most appropriate markers for use in marker assisted selection. CerealsDB includes a database which currently contains in excess of a million putative varietal SNPs, of which several hundreds of thousands have been experimentally validated. In addition, CerealsDB also contains new data on functional SNPs predicted to have a major effect on protein function and we have constructed a web service to encourage data integration and high-throughput programmatic access.

Conclusion: CerealsDB is an open access website that hosts information on SNPs that are considered useful for both plant breeders and research scientists. The recent inclusion of web services designed to federate genomic data resources allows the information on CerealsDB to be more fully integrated with the WheatIS network and other biological databases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-016-1139-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4919907PMC
June 2016

gEVAL - a web-based browser for evaluating genome assemblies.

Bioinformatics 2016 08 7;32(16):2508-10. Epub 2016 Apr 7.

Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

Motivation: For most research approaches, genome analyses are dependent on the existence of a high quality genome reference assembly. However, the local accuracy of an assembly remains difficult to assess and improve. The gEVAL browser allows the user to interrogate an assembly in any region of the genome by comparing it to different datasets and evaluating the concordance. These analyses include: a wide variety of sequence alignments, comparative analyses of multiple genome assemblies, and consistency with optical and other physical maps. gEVAL highlights allelic variations, regions of low complexity, abnormal coverage, and potential sequence and assembly errors, and offers strategies for improvement. Although gEVAL focuses primarily on sequence integrity, it can also display arbitrary annotation including from Ensembl or TrackHub sources. We provide gEVAL web sites for many human, mouse, zebrafish and chicken assemblies to support the Genome Reference Consortium, and gEVAL is also downloadable to enable its use for any organism and assembly.

Availability And Implementation: Web Browser: http://geval.sanger.ac.uk, Plugin: http://wchow.github.io/wtsi-geval-plugin

Contact: [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btw159DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978925PMC
August 2016

Red clover (Trifolium pratense L.) draft genome provides a platform for trait improvement.

Sci Rep 2015 Nov 30;5:17394. Epub 2015 Nov 30.

Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Gogerddan, Aberystwyth, Ceredigion SY23 3EB, UK.

Red clover (Trifolium pratense L.) is a globally significant forage legume in pastoral livestock farming systems. It is an attractive component of grassland farming, because of its high yield and protein content, nutritional value and ability to fix atmospheric nitrogen. Enhancing its role further in sustainable agriculture requires genetic improvement of persistency, disease resistance, and tolerance to grazing. To help address these challenges, we have assembled a chromosome-scale reference genome for red clover. We observed large blocks of conserved synteny with Medicago truncatula and estimated that the two species diverged ~23 million years ago. Among the 40,868 annotated genes, we identified gene clusters involved in biochemical pathways of importance for forage quality and livestock nutrition. Genotyping by sequencing of a synthetic population of 86 genotypes show that the number of markers required for genomics-based breeding approaches is tractable, making red clover a suitable candidate for association studies and genomic selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep17394DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4663792PMC
November 2015

A synteny-based draft genome sequence of the forage grass Lolium perenne.

Plant J 2015 Nov;84(4):816-26

Department of Molecular Biology, Genetics, Aarhus University, Forsøgsvej 1, Slagelse, 4200, Denmark.

Here we report the draft genome sequence of perennial ryegrass (Lolium perenne), an economically important forage and turf grass species that is widely cultivated in temperate regions worldwide. It is classified along with wheat, barley, oats and Brachypodium distachyon in the Pooideae sub-family of the grass family (Poaceae). Transcriptome data was used to identify 28,455 gene models, and we utilized macro-co-linearity between perennial ryegrass and barley, and synteny within the grass family, to establish a synteny-based linear gene order. The gametophytic self-incompatibility mechanism enables the pistil of a plant to reject self-pollen and therefore promote out-crossing. We have used the sequence assembly to characterize transcriptional changes in the stigma during pollination with both compatible and incompatible pollen. Characterization of the pollen transcriptome identified homologs to pollen allergens from a range of species, many of which were expressed to very high levels in mature pollen grains, and are potentially involved in the self-incompatibility mechanism. The genome sequence provides a valuable resource for future breeding efforts based on genomic prediction, and will accelerate the development of new varieties for more productive grasslands.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/tpj.13037DOI Listing
November 2015

NanoOK: multi-reference alignment analysis of nanopore sequencing data, quality and error profiles.

Bioinformatics 2016 Jan 17;32(1):142-4. Epub 2015 Sep 17.

The Genome Analysis Centre (TGAC), Norwich NR4 7UH, UK.

Motivation: The Oxford Nanopore MinION sequencer, currently in pre-release testing through the MinION Access Programme (MAP), promises long reads in real-time from an inexpensive, compact, USB device. Tools have been released to extract FASTA/Q from the MinION base calling output and to provide basic yield statistics. However, no single tool yet exists to provide comprehensive alignment-based quality control and error profile analysis--something that is extremely important given the speed with which the platform is evolving.

Results: NanoOK generates detailed tabular and graphical output plus an in-depth multi-page PDF report including error profile, quality and yield data. NanoOK is multi-reference, enabling detailed analysis of metagenomic or multiplexed samples. Four popular Nanopore aligners are supported and it is easily extensible to include others.

Availability And Implementation: NanoOK is an open-source software, implemented in Java with supporting R scripts. It has been tested on Linux and Mac OS X and can be downloaded from https://github.com/TGAC/NanoOK. A VirtualBox VM containing all dependencies and the DH10B read set used in this article is available from http://opendata.tgac.ac.uk/nanook/. A Docker image is also available from Docker Hub--see program documentation https://documentation.tgac.ac.uk/display/NANOOK.

Contact: [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btv540DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681994PMC
January 2016

Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data.

PLoS One 2015 22;10(6):e0129059. Epub 2015 Jun 22.

Lab of Viral Zoonotics, Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB30ES, United Kingdom.

The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0129059PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4476701PMC
March 2016

Characterization of repetitive DNA landscape in wheat homeologous group 4 chromosomes.

BMC Genomics 2015 May 12;16:375. Epub 2015 May 12.

CERZOS (CCT - CONICET Bahía Blanca) and Universidad Nacional del Sur, Bahía Blanca, Argentina.

Background: The number and complexity of repetitive elements varies between species, being in general most represented in those with larger genomes. Combining the flow-sorted chromosome arms approach to genome analysis with second generation DNA sequencing technologies provides a unique opportunity to study the repetitive portion of each chromosome, enabling comparisons among them. Additionally, different sequencing approaches may produce different depth of insight to repeatome content and structure. In this work we analyze and characterize the repetitive sequences of Triticum aestivum cv. Chinese Spring homeologous group 4 chromosome arms, obtained through Roche 454 and Illumina sequencing technologies, hereinafter marked by subscripts 454 and I, respectively. Repetitive sequences were identified with the RepeatMasker software using the interspersed repeat database mips-REdat_v9.0p. The input sequences consisted of our 4DS454 and 4DL454 scaffolds and 4ASI, 4ALI, 4BSI, 4BLI, 4DSI and 4DLI contigs, downloaded from the International Wheat Genome Sequencing Consortium (IWGSC).

Results: Repetitive sequences content varied from 55% to 63% for all chromosome arm assemblies except for 4DLI, in which the repeat content was 38%. Transposable elements, small RNA, satellites, simple repeats and low complexity sequences were analyzed. SSR frequency was found one per 24 to 27 kb for all chromosome assemblies except 4DLI, where it was three times higher. Dinucleotides and trinucleotides were the most abundant SSR repeat units. (GA)n/(TC)n was the most abundant SSR except for 4DLI where the most frequently identified SSR was (CCG/CGG)n. Retrotransposons followed by DNA transposons were the most highly represented sequence repeats, mainly composed of CACTA/En-Spm and Gypsy superfamilies, respectively. This whole chromosome sequence analysis allowed identification of three new LTR retrotransposon families belonging to the Copia superfamily, one belonging to the Gypsy superfamily and two TRIM retrotransposon families. Their physical distribution in wheat genome was analyzed by fluorescent in situ hybridization (FISH) and one of them, the Carmen retrotransposon, was found specific for centromeric regions of all wheat chromosomes.

Conclusion: The presented work is the first deep report of wheat repetitive sequences analyzed at the chromosome arm level, revealing the first insight into the repeatome of T. aestivum chromosomes of homeologous group 4.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-015-1579-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4440537PMC
May 2015

New insights into the wheat chromosome 4D structure and virtual gene order, revealed by survey pyrosequencing.

Plant Sci 2015 Apr 18;233:200-212. Epub 2014 Dec 18.

CERZOS (Centro de Recursos Naturales Renovables de la Zona Semiárida), (CCT-CONICET-Bahía Blanca) and Universidad Nacional del Sur, Bahía Blanca, Buenos Aires, Argentina. Electronic address:

Survey sequencing of the bread wheat (Triticum aestivum L.) genome (AABBDD) has been approached through different strategies delivering important information. However, the current wheat sequence knowledge is not complete. The aim of our study is to provide different and complementary set of data for chromosome 4D. A survey sequence was obtained by pyrosequencing of flow-sorted 4DS (7.2×) and 4DL (4.1×) arms. Single ends (SE) and long mate pairs (LMP) reads were assembled into contigs (223Mb) and scaffolds (65Mb) that were aligned to Aegilops tauschii draft genome (DD), anchoring 34Mb to chromosome 4. Scaffolds annotation rendered 822 gene models. A virtual gene order comprising 1973 wheat orthologous gene loci and 381 wheat gene models was built. This order was largely consistent with the scaffold order determined based on a published high density map from the Ae. tauschii chromosome 4, using bin-mapped 4D ESTs as a common reference. The virtual order showed a higher collinearity with homeologous 4B compared to 4A. Additionally, a virtual map was constructed and ∼5700 genes (∼2200 on 4DS and ∼3500 on 4DL) predicted. The sequence and virtual order obtained here using the 454 platform were compared with the Illumina one used by the IWGSC, giving complementary information.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.plantsci.2014.12.004DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4352925PMC
April 2015

PolyMarker: A fast polyploid primer design pipeline.

Bioinformatics 2015 Jun 2;31(12):2038-9. Epub 2015 Feb 2.

The Genome Analysis Centre (TGAC), Norwich Research Park, Norwich NR4 7UH, UK, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK and National Institute of Agricultural Botany (NIAB), Cambridge CB3 0LE, UK.

Unlabelled: The design of genetic markers is of particular relevance in crop breeding programs. Despite many economically important crops being polyploid organisms, the current primer design tools are tailored for diploid species. Bread wheat, for instance, is a hexaploid comprising of three related genomes and the performance of genetic markers is diminished if the primers are not genome specific. PolyMarker is a pipeline that generates SNP markers by selecting candidate primers for a specified genome using local alignments and standard primer design tools to test the viability of the primers. A command line tool and a web interface are available to the community.

Availability And Implementation: PolyMarker is available as a ruby BioGem: bio-polyploid-tools. Web interface: http://polymarker.tgac.ac.uk.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btv069DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765872PMC
June 2015

RNA-Seq bulked segregant analysis enables the identification of high-resolution genetic markers for breeding in hexaploid wheat.

Plant Biotechnol J 2015 Jun 8;13(5):613-24. Epub 2014 Nov 8.

John Innes Centre, Norwich, UK.

The identification of genetic markers linked to genes of agronomic importance is a major aim of crop research and breeding programmes. Here, we identify markers for Yr15, a major disease resistance gene for wheat yellow rust, using a segregating F2 population. After phenotyping, we implemented RNA sequencing (RNA-Seq) of bulked pools to identify single-nucleotide polymorphisms (SNP) associated with Yr15. Over 27 000 genes with SNPs were identified between the parents, and then classified based on the results from the sequenced bulks. We calculated the bulk frequency ratio (BFR) of SNPs between resistant and susceptible bulks, selecting those showing sixfold enrichment/depletion in the corresponding bulks (BFR > 6). Using additional filtering criteria, we reduced the number of genes with a putative SNP to 175. The 35 SNPs with the highest BFR values were converted into genome-specific KASP assays using an automated bioinformatics pipeline (PolyMarker) which circumvents the limitations associated with the polyploid wheat genome. Twenty-eight assays were polymorphic of which 22 (63%) mapped in the same linkage group as Yr15. Using these markers, we mapped Yr15 to a 0.77-cM interval. The three most closely linked SNPs were tested across varieties and breeding lines representing UK elite germplasm. Two flanking markers were diagnostic in over 99% of lines tested, thus providing a reliable haplotype for marker-assisted selection in these breeding programmes. Our results demonstrate that the proposed methodology can be applied in polyploid F2 populations to generate high-resolution genetic maps across target intervals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/pbi.12281DOI Listing
June 2015

NextClip: an analysis and read preparation tool for Nextera Long Mate Pair libraries.

Bioinformatics 2014 Feb 2;30(4):566-8. Epub 2013 Dec 2.

The Genome Analysis Centre (TGAC), Norwich Research Park, Norwich NR4 7UH, UK.

Summary: Illumina's recently released Nextera Long Mate Pair (LMP) kit enables production of jumping libraries of up to 12 kb. The LMP libraries are an invaluable resource for carrying out complex assemblies and other downstream bioinformatics analyses such as the characterization of structural variants. However, LMP libraries are intrinsically noisy and to maximize their value, post-sequencing data analysis is required. Standardizing laboratory protocols and the selection of sequenced reads for downstream analysis are non-trivial tasks. NextClip is a tool for analyzing reads from LMP libraries, generating a comprehensive quality report and extracting good quality trimmed and deduplicated reads.

Availability And Implementation: Source code, user guide and example data are available from https://github.com/richardmleggett/nextclip/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt702DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3928519PMC
February 2014

The zebrafish reference genome sequence and its relationship to the human genome.

Nature 2013 Apr 17;496(7446):498-503. Epub 2013 Apr 17.

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature12111DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3703927PMC
April 2013

Crowdsourcing genomic analyses of ash and ash dieback - power to the people.

Gigascience 2013 Feb 12;2(1). Epub 2013 Feb 12.

The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH, UK.

Ash dieback is a devastating fungal disease of ash trees that has swept across Europe and recently reached the UK. This emergent pathogen has received little study in the past and its effect threatens to overwhelm the ash population. In response to this we have produced some initial genomics datasets and taken the unusual step of releasing them to the scientific community for analysis without first performing our own. In this manner we hope to 'crowdsource' analyses and bring the expertise of the community to bear on this problem as quickly as possible. Our data has been released through our website at oadb.tsl.ac.uk and a public GitHub repository.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/2047-217X-2-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3626535PMC
February 2013
-->