Publications by authors named "Pradeep Ruperao"

18 Publications

  • Page 1 of 1

Sorghum Pan-Genome Explores the Functional Utility for Genomic-Assisted Breeding to Accelerate the Genetic Gain.

Front Plant Sci 2021 1;12:666342. Epub 2021 Jun 1.

International Crops Research Institute for the Semi-Arid Tropics, Patancheru, India.

Sorghum ( L.) is a staple food crops in the arid and rainfed production ecologies. Sorghum plays a critical role in resilient farming and is projected as a smart crop to overcome the food and nutritional insecurity in the developing world. The development and characterisation of the sorghum pan-genome will provide insight into genome diversity and functionality, supporting sorghum improvement. We built a sorghum pan-genome using reference genomes as well as 354 genetically diverse sorghum accessions belonging to different races. We explored the structural and functional characteristics of the pan-genome and explain its utility in supporting genetic gain. The newly-developed pan-genome has a total of 35,719 genes, a core genome of 16,821 genes and an average of 32,795 genes in each cultivar. The variable genes are enriched with environment responsive genes and classify the sorghum accessions according to their race. We show that 53% of genes display presence-absence variation, and some of these variable genes are predicted to be functionally associated with drought adaptation traits. Using more than two million SNPs from the pan-genome, association analysis identified 398 SNPs significantly associated with important agronomic traits, of which, 92 were in genes. Drought gene expression analysis identified 1,788 genes that are functionally linked to different conditions, of which 79 were absent from the reference genome assembly. This study provides comprehensive genomic diversity resources in sorghum which can be used in genome assisted crop improvement.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fpls.2021.666342DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8204017PMC
June 2021

Trait associations in the pangenome of pigeon pea (Cajanus cajan).

Plant Biotechnol J 2020 09 12;18(9):1946-1954. Epub 2020 Mar 12.

International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India.

Pigeon pea (Cajanus cajan) is an important orphan crop mainly grown by smallholder farmers in India and Africa. Here, we present the first pigeon pea pangenome based on 89 accessions mainly from India and the Philippines, showing that there is significant genetic diversity in Philippine individuals that is not present in Indian individuals. Annotation of variable genes suggests that they are associated with self-fertilization and response to disease. We identified 225 SNPs associated with nine agronomically important traits over three locations and two different time points, with SNPs associated with genes for transcription factors and kinases. These results will lead the way to an improved pigeon pea breeding programme.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/pbi.13354DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7415775PMC
September 2020

Investigating Drought Tolerance in Chickpea Using Genome-Wide Association Mapping and Genomic Selection Based on Whole-Genome Resequencing Data.

Front Plant Sci 2018 19;9:190. Epub 2018 Feb 19.

School of Agriculture, Food and Wine, The University of Adelaide, Adelaide, SA, Australia.

Drought tolerance is a complex trait that involves numerous genes. Identifying key causal genes or linked molecular markers can facilitate the fast development of drought tolerant varieties. Using a whole-genome resequencing approach, we sequenced 132 chickpea varieties and advanced breeding lines and found more than 144,000 single nucleotide polymorphisms (SNPs). We measured 13 yield and yield-related traits in three drought-prone environments of Western Australia. The genotypic effects were significant for all traits, and many traits showed highly significant correlations, ranging from 0.83 between grain yield and biomass to -0.67 between seed weight and seed emergence rate. To identify candidate genes, the SNP and trait data were incorporated into the SUPER genome-wide association study (GWAS) model, a modified version of the linear mixed model. We found that several SNPs from auxin-related genes, including auxin efflux carrier protein (PIN3), p-glycoprotein, and nodulin MtN21/EamA-like transporter, were significantly associated with yield and yield-related traits under drought-prone environments. We identified four genetic regions containing SNPs significantly associated with several different traits, which was an indication of pleiotropic effects. We also investigated the possibility of incorporating the GWAS results into a genomic selection (GS) model, which is another approach to deal with complex traits. Compared to using all SNPs, application of the GS model using subsets of SNPs significantly associated with the traits under investigation increased the prediction accuracies of three yield and yield-related traits by more than twofold. This has important implication for implementing GS in plant breeding programs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fpls.2018.00190DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5825913PMC
February 2018

Genome Analysis Identified Novel Candidate Genes for Ascochyta Blight Resistance in Chickpea Using Whole Genome Re-sequencing Data.

Front Plant Sci 2017 17;8:359. Epub 2017 Mar 17.

School of Agriculture, Food and Wine, University of Adelaide, AdelaideSA, Australia; South Australian Research and Development Institute, UrrbraeSA, Australia.

Ascochyta blight (AB) is a fungal disease that can significantly reduce chickpea production in Australia and other regions of the world. In this study, 69 chickpea genotypes were sequenced using whole genome re-sequencing (WGRS) methods. They included 48 Australian varieties differing in their resistance ranking to AB, 16 advanced breeding lines from the Australian chickpea breeding program, four landraces, and one accession representing the wild chickpea species . More than 800,000 single nucleotide polymorphisms (SNPs) were identified. Population structure analysis revealed relatively narrow genetic diversity amongst recently released Australian varieties and two groups of varieties separated by the level of AB resistance. Several regions of the chickpea genome were under positive selection based on Tajima's test. Both Fst genome- scan and genome-wide association studies (GWAS) identified a 100 kb region (AB4.1) on chromosome 4 that was significantly associated with AB resistance. The AB4.1 region co-located to a large QTL interval of 7 Mb∼30 Mb identified previously in three different mapping populations which were genotyped at relatively low density with SSR or SNP markers. The AB4.1 region was validated by GWAS in an additional collection of 132 advanced breeding lines from the Australian chickpea breeding program, genotyped with approximately 144,000 SNPs. The reduced level of nucleotide diversity and long extent of linkage disequilibrium also suggested the AB4.1 region may have gone through selective sweeps probably caused by selection of the AB resistance trait in breeding. In total, 12 predicted genes were located in the AB4.1 QTL region, including those annotated as: NBS-LRR receptor-like kinase, wall-associated kinase, zinc finger protein, and serine/threonine protein kinases. One significant SNP located in the conserved catalytic domain of a NBS-LRR receptor-like kinase led to amino acid substitution. Transcriptional analysis using qPCR showed that some predicted genes were significantly induced in resistant lines after inoculation compared to non-inoculated plants. This study demonstrates the power of combining WGRS data with relatively simple traits to rapidly develop "functional makers" for marker-assisted selection and genomic selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fpls.2017.00359DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5355423PMC
March 2017

An efficient approach to BAC based assembly of complex genomes.

Plant Methods 2016 20;12. Epub 2016 Jan 20.

School of Agriculture and Food Science, University of Queensland, Brisbane, QLD 4072 Australia ; School of Plant Biology, University of Western Australia, Perth, WA 6009 Australia.

Background: There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success.

Results: We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes.

Conclusions: We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13007-016-0107-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4719536PMC
January 2016

TAG Sequence Identification of Genomic Regions Using TAGdb.

Authors:
Pradeep Ruperao

Methods Mol Biol 2016 ;1374:233-40

School of Agriculture and Food Sciences, University of Queensland, Hartley Teakle Building 83, St. Lucia, QLD, 4072, Australia.

Second-generation sequencing (SGS) technology has enabled the sequencing of genomes and identification of genes. However, large complex plant genomes remain particularly difficult for de novo assembly. Access to the vast quantity of raw sequence data may facilitate discoveries; however the volume of this data makes access difficult. This chapter discusses the Web-based tool TAGdb that enables researchers to identify paired read second-generation DNA sequence data that share identity with a submitted query sequence. The identified reads can be used for PCR amplification of genomic regions to identify genes and promoters without the need for genome assembly.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-3167-5_12DOI Listing
May 2016

Prioritization of candidate genes in "QTL-hotspot" region for drought tolerance in chickpea (Cicer arietinum L.).

Sci Rep 2015 Oct 19;5:15296. Epub 2015 Oct 19.

International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Center of Excellence in Genomics (CEG), Hyderabad, 502324, India.

A combination of two approaches, namely QTL analysis and gene enrichment analysis were used to identify candidate genes in the "QTL-hotspot" region for drought tolerance present on the Ca4 pseudomolecule in chickpea. In the first approach, a high-density bin map was developed using 53,223 single nucleotide polymorphisms (SNPs) identified in the recombinant inbred line (RIL) population of ICC 4958 (drought tolerant) and ICC 1882 (drought sensitive) cross. QTL analysis using recombination bins as markers along with the phenotyping data for 17 drought tolerance related traits obtained over 1-5 seasons and 1-5 locations split the "QTL-hotspot" region into two subregions namely "QTL-hotspot_a" (15 genes) and "QTL-hotspot_b" (11 genes). In the second approach, gene enrichment analysis using significant marker trait associations based on SNPs from the Ca4 pseudomolecule with the above mentioned phenotyping data, and the candidate genes from the refined "QTL-hotspot" region showed enrichment for 23 genes. Twelve genes were found common in both approaches. Functional validation using quantitative real-time PCR (qRT-PCR) indicated four promising candidate genes having functional implications on the effect of "QTL-hotspot" for drought tolerance in chickpea.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep15296DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4609953PMC
October 2015

CicArVarDB: SNP and InDel database for advancing genetics research and breeding applications in chickpea.

Database (Oxford) 2015 19;2015. Epub 2015 Aug 19.

Research Program Grain Legumes, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, Telangana State, India, School of Plant Biology, The University of Western Australia, Perth, Western Australia, Australia 6009 and

Molecular markers are valuable tools for breeders to help accelerate crop improvement. High throughput sequencing technologies facilitate the discovery of large-scale variations such as single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs). Sequencing of chickpea genome along with re-sequencing of several chickpea lines has enabled the discovery of 4.4 million variations including SNPs and InDels. Here we report a repository of 1.9 million variations (SNPs and InDels) anchored on eight pseudomolecules in a custom database, referred as CicArVarDB that can be accessed at http://cicarvardb.icrisat.org/. It includes an easy interface for users to select variations around specific regions associated with quantitative trait loci, with embedded webBLAST search and JBrowse visualisation. We hope that this database will be immensely useful for the chickpea research community for both advancing genetics research as well as breeding applications for crop improvement. Database URL: http://cicarvardb.icrisat.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bav078DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4541373PMC
May 2016

High-resolution skim genotyping by sequencing reveals the distribution of crossovers and gene conversions in Cicer arietinum and Brassica napus.

Theor Appl Genet 2015 Jun 10;128(6):1039-47. Epub 2015 Mar 10.

School of Agriculture and Food Sciences, University of Queensland, Brisbane, 4072, Australia.

Key Message: We characterise the distribution of crossover and non-crossover recombination in Brassica napus and Cicer arietinum using a low-coverage genotyping by sequencing pipeline SkimGBS. The growth of next-generation DNA sequencing technologies has led to a rapid increase in sequence-based genotyping for applications including diversity assessment, genome structure validation and gene-trait association. We have established a skim-based genotyping by sequencing method for crop plants and applied this approach to genotype-segregating populations of Brassica napus and Cicer arietinum. Comparison of progeny genotypes with those of the parental individuals allowed the identification of crossover and non-crossover (gene conversion) events. Our results identify the positions of recombination events with high resolution, permitting the mapping and frequency assessment of recombination in segregating populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00122-015-2488-yDOI Listing
June 2015

Scanning the effects of ethyl methanesulfonate on the whole genome of Lotus japonicus using second-generation sequencing analysis.

G3 (Bethesda) 2015 Feb 6;5(4):559-67. Epub 2015 Feb 6.

Centre of Integrative Legume Research, School of Agriculture and Food Science, The University of Queensland, St Lucia, Brisbane QLD 4072, Australia School of Plant Biology, University of Western Australia, Crawley, WA 6009, Australia.

Genetic structure can be altered by chemical mutagenesis, which is a common method applied in molecular biology and genetics. Second-generation sequencing provides a platform to reveal base alterations occurring in the whole genome due to mutagenesis. A model legume, Lotus japonicus ecotype Miyakojima, was chemically mutated with alkylating ethyl methanesulfonate (EMS) for the scanning of DNA lesions throughout the genome. Using second-generation sequencing, two individually mutated third-generation progeny (M3, named AM and AS) were sequenced and analyzed to identify single nucleotide polymorphisms and reveal the effects of EMS on nucleotide sequences in these mutant genomes. Single-nucleotide polymorphisms were found in every 208 kb (AS) and 202 kb (AM) with a bias mutation of G/C-to-A/T changes at low percentage. Most mutations were intergenic. The mutation spectrum of the genomes was comparable in their individual chromosomes; however, each mutated genome has unique alterations, which are useful to identify causal mutations for their phenotypic changes. The data obtained demonstrate that whole genomic sequencing is applicable as a high-throughput tool to investigate genomic changes due to mutagenesis. The identification of these single-point mutations will facilitate the identification of phenotypically causative mutations in EMS-mutated germplasm.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/g3.114.014571DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4390572PMC
February 2015

Bioinformatics: identification of markers from next-generation sequence data.

Methods Mol Biol 2015 ;1245:29-47

School of Agriculture and Food Sciences, University of Queensland, Brisbane, QLD, Australia.

With the advent of sequencing technology, next-generation sequencing (NGS) technology has dramatically revolutionized plant genomics. NGS technology combined with new software tools enables the discovery, validation, and assessment of genetic markers on a large scale. Among different markers systems, simple sequence repeats (SSRs) and Single nucleotide polymorphisms (SNPs) are the markers of choice for genetics and plant breeding. SSR markers have been a choice for large-scale characterization of germplasm collections, construction of genetic maps, and QTL identification. Similarly, SNPs are the most abundant genetic variations with higher frequencies throughout the genome of plant species. This chapter discusses various tools available for genome assembly and widely focuses on SSR and SNP marker discovery.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-1966-6_3DOI Listing
June 2015

Identification and characterization of more than 4 million intervarietal SNPs across the group 7 chromosomes of bread wheat.

Plant Biotechnol J 2015 Jan 22;13(1):97-104. Epub 2014 Aug 22.

School of Agriculture and Food Sciences, University of Queensland, Brisbane, Qld, Australia; Australian Centre for Plant Functional Genomics, University of Queensland, Brisbane, Qld, Australia.

Despite being a major international crop, our understanding of the wheat genome is relatively poor due to its large size and complexity. To gain a greater understanding of wheat genome diversity, we have identified single nucleotide polymorphisms between 16 Australian bread wheat varieties. Whole-genome shotgun Illumina paired read sequence data were mapped to the draft assemblies of chromosomes 7A, 7B and 7D to identify more than 4 million intervarietal SNPs. SNP density varied between the three genomes, with much greater density observed on the A and B genomes than the D genome. This variation may be a result of substantial gene flow from the tetraploid Triticum turgidum, which possesses A and B genomes, during early co-cultivation of tetraploid and hexaploid wheat. In addition, we examined SNP density variation along the chromosome syntenic builds and identified genes in low-density regions which may have been selected during domestication and breeding. This study highlights the impact of evolution and breeding on the bread wheat genome and provides a substantial resource for trait association and crop improvement. All SNP data are publically available on a generic genome browser GBrowse at www.wheatgenome.info.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/pbi.12240DOI Listing
January 2015

Genome-wide delineation of natural variation for pod shatter resistance in Brassica napus.

PLoS One 2014 9;9(7):e101673. Epub 2014 Jul 9.

Graham Centre for Agricultural Innovation (an alliance between NSW Department of Primary Industries and Charles Sturt University), Wagga Wagga Agricultural Institute, Wagga Wagga, NSW, Australia.

Resistance to pod shattering (shatter resistance) is a target trait for global rapeseed (canola, Brassica napus L.), improvement programs to minimise grain loss in the mature standing crop, and during windrowing and mechanical harvest. We describe the genetic basis of natural variation for shatter resistance in B. napus and show that several quantitative trait loci (QTL) control this trait. To identify loci underlying shatter resistance, we used a novel genotyping-by-sequencing approach DArT-Seq. QTL analysis detected a total of 12 significant QTL on chromosomes A03, A07, A09, C03, C04, C06, and C08; which jointly account for approximately 57% of the genotypic variation in shatter resistance. Through Genome-Wide Association Studies, we show that a large number of loci, including those that are involved in shattering in Arabidopsis, account for variation in shatter resistance in diverse B. napus germplasm. Our results indicate that genetic diversity for shatter resistance genes in B. napus is limited; many of the genes that might control this trait were not included during the natural creation of this species, or were not retained during the domestication and selection process. We speculate that valuable diversity for this trait was lost during the natural creation of B. napus. To improve shatter resistance, breeders will need to target the introduction of useful alleles especially from genotypes of other related species of Brassica, such as those that we have identified.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0101673PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4090071PMC
March 2015

An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

PLoS One 2014 8;9(7):e101754. Epub 2014 Jul 8.

Centre of Excellence in Genomics, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India.

Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0101754PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086967PMC
March 2015

The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes.

Nat Commun 2014 May 23;5:3930. Epub 2014 May 23.

Beijing Genome Institute-Shenzhen, Shenzhen 518083, China.

Polyploidization has provided much genetic variation for plant adaptive evolution, but the mechanisms by which the molecular evolution of polyploid genomes establishes genetic architecture underlying species differentiation are unclear. Brassica is an ideal model to increase knowledge of polyploid evolution. Here we describe a draft genome sequence of Brassica oleracea, comparing it with that of its sister species B. rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks, asymmetrical amplification of transposable elements, differential gene co-retention for specific pathways and variation in gene expression, including alternative splicing, among a large number of paralogous and orthologous genes. Genes related to the production of anticancer phytochemicals and morphological variations illustrate consequences of genome duplication and gene divergence, imparting biochemical and morphological variation to B. oleracea. This study provides insights into Brassica genome evolution and will underpin research into the many important crops in this genus.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms4930DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4279128PMC
May 2014

A chromosomal genomics approach to assess and validate the desi and kabuli draft chickpea genome assemblies.

Plant Biotechnol J 2014 Aug 5;12(6):778-86. Epub 2014 Apr 5.

University of Queensland, St. Lucia, Queensland, Australia; Australian Centre for Plant Functional Genomics, University of Queensland, St. Lucia, Queensland, Australia; International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Andhra Pradesh, India.

With the expansion of next-generation sequencing technology and advanced bioinformatics, there has been a rapid growth of genome sequencing projects. However, while this technology enables the rapid and cost-effective assembly of draft genomes, the quality of these assemblies usually falls short of gold standard genome assemblies produced using the more traditional BAC by BAC and Sanger sequencing approaches. Assembly validation is often performed by the physical anchoring of genetically mapped markers, but this is prone to errors and the resolution is usually low, especially towards centromeric regions where recombination is limited. New approaches are required to validate reference genome assemblies. The ability to isolate individual chromosomes combined with next-generation sequencing permits the validation of genome assemblies at the chromosome level. We demonstrate this approach by the assessment of the recently published chickpea kabuli and desi genomes. While previous genetic analysis suggests that these genomes should be very similar, a comparison of their chromosome sizes and published assemblies highlights significant differences. Our chromosomal genomics analysis highlights short defined regions that appear to have been misassembled in the kabuli genome and identifies large-scale misassembly in the draft desi genome. The integration of chromosomal genomics tools within genome sequencing projects has the potential to significantly improve the construction and validation of genome assemblies. The approach could be applied both for new genome assemblies as well as published assemblies, and complements currently applied genome assembly strategies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/pbi.12182DOI Listing
August 2014

Coverage-based consensus calling (CbCC) of short sequence reads and comparison of CbCC results to identify SNPs in chickpea (Cicer arietinum; Fabaceae), a crop species without a reference genome.

Am J Bot 2012 Feb 1;99(2):186-92. Epub 2012 Feb 1.

Centre of Excellence in Genomics, International Crops Research Institute for the Semi-Arid Tropics, Patancheru 502324, Andhra Pradesh, India.

Premise Of The Study: Next-generation sequencing (NGS) technologies are frequently used for resequencing and mining of single nucleotide polymorphisms (SNPs) by comparison to a reference genome. In crop species such as chickpea (Cicer arietinum) that lack a reference genome sequence, NGS-based SNP discovery is a challenge. Therefore, unlike probability-based statistical approaches for consensus calling and by comparison with a reference sequence, a coverage-based consensus calling (CbCC) approach was applied and two genotypes were compared for SNP identification.

Methods: A CbCC approach is used in this study with four commonly used short read alignment tools (Maq, Bowtie, Novoalign, and SOAP2) and 15.7 and 22.1 million Illumina reads for chickpea genotypes ICC4958 and ICC1882, together with the chickpea trancriptome assembly (CaTA).

Key Results: A nonredundant set of 4543 SNPs was identified between two chickpea genotypes. Experimental validation of 224 randomly selected SNPs showed superiority of Maq among individual tools, as 50.0% of SNPs predicted by Maq were true SNPs. For combinations of two tools, greatest accuracy (55.7%) was reported for Maq and Bowtie, with a combination of Bowtie, Maq, and Novoalign identifying 61.5% true SNPs. SNP prediction accuracy generally increased with increasing reads depth.

Conclusions: This study provides a benchmark comparison of tools as well as read depths for four commonly used tools for NGS SNP discovery in a crop species without a reference genome sequence. In addition, a large number of SNPs have been identified in chickpea that would be useful for molecular breeding.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3732/ajb.1100419DOI Listing
February 2012

Discovery of Single Nucleotide Polymorphisms in Complex Genomes Using SGSautoSNP.

Biology (Basel) 2012 Aug 27;1(2):370-82. Epub 2012 Aug 27.

Australian Centre for Plant Functional Genomics, School of Agriculture and Food Science, University of Queensland, Brisbane, QLD 4072, Australia.

Single nucleotide polymorphisms (SNPs) are becoming the dominant form of molecular marker for genetic and genomic analysis. The advances in second generation DNA sequencing provide opportunities to identify very large numbers of SNPs in a range of species. However, SNP identification remains a challenge for large and polyploid genomes due to their size and complexity. We have developed a pipeline for the robust identification of SNPs in large and complex genomes using Illumina second generation DNA sequence data and demonstrated this by the discovery of SNPs in the hexaploid wheat genome. We have developed a SNP discovery pipeline called SGSautoSNP (Second-Generation Sequencing AutoSNP) and applied this to discover more than 800,000 SNPs between four hexaploid wheat cultivars across chromosomes 7A, 7B and 7D. All SNPs are presented for download and viewing within a public GBrowse database. Validation suggests an accuracy of greater than 93% of SNPs represent polymorphisms between wheat cultivars and hence are valuable for detailed diversity analysis, marker assisted selection and genotyping by sequencing. The pipeline produces output in GFF3, VCF, Flapjack or Illumina Infinium design format for further genotyping diverse populations. As well as providing an unprecedented resource for wheat diversity analysis, the method establishes a foundation for high resolution SNP discovery in other large and complex genomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/biology1020370DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4009776PMC
August 2012
-->