Publications by authors named "Lakshmi K Matukumalli"

25 Publications

  • Page 1 of 1

High density LD-based structural variations analysis in cattle genome.

PLoS One 2014 22;9(7):e103046. Epub 2014 Jul 22.

Laboratory of Bioinformatics and Biofotonics, Engineering Institute, Autonomous University of Baja California, Baja California, Mexico.

Genomic structural variations represent an important source of genetic variation in mammal genomes, thus, they are commonly related to phenotypic expressions. In this work, ∼ 770,000 single nucleotide polymorphism genotypes from 506 animals from 19 cattle breeds were analyzed. A simple LD-based structural variation was defined, and a genome-wide analysis was performed. After applying some quality control filters, for each breed and each chromosome we calculated the linkage disequilibrium (r2) of short range (≤ 100 Kb). We sorted SNP pairs by distance and obtained a set of LD means (called the expected means) using bins of 5 Kb. We identified 15,246 segments of at least 1 Kb, among the 19 breeds, consisting of sets of at least 3 adjacent SNPs so that, for each SNP, r2 within its neighbors in a 100 Kb range, to the right side of that SNP, were all bigger than, or all smaller than, the corresponding expected mean, and their P-value were significant after a Benjamini-Hochberg multiple testing correction. In addition, to account just for homogeneously distributed regions we considered only SNPs having at least 15 SNP neighbors within 100 Kb. We defined such segments as structural variations. By grouping all variations across all animals in the sample we defined 9,146 regions, involving a total of 53,137 SNPs; representing the 6.40% (160.98 Mb) from the bovine genome. The identified structural variations covered 3,109 genes. Clustering analysis showed the relatedness of breeds given the geographic region in which they are evolving. In summary, we present an analysis of structural variations based on the deviation of the expected short range LD between SNPs in the bovine genome. With an intuitive and simple definition based only on SNPs data it was possible to discern closeness of breeds due to grouping by geographic region in which they are evolving.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0103046PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4106904PMC
November 2015

Copy number variation of individual cattle genomes using next-generation sequencing.

Genome Res 2012 Apr 2;22(4):778-90. Epub 2012 Feb 2.

USDA-ARS, ANRI, Bovine Functional Genomics Laboratory, Beltsville, Maryland 20705, USA.

Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one Holstein, and one Hereford) and one indicine (Nelore) cattle. Within mapped chromosomal sequence, we identified 1265 CNV regions comprising ~55.6-Mbp sequence--476 of which (~38%) have not previously been reported. We validated this sequence-based CNV call set with array comparative genomic hybridization (aCGH), quantitative PCR (qPCR), and fluorescent in situ hybridization (FISH), achieving a validation rate of 82% and a false positive rate of 8%. We further estimated absolute copy numbers for genomic segments and annotated genes in each individual. Surveys of the top 25 most variable genes revealed that the Nelore individual had the lowest copy numbers in 13 cases (~52%, χ(2) test; P-value <0.05). In contrast, genes related to pathogen- and parasite-resistance, such as CATHL4 and ULBP17, were highly duplicated in the Nelore individual relative to the taurine cattle, while genes involved in lipid transport and metabolism, including APOL3 and FABP2, were highly duplicated in the beef breeds. These CNV regions also harbor genes like BPIFA2A (BSP30A) and WC1, suggesting that some CNVs may be associated with breed-specific differences in adaptation, health, and production traits. By providing the first individualized cattle CNV and segmental duplication maps and genome-wide gene copy number estimates, we enable future CNV studies into highly duplicated regions in the cattle genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.133967.111DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3317159PMC
April 2012

Genomic regions showing copy number variations associate with resistance or susceptibility to gastrointestinal nematodes in Angus cattle.

Funct Integr Genomics 2012 Mar 18;12(1):81-92. Epub 2011 Sep 18.

Bovine Functional Genomics Laboratory, ANRI, USDA-ARS, Building 200, Room 124B, BARC-East, Beltsville, MD 20705, USA.

Genomic structural variation is an important and abundant source of genetic and phenotypic variation. We previously reported an initial analysis of copy number variations (CNVs) in Angus cattle selected for resistance or susceptibility to gastrointestinal nematodes. In this study, we performed a large-scale analysis of CNVs using SNP genotyping data from 472 animals of the same population. We detected 811 candidate CNV regions, which represent 141.8 Mb (~4.7%) of the genome. To investigate the functional impacts of CNVs, we created 2 groups of 100 individual animals with extremely low or high estimated breeding values of eggs per gram of feces and referred to these groups as parasite resistant (PR) or parasite susceptible (PS), respectively. We identified 297 (~51 Mb) and 282 (~48 Mb) CNV regions from PR and PS groups, respectively. Approximately 60% of the CNV regions were specific to the PS group or PR group of animals. Selected PR- or PS-specific CNVs were further experimentally validated by quantitative PCR. A total of 297 PR CNV regions overlapped with 437 Ensembl genes enriched in immunity and defense, like WC1 gene which uniquely expresses on gamma/delta T cells in cattle. Network analyses indicated that the PR-specific genes were predominantly involved in gastrointestinal disease, immunological disease, inflammatory response, cell-to-cell signaling and interaction, lymphoid tissue development, and cell death. By contrast, the 282 PS CNV regions contained 473 Ensembl genes which are overrepresented in environmental interactions. Network analyses indicated that the PS-specific genes were particularly enriched for inflammatory response, immune cell trafficking, metabolic disease, cell cycle, and cellular organization and movement.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10142-011-0252-1DOI Listing
March 2012

Genome-wide association analysis of thirty one production, health, reproduction and body conformation traits in contemporary U.S. Holstein cows.

BMC Genomics 2011 Aug 11;12:408. Epub 2011 Aug 11.

Animal Improvement Programs Laboratory, Agricultural Research Service,USDA, Beltsville, Maryland, USA.

Background: Genome-wide association analysis is a powerful tool for annotating phenotypic effects on the genome and knowledge of genes and chromosomal regions associated with dairy phenotypes is useful for genome and gene-based selection. Here, we report results of a genome-wide analysis of predicted transmitting ability (PTA) of 31 production, health, reproduction and body conformation traits in contemporary Holstein cows.

Results: Genome-wide association analysis identified a number of candidate genes and chromosome regions associated with 31 dairy traits in contemporary U.S. Holstein cows. Highly significant genes and chromosome regions include: BTA13's GNAS region for milk, fat and protein yields; BTA7's INSR region and BTAX's LOC520057 and GRIA3 for daughter pregnancy rate, somatic cell score and productive life; BTA2's LRP1B for somatic cell score; BTA14's DGAT1-NIBP region for fat percentage; BTA1's FKBP2 for protein yields and percentage, BTA26's MGMT and BTA6's PDGFRA for protein percentage; BTA18's 53.9-58.7 Mb region for service-sire and daughter calving ease and service-sire stillbirth; BTA18's PGLYRP1-IGFL1 region for a large number of traits; BTA18's LOC787057 for service-sire stillbirth and daughter calving ease; BTA15's CD82, BTA23's DST and the MOCS1-LRFN2 region for daughter stillbirth; and BTAX's LOC520057 and GRIA3 for daughter pregnancy rate. For body conformation traits, BTA11, BTAX, BTA10, BTA5, and BTA26 had the largest concentrations of SNP effects, and PHKA2 of BTAX and REN of BTA16 had the most significant effects for body size traits. For body shape traits, BTAX, BTA19 and BTA3 were most significant. Udder traits were affected by BTA16, BTA22, BTAX, BTA2, BTA10, BTA11, BTA20, BTA22 and BTA25, teat traits were affected by BTA6, BTA7, BTA9, BTA16, BTA11, BTA26 and BTA17, and feet/legs traits were affected by BTA11, BTA13, BTA18, BTA20, and BTA26.

Conclusions: Genome-wide association analysis identified a number of genes and chromosome regions associated with 31 production, health, reproduction and body conformation traits in contemporary Holstein cows. The results provide useful information for annotating phenotypic effects on the dairy genome and for building consensus of dairy QTL effects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-12-408DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176260PMC
August 2011

Genomic characteristics of cattle copy number variations.

BMC Genomics 2011 Feb 23;12:127. Epub 2011 Feb 23.

Bovine Functional Genomics Laboratory, ANRI, USDA-ARS, Beltsville, Maryland 20705, USA.

Background: Copy number variation (CNV) represents another important source of genetic variation complementary to single nucleotide polymorphism (SNP). High-density SNP array data have been routinely used to detect human CNVs, many of which have significant functional effects on gene expression and human diseases. In the dairy industry, a large quantity of SNP genotyping results are becoming available and can be used for CNV discovery to understand and accelerate genetic improvement for complex traits.

Results: We performed a systematic analysis of CNV using the Bovine HapMap SNP genotyping data, including 539 animals of 21 modern cattle breeds and 6 outgroups. After correcting genomic waves and considering the pedigree information, we identified 682 candidate CNV regions, which represent 139.8 megabases (~4.60%) of the genome. Selected CNVs were further experimentally validated and we found that copy number "gain" CNVs were predominantly clustered in tandem rather than existing as interspersed duplications. Many CNV regions (~56%) overlap with cattle genes (1,263), which are significantly enriched for immunity, lactation, reproduction and rumination. The overlap of this new dataset and other published CNV studies was less than 40%; however, our discovery of large, high frequency (> 5% of animals surveyed) CNV regions showed 90% agreement with other studies. These results highlight the differences and commonalities between technical platforms.

Conclusions: We present a comprehensive genomic analysis of cattle CNVs derived from SNP data which will be a valuable genomic variation resource. Combined with SNP detection assays, gene-containing CNV regions may help identify genes undergoing artificial selection in domesticated animals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-12-127DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3053260PMC
February 2011

An atlas of bovine gene expression reveals novel distinctive tissue characteristics and evidence for improving genome annotation.

Genome Biol 2010 20;11(10):R102. Epub 2010 Oct 20.

USDA-ARS US Meat Animal Research Center, State Spur 18 D, Clay Center, NE 68901, USA.

Background: A comprehensive transcriptome survey, or gene atlas, provides information essential for a complete understanding of the genomic biology of an organism. We present an atlas of RNA abundance for 92 adult, juvenile and fetal cattle tissues and three cattle cell lines.

Results: The Bovine Gene Atlas was generated from 7.2 million unique digital gene expression tag sequences (300.2 million total raw tag sequences), from which 1.59 million unique tag sequences were identified that mapped to the draft bovine genome accounting for 85% of the total raw tag abundance. Filtering these tags yielded 87,764 unique tag sequences that unambiguously mapped to 16,517 annotated protein-coding loci in the draft genome accounting for 45% of the total raw tag abundance. Clustering of tissues based on tag abundance profiles generally confirmed ontology classification based on anatomy. There were 5,429 constitutively expressed loci and 3,445 constitutively expressed unique tag sequences mapping outside annotated gene boundaries that represent a resource for enhancing current gene models. Physical measures such as inferred transcript length or antisense tag abundance identified tissues with atypical transcriptional tag profiles. We report for the first time the tissue-specific variation in the proportion of mitochondrial transcriptional tag abundance.

Conclusions: The Bovine Gene Atlas is the deepest and broadest transcriptome survey of any livestock genome to date. Commonalities and variation in sense and antisense transcript tag profiles identified in different tissues facilitate the examination of the relationship between gene expression, tissue, and gene function.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2010-11-10-r102DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218658PMC
June 2011

Analysis of copy number variations among diverse cattle breeds.

Genome Res 2010 May 8;20(5):693-703. Epub 2010 Mar 8.

USDA-ARS, ANRI, Bovine Functional Genomics Laboratory, Beltsville, Maryland 20705, USA.

Genomic structural variation is an important and abundant source of genetic and phenotypic variation. Here, we describe the first systematic and genome-wide analysis of copy number variations (CNVs) in modern domesticated cattle using array comparative genomic hybridization (array CGH), quantitative PCR (qPCR), and fluorescent in situ hybridization (FISH). The array CGH panel included 90 animals from 11 Bos taurus, three Bos indicus, and three composite breeds for beef, dairy, or dual purpose. We identified over 200 candidate CNV regions (CNVRs) in total and 177 within known chromosomes, which harbor or are adjacent to gains or losses. These 177 high-confidence CNVRs cover 28.1 megabases or approximately 1.07% of the genome. Over 50% of the CNVRs (89/177) were found in multiple animals or breeds and analysis revealed breed-specific frequency differences and reflected aspects of the known ancestry of these cattle breeds. Selected CNVs were further validated by independent methods using qPCR and FISH. Approximately 67% of the CNVRs (119/177) completely or partially span cattle genes and 61% of the CNVRs (108/177) directly overlap with segmental duplications. The CNVRs span about 400 annotated cattle genes that are significantly enriched for specific biological functions, such as immunity, lactation, reproduction, and rumination. Multiple gene families, including ULBP, have gone through ruminant lineage-specific gene amplification. We detected and confirmed marked differences in their CNV frequencies across diverse breeds, indicating that some cattle CNVs are likely to arise independently in breeds and contribute to breed differences. Our results provide a valuable resource beyond microsatellites and single nucleotide polymorphisms to explore the full dimension of genetic variability for future cattle genomic research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.105403.110DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2860171PMC
May 2010

High-resolution haplotype block structure in the cattle genome.

BMC Genet 2009 Apr 24;10:19. Epub 2009 Apr 24.

Department of Bioinformatics and Computational Biology, George Mason University, VA, USA.

Background: The Bovine HapMap Consortium has generated assay panels to genotype ~30,000 single nucleotide polymorphisms (SNPs) from 501 animals sampled from 19 worldwide taurine and indicine breeds, plus two outgroup species (Anoa and Water Buffalo). Within the larger set of SNPs we targeted 101 high density regions spanning up to 7.6 Mb with an average density of approximately one SNP per 4 kb, and characterized the linkage disequilibrium (LD) and haplotype block structure within individual breeds and groups of breeds in relation to their geographic origin and use.

Results: From the 101 targeted high-density regions on bovine chromosomes 6, 14, and 25, between 57 and 95% of the SNPs were informative in the individual breeds. The regions of high LD extend up to ~100 kb and the size of haplotype blocks ranges between 30 bases and 75 kb (10.3 kb average). On the scale from 1-100 kb the extent of LD and haplotype block structure in cattle has high similarity to humans. The estimation of effective population sizes over the previous 10,000 generations conforms to two main events in cattle history: the initiation of cattle domestication (~12,000 years ago), and the intensification of population isolation and current population bottleneck that breeds have experienced worldwide within the last ~700 years. Haplotype block density correlation, block boundary discordances, and haplotype sharing analyses were consistent in revealing unexpected similarities between some beef and dairy breeds, making them non-differentiable. Clustering techniques permitted grouping of breeds into different clades given their similarities and dissimilarities in genetic structure.

Conclusion: This work presents the first high-resolution analysis of haplotype block structure in worldwide cattle samples. Several novel results were obtained. First, cattle and human share a high similarity in LD and haplotype block structure on the scale of 1-100 kb. Second, unexpected similarities in haplotype block structure between dairy and beef breeds make them non-differentiable. Finally, our findings suggest that ~30,000 uniformly distributed SNPs would be necessary to construct a complete genome LD map in Bos taurus breeds, and ~580,000 SNPs would be necessary to characterize the haplotype block structure across the complete cattle genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2156-10-19DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2684545PMC
April 2009

Development and characterization of a high density SNP genotyping assay for cattle.

PLoS One 2009 24;4(4):e5350. Epub 2009 Apr 24.

Department of Bioinformatics and Computational Biology, George Mason University, Manassas, Virginia, United States of America.

The success of genome-wide association (GWA) studies for the detection of sequence variation affecting complex traits in human has spurred interest in the use of large-scale high-density single nucleotide polymorphism (SNP) genotyping for the identification of quantitative trait loci (QTL) and for marker-assisted selection in model and agricultural species. A cost-effective and efficient approach for the development of a custom genotyping assay interrogating 54,001 SNP loci to support GWA applications in cattle is described. A novel algorithm for achieving a compressed inter-marker interval distribution proved remarkably successful, with median interval of 37 kb and maximum predicted gap of <350 kb. The assay was tested on a panel of 576 animals from 21 cattle breeds and six outgroup species and revealed that from 39,765 to 46,492 SNP are polymorphic within individual breeds (average minor allele frequency (MAF) ranging from 0.24 to 0.27). The assay also identified 79 putative copy number variants in cattle. Utility for GWA was demonstrated by localizing known variation for coat color and the presence/absence of horns to their correct genomic locations. The combination of SNP selection and the novel spacing algorithm allows an efficient approach for the development of high-density genotyping platforms in species having full or even moderate quality draft sequence. Aspects of the approach can be exploited in species which lack an available genome sequence. The BovineSNP50 assay described here is commercially available from Illumina and provides a robust platform for mapping disease genes and QTL in cattle.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0005350PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2669730PMC
July 2009

Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds.

Science 2009 Apr;324(5926):528-32

The imprints of domestication and breed development on the genomes of livestock likely differ from those of companion animals. A deep draft sequence assembly of shotgun reads from a single Hereford female and comparative sequences sampled from six additional breeds were used to develop probes to interrogate 37,470 single-nucleotide polymorphisms (SNPs) in 497 cattle from 19 geographically and biologically diverse breeds. These data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation. Domestication and artificial selection appear to have left detectable signatures of selection within the cattle genome, yet the current levels of diversity within breeds are at least as great as exists within humans.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1167936DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735092PMC
April 2009

The genome sequence of taurine cattle: a window to ruminant biology and evolution.

Science 2009 Apr;324(5926):522-8

To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1169588DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2943200PMC
April 2009

Identification of conserved regulatory elements in mammalian promoter regions: a case study using the PCK1 promoter.

Genomics Proteomics Bioinformatics 2008 Dec;6(3-4):129-43

Bovine Functional Genomics Laboratory, Beltsville Agricultural Research Center, Beltsville, MD 20705, USA.

A systematic phylogenetic footprinting approach was performed to identify conserved transcription factor binding sites (TFBSs) in mammalian promoter regions using human, mouse and rat sequence alignments. We found that the score distributions of most binding site models did not follow the Gaussian distribution required by many statistical methods. Therefore, we performed an empirical test to establish the optimal threshold for each model. We gauged our computational predictions by comparing with previously known TFBSs in the PCK1 gene promoter of the cytosolic isoform of phosphoenolpyruvate carboxykinase, and achieved a sensitivity of 75% and a specificity of approximately 32%. Almost all known sites overlapped with predicted sites, and several new putative TFBSs were also identified. We validated a predicted SP1 binding site in the control of PCK1 transcription using gel shift and reporter assays. Finally, we applied our computational approach to the prediction of putative TFBSs within the promoter regions of all available RefSeq genes. Our full set of TFBS predictions is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/S1672-0229(09)60001-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5054123PMC
December 2008

MicroRNA transcriptome profiles during swine skeletal muscle development.

BMC Genomics 2009 Feb 10;10:77. Epub 2009 Feb 10.

USDA/ARS Meat Animal Research Center, Clay Center, NE, USA.

Background: MicroRNA (miR) are a class of small RNAs that regulate gene expression by inhibiting translation of protein encoding transcripts. To evaluate the role of miR in skeletal muscle of swine, global microRNA abundance was measured at specific developmental stages including proliferating satellite cells, three stages of fetal growth, day-old neonate, and the adult.

Results: Twelve potential novel miR were detected that did not match previously reported sequences. In addition, a number of miR previously reported to be expressed in mammalian muscle were detected, having a variety of abundance patterns through muscle development. Muscle-specific miR-206 was nearly absent in proliferating satellite cells in culture, but was the highest abundant miR at other time points evaluated. In addition, miR-1 was moderately abundant throughout developmental stages with highest abundance in the adult. In contrast, miR-133 was moderately abundant in adult muscle and either not detectable or lowly abundant throughout fetal and neonate development. Changes in abundance of ubiquitously expressed miR were also observed. MiR-432 abundance was highest at the earliest stage of fetal development tested (60 day-old fetus) and decreased throughout development to the adult. Conversely, miR-24 and miR-27 exhibited greatest abundance in proliferating satellite cells and the adult, while abundance of miR-368, miR-376, and miR-423-5p was greatest in the neonate.

Conclusion: These data present a complete set of transcriptome profiles to evaluate miR abundance at specific stages of skeletal muscle growth in swine. Identification of these miR provides an initial group of miR that may play a vital role in muscle development and growth.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-10-77DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2646747PMC
February 2009

An assessment of population structure in eight breeds of cattle using a whole genome SNP panel.

BMC Genet 2008 May 20;9:37. Epub 2008 May 20.

Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada.

Background: Analyses of population structure and breed diversity have provided insight into the origin and evolution of cattle. Previously, these studies have used a low density of microsatellite markers, however, with the large number of single nucleotide polymorphism markers that are now available, it is possible to perform genome wide population genetic analyses in cattle. In this study, we used a high-density panel of SNP markers to examine population structure and diversity among eight cattle breeds sampled from Bos indicus and Bos taurus.

Results: Two thousand six hundred and forty one single nucleotide polymorphisms (SNPs) spanning all of the bovine autosomal genome were genotyped in Angus, Brahman, Charolais, Dutch Black and White Dairy, Holstein, Japanese Black, Limousin and Nelore cattle. Population structure was examined using the linkage model in the program STRUCTURE and Fst estimates were used to construct a neighbor-joining tree to represent the phylogenetic relationship among these breeds.

Conclusion: The whole-genome SNP panel identified several levels of population substructure in the set of examined cattle breeds. The greatest level of genetic differentiation was detected between the Bos taurus and Bos indicus breeds. When the Bos indicus breeds were excluded from the analysis, genetic differences among beef versus dairy and European versus Asian breeds were detected among the Bos taurus breeds. Exploration of the number of SNP loci required to differentiate between breeds showed that for 100 SNP loci, individuals could only be correctly clustered into breeds 50% of the time, thus a large number of SNP markers are required to replace the 30 microsatellite markers that are currently commonly used in genetic diversity studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2156-9-37DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2408608PMC
May 2008

SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries.

Nat Methods 2008 Mar 24;5(3):247-52. Epub 2008 Feb 24.

Bovine Functional Genomics Laboratory, United States Department of Agriculture, Agricultural Research Service, 10300 Baltimore Avenue, Beltsville, Maryland 20705, USA.

High-density single-nucleotide polymorphism (SNP) arrays have revolutionized the ability of genome-wide association studies to detect genomic regions harboring sequence variants that affect complex traits. Extensive numbers of validated SNPs with known allele frequencies are essential to construct genotyping assays with broad utility. We describe an economical, efficient, single-step method for SNP discovery, validation and characterization that uses deep sequencing of reduced representation libraries (RRLs) from specified target populations. Using nearly 50 million sequences generated on an Illumina Genome Analyzer from DNA of 66 cattle representing three populations, we identified 62,042 putative SNPs and predicted their allele frequencies. Genotype data for these 66 individuals validated 92% of 23,357 selected genome-wide SNPs, with a genotypic and sequence allele frequency correlation of r = 0.67. This approach for simultaneous de novo discovery of high-quality SNPs and population characterization of allele frequencies may be applied to any species with at least a partially sequenced genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.1185DOI Listing
March 2008

High-throughput genotyping with the GoldenGate assay in the complex genome of soybean.

Theor Appl Genet 2008 May 16;116(7):945-52. Epub 2008 Feb 16.

Soybean Genomics and Improvement Laboratory, US Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA.

Large numbers of single nucleotide polymorphism (SNP) markers are now available for a number of crop species. However, the high-throughput methods for multiplexing SNP assays are untested in complex genomes, such as soybean, that have a high proportion of paralogous genes. The Illumina GoldenGate assay is capable of multiplexing from 96 to 1,536 SNPs in a single reaction over a 3-day period. We tested the GoldenGate assay in soybean to determine the success rate of converting verified SNPs into working assays. A custom 384-SNP GoldenGate assay was designed using SNPs that had been discovered through the resequencing of five diverse accessions that are the parents of three recombinant inbred line (RIL) mapping populations. The 384 SNPs that were selected for this custom assay were predicted to segregate in one or more of the RIL mapping populations. Allelic data were successfully generated for 89% of the SNP loci (342 of the 384) when it was used in the three RIL mapping populations, indicating that the complex nature of the soybean genome had little impact on conversion of the discovered SNPs into usable assays. In addition, 80% of the 342 mapped SNPs had a minor allele frequency >10% when this assay was used on a diverse sample of Asian landrace germplasm accessions. The high success rate of the GoldenGate assay makes this a useful technique for quickly creating high density genetic maps in species where SNP markers are rapidly becoming available.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00122-008-0726-2DOI Listing
May 2008

Whole genome linkage disequilibrium maps in cattle.

BMC Genet 2007 Oct 25;8:74. Epub 2007 Oct 25.

Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada.

Background: Bovine whole genome linkage disequilibrium maps were constructed for eight breeds of cattle. These data provide fundamental information concerning bovine genome organization which will allow the design of studies to associate genetic variation with economically important traits and also provides background information concerning the extent of long range linkage disequilibrium in cattle.

Results: Linkage disequilibrium was assessed using r2 among all pairs of syntenic markers within eight breeds of cattle from the Bos taurus and Bos indicus subspecies. Bos taurus breeds included Angus, Charolais, Dutch Black and White Dairy, Holstein, Japanese Black and Limousin while Bos indicus breeds included Brahman and Nelore. Approximately 2670 markers spanning the entire bovine autosomal genome were used to estimate pairwise r2 values. We found that the extent of linkage disequilibrium is no more than 0.5 Mb in these eight breeds of cattle.

Conclusion: Linkage disequilibrium in cattle has previously been reported to extend several tens of centimorgans. Our results, based on a much larger sample of marker loci and across eight breeds of cattle indicate that in cattle linkage disequilibrium persists over much more limited distances. Our findings suggest that 30,000-50,000 loci will be needed to conduct whole genome association studies in cattle.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2156-8-74DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2174945PMC
October 2007

A physical map of the bovine genome.

Genome Biol 2007 ;8(8):R165

USDA, ARS, US Meat Animal Research Center, Clay Center, NE 68933, USA.

Background: Cattle are important agriculturally and relevant as a model organism. Previously described genetic and radiation hybrid (RH) maps of the bovine genome have been used to identify genomic regions and genes affecting specific traits. Application of these maps to identify influential genetic polymorphisms will be enhanced by integration with each other and with bacterial artificial chromosome (BAC) libraries. The BAC libraries and clone maps are essential for the hybrid clone-by-clone/whole-genome shotgun sequencing approach taken by the bovine genome sequencing project.

Results: A bovine BAC map was constructed with HindIII restriction digest fragments of 290,797 BAC clones from animals of three different breeds. Comparative mapping of 422,522 BAC end sequences assisted with BAC map ordering and assembly. Genotypes and pedigree from two genetic maps and marker scores from three whole-genome RH panels were consolidated on a 17,254-marker composite map. Sequence similarity allowed integrating the BAC and composite maps with the bovine draft assembly (Btau3.1), establishing a comprehensive resource describing the bovine genome. Agreement between the marker and BAC maps and the draft assembly is high, although discrepancies exist. The composite and BAC maps are more similar than either is to the draft assembly.

Conclusion: Further refinement of the maps and greater integration into the genome assembly process may contribute to a high quality assembly. The maps provide resources to associate phenotypic variation with underlying genomic variation, and are crucial resources for understanding the biology underpinning this important ruminant species so closely associated with humans.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2007-8-8-r165DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2374996PMC
February 2008

A soybean transcript map: gene distribution, haplotype and single-nucleotide polymorphism analysis.

Genetics 2007 May 4;176(1):685-96. Epub 2007 Mar 4.

Soybean Genomics and Improvement Laboratory, USDA, ARS, Beltsville, Maryland 20705, USA.

The first genetic transcript map of the soybean genome was created by mapping one SNP in each of 1141 genes in one or more of three recombinant inbred line mapping populations, thus providing a picture of the distribution of genic sequences across the mapped portion of the genome. Single-nucleotide polymorphisms (SNPs) were discovered via the resequencing of sequence-tagged sites (STSs) developed from expressed sequence tag (EST) sequence. From an initial set of 9459 polymerase chain reaction primer sets designed to a diverse set of genes, 4240 STSs were amplified and sequenced in each of six diverse soybean genotypes. In the resulting 2.44 Mbp of aligned sequence, a total of 5551 SNPs were discovered, including 4712 single-base changes and 839 indels for an average nucleotide diversity of Theta= 0.000997. The analysis of the observed genetic distances between adjacent genes vs. the theoretical distribution based upon the assumption of a random distribution of genes across the 20 soybean linkage groups clearly indicated that genes were clustered. Of the 1141 genes, 291 mapped to 72 of the 112 gaps of 5-10 cM in the preexisting simple sequence repeat (SSR)-based map, while 111 genes mapped in 19 of the 26 gaps >10 cM. The addition of 1141 sequence-based genic markers to the soybean genome map will provide an important resource to soybean geneticists for quantitative trait locus discovery and map-based cloning, as well as to soybean breeders who increasingly depend upon marker-assisted selection in cultivar improvement.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.107.070821DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1893076PMC
May 2007

Discovery and profiling of bovine microRNAs from immune-related and embryonic tissues.

Physiol Genomics 2007 Mar 14;29(1):35-43. Epub 2006 Nov 14.

United States Department of Agriculture, Agricultural Research Center, Beltsville Area Research Center, Beltsville, Maryland 20705, USA.

MicroRNAs are small approximately 22 nucleotide-long noncoding RNAs capable of controlling gene expression by inhibiting translation. Alignment of human microRNA stem-loop sequences (mir) against a recent draft sequence assembly of the bovine genome resulted in identification of 334 predicted bovine mir. We sequenced five tissue-specific cDNA libraries derived from the small RNA fractions of bovine embryo, thymus, small intestine, and lymph node to validate these predictions and identify new mir. This strategy combined with comparative sequence analysis identified 129 sequences that corresponded to mature microRNAs (miR). A total of 107 sequences aligned to known human mir, and 100 of these matched expressed miR. The other seven sequences represented novel miR expressed from the complementary strand of previously characterized human mir. The 22 sequences without matches displayed characteristic mir secondary structures when folded in silico, and 10 of these retained sequence conservation with other vertebrate species. Expression analysis based on sequence identity counts revealed that some miR were preferentially expressed in certain tissues, while bta-miR-26a and bta-miR-103 were prevalent in all tissues examined. These results support the premise that species differences in regulation of gene expression by miR occur primarily at the level of expression and processing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1152/physiolgenomics.00081.2006DOI Listing
March 2007

SNP-PHAGE--High throughput SNP discovery pipeline.

BMC Bioinformatics 2006 Oct 23;7:468. Epub 2006 Oct 23.

US Department of Agriculture, ARS, Beltsville Agricultural Research Center, Bovine Functional Genomics Laboratory, Beltsville, MD 20705, USA.

Background: Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable.

Results: We developed SNP-PHAGE (SNP discovery Pipeline with additional features for identification of common haplotypes within a sequence tagged site (Haplotype Analysis) and GenBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at http://bfgl.anri.barc.usda.gov/ML/snp-phage/.

Conclusion: SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-7-468DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1626092PMC
October 2006

Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences.

BMC Genomics 2006 Jun 7;7:140. Epub 2006 Jun 7.

USDA, ARS, ANRI, Bovine Functional Genomics Laboratory, Beltsville Agricultural Research Center (BARC)-East, 10300 Baltimore Avenue, Beltsville, MD 20705, USA.

Background: Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages.

Results: Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence) were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32-0.37 change/site) for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0-2.2 x 10(-9) change/site/year) was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 x 10(-9) change/site/year) was approximately half of the overall rate (1.9-2.0 x 10(-9) change/site/year). Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%.

Conclusion: This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-7-140DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1525190PMC
June 2006

Application of machine learning in SNP discovery.

BMC Bioinformatics 2006 Jan 6;7. Epub 2006 Jan 6.

Beltsville Agricultural Research Center, Bovine Functional Genomics Laboratory, US Department of Agriculture, ARS, Beltsville, MD 20705, USA.

Background: Single nucleotide polymorphisms (SNP) constitute more than 90% of the genetic variation, and hence can account for most trait differences among individuals in a given species. Polymorphism detection software PolyBayes and PolyPhred give high false positive SNP predictions even with stringent parameter values. We developed a machine learning (ML) method to augment PolyBayes to improve its prediction accuracy. ML methods have also been successfully applied to other bioinformatics problems in predicting genes, promoters, transcription factor binding sites and protein structures.

Results: The ML program C4.5 was applied to a set of features in order to build a SNP classifier from training data based on human expert decisions (True/False). The training data were 27,275 candidate SNP generated by sequencing 1973 STS (sequence tag sites) (12 Mb) in both directions from 6 diverse homozygous soybean cultivars and PolyBayes analysis. Test data of 18,390 candidate SNP were generated similarly from 1359 additional STS (8 Mb). SNP from both sets were classified by experts. After training the ML classifier, it agreed with the experts on 97.3% of test data compared with 7.8% agreement between PolyBayes and experts. The PolyBayes positive predictive values (PPV) (i.e., fraction of candidate SNP being real) were 7.8% for all predictions and 16.7% for those with 100% posterior probability of being real. Using ML improved the PPV to 84.8%, a 5- to 10-fold increase. While both ML and PolyBayes produced a similar number of true positives, the ML program generated only 249 false positives as compared to 16,955 for PolyBayes. The complexity of the soybean genome may have contributed to high false SNP predictions by PolyBayes and hence results may differ for other genomes.

Conclusion: A machine learning (ML) method was developed as a supplementary feature to the polymorphism detection software for improving prediction accuracies. The results from this study indicate that a trained ML classifier can significantly reduce human intervention and in this case achieved a 5-10 fold enhanced productivity. The optimized feature set and ML framework can also be applied to all polymorphism discovery software. ML support software is written in Perl and can be easily integrated into an existing SNP discovery pipeline.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-7-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1955739PMC
January 2006

Expressed sequence tag analysis of Eimeria-stimulated intestinal intraepithelial lymphocytes in chickens.

Mol Biotechnol 2005 Jun;30(2):143-50

Animal Parasitic Diseases Laboratory, Animal & Natural Resources Institute, US Dept. of Agriculture, Beltsville, MD 20705, USA.

Intraepithelial lymphocytes (IELs) play a critical role in protective immune response to intestinal pathogens such as Eimeria, the etiologic agent of avian coccidiosis. A list of genes expressed by intestinal IELs of Eimeria-infected chickens was compiled using the expressed sequence tag (EST) strategy. The 14,409 ESTs consisted of 1851 clusters and 7595 singletons, which revealed 9446 unique genes in the data set. Comparison of the sequence data with chicken DNA sequences in GenBank identified 125 novel clones. This EST library will provide a valuable resource for profiling global gene expression in normal and pathogen-infected chickens and identifying additional unique immune-related genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1385/MB:30:2:143DOI Listing
June 2005

EST-PAGE--managing and analyzing EST data.

Bioinformatics 2004 Jan;20(2):286-8

US Department of Agriculture, ARS, Beltsville Agricultural Research Center, Bovine Functional Genomics Laboratory, Beltsville, MD 20705, USA.

Unlabelled: EST-PAGE provides a bioinformatics solution for expressed sequence tags (EST) data entry, database management, GenBank submission, process control and data retrieval from a unified web interface that can be easily customized and adapted by groups working on diverse EST sequencing projects.

Availability: The system and source code are available upon request from the authors.

Supplementary Information: http://EST-PAGE.binf.gmu.edu
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btg411DOI Listing
January 2004