Publications by authors named "Rohan L Fernando"

55 Publications

Genomics of response to porcine reproductive and respiratory syndrome virus in purebred and crossbred sows: antibody response and performance following natural infection vs. vaccination.

J Anim Sci 2021 May;99(5)

Department of Animal Science, Iowa State University, Ames, IA 50011, USA.

Antibody response, measured as sample-to-positive (S/P) ratio, to porcine reproductive and respiratory syndrome virus (PRRSV) following a PRRSV-outbreak (S/POutbreak) in a purebred nucleus and following a PRRSV-vaccination (S/PVx) in commercial crossbred herds have been proposed as genetic indicator traits for improved reproductive performance in PRRSV-infected purebred and PRRSV-vaccinated crossbred sows, respectively. In this study, we investigated the genetic relationships of S/POutbreak and S/PVx with performance at the commercial (vaccinated crossbred sows) and nucleus level (non-infected and PRRSV-infected purebred sows), respectively, and tested the effect of previously identified SNP for these indicator traits. Antibody response was measured on 541 Landrace sows ~54 d after the start of a PRRSV outbreak, and on 906 F1 (Landrace × Large White) gilts ~50 d after vaccination with a commercial PRRSV vaccine. Reproductive performance was recorded for 711 and 428 Landrace sows before and during the PRRSV outbreak, respectively, and for 811 vaccinated F1 animals. The estimate of the genetic correlation (rg) of S/POutbreak with S/PVx was 0.72 ± 0.18. The estimates of rg of S/POutbreak with reproductive performance in vaccinated crossbred sows were low to moderate, ranging from 0.05 ± 0.23 to 0.30 ± 0.20. The estimate of rg of S/PVx with reproductive performance in non-infected purebred sows was moderate and favorable with number born alive (0.50 ± 0.23) but low (0 ± 0.23 to -0.11 ± 0.23) with piglet mortality traits. The estimates of rg of S/PVx were moderate and negative (-0.38 ± 0.21) with number of mummies in PRRSV-infected purebred sows and low with other traits (-0.30 ± 0.18 to 0.05 ± 0.18). Several significant associations (P0 > 0.90) of previously reported SNP for S/P ratio (ASGA0032063 and H3GA0020505) were identified for S/P ratio and performance in non-infected purebred and PRRSV-exposed purebred and crossbred sows. Genomic regions harboring the major histocompatibility complex class II region significantly contributed to the genetic correlation of antibody response to PRRSV with most of the traits analyzed. These results indicate that selection for antibody response in purebred sows following a PRRSV outbreak in the nucleus and for antibody response to PRRSV vaccination measured in commercial crossbred sows are expected to increase litter size in purebred and commercial sows.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jas/skab097DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8118356PMC
May 2021

Cross-validation of best linear unbiased predictions of breeding values using an efficient leave-one-out strategy.

J Anim Breed Genet 2021 Mar 17. Epub 2021 Mar 17.

Department of Animal Science, Iowa State University, Ames, IA, USA.

Empirical estimates of the accuracy of estimates of breeding values (EBV) can be obtained by cross-validation. Leave-one-out cross-validation (LOOCV) is an extreme case of k-fold cross-validation. Efficient strategies for LOOCV of predictions of phenotypes have been developed for a simple model with an overall mean and random marker or animal genetic effects. The objective here was to develop and evaluate an efficient LOOCV method for prediction of breeding values and other random effects under a general mixed linear model with multiple random effects. Conventional LOOCV of EBV requires inverting an (n-1)×(n-1) covariance matrix for each of n (= number of observations) data sets. Our efficient LOOCV obtains the required inverses from the inverse of the covariance matrix for all n observations. The efficient method can be applied to complex models with multiple fixed and random effects, but requires fixed effects to be treated as random, with large variances. An alternative is to precorrect observations using estimates of fixed effects obtained from the complete data, but this can lead to biases. The efficient LOOCV method was compared to conventional LOOCV of predictions of breeding values in terms of computational demands and accuracy. For a data set with 3,205 observations and a model with multiple random and fixed effects, the efficient LOOCV method was 962 times faster than the conventional LOOCV with precorrection for fixed effects based on each training data set but resulted in identical EBV. A computationally efficient LOOCV for prediction of breeding values for single- and multiple-trait mixed models with multiple fixed and random effects was successfully developed. The method enables cross-validation of predictions of breeding values and of any linear combination of random and/or fixed effects, along with leave-one-out precorrection of validation phenotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/jbg.12545DOI Listing
March 2021

Genetic Analysis of Antibody Response to Porcine Reproductive and Respiratory Syndrome Vaccination as an Indicator Trait for Reproductive Performance in Commercial Sows.

Front Genet 2020 11;11:1011. Epub 2020 Sep 11.

Department of Animal Science, Iowa State University, Ames, IA, United States.

We proposed to investigate the genomic basis of antibody response to porcine reproductive and respiratory syndrome (PRRS) virus (PRRSV) vaccination and its relationship to reproductive performance in non-PRRSV-infected commercial sows. Nine hundred and six F1 replacement gilts (139 ± 17 days old) from two commercial farms were vaccinated with a commercial modified live PRRSV vaccine. Blood samples were collected about 52 days after vaccination to measure antibody response to PRRSV as sample-to-positive (S/P) ratio and for single-nucleotide polymorphism (SNP) genotyping. Reproductive performance was recorded for up to 807 sows for number born alive (NBA), number of piglets weaned, number born mummified (MUM), number of stillborn (NSB), and number of pre-weaning mortality (PWM) at parities (P) 1-3 and per sow per year (PSY). Fertility traits such as farrowing rate and age at first service were also analyzed. BayesC0 was used to estimate heritability and genetic correlations of S/P ratio with reproductive performance. Genome-wide association study (GWAS) and genomic prediction were performed using BayesB. The heritability estimate of S/P ratio was 0.34 ± 0.05. High genetic correlations ( ) of S/P ratio with farrowing performance were identified for NBA P1 (0.61), PWM P2 (-0.70), NSB P3 (-0.83), MUM P3 (-0.84), and NSB PSY (-0.90), indicating that genetic selection for increased S/P ratio would result in improved performance of these traits. A quantitative trait locus was identified on chromosome 7 (∼25 Mb), at the major histocompatibility complex (MHC) region, explaining ∼30% of the genetic variance for S/P ratio, mainly by SNPs ASGA0032113, H3GA0020505, and M1GA0009777. This same region was identified in the bivariate GWAS of S/P ratio and reproductive traits, with SNP H3GA0020505 explaining up to 10% (for NBA P1) of the genetic variance of reproductive performance. The heterozygote genotype at H3GA0020505 was associated with greater S/P ratio and NBA P1 ( = 0.06), and lower MUM P3 and NSB P3 ( = 0.07). Genomic prediction accuracy for S/P ratio was high when using all SNPs (0.67) and when using only those in the MHC region (0.59) and moderate to low when using all SNPs excluding those in the MHC region (0.39). These results suggest that there is great potential to use antibody response to PRRSV vaccination as an indicator trait to improve reproductive performance in commercial pigs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2020.01011DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516203PMC
September 2020

Corrigendum: Exact Distribution of Linkage Disequilibrium in the Presence of Mutation, Selection, or Minor Allele Frequency Filtering.

Front Genet 2020 11;11:732. Epub 2020 Aug 11.

Department of Animal Science, University of California, Davis, Davis, CA, United States.

[This corrects the article DOI: 10.3389/fgene.2020.00362.].
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2020.00732DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7432120PMC
August 2020

Exact Distribution of Linkage Disequilibrium in the Presence of Mutation, Selection, or Minor Allele Frequency Filtering.

Front Genet 2020 21;11:362. Epub 2020 Apr 21.

Department of Animal Science, University of California, Davis, Davis, CA, United States.

Linkage disequilibrium (LD), often expressed in terms of the squared correlation ( ) between allelic values at two loci, is an important concept in many branches of genetics and genomics. Genetic drift and recombination have opposite effects on LD, and thus will keep changing until the effects of these two forces are counterbalanced. Several approximations have been used to determine the expected value of at equilibrium in the presence or absence of mutation. In this paper, we propose a probability-based approach to compute the exact distribution of allele frequencies at two loci in a finite population at any generation conditional on the distribution at generation - 1. As is a function of this distribution of allele frequencies, this approach can be used to examine the distribution of over generations as it approaches equilibrium. The exact distribution of LD from our method is used to describe, quantify, and compare LD at different equilibria, including equilibrium in the absence or presence of mutation, selection, and filtering by minor allele frequency. We also propose a deterministic formula for expected LD in the presence of mutation at equilibrium based on the exact distribution of LD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2020.00362DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7212447PMC
April 2020

A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits.

Genetics 2020 02 26;214(2):305-331. Epub 2019 Dec 26.

Department of Animal Science, Iowa State University, Ames, Iowa 50011.

A multiple-trait Bayesian LASSO (MBL) for genome-based analysis and prediction of quantitative traits is presented and applied to two real data sets. The data-generating model is a multivariate linear Bayesian regression on possibly a huge number of molecular markers, and with a Gaussian residual distribution posed. Each (one per marker) of the [Formula: see text] vectors of regression coefficients (: number of traits) is assigned the same -variate Laplace prior distribution, with a null mean vector and unknown scale matrix Σ. The multivariate prior reduces to that of the standard univariate Bayesian LASSO when [Formula: see text] The covariance matrix of the residual distribution is assigned a multivariate Jeffreys prior, and Σ is given an inverse-Wishart prior. The unknown quantities in the model are learned using a Markov chain Monte Carlo sampling scheme constructed using a scale-mixture of normal distributions representation. MBL is demonstrated in a bivariate context employing two publicly available data sets using a bivariate genomic best linear unbiased prediction model (GBLUP) for benchmarking results. The first data set is one where wheat grain yields in two different environments are treated as distinct traits. The second data set comes from genotyped trees, with each individual measured for two traits: rust bin and gall volume. In MBL, the bivariate marker effects are shrunk differentially, , "short" vectors are more strongly shrunk toward the origin than in GBLUP; conversely, "long" vectors are shrunk less. A predictive comparison was carried out as well in wheat, where the comparators of MBL were bivariate GBLUP and bivariate Bayes C-a variable selection procedure. A training-testing layout was used, with 100 random reconstructions of training and testing sets. For the wheat data, all methods produced similar predictions. In , MBL gave better predictions that either a Bayesian bivariate GBLUP or the single trait Bayesian LASSO. MBL has been implemented in the Julia language package JWAS, and is now available for the scientific community to explore with different traits, species, and environments. It is well known that there is no universally best prediction machine, and MBL represents a new resource in the armamentarium for genome-enabled analysis and prediction of complex traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.119.302934DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7017027PMC
February 2020

Inferring trait-specific similarity among individuals from molecular markers and phenotypes with Bayesian regression.

Theor Popul Biol 2020 04 9;132:47-59. Epub 2019 Dec 9.

Department of Plant Sciences, Technical University of Munich, TUM School of Life Sciences, Germany.

Modeling covariance structure based on genetic similarity between pairs of relatives plays an important role in evolutionary, quantitative and statistical genetics. Historically, genetic similarity between individuals has been quantified from pedigrees via the probability that randomly chosen homologous alleles between individuals are identical by descent (IBD). At present, however, many genetic analyses rely on molecular markers, with realized measures of genomic similarity replacing IBD-based expected similarities. Animal and plant breeders, for example, now employ marker-based genomic relationship matrices between individuals in prediction models and in estimation of genome-based heritability coefficients. Phenotypes convey information about genetic similarity as well. For instance, if phenotypic values are at least partially the result of the action of quantitative trait loci, one would expect the former to inform about the latter, as in genome-wide association studies. Statistically, a non-trivial conditional distribution of unknown genetic similarities, given phenotypes, is to be expected. A Bayesian formalism is presented here that applies to whole-genome regression methods where some genetic similarity matrix, e.g., a genomic relationship matrix, can be defined. Our Bayesian approach, based on phenotypes and markers, converts prior (markers only) expected similarity into trait-specific posterior similarity. A simulation illustrates situations under which effective Bayesian learning from phenotypes occurs. Pinus and wheat data sets were used to demonstrate applicability of the concept in practice. The methodology applies to a wide class of Bayesian linear regression models, it extends to the multiple-trait domain, and can also be used to develop phenotype-guided similarity kernels in prediction problems.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.tpb.2019.11.008DOI Listing
April 2020

Marker discovery and associations with β-carotene content in Indian dairy cattle and buffalo breeds.

J Dairy Sci 2019 Nov 30;102(11):10039-10055. Epub 2019 Aug 30.

Department of Animal Science, Iowa State University, 2255 Kildee Hall, 806 Stange Road, Ames 50011. Electronic address:

Vitamin A is essential for human health, but current intake levels in many developing countries such as India are too low due to malnutrition. According to the World Health Organization, an estimated 250 million preschool children are vitamin A deficient globally. This number excludes pregnant women and nursing mothers, who are particularly vulnerable. Efforts to improve access to vitamin A are key because supplementation can reduce mortality rates in young children in developing countries by around 23%. Three key genes, BCMO1, BCO2, and SCARB1, have been shown to be associated with the amount of β-carotene (BC) in milk. Whole-genome sequencing reads from the coordinates of these 3 genes in 202 non-Indian cattle (141 Bos taurus, 61 Bos indicus) and 35 non-Indian buffalo (Bubalus bubalis) animals from several breeds were collected from data repositories. The number of SNP detected in the coding regions of these 3 genes ranged from 16 to 26 in the 3 species, with 5 overlapping SNP between B. taurus and B. indicus. All these SNP together with 2 SNP in the upstream part of the gene but already present in dbSNP (https://www.ncbi.nlm.nih.gov/projects/SNP/) were used to build a custom Sequenom array. Blood for DNA and milk samples for BC were obtained from 2,291 Indian cows of 5 different breeds (Gir, Holstein cross, Jersey Cross, Tharparkar, and Sahiwal) and 2,242 Indian buffaloes (Jafarabadi, Murrah, Pandharpuri, and Surti breeds). The DNA was extracted and genotyped with the Sequenom array. For each individual breed and the combined breeds, SNP with an association that had a P-value <0.3 in the first round of linear analysis were included in a second step of regression analyses to determine allele substitution effects to increase the content of BC in milk. Additionally, an F-test for all SNP within gene was performed with the objective of determining if overall the gene had a significant effect on the content of BC in milk. The analyses were repeated using a Bayesian approach to compare and validate the previous frequentist results. Multiple significant SNP were found using both methodologies with allele substitution effects ranging from 6.21 (3.13) to 9.10 (5.43) µg of BC per 100 mL of milk. Total gene effects exceeded the mean BC value for all breeds with both analysis approaches. The custom panel designed for genes related to BC production demonstrated applicability in genotyping of cattle and buffalo in India and may be used for cattle or buffalo from other developing countries. Moreover, the recommendation of selection for significant specific alleles of some gene markers provides a route to effectively increase the BC content in milk in the Indian cattle and buffalo populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3168/jds.2019-16361DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7753891PMC
November 2019

Identification of recombination hotspots and quantitative trait loci for recombination rate in layer chickens.

J Anim Sci Biotechnol 2019 26;10:20. Epub 2019 Feb 26.

1Department of Animal Science, Iowa State University, Ames, IA 50010 USA.

Background: The frequency of recombination events varies across the genome and between individuals, which may be related to some genomic features. The objective of this study was to assess the frequency of recombination events and to identify QTL (quantitative trait loci) for recombination rate in two purebred layer chicken lines.

Methods: A total of 1200 white-egg layers (WL) were genotyped with 580 K SNPs and 5108 brown-egg layers (BL) were genotyped with 42 K SNPs (single nucleotide polymorphisms). Recombination events were identified within half-sib families and both the number of recombination events and the recombination rate was calculated within each 0.5 Mb window of the genome. The 10% of windows with the highest recombination rate on each chromosome were considered to be recombination hotspots. A BayesB model was used separately for each line to identify genomic regions associated with the genome-wide number of recombination event per meiosis. Regions that explained more than 0.8% of genetic variance of recombination rate were considered to harbor QTL.

Results: Heritability of recombination rate was estimated at 0.17 in WL and 0.16 in BL. On average, 11.3 and 23.2 recombination events were detected per individual across the genome in 1301 and 9292 meioses in the WL and BL, respectively. The estimated recombination rates differed significantly between the lines, which could be due to differences in inbreeding levels, and haplotype structures. Dams had about 5% to 20% higher recombination rates per meiosis than sires in both lines. Recombination rate per 0.5 Mb window had a strong negative correlation with chromosome size and a strong positive correlation with GC content and with CpG island density across the genome in both lines. Different QTL for recombination rate were identified in the two lines. There were 190 and 199 non-overlapping recombination hotspots detected in WL and BL respectively, 28 of which were common to both lines.

Conclusions: Differences in the recombination rates, hotspot locations, and QTL regions associated with genome-wide recombination were observed between lines, indicating the breed-specific feature of detected recombination events and the control of recombination events is a complex polygenic trait.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s40104-019-0332-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6390344PMC
February 2019

A certain invariance property of BLUE in a whole-genome regression context.

J Anim Breed Genet 2019 Mar 7;136(2):113-117. Epub 2019 Jan 7.

AL Rae Centre of Genetics and Breeding, Massey University, Palmerston North, New Zealand.

A curious result from mixed linear models applied to genome-wide association studies was expanded. In particular, a model in which one or more markers are considered as fixed but are allowed to contribute to the covariance structure by treating such markers as random as well was examined. The best linear unbiased estimator of marker effects is invariant with respect to whether those markers are employed in constructing a genomic relationship matrix or are ignored, provided marker effects are uncorrelated with those not being tested. Also, the implications of regarding some marker effects as fixed when, in fact, these possess a non-trivial covariance structure with those declared as random were examined.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/jbg.12378DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6850311PMC
March 2019

Empirical Comparisons of Different Statistical Models To Identify and Validate Kernel Row Number-Associated Variants from Structured Multi-parent Mapping Populations of Maize.

G3 (Bethesda) 2018 11 6;8(11):3567-3575. Epub 2018 Nov 6.

Department of Agronomy

Advances in next generation sequencing technologies and statistical approaches enable genome-wide dissection of phenotypic traits via genome-wide association studies (GWAS). Although multiple statistical approaches for conducting GWAS are available, the power and cross-validation rates of many approaches have been mostly tested using simulated data. Empirical comparisons of single variant (SV) and multi-variant (MV) GWAS approaches have not been conducted to test if a single approach or a combination of SV and MV is effective, through identification and cross-validation of trait-associated loci. In this study, kernel row number (KRN) data were collected from a set of 6,230 entries derived from the Nested Association Mapping (NAM) population and related populations. Three different types of GWAS analyses were performed: 1) single-variant (SV), 2) stepwise regression (STR) and 3) a Bayesian-based multi-variant (BMV) model. Using SV, STR, and BMV models, 257, 300, and 442 KRN-associated variants (KAVs) were identified in the initial GWAS analyses. Of these, 231 KAVs were subjected to genetic validation using three unrelated populations that were not included in the initial GWAS. Genetic validation results suggest that the three GWAS approaches are complementary. Interestingly, KAVs in low recombination regions were more likely to exhibit associations in independent populations than KAVs in recombinationally active regions, probably as a consequence of linkage disequilibrium. The KAVs identified in this study have the potential to enhance our understanding of the genetic basis of ear development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/g3.118.200636DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6222574PMC
November 2018

Genome-wide mapping of quantitative trait loci in admixed populations using mixed linear model and Bayesian multiple regression analysis.

Genet Sel Evol 2018 06 19;50(1):32. Epub 2018 Jun 19.

Department of Animal Science, Iowa State University, Ames, IA, 50010, USA.

Background: Population stratification and cryptic relationships have been the main sources of excessive false-positives and false-negatives in population-based association studies. Many methods have been developed to model these confounding factors and minimize their impact on the results of genome-wide association studies. In most of these methods, a two-stage approach is applied where: (1) methods are used to determine if there is a population structure in the sample dataset and (2) the effects of population structure are corrected either by modeling it or by running a separate analysis within each sub-population. The objective of this study was to evaluate the impact of population structure on the accuracy and power of genome-wide association studies using a Bayesian multiple regression method.

Methods: We conducted a genome-wide association study in a stochastically simulated admixed population. The genome was composed of six chromosomes, each with 1000 markers. Fifteen segregating quantitative trait loci contributed to the genetic variation of a quantitative trait with heritability of 0.30. The impact of genetic relationships and breed composition (BC) on three analysis methods were evaluated: single marker simple regression (SMR), single marker mixed linear model (MLM) and Bayesian multiple-regression analysis (BMR). Each method was fitted with and without BC. Accuracy, power, false-positive rate and the positive predictive value of each method were calculated and used for comparison.

Results: SMR and BMR, both without BC, were ranked as the worst and the best performing approaches, respectively. Our results showed that, while explicit modeling of genetic relationships and BC is essential for models SMR and MLM, BMR can disregard them and yet result in a higher power without compromising its false-positive rate.

Conclusions: This study showed that the Bayesian multiple-regression analysis is robust to population structure and to relationships among study subjects and performs better than a single marker mixed linear model approach.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12711-018-0402-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6006859PMC
June 2018

The Accuracy and Bias of Single-Step Genomic Prediction for Populations Under Selection.

G3 (Bethesda) 2017 08 7;7(8):2685-2694. Epub 2017 Aug 7.

Department of Animal Science, Iowa State University, Ames, Iowa 50011

In single-step analyses, missing genotypes are explicitly or implicitly imputed, and this requires centering the observed genotypes using the means of the unselected founders. If genotypes are only available for selected individuals, centering on the unselected founder mean is not straightforward. Here, computer simulation is used to study an alternative analysis that does not require centering genotypes but fits the mean [Formula: see text] of unselected individuals as a fixed effect. Starting with observed diplotypes from 721 cattle, a five-generation population was simulated with sire selection to produce 40,000 individuals with phenotypes, of which the 1000 sires had genotypes. The next generation of 8000 genotyped individuals was used for validation. Evaluations were undertaken with (J) or without (N) [Formula: see text] when marker covariates were not centered; and with (JC) or without (C) [Formula: see text] when all observed and imputed marker covariates were centered. Centering did not influence accuracy of genomic prediction, but fitting [Formula: see text] did. Accuracies were improved when the panel comprised only quantitative trait loci (QTL); models JC and J had accuracies of 99.4%, whereas models C and N had accuracies of 90.2%. When only markers were in the panel, the 4 models had accuracies of 80.4%. In panels that included QTL, fitting [Formula: see text] in the model improved accuracy, but had little impact when the panel contained only markers. In populations undergoing selection, fitting [Formula: see text] in the model is recommended to avoid bias and reduction in prediction accuracy due to selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/g3.117.043596DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5555473PMC
August 2017

Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction.

J Anim Sci Biotechnol 2017 2;8:38. Epub 2017 May 2.

Department of Animal Science, Iowa State University, Ames, 50011 Iowa USA.

Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model.

Methods: Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.

Results: Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations.

Conclusions: Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s40104-017-0164-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5414316PMC
May 2017

Computational strategies for alternative single-step Bayesian regression models with large numbers of genotyped and non-genotyped animals.

Genet Sel Evol 2016 12 8;48(1):96. Epub 2016 Dec 8.

Department of Animal Science, Iowa State University, Ames, IA, 50011, USA.

Background: Two types of models have been used for single-step genomic prediction and genome-wide association studies that include phenotypes from both genotyped animals and their non-genotyped relatives. The two types are breeding value models (BVM) that fit breeding values explicitly and marker effects models (MEM) that express the breeding values in terms of the effects of observed or imputed genotypes. MEM can accommodate a wider class of analyses, including variable selection or mixture model analyses. The order of the equations that need to be solved and the inverses required in their construction vary widely, and thus the computational effort required depends upon the size of the pedigree, the number of genotyped animals and the number of loci.

Theory: We present computational strategies to avoid storing large, dense blocks of the MME that involve imputed genotypes. Furthermore, we present a hybrid model that fits a MEM for animals with observed genotypes and a BVM for those without genotypes. The hybrid model is computationally attractive for pedigree files containing millions of animals with a large proportion of those being genotyped.

Application: We demonstrate the practicality on both the original MEM and the hybrid model using real data with 6,179,960 animals in the pedigree with 4,934,101 phenotypes and 31,453 animals genotyped at 40,214 informative loci. To complete a single-trait analysis on a desk-top computer with four graphics cards required about 3 h using the hybrid model to obtain both preconditioned conjugate gradient solutions and 42,000 Markov chain Monte-Carlo (MCMC) samples of breeding values, which allowed making inferences from posterior means, variances and covariances. The MCMC sampling required one quarter of the effort when the hybrid model was used compared to the published MEM.

Conclusions: We present a hybrid model that fits a MEM for animals with genotypes and a BVM for those without genotypes. Its practicality and considerable reduction in computing effort was demonstrated. This model can readily be extended to accommodate multiple traits, multiple breeds, maternal effects, and additional random effects such as polygenic residual effects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12711-016-0273-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5144523PMC
December 2016

An efficient exact method to obtain GBLUP and single-step GBLUP when the genomic relationship matrix is singular.

Genet Sel Evol 2016 10 27;48(1):80. Epub 2016 Oct 27.

Department of Animal Science, Iowa State University, Ames, IA, 50011, USA.

Background: The mixed linear model employed for genomic best linear unbiased prediction (GBLUP) includes the breeding value for each animal as a random effect that has a mean of zero and a covariance matrix proportional to the genomic relationship matrix ([Formula: see text]), where the inverse of [Formula: see text] is required to set up the usual mixed model equations (MME). When only some animals have genomic information, genomic predictions can be obtained by an extension known as single-step GBLUP, where the covariance matrix of breeding values is constructed by combining the pedigree-based additive relationship matrix with [Formula: see text]. The inverse of the combined relationship matrix can be obtained efficiently, provided [Formula: see text] can be inverted. In some livestock species, however, the number [Formula: see text] of animals with genomic information exceeds the number of marker covariates used to compute [Formula: see text], and this results in a singular [Formula: see text]. For such a case, an efficient and exact method to obtain GBLUP and single-step GBLUP is presented here.

Results: Exact methods are already available to obtain GBLUP when [Formula: see text] is singular, but these require working with large dense matrices. Another approach is to modify [Formula: see text] to make it nonsingular by adding a small value to all its diagonals or regressing it towards the pedigree-based relationship matrix. This, however, results in the inverse of [Formula: see text] being dense and difficult to compute as [Formula: see text] grows. The approach presented here recognizes that the number r of linearly independent genomic breeding values cannot exceed the number of marker covariates, and the mixed linear model used here for genomic prediction only fits these r linearly independent breeding values as random effects.

Conclusions: The exact method presented here was compared to Apy-GBLUP and to Apy single-step GBLUP, both of which are approximate methods that use a modified [Formula: see text] that has a sparse inverse which can be computed efficiently. In a small numerical example, predictions from the exact approach and Apy were almost identical, but the MME from Apy had a condition number about 1000 times larger than that from the exact approach, indicating ill-conditioning of the MME from Apy. The practical application of exact SSGBLUP is not more difficult than implementation of Apy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12711-016-0260-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5082134PMC
October 2016

An Upper Bound for Accuracy of Prediction Using GBLUP.

PLoS One 2016 16;11(8):e0161054. Epub 2016 Aug 16.

Department of Animal Science, Iowa State University, 50011 Ames, Iowa, United States of America.

This study aims at characterizing the asymptotic behavior of genomic prediction R2 as the size of the reference population increases for common or rare QTL alleles through simulations. Haplotypes derived from whole-genome sequence of 85 Caucasian individuals from the 1,000 Genomes Project were used to simulate random mating in a population of 10,000 individuals for at least 100 generations to create the LD structure in humans for a large number of individuals. To reduce computational demands, only SNPs within a 0.1M region of each of the first 5 chromosomes were used in simulations, and therefore, the total genome length simulated was 0.5M. When the genome length is 30M, to get the same genomic prediction R2 as with a 0.5M genome would require a reference population 60 fold larger. Three scenarios were considered varying in minor allele frequency distributions of markers and QTL, for h2 = 0.8 resembling height in humans. Total number of markers was 4,200 and QTL were 70 for each scenario. In this study, we considered the prediction accuracy in terms of an estimability problem, and thereby provided an upper bound for reliability of prediction, and thus, for prediction R2. Genomic prediction methods GBLUP, BayesB and BayesC were compared. Our results imply that for human height variable selection methods BayesB and BayesC applied to a 30M genome have no advantage over GBLUP when the size of reference population was small (<6,000 individuals), but are superior as more individuals are included in the reference population. All methods become asymptotically equivalent in terms of prediction R2, which approaches genomic heritability when the size of the reference population reaches 480,000 individuals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0161054PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4986954PMC
August 2017

Effects of number of training generations on genomic prediction for various traits in a layer chicken population.

Genet Sel Evol 2016 Mar 19;48:22. Epub 2016 Mar 19.

Department of Animal Science, Iowa State University, Ames, IA, 50010, USA.

Background: Genomic estimated breeding values (GEBV) based on single nucleotide polymorphism (SNP) genotypes are widely used in animal improvement programs. It is typically assumed that the larger the number of animals is in the training set, the higher is the prediction accuracy of GEBV. The aim of this study was to quantify genomic prediction accuracy depending on the number of ancestral generations included in the training set, and to determine the optimal number of training generations for different traits in an elite layer breeding line.

Methods: Phenotypic records for 16 traits on 17,793 birds were used. All parents and some selection candidates from nine non-overlapping generations were genotyped for 23,098 segregating SNPs. An animal model with pedigree relationships (PBLUP) and the BayesB genomic prediction model were applied to predict EBV or GEBV at each validation generation (progeny of the most recent training generation) based on varying numbers of immediately preceding ancestral generations. Prediction accuracy of EBV or GEBV was assessed as the correlation between EBV and phenotypes adjusted for fixed effects, divided by the square root of trait heritability. The optimal number of training generations that resulted in the greatest prediction accuracy of GEBV was determined for each trait. The relationship between optimal number of training generations and heritability was investigated.

Results: On average, accuracies were higher with the BayesB model than with PBLUP. Prediction accuracies of GEBV increased as the number of closely-related ancestral generations included in the training set increased, but reached an asymptote or slightly decreased when distant ancestral generations were used in the training set. The optimal number of training generations was 4 or more for high heritability traits but less than that for low heritability traits. For less heritable traits, limiting the training datasets to individuals closely related to the validation population resulted in the best predictions.

Conclusions: The effect of adding distant ancestral generations in the training set on prediction accuracy differed between traits and the optimal number of necessary training generations is associated with the heritability of traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12711-016-0198-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4799631PMC
March 2016

Accuracy of prediction of simulated polygenic phenotypes and their underlying quantitative trait loci genotypes using real or imputed whole-genome markers in cattle.

Genet Sel Evol 2015 Dec 23;47:99. Epub 2015 Dec 23.

Department of Animal Science, Iowa State University, Ames, 50011, USA.

Background: More accurate genomic predictions are expected when the effects of QTL (quantitative trait loci) are predicted from markers in close physical proximity to the QTL. The objective of this study was to quantify to what extent whole-genome methods using 50 K or imputed 770 K SNPs (single nucleotide polymorphisms) could predict single or multiple QTL genotypes based on SNPs in close proximity to those QTL.

Methods: Phenotypes with a heritability of 1 were simulated for 2677 Hereford animals genotyped with the BovineSNP50 BeadChip. Genotypes for the high-density 770 K SNP panel were imputed using Beagle software. Various Bayesian regression methods were used to predict single QTL or a trait influenced by 42 such QTL. We quantified to what extent these predictions were based on SNPs in close proximity to the QTL by comparing whole-genome predictions to local predictions based on estimates of the effects of variable numbers of SNPs i.e. ±1, ±2, ±5, ±10, ±50 or ±100 that flanked the QTL.

Results: Prediction accuracies based on local SNPs using whole-genome training for single QTL with the 50 K SNP panel and BayesC0 ranged from 0.49 (±1 SNP) to 0.75 (±100 SNPs). The minimum number of local SNPs for an accurate prediction is ±10 SNPs. Prediction accuracies that were based on local SNPs only were higher than those based on whole-genome SNPs for both 50 K and 770 K SNP panels. For the 770 K SNP panel, prediction accuracies were higher than 0.70 and varied little i.e. between 0.73 (±1 SNP) and 0.77 (±5 SNPs). For the summed 42 QTL, prediction accuracies were generally higher than for single QTL regardless of the number of local SNPs. For QTL with low minor allele frequency (MAF) compared to QTL with high MAF, prediction accuracies increased as the number of SNPs around the QTL increased.

Conclusions: These results suggest that with both 50 K and imputed 770 K SNP genotypes the level of linkage disequilibrium is sufficient to predict single and multiple QTL. However, prediction accuracies are eroded through spuriously estimated effects of SNPs that are distant from the QTL. Prediction accuracies were higher with the 770 K than with the 50 K SNP panel.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12711-015-0179-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4689055PMC
December 2015

A fast and efficient Gibbs sampler for BayesB in whole-genome analyses.

Genet Sel Evol 2015 Oct 14;47:80. Epub 2015 Oct 14.

Department of Animal Science, Iowa State University, Ames, 50011, IA, USA.

Background: In whole-genome analyses, the number p of marker covariates is often much larger than the number n of observations. Bayesian multiple regression models are widely used in genomic selection to address this problem of [Formula: see text] The primary difference between these models is the prior assumed for the effects of the covariates. Usually in the BayesB method, a Metropolis-Hastings (MH) algorithm is used to jointly sample the marker effect and the locus-specific variance, which may make BayesB computationally intensive. In this paper, we show how the Gibbs sampler without the MH algorithm can be used for the BayesB method.

Results: We consider three different versions of the Gibbs sampler to sample the marker effect and locus-specific variance for each locus. Among the Gibbs samplers that were considered, the most efficient sampler is about 2.1 times as efficient as the MH algorithm proposed by Meuwissen et al. and 1.7 times as efficient as that proposed by Habier et al.

Conclusions: The three Gibbs samplers presented here were twice as efficient as Metropolis-Hastings samplers and gave virtually the same results.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12711-015-0157-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4606519PMC
October 2015

Response and inbreeding from a genomic selection experiment in layer chickens.

Genet Sel Evol 2015 Jul 7;47:59. Epub 2015 Jul 7.

Department of Animal Science, Iowa State University, Ames, IA, 50011-3150, USA.

Background: Genomic selection (GS) using estimated breeding values (GS-EBV) based on dense marker data is a promising approach for genetic improvement. A simulation study was undertaken to illustrate the opportunities offered by GS for designing breeding programs. It consisted of a selection program for a sex-limited trait in layer chickens, which was developed by deterministic predictions under different scenarios. Later, one of the possible schemes was implemented in a real population of layer chicken.

Methods: In the simulation, the aim was to double the response to selection per year by reducing the generation interval by 50 %, while maintaining the same rate of inbreeding per year. We found that GS with retraining could achieve the set objectives while requiring 75 % fewer reared birds and 82 % fewer phenotyped birds per year. A multi-trait GS scenario was subsequently implemented in a real population of brown egg laying hens. The population was split into two sub-lines, one was submitted to conventional phenotypic selection, and one was selected based on genomic prediction. At the end of the 3-year experiment, the two sub-lines were compared for multiple performance traits that are relevant for commercial egg production.

Results: Birds that were selected based on genomic prediction outperformed those that were submitted to conventional selection for most of the 16 traits that were included in the index used for selection. However, although the two programs were designed to achieve the same rate of inbreeding per year, the realized inbreeding per year assessed from pedigree was higher in the genomic selected line than in the conventionally selected line.

Conclusions: The results demonstrate that GS is a promising alternative to conventional breeding for genetic improvement of layer chickens.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12711-015-0133-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4492088PMC
July 2015

A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses.

Genet Sel Evol 2014 Sep 22;46:50. Epub 2014 Sep 22.

Department of Animal Science, Iowa State University, 50011 Ames, Iowa, USA.

Background: To obtain predictions that are not biased by selection, the conditional mean of the breeding values must be computed given the data that were used for selection. When single nucleotide polymorphism (SNP) effects have a normal distribution, it can be argued that single-step best linear unbiased prediction (SS-BLUP) yields a conditional mean of the breeding values. Obtaining SS-BLUP, however, requires computing the inverse of the dense matrix G of genomic relationships, which will become infeasible as the number of genotyped animals increases. Also, computing G requires the frequencies of SNP alleles in the founders, which are not available in most situations. Furthermore, SS-BLUP is expected to perform poorly relative to variable selection models such as BayesB and BayesC as marker densities increase.

Methods: A strategy is presented for Bayesian regression models (SSBR) that combines all available data from genotyped and non-genotyped animals, as in SS-BLUP, but accommodates a wider class of models. Our strategy uses imputed marker covariates for animals that are not genotyped, together with an appropriate residual genetic effect to accommodate deviations between true and imputed genotypes. Under normality, one formulation of SSBR yields results identical to SS-BLUP, but does not require computing G or its inverse and provides richer inferences. At present, Bayesian regression analyses are used with a few thousand genotyped individuals. However, when SSBR is applied to all animals in a breeding program, there will be a 100 to 200-fold increase in the number of animals and an associated 100 to 200-fold increase in computing time. Parallel computing strategies can be used to reduce computing time. In one such strategy, a 58-fold speedup was achieved using 120 cores.

Discussion: In SSBR and SS-BLUP, phenotype, genotype and pedigree information are combined in a single-step. Unlike SS-BLUP, SSBR is not limited to normally distributed marker effects; it can be used when marker effects have a t distribution, as in BayesA, or mixture distributions, as in BayesB or BayesC π. Furthermore, it has the advantage that matrix inversion is not required. We have investigated parallel computing to speedup SSBR analyses so they can be used for routine applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1297-9686-46-50DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262255PMC
September 2014

Reduction in accuracy of genomic prediction for ordered categorical data compared to continuous observations.

Genet Sel Evol 2014 Jun 9;46:37. Epub 2014 Jun 9.

Department of Animal Science, Iowa State University, Ames IA 50011, USA.

Background: Accuracy of genomic prediction depends on number of records in the training population, heritability, effective population size, genetic architecture, and relatedness of training and validation populations. Many traits have ordered categories including reproductive performance and susceptibility or resistance to disease. Categorical scores are often recorded because they are easier to obtain than continuous observations. Bayesian linear regression has been extended to the threshold model for genomic prediction. The objective of this study was to quantify reductions in accuracy for ordinal categorical traits relative to continuous traits.

Methods: Efficiency of genomic prediction was evaluated for heritabilities of 0.10, 0.25 or 0.50. Phenotypes were simulated for 2250 purebred animals using 50 QTL selected from actual 50k SNP (single nucleotide polymorphism) genotypes giving a proportion of causal to total loci of.0001. A Bayes C π threshold model simultaneously fitted all 50k markers except those that represented QTL. Estimated SNP effects were utilized to predict genomic breeding values in purebred (n = 239) or multibreed (n = 924) validation populations. Correlations between true and predicted genomic merit in validation populations were used to assess predictive ability.

Results: Accuracies of genomic estimated breeding values ranged from 0.12 to 0.66 for purebred and from 0.04 to 0.53 for multibreed validation populations based on Bayes C π linear model analysis of the simulated underlying variable. Accuracies for ordinal categorical scores analyzed by the Bayes C π threshold model were 20% to 50% lower and ranged from 0.04 to 0.55 for purebred and from 0.01 to 0.44 for multibreed validation populations. Analysis of ordinal categorical scores using a linear model resulted in further reductions in accuracy.

Conclusions: Threshold traits result in markedly lower accuracy than a linear model on the underlying variable. To achieve an accuracy equal or greater than for continuous phenotypes with a training population of 1000 animals, a 2.25 fold increase in training population size was required for categorical scores fitted with the threshold model. The threshold model resulted in higher accuracies than the linear model and its advantage was greatest when training populations were smallest.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1297-9686-46-37DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4094927PMC
June 2014

Genome-wide association mapping including phenotypes from relatives without genotypes in a single-step (ssGWAS) for 6-week body weight in broiler chickens.

Front Genet 2014 20;5:134. Epub 2014 May 20.

Department of Animal Science, Purdue University West Lafayette, IN, USA.

The purpose of this study was to compare results obtained from various methodologies for genome-wide association studies, when applied to real data, in terms of number and commonality of regions identified and their genetic variance explained, computational speed, and possible pitfalls in interpretations of results. Methodologies include: two iteratively reweighted single-step genomic BLUP procedures (ssGWAS1 and ssGWAS2), a single-marker model (CGWAS), and BayesB. The ssGWAS methods utilize genomic breeding values (GEBVs) based on combined pedigree, genomic and phenotypic information, while CGWAS and BayesB only utilize phenotypes from genotyped animals or pseudo-phenotypes. In this study, ssGWAS was performed by converting GEBVs to SNP marker effects. Unequal variances for markers were incorporated for calculating weights into a new genomic relationship matrix. SNP weights were refined iteratively. The data was body weight at 6 weeks on 274,776 broiler chickens, of which 4553 were genotyped using a 60 k SNP chip. Comparison of genomic regions was based on genetic variances explained by local SNP regions (20 SNPs). After 3 iterations, the noise was greatly reduced for ssGWAS1 and results are similar to that of CGWAS, with 4 out of the top 10 regions in common. In contrast, for BayesB, the plot was dominated by a single region explaining 23.1% of the genetic variance. This same region was found by ssGWAS1 with the same rank, but the amount of genetic variation attributed to the region was only 3%. These findings emphasize the need for caution when comparing and interpreting results from various methods, and highlight that detected associations, and strength of association, strongly depends on methodologies and details of implementations. BayesB appears to overly shrink regions to zero, while overestimating the amount of genetic variation attributed to the remaining SNP effects. The real world is most likely a compromise between methods and remains to be determined.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2014.00134DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4033036PMC
June 2014

Implementing a QTL detection study (GWAS) using genomic prediction methodology.

Methods Mol Biol 2013 ;1019:275-98

Department of Animal Science, Iowa State University, Ames, IA, USA.

Genomic prediction exploits historical genotypic and phenotypic data to predict performance on selection candidates based only on their genotypes. It achieves this by a process known as training that derives the values of all the chromosome fragments that can be characterized by regressing the historical phenotypes on some or all of the genotyped loci. A genome-wide association study (GWAS) involves a genome-wide search for chromosome fragments with significant association with phenotype. One Bayesian approach to GWAS makes inferences using samples from the posterior distribution of genotypic effects obtained in the training phase of genomic prediction. Here we describe how to do this from commonly used Bayesian methods for genomic prediction, and we comment on how to interpret the results.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-62703-447-0_11DOI Listing
December 2013

Bayesian methods applied to GWAS.

Methods Mol Biol 2013 ;1019:237-74

Department of Animal Science, Iowa State University, Ames, IA, USA.

Bayesian multiple-regression methods are being successfully used for genomic prediction and selection. These regression models simultaneously fit many more markers than the number of observations available for the analysis. Thus, the Bayes theorem is used to combine prior beliefs of marker effects, which are expressed in terms of prior distributions, with information from data for inference. Often, the analyses are too complex for closed-form solutions and Markov chain Monte Carlo (MCMC) sampling is used to draw inferences from posterior distributions. This chapter describes how these Bayesian multiple-regression analyses can be used for GWAS. In most GWAS, false positives are controlled by limiting the genome-wise error rate, which is the probability of one or more false-positive results, to a small value. As the number of test in GWAS is very large, this results in very low power. Here we show how in Bayesian GWAS false positives can be controlled by limiting the proportion of false-positive results among all positives to some small value. The advantage of this approach is that the power of detecting associations is not inversely related to the number of markers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-62703-447-0_10DOI Listing
December 2013

Genomic BLUP decoded: a look into the black box of genomic prediction.

Genetics 2013 Jul 2;194(3):597-607. Epub 2013 May 2.

Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, Iowa 50011, USA.

Genomic best linear unbiased prediction (BLUP) is a statistical method that uses relationships between individuals calculated from single-nucleotide polymorphisms (SNPs) to capture relationships at quantitative trait loci (QTL). We show that genomic BLUP exploits not only linkage disequilibrium (LD) and additive-genetic relationships, but also cosegregation to capture relationships at QTL. Simulations were used to study the contributions of those types of information to accuracy of genomic estimated breeding values (GEBVs), their persistence over generations without retraining, and their effect on the correlation of GEBVs within families. We show that accuracy of GEBVs based on additive-genetic relationships can decline with increasing training data size and speculate that modeling polygenic effects via pedigree relationships jointly with genomic breeding values using Bayesian methods may prevent that decline. Cosegregation information from half sibs contributes little to accuracy of GEBVs in current dairy cattle breeding schemes but from full sibs it contributes considerably to accuracy within family in corn breeding. Cosegregation information also declines with increasing training data size, and its persistence over generations is lower than that of LD, suggesting the need to model LD and cosegregation explicitly. The correlation between GEBVs within families depends largely on additive-genetic relationship information, which is determined by the effective number of SNPs and training data size. As genomic BLUP cannot capture short-range LD information well, we recommend Bayesian methods with t-distributed priors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.113.152207DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3697966PMC
July 2013

Genomic selection of purebred animals for crossbred performance in the presence of dominant gene action.

Genet Sel Evol 2013 Apr 26;45:11. Epub 2013 Apr 26.

Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, IA, USA.

Background: Genomic selection is an appealing method to select purebreds for crossbred performance. In the case of crossbred records, single nucleotide polymorphism (SNP) effects can be estimated using an additive model or a breed-specific allele model. In most studies, additive gene action is assumed. However, dominance is the likely genetic basis of heterosis. Advantages of incorporating dominance in genomic selection were investigated in a two-way crossbreeding program for a trait with different magnitudes of dominance. Training was carried out only once in the simulation.

Results: When the dominance variance and heterosis were large and overdominance was present, a dominance model including both additive and dominance SNP effects gave substantially greater cumulative response to selection than the additive model. Extra response was the result of an increase in heterosis but at a cost of reduced purebred performance. When the dominance variance and heterosis were realistic but with overdominance, the advantage of the dominance model decreased but was still significant. When overdominance was absent, the dominance model was slightly favored over the additive model, but the difference in response between the models increased as the number of quantitative trait loci increased. This reveals the importance of exploiting dominance even in the absence of overdominance. When there was no dominance, response to selection for the dominance model was as high as for the additive model, indicating robustness of the dominance model. The breed-specific allele model was inferior to the dominance model in all cases and to the additive model except when the dominance variance and heterosis were large and with overdominance. However, the advantage of the dominance model over the breed-specific allele model may decrease as differences in linkage disequilibrium between the breeds increase. Retraining is expected to reduce the advantage of the dominance model over the alternatives, because in general, the advantage becomes important only after five or six generations post-training.

Conclusion: Under dominance and without retraining, genomic selection based on the dominance model is superior to the additive model and the breed-specific allele model to maximize crossbred performance through purebred selection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1297-9686-45-11DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3673865PMC
April 2013

Genome-wide association study of infectious bovine keratoconjunctivitis in Angus cattle.

BMC Genet 2013 Mar 26;14:23. Epub 2013 Mar 26.

Department of Animal Science, Iowa State University, Ames, IA 50011 USA.

Background: Infectious Bovine Keratoconjunctivitis (IBK) in beef cattle, commonly known as pinkeye, is a bacterial disease caused by Moraxellabovis. IBK is characterized by excessive tearing and ulceration of the cornea. Perforation of the cornea may also occur in severe cases. IBK is considered the most important ocular disease in cattle production, due to the decreased growth performance of infected individuals and its subsequent economic effects. IBK is an economically important, lowly heritable categorical disease trait. Mass selection of unaffected animals has not been successful at reducing disease incidence. Genome-wide studies can determine chromosomal regions associated with IBK susceptibility. The objective of the study was to detect single-nucleotide polymorphism (SNP) markers in linkage disequilibrium (LD) with genetic variants associated with IBK in American Angus cattle.

Results: The proportion of phenotypic variance explained by markers was 0.06 in the whole genome analysis of IBK incidence classified as two, three or nine categories. Whole-genome analysis using any categorisation of (two, three or nine) IBK scores showed that locations on chromosomes 2, 12, 13 and 21 were associated with IBK disease. The genomic locations on chromosomes 13 and 21 overlap with QTLs associated with Bovine spongiform encephalopathy, clinical mastitis or somatic cell count.

Conclusions: Results of these genome-wide analyses indicated that if the underlying genetic factors confer not only IBK susceptibility but also IBK severity, treating IBK phenotypes as a two-categorical trait can cause information loss in the genome-wide analysis. These results help our overall understanding of the genetics of IBK and have the potential to provide information for future use in breeding schemes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2156-14-23DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3673868PMC
March 2013

The effect of using genealogy-based haplotypes for genomic prediction.

Genet Sel Evol 2013 Mar 6;45. Epub 2013 Mar 6.

Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele DK-8830, Denmark.

Background: Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information.

Methods: A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method.

Results: About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers.

Conclusions: Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1297-9686-45-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3655921PMC
March 2013
-->