Publications by authors named "Ana I Vazquez"

41 Publications

The Shared Genetic Basis of Hyperuricemia, Gout, and Kidney Function.

Semin Nephrol 2020 Nov;40(6):586-599

Department of Biochemistry, University of Otago, Dunedin, New Zealand; Division of Clinical Immunology and Rheumatology, University of Alabama Birmingham, Birmingham, AL. Electronic address:

Increased urate levels and gout correlate with chronic kidney disease with consensus that the primary driver of this relationship is reduced kidney function. However, a comparison of results of genome-wide association studies in serum urate levels and kidney function indicate a more complex situation. Approximately 20% of loci are shared-comprised of those in which the urate-raising allele associates with reduced kidney function, the vice versa situation, and those in which the signals/alleles are different. Although there is very little known regarding the molecular basis of the shared genetic relationship, it is clear that there is no major role for urate transporters and associated transportasome machinery. Some loci, however, do provide clues. The ATXN2 locus, with a shared signal, is one of only a small number of master regulators of expression by chromatin interaction, regulating expression of genes relevant for cholesterol and blood pressure. This suggests a role for systemic metabolic alteration. At HNF4A there is genetic heterogeneity with different genetic variants conferring risk to hyperuricemia and chronic kidney disease, suggesting different pathways. Interestingly, the shared loci congregate in the olfactory receptor pathway. The genome-wide association studies have generated a range of experimentally testable hypotheses that should provide insights into the shared pathogenesis of hyperuricemia/gout and chronic kidney disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.semnephrol.2020.12.002DOI Listing
November 2020

Genetic correlations between traits associated with hyperuricemia, gout, and comorbidities.

Eur J Hum Genet 2021 Feb 26. Epub 2021 Feb 26.

Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA.

Hypertension, obesity, chronic kidney disease and type 2 diabetes are comorbidities that have very high prevalence among persons with hyperuricemia (serum urate > 6.8 mg/dL) and gout. Here we use multivariate genetic models to test the hypothesis that the co-association of traits representing hyperuricemia and its comorbidities is genetically based. Using Bayesian whole-genome regression models, we estimated the genetic marker-based variance and the covariance between serum urate, serum creatinine, systolic blood pressure (SBP), blood glucose and body mass index (BMI) from two independent family-based studies: The Framingham Heart Study-FHS and the Hypertension Genetic Epidemiology Network study-HyperGEN. The main genetic findings that replicated in both FHS and HyperGEN, were (1) creatinine was genetically correlated only with urate and (2) BMI was genetically correlated with urate, SBP, and glucose. The environmental covariance among the traits was generally highest for trait pairs involving BMI. The genetic overlap of traits representing the comorbidities of hyperuricemia and gout appears to cluster in two separate axes of genetic covariance. Because creatinine is genetically correlated with urate but not with metabolic traits, this suggests there is one genetic module of shared loci associated with hyperuricemia and chronic kidney disease. Another module of shared loci may account for the association of hyperuricemia and metabolic syndrome. This study provides a clear quantitative genetic basis for the clustering of comorbidities with hyperuricemia.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41431-021-00830-zDOI Listing
February 2021

ANOVA-HD: Analysis of variance when both input and output layers are high-dimensional.

PLoS One 2020 14;15(12):e0243251. Epub 2020 Dec 14.

Epidemiology & Biostatistics, Michigan State University, East Lansing, MI, United States of America.

Modern genomic data sets often involve multiple data-layers (e.g., DNA-sequence, gene expression), each of which itself can be high-dimensional. The biological processes underlying these data-layers can lead to intricate multivariate association patterns. We propose and evaluate two methods to determine the proportion of variance of an output data set that can be explained by an input data set when both data panels are high dimensional. Our approach uses random-effects models to estimate the proportion of variance of vectors in the linear span of the output set that can be explained by regression on the input set. We consider a method based on an orthogonal basis (Eigen-ANOVA) and one that uses random vectors (Monte Carlo ANOVA, MC-ANOVA) in the linear span of the output set. Using simulations, we show that the MC-ANOVA method gave nearly unbiased estimates. Estimates produced by Eigen-ANOVA were also nearly unbiased, except when the shared variance was very high (e.g., >0.9). We demonstrate the potential insight that can be obtained from the use of MC-ANOVA and Eigen-ANOVA by applying these two methods to the study of multi-locus linkage disequilibrium in chicken (Gallus gallus) genomes and to the assessment of inter-dependencies between gene expression, methylation, and copy-number-variants in data from breast cancer tumors from humans (Homo sapiens). Our analyses reveal that in chicken breeding populations ~50,000 evenly-spaced SNPs are enough to fully capture the span of whole-genome-sequencing genomes. In the study of multi-omic breast cancer data, we found that the span of copy-number-variants can be fully explained using either methylation or gene expression data and that roughly 74% of the variance in gene expression can be predicted from methylation data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0243251PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7735570PMC
January 2021

DNA Methylation and Gene Expression with Clinical Covariates Explain Variation in Aggressiveness and Survival of Pancreatic Cancer Patients.

Cancer Invest 2020 Sep 16;38(8-9):502-506. Epub 2020 Sep 16.

Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, USA.

Pancreatic cancer (PC) is associated with a high mortality rate. We explored the interindividual variation of cancer outcomes, attributable to DNA methylation, gene expression, and clinical factors among PC patients. We aim to determine whether we could differentiate subjects with greater nodal involvement, higher cancer staging, and subsequent survival. We modeled every response variable as a function of a linear predictor involving the effects of clinical variables, methylation, and gene expression in a Bayesian framework. Our results highlight the overall importance of wide-spread alterations in methylation and gene expression patterns associated with survival, nodal metastasis, and staging.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/07357907.2020.1812079DOI Listing
September 2020

Xylem systems genetics analysis reveals a key regulator of lignin biosynthesis in .

Genome Res 2020 08 19;30(8):1131-1143. Epub 2020 Aug 19.

School of Forest Resources and Conservation, University of Florida, Gainesville, Florida 32611, USA.

Despite the growing resources and tools for high-throughput characterization and analysis of genomic information, the discovery of the genetic elements that regulate complex traits remains a challenge. Systems genetics is an emerging field that aims to understand the flow of biological information that underlies complex traits from genotype to phenotype. In this study, we used a systems genetics approach to identify and evaluate regulators of the lignin biosynthesis pathway in by combining genome, transcriptome, and phenotype data from a population of 268 unrelated individuals of The discovery of lignin regulators began with the quantitative genetic analysis of the xylem transcriptome and resulted in the detection of 6706 and 4628 significant local- and distant-eQTL associations, respectively. Among the locally regulated genes, we identified the R2R3-MYB transcription factor () as a putative -regulator of the majority of genes in the lignin biosynthesis pathway. The expression of in a diverse population positively correlated with lignin content. Furthermore, overexpression of in transgenic poplar resulted in increased lignin content, as well as altered expression of genes in the lignin biosynthesis pathway. Altogether, our findings indicate that is involved in the control of a transcriptional coexpression network of lignin biosynthesis genes during secondary cell wall formation in .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.261438.120DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7462072PMC
August 2020

Multi-omic signatures identify pan-cancer classes of tumors beyond tissue of origin.

Sci Rep 2020 05 20;10(1):8341. Epub 2020 May 20.

Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA.

Despite recent advances in treatment, cancer continues to be one of the most lethal human maladies. One of the challenges of cancer treatment is the diversity among similar tumors that exhibit different clinical outcomes. Most of this variability comes from wide-spread molecular alterations that can be summarized by omic integration. Here, we have identified eight novel tumor groups (C1-8) via omic integration, characterized by unique cancer signatures and clinical characteristics. C3 had the best clinical outcomes, while C2 and C5 had poorest. C1, C7, and C8 were upregulated for cellular and mitochondrial translation, and relatively low proliferation. C6 and C4 were also downregulated for cellular and mitochondrial translation, and had high proliferation rates. C4 was represented by copy losses on chromosome 6, and had the highest number of metastatic samples. C8 was characterized by copy losses on chromosome 11, having also the lowest lymphocytic infiltration rate. C6 had the lowest natural killer infiltration rate and was represented by copy gains of genes in chromosome 11. C7 was represented by copy gains on chromosome 6, and had the highest upregulation in mitochondrial translation. We believe that, since molecularly alike tumors could respond similarly to treatment, our results could inform therapeutic action.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-020-65119-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7239905PMC
May 2020

Deciphering Sex-Specific Genetic Architectures Using Local Bayesian Regressions.

Genetics 2020 05 20;215(1):231-241. Epub 2020 Mar 20.

Departments of Epidemiology and Biostatistics and Statistics and Probability, Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, Michigan, 48824

Many complex human traits exhibit differences between sexes. While numerous factors likely contribute to this phenomenon, growing evidence from genome-wide studies suggest a partial explanation: that males and females from the same population possess differing genetic architectures. Despite this, mapping gene-by-sex (G×S) interactions remains a challenge likely because the magnitude of such an interaction is typically and exceedingly small; traditional genome-wide association techniques may be underpowered to detect such events, due partly to the burden of multiple test correction. Here, we developed a local Bayesian regression (LBR) method to estimate sex-specific SNP marker effects after fully accounting for local linkage-disequilibrium (LD) patterns. This enabled us to infer sex-specific effects and G×S interactions either at the single SNP level, or by aggregating the effects of multiple SNPs to make inferences at the level of small LD-based regions. Using simulations in which there was imperfect LD between SNPs and causal variants, we showed that aggregating sex-specific marker effects with LBR provides improved power and resolution to detect G×S interactions over traditional single-SNP-based tests. When using LBR to analyze traits from the UK Biobank, we detected a relatively large G×S interaction impacting bone mineral density within , and replicated many previously detected large-magnitude G×S interactions impacting waist-to-hip ratio. We also discovered many new G×S interactions impacting such traits as height and body mass index (BMI) within regions of the genome where both male- and female-specific effects explain a small proportion of phenotypic variance (R < 1 × 10), but are enriched in known expression quantitative trait loci.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.120.303120DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7198271PMC
May 2020

Cysteine catabolism and the serine biosynthesis pathway support pyruvate production during pyruvate kinase knockdown in pancreatic cancer cells.

Cancer Metab 2019 30;7:13. Epub 2019 Dec 30.

1Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI USA.

Background: Pancreatic ductal adenocarcinoma (PDAC) is an aggressive cancer with limited treatment options. Pyruvate kinase, especially the M2 isoform (PKM2), is highly expressed in PDAC cells, but its role in pancreatic cancer remains controversial. To investigate the role of pyruvate kinase in pancreatic cancer, we knocked down PKM2 individually as well as both PKM1 and PKM2 concurrently (PKM1/2) in cell lines derived from a pancreatic mouse model.

Methods: We used liquid chromatography tandem mass spectrometry (LC-MS/MS) to determine metabolic profiles of wildtype and PKM1/2 knockdown PDAC cells. We further used stable isotope-labeled metabolic precursors and LC-MS/MS to determine metabolic pathways upregulated in PKM1/2 knockdown cells. We then targeted metabolic pathways upregulated in PKM1/2 knockdown cells using CRISPR/Cas9 gene editing technology.

Results: PDAC cells are able to proliferate and continue to produce pyruvate despite PKM1/2 knockdown. The serine biosynthesis pathway partially contributed to pyruvate production during PKM1/2 knockdown: knockout of phosphoglycerate dehydrogenase in this pathway decreased pyruvate production from glucose. In addition, cysteine catabolism generated ~ 20% of intracellular pyruvate in PDAC cells. Other potential sources of pyruvate include the sialic acid pathway and catabolism of glutamine, serine, tryptophan, and threonine. However, these sources did not provide significant levels of pyruvate in PKM1/2 knockdown cells.

Conclusion: PKM1/2 knockdown does not impact the proliferation of pancreatic cancer cells. The serine biosynthesis pathway supports conversion of glucose to pyruvate during pyruvate kinase knockdown. However, direct conversion of serine to pyruvate was not observed during PKM1/2 knockdown. Investigating several alternative sources of pyruvate identified cysteine catabolism for pyruvate production during PKM1/2 knockdown. Surprisingly, we find that a large percentage of intracellular pyruvate comes from cysteine. Our results highlight the ability of PDAC cells to adaptively rewire their metabolic pathways during knockdown of a key metabolic enzyme.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s40170-019-0205-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6937848PMC
December 2019

Gain of function in somatic TP53 mutations is associated with immune-rich breast tumors and changes in tumor-associated macrophages.

Mol Genet Genomic Med 2019 12 22;7(12):e1001. Epub 2019 Oct 22.

Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA.

Background: Somatic mutations in TP53 are present in 20%-30% of all breast tumors. While there are numerous population-based analyses of TP53, yet none have examined the relationship between somatic mutations in TP53 and tumor invasive immune cells.

Methods: Clinical and genetic data from 601 women drawn from The Cancer Genome Atlas (TCGA) were used to test the association between somatic TP53 mutation and immune-rich or immune-poor tumor status; determined using the CIBERSORT-based gene expression signature of 22 immune cell types. Our validation dataset, the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), used a pathologist-determined measure of lymphocyte infiltration.

Results: Within TP53-mutated samples, a mutation at codon p.R175H was shown to be present at higher frequency in immune-rich tumors. In validation analysis, any somatic mutation in TP53 was associated with immune-rich status, and the mutation at p.R175H had a significant association with tumor-invasive lymphocytes. TCGA-only analysis of invasive immune cell type identified an increase in M0 macrophages associated with p.R175H.

Conclusions: These findings suggest that TP53 somatic mutations, particularly at codon p.R175H, are enriched in tumors with infiltrating immune cells. Our results confirm recent research showing inflammation-related gain of function in specific TP53 mutations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/mgg3.1001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6900370PMC
December 2019

Modeling Heterogeneity in the Genetic Architecture of Ethnically Diverse Groups Using Random Effect Interaction Models.

Genetics 2019 04 22;211(4):1395-1407. Epub 2019 Feb 22.

Department of Molecular Epidemiology, Helmholtz Zentrum München, Germany 85764.

In humans, most genome-wide association studies have been conducted using data from Caucasians and many of the reported findings have not replicated in other populations. This lack of replication may be due to statistical issues (small sample sizes or confounding) or perhaps more fundamentally to differences in the genetic architecture of traits between ethnically diverse subpopulations. What aspects of the genetic architecture of traits vary between subpopulations and how can this be quantified? We consider studying effect heterogeneity using Bayesian random effect interaction models. The proposed methodology can be applied using shrinkage and variable selection methods, and produces useful information about effect heterogeneity in the form of whole-genome summaries (, the proportions of variance of a complex trait explained by a set of SNPs and the average correlation of effects) as well as SNP-specific attributes. Using simulations, we show that the proposed methodology yields (nearly) unbiased estimates when the sample size is not too small relative to the number of SNPs used. Subsequently, we used the methodology for the analyses of four complex human traits (standing height, high-density lipoprotein, low-density lipoprotein, and serum urate levels) in European-Americans (EAs) and African-Americans (AAs). The estimated correlations of effects between the two subpopulations were well below unity for all the traits, ranging from 0.73 to 0.50. The extent of effect heterogeneity varied between traits and SNP sets. Height showed less differences in SNP effects between AAs and EAs whereas HDL, a trait highly influenced by lifestyle, exhibited a greater extent of effect heterogeneity. For all the traits, we observed substantial variability in effect heterogeneity across SNPs, suggesting that effect heterogeneity varies between regions of the genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.119.301909DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456318PMC
April 2019

The influence of 15-week exercise training on dietary patterns among young adults.

Int J Obes (Lond) 2019 09 18;43(9):1681-1690. Epub 2019 Jan 18.

Department of Nutritional Sciences, The University of Texas at Austin, Austin, TX, USA.

Background/objectives: Little is currently known about how exercise may influence dietary patterns and/or food preferences. The present study aimed to examine the effect of a 15-week exercise training program on overall dietary patterns among young adults.

Subjects/methods: This study consisted of 2680 young adults drawn from the Training Intervention and Genetics of Exercise Response (TIGER) study. Subjects underwent 15 weeks of aerobic exercise training, and exercise duration, intensity, and dose were recorded for each session using computerized heart rate monitors. In total, 4355 dietary observations with 102 food items were collected using a self-administered food frequency questionnaire before and after exercise training (n = 2476 at baseline; n = 1859 at 15 weeks). Dietary patterns were identified using a Bayesian sparse latent factor model. Changes in dietary pattern preferences were evaluated based on the pre/post-training differences in dietary pattern scores, accounting for the effects of gender, race/ethnicity, and BMI.

Results: Within each of the seven dietary patterns identified, most dietary pattern scores were decreased following exercise training, consistent with increased voluntary regulation of food intake. A longer duration of exercise was associated with decreased preferences for the western (β: -0.0793; 95% credible interval: -0.1568, -0.0017) and snacking (β: -0.1280; 95% credible interval: -0.1877, -0.0637) patterns, while a higher intensity of exercise was linked to an increased preference for the prudent pattern (β: 0.0623; 95% credible interval: 0.0159, 0.1111). Consequently, a higher dose of exercise was related to a decreased preference for the snacking pattern (β: -0.0023; 95% credible interval: -0.0042, -0.0004) and an increased preference for the prudent pattern (β: 0.0029; 95% credible interval: 0.0009, 0.0048).

Conclusions: The 15-week exercise training appeared to motivate young adults to pursue healthier dietary preferences and to regulate their food intake.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41366-018-0299-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6639161PMC
September 2019

Integrated landscape of copy number variation and RNA expression associated with nodal metastasis in invasive ductal breast carcinoma.

Oncotarget 2018 Dec 7;9(96):36836-36848. Epub 2018 Dec 7.

Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA.

Background: Lymph node metastasis (NM) in breast cancer is a clinical predictor of patient outcomes, but how its genetic underpinnings contribute to aggressive phenotypes is unclear. Our objective was to create the first landscape analysis of CNV-associated NM in ductal breast cancer. To assess the role of copy number variations (CNVs) in NM, we compared CNVs and/or associated mRNA expression in primary tumors of patients with NM to those without metastasis.

Results: We found CNV loss in chromosomes 1, 3, 9, 18, and 19 and gains in chromosomes 5, 8, 12, 14, 16-17, and 20 that were associated with NM and replicated in both databases. In primary tumors, per-gene CNVs associated with NM were ten times more frequent than mRNA expression; however, there were few CNV-driven changes in mRNA expression that differed by nodal status. Overlapping regions of CNV changes and mRNA expression were evident for the gene. In 8q12, 11q13-14, 20q1, and 17q14-24 regions, there were gene-specific gains in CNV-driven mRNA expression associated with NM.

Methods: Data on CNV and mRNA expression from the TCGA and the METABRIC consortium of breast ductal carcinoma were utilized to identify CNV-based features associated with NM. Within each dataset, associations were compared across omic platforms to identify CNV-driven variations in gene expression. Only replications across both datasets were considered as determinants of NM.

Conclusions: Gains in , , , and genes and their expression may aid in early diagnosis of metastatic breast carcinoma and have potential as therapeutic targets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.18632/oncotarget.26386DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6305147PMC
December 2018

Advanced Dietary Patterns Analysis Using Sparse Latent Factor Models in Young Adults.

J Nutr 2018 12;148(12):1984-1992

Departments of Nutritional Sciences.

Background: Principal components analysis (PCA) has been the most widely used method for deriving dietary patterns to date. However, PCA requires arbitrary ad hoc decisions for selecting food variables in interpreting dietary patterns and does not easily accommodate covariates. Sparse latent factor models can be utilized to address these issues.

Objective: The objective of this study was to compare Bayesian sparse latent factor models with PCA for identifying dietary patterns among young adults.

Methods: Habitual food intake was estimated in 2730 sedentary young adults from the Training Interventions and Genetics of Exercise Response (TIGER) Study [aged 18-35 y; body mass index (BMI; in kg/m2): 26.5 ± 6.1] who exercised <30 min/wk during the previous 30 d without restricting caloric intake before study enrollment. A food-frequency questionnaire was used to generate the frequency intakes of 102 food items. Sparse latent factor modeling was applied to the standardized food intakes to derive dietary patterns, incorporating additional covariates (sex, race/ethnicity, and BMI). The identified dietary patterns via sparse latent factor modeling were compared with the PCA derived dietary patterns.

Results: Seven dietary patterns were identified in both PCA and sparse latent factor analysis. In contrast to PCA, the sparse latent factor analysis allowed the covariate information to be jointly accounted for in the estimation of dietary patterns in the model and offered probabilistic criteria to determine the foods relevant to each dietary pattern. The derived patterns from both methods generally described common dietary behaviors. Dietary patterns 1-4 had similar food subsets using both statistical approaches, but PCA had smaller sets of foods with more cross-loading elements between the 2 factors. Overall, the sparse latent factor analysis produced more interpretable dietary patterns, with fewer of the food items excluded from all patterns.

Conclusion: Sparse latent factor models can be useful in future studies of dietary patterns by reducing the intrinsic arbitrariness involving the choice of food variables in interpreting dietary patterns and incorporating covariates in the assessment of dietary patterns.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jn/nxy188DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6280002PMC
December 2018

Whole-Genome Multi-omic Study of Survival in Patients with Glioblastoma Multiforme.

G3 (Bethesda) 2018 11 6;8(11):3627-3636. Epub 2018 Nov 6.

Department of Epidemiology and Biostatistics

Glioblastoma multiforme (GBM) has been recognized as the most lethal type of malignant brain tumor. Despite efforts of the medical and research community, patients' survival remains extremely low. Multi-omic profiles (including DNA sequence, methylation and gene expression) provide rich information about the tumor. These profiles are likely to reveal processes that may be predictive of patient survival. However, the integration of multi-omic profiles, which are high dimensional and heterogeneous in nature, poses great challenges. The goal of this work was to develop models for prediction of survival of GBM patients that can integrate clinical information and multi-omic profiles, using multi-layered Bayesian regressions. We apply the methodology to data from GBM patients from The Cancer Genome Atlas (TCGA, n = 501) to evaluate whether integrating multi-omic profiles (SNP-genotypes, methylation, copy number variants and gene expression) with clinical information (demographics as well as treatments) leads to an improved ability to predict patient survival. The proposed Bayesian models were used to estimate the proportion of variance explained by clinical covariates and omics and to evaluate prediction accuracy in cross validation (using the area under the Receiver Operating Characteristic curve, AUC). Among clinical and demographic covariates, age (AUC = 0.664) and the use of temozolomide (AUC = 0.606) were the most predictive of survival. Among omics, methylation (AUC = 0.623) and gene expression (AUC = 0.593) were more predictive than either SNP (AUC = 0.539) or CNV (AUC = 0.547). While there was a clear association between age and methylation, the integration of age, the use of temozolomide, and either gene expression or methylation led to a substantial increase in AUC in cross-validaton (AUC = 0.718). Finally, among the genes whose methylation was higher in aging brains, we observed a higher enrichment of these genes being also differentially methylated in cancer.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/g3.118.200391DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6222579PMC
November 2018

Accurate Genomic Prediction of Human Height.

Genetics 2018 10 27;210(2):477-497. Epub 2018 Aug 27.

Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan 48824

We construct genomic predictors for heritable but extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (, machine learning). The constructed predictors explain, respectively, ∼40, 20, and 9% of total variance for the three traits, in data not used for training. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few centimeters of the prediction. The proportion of variance explained for height is comparable to the estimated common SNP heritability from genome-wide complex trait analysis (GCTA), and seems to be close to its asymptotic value (, as sample size goes to infinity), suggesting that we have captured most of the heritability for SNPs. Thus, our results close the gap between prediction R-squared and common SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common variants. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier genome-wide association studies (GWAS) for out-of-sample validation of our results.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.118.301267DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6216598PMC
October 2018

Changes in milk characteristics and fatty acid profile during the estrous cycle in dairy cows.

J Dairy Sci 2018 Oct 25;101(10):9135-9153. Epub 2018 Jul 25.

Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Viale dell'Università 16, 35020, Legnaro PD, Italy.

The relationship of the estrous cycle to milk composition and milk physical properties was assessed on Holstein (n = 10,696), Brown Swiss (n = 20,501), Simmental (n = 17,837), and Alpine Grey (n = 8,595) cows reared in northeastern Italy. The first insemination after calving for each cow was chosen to be the day of estrus and insemination. Test days surrounding the insemination date (from 10 d before to 10 d after the day of the estrus) were selected and categorized in phases relative to estrus as diestrus high-progesterone, proestrus, estrus, metestrus, and diestrus increasing-progesterone phases. Milk components and physical properties were predicted on the basis of Fourier-transform infrared spectra of milk samples and were analyzed using a linear mixed model, which included the random effects of herd, the fixed classification effects of year-month, parity number, breed, estrous cycle phase, day nested within the estrous cycle phase, conception, partial regressions on linear and quadratic effects of days in milk nested within parity number, as well as the interactions between conception outcome with estrous cycle phase and breed with estrous cycle phase. Milk composition, particularly fat, protein, and lactose, showed clear differences among the estrous cycle phases. Fat increased by 0.14% from diestrus high-progesterone to estrous phase, whereas protein concomitantly decreased by 0.03%. Lactose appeared to remain relatively constant over diestrus high-progesterone, rising 1 d before the day of estrus followed by a gradual reduction over the subsequent phases. Specific fatty acids were also affected across the estrous cycle phases: C14:0 and C16:0 decreased (-0.34 and -0.48%) from proestrus to estrus with a concomitant increase in C18:0 and C18:1 cis-9 (0.40 and 0.73%). More general categories of fatty acids showed a similar behavior; that is, unsaturated fatty acids, monounsaturated fatty acids, polyunsaturated fatty acids, trans fatty acids, and long-chain fatty acids increased, whereas the saturated fatty acids, medium-chain fatty acids, and short-chain fatty acids decreased during the estrous phase. Finally, urea, somatic cell score, freezing point, pH, and homogenization index were also affected indicating variation associated with the hormonal and behavioral changes of cows in standing estrus. Hence, the variation in milk profiles of cows showing estrus should potentially be taken into account for precision dairy farming management.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3168/jds.2018-14480DOI Listing
October 2018

Untangling the complex relationships between incident gout risk, serum urate, and its comorbidities.

Arthritis Res Ther 2018 05 3;20(1):90. Epub 2018 May 3.

Department of Epidemiology and Biostatistics, Michigan State University, 220 Trowbridge Rd, East Lansing, MI 48824, USA.

Background: Many gout comorbidities (e.g., hypertension) are correlated with serum urate. In this investigation, we identified risk factors (e.g., systolic blood pressure [SBP]), that (1) are associated with incident gout, (2) have effects on gout risk that cannot be fully explained by correlated differences in serum urate, and (3) may modulate the relationship between gout and serum urate.

Methods: Using data from the Atherosclerosis Risk in Communities (ARIC) study, we estimated the unadjusted associations between gout and risk factors by calculating ORs and using chi-square tests. The adjusted associations were analyzed using logistic regression by sequentially adding (1) one risk factor at a time or (2) all risk factors, to a baseline model that includes serum urate only. Stepwise selection was used to select main effects. Two-way interactions of variables from the main effects model were also analyzed.

Results: Average gout incidence was 2.7 per 1000 people per year. Serum urate was highly associated with incident gout, with odd ratios of 3.16 [95% CI 2.11, 4.76] and 25.9 [95% CI 17.2, 38.4] for moderately high (6-8 mg/dl) and high serum urate (> 8 mg/dl), relative to normal serum urate (< 6 mg/dl), respectively. Ethnicity and SBP were independently and additively associated with gout after accounting for serum urate levels. No significant interactions were found between serum urate and ethnicity or SBP.

Conclusions: Ethnicity and hypertension are predictive of gout risk, and the associations cannot be fully explained by serum urate. For serum urate levels near the crystallization threshold (6-8 mg/dl) African Americans and people with hypertension are at two to three times greater risk for developing gout. The gout risk for this group appears to increase before the onset of severe hyperuricemia.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13075-018-1558-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5932762PMC
May 2018

Diagnosing pregnancy status using infrared spectra and milk composition in dairy cows.

J Dairy Sci 2018 Mar 28;101(3):2496-2505. Epub 2017 Dec 28.

Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro PD, Italy. Electronic address:

Data on Holstein (16,890), Brown Swiss (31,441), Simmental (25,845), and Alpine Grey (12,535) cows reared in northeastern Italy were used to assess the ability of milk components (fat, protein, casein, and lactose) and Fourier transform infrared (FTIR) spectral data to diagnose pregnancy. Pregnancy status was defined as whether a pregnancy was confirmed by a subsequent calving and no other subsequent inseminations within 90 d of the breeding of specific interest. Milk samples were analyzed for components and FTIR full-spectrum data using a MilkoScan FT+ 6000 (Foss Electric, Hillerød, Denmark). The spectrum covered 1,060 wavenumbers (wn) from 5,010 to 925 cm. Pregnancy status was predicted using generalized linear models with fat, protein, lactose, casein, and individual FTIR spectral bands or wavelengths as predictors. We also fitted a generalized linear model as a simultaneous function of all wavelengths (1,060 wn) with a Bayesian variable selection model using the BGLR R-package (https://r-forge.r-project.org/projects/bglr/). Prediction accuracy was determined using the area under a receiver operating characteristic curve based on a 10-fold cross-validation (CV-AUC) assessment based on sensitivities and specificities of phenotypic predictions. Overall, the best prediction accuracies were obtained for the model that included the complete FTIR spectral data. We observed similar patterns across breeds with small differences in prediction accuracy. The highest CV-AUC value was obtained for Alpine Grey cows (CV-AUC = 0.645), whereas Brown Swiss and Simmental cows had similar performance (CV-AUC = 0.630 and 0.628, respectively), followed by Holsteins (CV-AUC = 0.607). For single-wavelength analyses, important peaks were detected at wn 2,973 to 2,872 cm where Fat-B (C-H stretch) is usually filtered, wn 1,773 cm where Fat-A (C=O stretch) is filtered, wn 1,546 cm where protein is filtered, wn 1,468 cm associated with urea and fat, wn 1,399 and 1,245 cm associated with acetone, and wn 1,025 to 1,013 cm where lactose is filtered. In conclusion, this research provides new insight into alternative strategies for pregnancy screening of dairy cows.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3168/jds.2017-13647DOI Listing
March 2018

Will Big Data Close the Missing Heritability Gap?

Genetics 2017 11 11;207(3):1135-1145. Epub 2017 Sep 11.

Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan 48824

Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity (, number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing ( = 22,221) of 0.24 (95% C.I.: 0.23-0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.117.300271DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5676235PMC
November 2017

Obesity, adipokines, and C-peptide are associated with distinct plasma phospholipid profiles in adult males, an untargeted lipidomic approach.

Sci Rep 2017 07 24;7(1):6335. Epub 2017 Jul 24.

Department of Food Science and Human Nutrition, Michigan State University, 469 Wilson Road, East Lansing, MI 48824, USA.

Obesity is associated with dysregulated lipid metabolism and adipokine secretion. Our group has previously reported obesity and adipokines are associated with % total fatty acid (FA) differences in plasma phospholipids. The objective of our current study was to identify in which complex lipid species (i.e., phosphatidylcholine, sphingolipids, etc) these FA differences occur. Plasma lipidomic profiling (n = 126, >95% Caucasian, 48-65 years) was performed using chromatographic separation and high resolution tandem mass spectrometry. The responses used in the statistical analyses were body mass index (BMI), waist circumference (WC), serum adipokines, cytokines, and a glycemic marker. High-dimensional statistical analyses were performed, all models were adjusted for age and smoking, and p-values were adjusted for false discovery. In Bayesian models, the lipidomic profiles (over 1,700 lipids) accounted for >60% of the inter-individual variation of BMI, WC, and leptin in our population. Across statistical analyses, we report 51 individual plasma lipids were significantly associated with obesity. Obesity was inversely associated lysophospholipids and ether linked phosphatidylcholines. In addition, we identify several unreported lipids associated with obesity that are not present in lipid databases. Taken together, these results provide new insights into the underlying biology associated with obesity and reveal new potential pathways for therapeutic targeting.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-017-05785-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5524758PMC
July 2017

Prediction of years of life after diagnosis of breast cancer using omics and omic-by-treatment interactions.

Eur J Hum Genet 2017 05 8;25(5):538-544. Epub 2017 Mar 8.

QuantGen Group, Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA.

Breast cancer (BC) is the second most common type of cancer and a major cause of death for women. Commonly, BC patients are assigned to risk groups based on the combination of prognostic and prediction factors (eg, patient age, tumor size, tumor grade, hormone receptor status, etc). Although this approach is able to identify risk groups with different prognosis, patients are highly heterogeneous in their response to treatments. To improve the prediction of BC patients, we extended clinical models (including prognostic and prediction factors with whole-omic data) to integrate omics profiles for gene expression and copy number variants (CNVs). We describe a modeling framework that is able to incorporate clinical risk factors, high-dimensional omics profiles, and interactions between omics and non-omic factors (eg, treatment). We used the proposed modeling framework and data from METABRIC (Molecular Taxonomy of Breast Cancer Consortium) to assess the impact on the accuracy of BC patient survival predictions when omics and omic-by-treatment interactions are being considered. Our analysis shows that omics and omic-by-treatment interactions explain a sizable fraction of the variance on survival time that is not explained by commonly used clinical covariates. The sizable interaction effects observed, together with the increase in prediction accuracy, suggest that whole-omic profiles could be used to improve prognosis prediction among BC patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ejhg.2017.12DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5437894PMC
May 2017

Genome-wide association study reveals putative regulators of bioenergy traits in Populus deltoides.

New Phytol 2017 Jan 6;213(2):799-811. Epub 2016 Sep 6.

School of Forest Resources and Conservation, University of Florida, PO Box 110410, Gainesville, FL, 32611, USA.

Genome-wide association studies (GWAS) have been used extensively to dissect the genetic regulation of complex traits in plants. These studies have focused largely on the analysis of common genetic variants despite the abundance of rare polymorphisms in several species, and their potential role in trait variation. Here, we conducted the first GWAS in Populus deltoides, a genetically diverse keystone forest species in North America and an important short rotation woody crop for the bioenergy industry. We searched for associations between eight growth and wood composition traits, and common and low-frequency single-nucleotide polymorphisms detected by targeted resequencing of 18 153 genes in a population of 391 unrelated individuals. To increase power to detect associations with low-frequency variants, multiple-marker association tests were used in combination with single-marker association tests. Significant associations were discovered for all phenotypes and are indicative that low-frequency polymorphisms contribute to phenotypic variance of several bioenergy traits. Our results suggest that both common and low-frequency variants need to be considered for a comprehensive understanding of the genetic regulation of complex traits, particularly in species that carry large numbers of rare polymorphisms. These polymorphisms may be critical for the development of specialized plant feedstocks for bioenergy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/nph.14154DOI Listing
January 2017

Increased Proportion of Variance Explained and Prediction Accuracy of Survival of Breast Cancer Patients with Use of Whole-Genome Multiomic Profiles.

Genetics 2016 07 29;203(3):1425-38. Epub 2016 Apr 29.

Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan 48824 Statistics Department, Michigan State University, East Lansing, Michigan 48824.

Whole-genome multiomic profiles hold valuable information for the analysis and prediction of disease risk and progression. However, integrating high-dimensional multilayer omic data into risk-assessment models is statistically and computationally challenging. We describe a statistical framework, the Bayesian generalized additive model ((BGAM), and present software for integrating multilayer high-dimensional inputs into risk-assessment models. We used BGAM and data from The Cancer Genome Atlas for the analysis and prediction of survival after diagnosis of breast cancer. We developed a sequence of studies to (1) compare predictions based on single omics with those based on clinical covariates commonly used for the assessment of breast cancer patients (COV), (2) evaluate the benefits of combining COV and omics, (3) compare models based on (a) COV and gene expression profiles from oncogenes with (b) COV and whole-genome gene expression (WGGE) profiles, and (4) evaluate the impacts of combining multiple omics and their interactions. We report that (1) WGGE profiles and whole-genome methylation (METH) profiles offer more predictive power than any of the COV commonly used in clinical practice (e.g., subtype and stage), (2) adding WGGE or METH profiles to COV increases prediction accuracy, (3) the predictive power of WGGE profiles is considerably higher than that based on expression from large-effect oncogenes, and (4) the gain in prediction accuracy when combining multiple omics is consistent. Our results show the feasibility of omic integration and highlight the importance of WGGE and METH profiles in breast cancer, achieving gains of up to 7 points area under the curve (AUC) over the COV in some cases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.115.185181DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4937492PMC
July 2016

Incorporating Genetic Heterogeneity in Whole-Genome Regressions Using Interactions.

J Agric Biol Environ Stat 2015;20(4):467-490. Epub 2015 Nov 9.

Colegio de Postgraduados, Km. 36.5, Carretera Mexico, Montecillo, 56230 Texcoco, Estado de México Mexico.

Naturally and artificially selected populations usually exhibit some degree of stratification. In Genome-Wide Association Studies and in Whole-Genome Regressions (WGR) analyses, population stratification has been either ignored or dealt with as a potential confounder. However, systematic differences in allele frequency and in patterns of linkage disequilibrium can induce sub-population-specific effects. From this perspective, structure acts as an effect modifier rather than as a confounder. In this article, we extend WGR models commonly used in plant and animal breeding to allow for sub-population-specific effects. This is achieved by decomposing marker effects into main effects and interaction components that describe group-specific deviations. The model can be used both with variable selection and shrinkage methods and can be implemented using existing software for genomic selection. Using a wheat and a pig breeding data set, we compare parameter estimates and the prediction accuracy of the interaction WGR model with WGR analysis ignoring population stratification (across-group analysis) and with a stratified (i.e., within-sub-population) WGR analysis. The interaction model renders trait-specific estimates of the average correlation of effects between sub-populations; we find that such correlation not only depends on the extent of genetic differentiation in allele frequencies between groups but also varies among traits. The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches. This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model.

Electronic Supplementary Material: Supplementary materials for this article are available at 10.1007/s13253-015-0222-5.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s13253-015-0222-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666286PMC
November 2015

Serum urate gene associations with incident gout, measured in the Framingham Heart Study, are modified by renal disease and not by body mass index.

Rheumatol Int 2016 Feb 1;36(2):263-70. Epub 2015 Oct 1.

Division of Clinical Immunology and Rheumatology, Department of Medicine, University of Alabama at Birmingham, Faculty Office Tower 805B, 510 20th Street S, Birmingham, AL, 35294, USA.

We hypothesized that serum urate-associated SNPs, individually or collectively, interact with BMI and renal disease to contribute to risk of incident gout. We measured the incidence of gout and associated comorbidities using the original and offspring cohorts of the Framingham Heart Study. We used direct and imputed genotypes for eight validated serum urate loci. We fit binomial regression models of gout incidence as a function of the covariates, age, type 2 diabetes, sex, and all main and interaction effects of the eight serum urate SNPs with BMI and renal disease. Models were also fit with a genetic risk score for serum urate levels which corresponds to the sum of risk alleles at the eight SNPs. Model covariates, age (P = 5.95E-06), sex (P = 2.46E-39), diabetes (P = 2.34E-07), BMI (P = 1.14E-11) and the SNPs, rs1967017 (P = 9.54E-03), rs13129697 (P = 4.34E-07), rs2199936 (P = 7.28E-03) and rs675209 (P = 4.84E-02) were all associated with incident gout. No BMI by SNP or BMI by serum urate genetic risk score interactions were statistically significant, but renal disease by rs1106766 was statistically significant (P = 6.12E-03). We demonstrated that minor alleles of rs1106766 (intergenic, INHBC) were negatively associated with the risk of incident gout in subjects without renal disease, but not for individuals with renal disease. These analyses demonstrate that a significant component of the risk of gout may involve complex interplay between genes and environment.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00296-015-3364-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4724568PMC
February 2016

Assessment of whole-genome regression for type II diabetes.

PLoS One 2015 17;10(4):e0123818. Epub 2015 Apr 17.

Colegio de Postgraduados, Montecillo, Edo. de Mexico, Mexico.

Lifestyle and genetic factors play a large role in the development of Type 2 Diabetes (T2D). Despite the important role of genetic factors, genetic information is not incorporated into the clinical assessment of T2D risk. We assessed and compared Whole Genome Regression methods to predict the T2D status of 5,245 subjects from the Framingham Heart Study. For evaluating each method we constructed the following set of regression models: A clinical baseline model (CBM) which included non-genetic covariates only. CBM was extended by adding the first two marker-derived principal components and 65 SNPs identified by a recent GWAS consortium for T2D (M-65SNPs). Subsequently, it was further extended by adding 249,798 genome-wide SNPs from a high-density array. The Bayesian models used to incorporate genome-wide marker information as predictors were: Bayes A, Bayes Cπ, Bayesian LASSO (BL), and the Genomic Best Linear Unbiased Prediction (G-BLUP). Results included estimates of the genetic variance and heritability, genetic scores for T2D, and predictive ability evaluated in a 10-fold cross-validation. The predictive AUC estimates for CBM and M-65SNPs were: 0.668 and 0.684, respectively. We found evidence of contribution of genetic effects in T2D, as reflected in the genomic heritability estimates (0.492±0.066). The highest predictive AUC among the genome-wide marker Bayesian models was 0.681 for the Bayesian LASSO. Overall, the improvement in predictive ability was moderate and did not differ greatly among models that included genetic information. Approximately 58% of the total number of genetic variants was found to contribute to the overall genetic variation, indicating a complex genetic architecture for T2D. Our results suggest that the Bayes Cπ and the G-BLUP models with a large set of genome-wide markers could be used for predicting risk to T2D, as an alternative to using high-density arrays when selected markers from large consortiums for a given complex trait or disease are unavailable.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0123818PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4401705PMC
January 2016

Integrated genomic and BMI analysis for type 2 diabetes risk assessment.

Front Genet 2015 17;6:75. Epub 2015 Mar 17.

Department of Biostatistics, School of Public Health, University of Alabama at Birmingham Birmingham, AL, USA.

Type 2 Diabetes (T2D) is a chronic disease arising from the development of insulin absence or resistance within the body, and a complex interplay of environmental and genetic factors. The incidence of T2D has increased throughout the last few decades, together with the occurrence of the obesity epidemic. The consideration of variants identified by Genome Wide Association Studies (GWAS) into risk assessment models for T2D could aid in the identification of at-risk patients who could benefit from preventive medicine. In this study, we build several risk assessment models, evaluated with two different classification approaches (Logistic Regression and Neural Networks), to measure the effect of including genetic information in the prediction of T2D. We used data from to the Original and the Offspring cohorts of the Framingham Heart Study, which provides phenotypic and genetic information for 5245 subjects (4306 controls and 939 cases). Models were built by using several covariates: gender, exposure time, cohort, body mass index (BMI), and 65 SNPs associated to T2D. We fitted Logistic Regressions and Bayesian Regularized Neural Networks and then assessed their predictive ability by using a ten-fold cross validation. We found that the inclusion of genetic information into the risk assessment models increased the predictive ability by 2%, when compared to the baseline model. Furthermore, the models that included BMI at the onset of diabetes as a possible effector, gave an improvement of 6% in the area under the curve derived from the ROC analysis. The highest AUC achieved (0.75) belonged to the model that included BMI, and a genetic score based on the 65 established T2D-associated SNPs. Finally, the inclusion of SNPs and BMI raised predictive ability in all models as expected; however, results from the AUC in Neural Networks and Logistic Regression did not differ significantly in their prediction accuracy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2015.00075DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4362394PMC
April 2015

Estimating proportions of explained variance: a comparison of whole genome subsets.

BMC Proc 2014 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S102. Epub 2014 Jun 17.

Department of Biostatistics, University of Alabama at Birmingham, 1665 University Blvd, Birmingham, AL 35205, USA.

Following the publication of the ENCODE project results, there has been increasing interest in investigating different areas of the chromosome and evaluating the relative contribution of each area to expressed phenotypes. This study aims to evaluate the contribution of variants, classified by minor allele frequency and gene annotation, to the observed interindividual differences. In this study, we fitted Bayesian linear regression models to data from Genetic Analysis Workshop 18 (n = 395) to estimate the variance of standardized and log-transformed systolic blood pressure that can be explained by subsets of genetic markers. Rare and very rare variants explained an overall higher proportion of the variance, as did markers located within a gene rather than flanking regions. The proportion of variance explained by rare and very rare variants decreased when we controlled for the number of markers, suggesting that the number of contributing rare alleles plays an important role in the genetic architecture of chronic disease traits. Our findings lend support to the "common disease, rare variant" hypothesis for systolic blood pressure and highlight allele frequency and functional annotation of a polymorphism as potentially crucial considerations in whole genome study designs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1753-6561-8-S1-S102DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4143698PMC
December 2014

Even modest prediction accuracy of genomic models can have large clinical utility.

Front Genet 2014 28;5:417. Epub 2014 Nov 28.

School of Public Health, University of Alabama at Birmingham Birmingham, AL, USA.

Whole Genome Prediction (WGP) jointly fits thousands of SNPs into a regression model to yield estimates for the contribution of markers to the overall variance of a particular trait, and for their associations with that trait. To date, WGP has offered only modest prediction accuracy, but in some cases even modest prediction accuracy may be useful. We provide an illustration of this using a theoretical simulation that used WGP to predict weight loss after bariatric surgery with moderate accuracy (R (2) = 0.07) to assess the clinical utility of WGP despite these limitations. Prevention of Type 2 Diabetes (T2DM) post-surgery was considered the major outcome. Treating only patients above predefined threshold of predicted weight loss in our simulation, in the realistic context of finite resources for the surgery, significantly reduced lifetime risk of T2DM in the treatable population by selecting those most likely to succeed. Thus, our example illustrates how WGP may be clinically useful in some situations, and even with moderate accuracy, may provide a clear path for turning personalized medicine from theory to reality.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fgene.2014.00417DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4246888PMC
December 2014

Pharmacogenetic effects of 'candidate gene complexes' on stroke in the GenHAT study.

Pharmacogenet Genomics 2014 Nov;24(11):556-63

Departments of aBiostatistics bEpidemiology, University of Alabama at Birmingham, Birmingham, Alabama cDepartment of Biostatistics dDepartment of Epidemiology, Human Genetics & Environmental Sciences, The University of Texas Health Science Center at Houston, Houston, Texas eDepartment of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota, USA fDepartment of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Denmark.

Objective: The aim of this study was to investigate whether there is a genotype-by-treatment interaction in patients experiencing stroke and treated with one of three antihypertensive drugs, that is chlorthalidone, amlodipine, or lisinopril.

Participants And Methods: A population of 436 African Americans and 539 whites who had experienced stroke in the GenHAT study were genotyped for 768 single nucleotide polymorphisms (SNPs) in 280 candidate genes. To detect a genotype-by-treatment interaction, we used the Pearson's χ-test to assess whether the genotype frequencies differed at the single SNP level for the three drug treatment groups. From these single SNP analyses, we derived a summary statistic for the degree of association at the gene and gene complex levels. This was done by grouping SNPs using information on gene locations and defining gene complexes on the basis of protein-protein interactions. To assess the statistical significance of the observed test statistic, we derived an empirical P-value by simulating data under the null hypothesis.

Results: We found that, in patients who have experienced stroke, there is a significant genetic difference between hypertension drug treatment groups. In African Americans, SNP rs12143842 showed a significant association (P<0.001) with drug treatment. At the gene level, HNRNPA1P4 and NOS1AP in African Americans and PRICKLE1 and NINJ2 in non-Hispanic whites were significantly associated (P<0.01) with drug treatment, whereas none of the gene complexes tested showed significance.

Conclusion: On the basis of the genetic differences between drug treatment groups, we conclude that there may be an interaction between certain genotypes and antihypertensive treatment in stroke patients. This needs to be replicated in other studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/FPC.0000000000000088DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4189974PMC
November 2014