Publications by authors named "Ina Hoeschele"

35 Publications

Transcriptome prediction performance across machine learning models and diverse ancestries.

HGG Adv 2021 Apr 5;2(2). Epub 2021 Jan 5.

Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA.

Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.xhgg.2020.100019DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8087249PMC
April 2021

Association between sleep disordered breathing and epigenetic age acceleration: Evidence from the Multi-Ethnic Study of Atherosclerosis.

EBioMedicine 2019 Dec 21;50:387-394. Epub 2019 Nov 21.

Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 221 Longwood Avenue, BL252, Boston, MA 02115, United States.

Background: Sleep disordered breathing (SDB) is a common disorder that results in oxidative stress and inflammation and is associated with multiple age-related health outcomes. Epigenetic age acceleration is a DNA methylation (DNAm)-based marker of fast biological aging. We examined the associations of SDB traits with epigenetic age acceleration.

Methods: A sample of 622 participants from the Multi-Ethnic Study of Atherosclerosis (MESA) had blood DNAm measured and underwent Type 2 in-home polysomnography that assessed apnea-hypopnea index (AHI), percentage of sleep time with oxygen saturation lower than 90% (Per90), and arousal index. DNAm data provided measures of DNAm-Age acceleration and DNAm-PhenoAge acceleration. The association of each SDB trait with age acceleration was estimated using linear regression, controlling for covariates. In secondary analyses, we studied the associations of SDB traits with epigenetic age acceleration 2-10 years after sleep study in 530 individuals from the Framingham Heart Study (FHS).

Findings: In MESA, AHI was associated with greater DNAm-PhenoAge acceleration (β = 0.03; 95% CI [0.001, 0.06]). Arousal index was associated with greater DNAm-Age acceleration (β = 0.04; 95% CI [0.01, 0.07]). Both associations were stronger in women than men. In the secondary FHS analyses, Per90 was associated with greater DNAm-Age acceleration and this association was stronger in men.

Interpretation: More severe SDB was associated with epigenetic age acceleration in both cohorts. Future work should prospectively study short- and long-term effects of SDB, and whether treatment reduces epigenetic age acceleration among those individuals with SBD.

Funding: National Institutes of Health.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ebiom.2019.11.020DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6921369PMC
December 2019

Adrenocortical Challenge Response and Genomic Analyses in Scottish Terriers With Increased Alkaline Phosphate Activity.

Front Vet Sci 2018 9;5:231. Epub 2018 Oct 9.

Department of Biomedical Sciences and Pathobiology, Virginia Maryland College of Veterinary Medicine, Virginia Tech, Blacksburg, VA, United States.

Scottish terriers (ST) frequently have increased serum alkaline phosphatase (ALP) of the steroid isoform. Many of these also have high serum concentrations of adrenal sex steroids. The study's objective was to determine the cause of increased sex steroids in ST with increased ALP. Adrenal gland suppression and stimulation were compared by low dose dexamethasone (LDDS), human chorionic gonadotropin (HCG) and adrenocorticotropic hormone (ACTH) response tests. Resting plasma pituitary hormones were measured. Steroidogenesis-related mRNA expression was evaluated in six ST with increased ALP, eight dogs of other breeds with pituitary-dependent hyperadrenocorticism (HAC), and seven normal dogs. The genome-wide association of single nucleotide polymorphisms (SNP) with ALP activity was evaluated in 168 ST. ALP (reference interval 8-70 U/L) was high in all ST (1,054 U/L) and HAC (985 U/L) dogs. All HAC dogs and 2/8 ST had increased cortisol post-ACTH administration. All ST and 2/7 Normal dogs had increased sex steroids post-ACTH. ST and Normal dogs had similar post-challenge adrenal steroid profiles following LDDS and HCG. Surprisingly, mRNA of hydroxysteroid 17-beta dehydrogenase 2 (HSD17B2) was lower in ST and Normal dogs than HAC. HSD17B2 facilities metabolism of sex steroids. A SNP region was identified on chromosome 5 in proximity to HSD17B2 that correlated with increased serum ALP. ST in this study with increased ALP had a normal pituitary-adrenal axis in relationship to glucocorticoids and luteinizing hormone. We speculate the identified SNP and HSD17B2 gene may have a role in the pathogenesis of elevated sex steroids and ALP in ST.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fvets.2018.00231DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6189480PMC
October 2018

Cross-species transcriptional analysis reveals conserved and host-specific neoplastic processes in mammalian glioma.

Sci Rep 2018 01 19;8(1):1180. Epub 2018 Jan 19.

Department of Neurosurgery, University of Maryland School of Medicine, Baltimore, Maryland, USA.

Glioma is a unique neoplastic disease that develops exclusively in the central nervous system (CNS) and rarely metastasizes to other tissues. This feature strongly implicates the tumor-host CNS microenvironment in gliomagenesis and tumor progression. We investigated the differences and similarities in glioma biology as conveyed by transcriptomic patterns across four mammalian hosts: rats, mice, dogs, and humans. Given the inherent intra-tumoral molecular heterogeneity of human glioma, we focused this study on tumors with upregulation of the platelet-derived growth factor signaling axis, a common and early alteration in human gliomagenesis. The results reveal core neoplastic alterations in mammalian glioma, as well as unique contributions of the tumor host to neoplastic processes. Notable differences were observed in gene expression patterns as well as related biological pathways and cell populations known to mediate key elements of glioma biology, including angiogenesis, immune evasion, and brain invasion. These data provide new insights regarding mammalian models of human glioma, and how these insights and models relate to our current understanding of the human disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-018-19451-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5775420PMC
January 2018

Cell Cycle Model System for Advancing Cancer Biomarker Research.

Sci Rep 2017 12 21;7(1):17989. Epub 2017 Dec 21.

Department of Biological Sciences, Virginia Tech. 1981 Kraft Drive, Blacksburg, VA, 24061, USA.

Progress in understanding the complexity of a devastating disease such as cancer has underscored the need for developing comprehensive panels of molecular markers for early disease detection and precision medicine applications. The present study was conducted to assess whether a cohesive biological context can be assigned to protein markers derived from public data mining, and whether mass spectrometry can be utilized to screen for the co-expression of functionally related biomarkers to be recommended for further exploration in clinical context. Cell cycle arrest/release experiments of MCF7/SKBR3 breast cancer and MCF10 non-tumorigenic cells were used as a surrogate to support the production of proteins relevant to aberrant cell proliferation. Information downloaded from the scientific public domain was queried with bioinformatics tools to generate an initial list of 1038 cancer-associated proteins. Mass spectrometric analysis of cell extracts identified 352 proteins that could be matched to the public list. Differential expression, enrichment, and protein-protein interaction analysis of the proteomic data revealed several functionally-related clusters of relevance to cancer. The results demonstrate that public data derived from independent experiments can be used to inform biological research and support the development of molecular assays for probing the characteristics of a disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-017-17845-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5740075PMC
December 2017

Blood monocyte transcriptome and epigenome analyses reveal loci associated with human atherosclerosis.

Nat Commun 2017 08 30;8(1):393. Epub 2017 Aug 30.

University of Wisconsin School of Medicine and Public Health, Madison, WI, 53792, USA.

Little is known regarding the epigenetic basis of atherosclerosis. Here we present the CD14+ blood monocyte transcriptome and epigenome signatures associated with human atherosclerosis. The transcriptome signature includes transcription coactivator, ARID5B, which is known to form a chromatin derepressor complex with a histone H3K9Me2-specific demethylase and promote adipogenesis and smooth muscle development. ARID5B CpG (cg25953130) methylation is inversely associated with both ARID5B expression and atherosclerosis, consistent with this CpG residing in an ARID5B enhancer region, based on chromatin capture and histone marks data. Mediation analysis supports assumptions that ARID5B expression mediates effects of cg25953130 methylation and several cardiovascular disease risk factors on atherosclerotic burden. In lipopolysaccharide-stimulated human THP1 monocytes, ARID5B knockdown reduced expression of genes involved in atherosclerosis-related inflammatory and lipid metabolism pathways, and inhibited cell migration and phagocytosis. These data suggest that ARID5B expression, possibly regulated by an epigenetically controlled enhancer, promotes atherosclerosis by dysregulating immunometabolism towards a chronic inflammatory phenotype.The molecular mechanisms mediating the impact of environmental factors in atherosclerosis are unclear. Here, the authors examine CD14+ blood monocyte's transcriptome and epigenome signatures to find differential methylation and expression of ARID5B to be associated with human atherosclerosis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-017-00517-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5577184PMC
August 2017

Secondhand Tobacco Smoke Exposure Associations With DNA Methylation of the Aryl Hydrocarbon Receptor Repressor.

Nicotine Tob Res 2017 Apr;19(4):442-451

Department of Epidemiology and Prevention, Division of Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, NC.

Introduction: Cigarette smoking is inversely associated with DNA methylation of the aryl hydrocarbon receptor repressor (AHRR; cg05575921). However, the association between secondhand tobacco smoke (SHS) exposure and AHRR methylation is unknown.

Methods: DNA methylation of AHRR cg05575921 in CD14+ monocyte samples, from 495 never-smokers and 411 former smokers (having quit smoking ≥15 years) from the Multi-Ethnic Study of Atherosclerosis (MESA), was cross-sectionally compared with concomitantly ascertained self-reported SHS exposure, urine cotinine concentrations, and estimates of air pollutants at participants' homes. Linear regression was used to test for associations, and covariates included age, sex, race, education, study site, and previous smoking exposure (smoking status, time since quitting, and pack-years).

Results: Recent indoor SHS exposure (hours per week) was inversely associated with cg05575921 methylation (β ± SE = -0.009 ± 0.003, p = .007). The inverse effect direction was consistent (but did not reach significance) in the majority of stratified analyses (by smoking status, sex, and race). Categorical analysis revealed high levels of recent SHS exposure (≥10 hours per week) inversely associated with cg05575921 methylation (β ± SE = -0.28 ± 0.09, p = .003), which remained significant (p < .05) in the majority of stratified analyses. cg05575921 methylation did not significantly (p < .05) associate with low to moderate levels of recent SHS exposure (1-9 hours per week), urine cotinine concentrations, years spent living with people smoking, years spent indoors (not at home) with people smoking, or estimated levels of air pollutants.

Conclusions: High levels of recent indoor SHS exposure may be inversely associated with DNA methylation of AHRR in human monocytes.

Implications: DNA methylation is a biochemical alteration that can occur in response to cigarette smoking; however, little is known about the effect of SHS on human DNA methylation. In the present study, we evaluated the association between SHS exposure and DNA methylation in human monocytes, at a site (AHRR cg05575921) known to have methylation inversely associated with current and former cigarette smoking compared to never smoking. Results from this study suggest high levels of recent SHS exposure inversely associate with DNA methylation of AHRR cg05575921 in monocytes from nonsmokers, albeit with weaker effects than active cigarette smoking.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/ntr/ntw219DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6075517PMC
April 2017

DNA Methylation of the Aryl Hydrocarbon Receptor Repressor Associations With Cigarette Smoking and Subclinical Atherosclerosis.

Circ Cardiovasc Genet 2015 Oct 25;8(5):707-16. Epub 2015 Aug 25.

Background: Tobacco smoke contains numerous agonists of the aryl hydrocarbon receptor (AhR) pathway, and activation of the AhR pathway was shown to promote atherosclerosis in mice. Intriguingly, cigarette smoking is most strongly and robustly associated with DNA modifications to an AhR pathway gene, the AhR repressor (AHRR). We hypothesized that altered AHRR methylation in monocytes, a cell type sensitive to cigarette smoking and involved in atherogenesis, may be a part of the biological link between cigarette smoking and atherosclerosis.

Methods And Results: DNA methylation profiles of AHRR in monocytes (542 CpG sites ± 150 kb of AHRR, using Illumina 450K array) were integrated with smoking habits and ultrasound-measured carotid plaque scores from 1256 participants of the Multi-Ethnic Study of Atherosclerosis (MESA). Methylation of cg05575921 significantly associated (P=6.1 × 10(-134)) with smoking status (current versus never). Novel associations between cg05575921 methylation and carotid plaque scores (P=3.1 × 10(-10)) were identified, which remained significant in current and former smokers even after adjusting for self-reported smoking habits, urinary cotinine, and well-known cardiovascular disease risk factors. This association replicated in an independent cohort using hepatic DNA (n=141). Functionally, cg05575921 was located in a predicted gene expression regulatory element (enhancer) and had methylation correlated with AHRR mRNA profiles (P=1.4 × 10(-17)) obtained from RNA sequencing conducted on a subset (n=373) of the samples.

Conclusions: These findings suggest that AHRR methylation may be functionally related to AHRR expression in monocytes and represents a potential biomarker of subclinical atherosclerosis in smokers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1161/CIRCGENETICS.115.001097DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4618776PMC
October 2015

Alterations of a Cellular Cholesterol Metabolism Network Are a Molecular Feature of Obesity-Related Type 2 Diabetes and Cardiovascular Disease.

Diabetes 2015 Oct 7;64(10):3464-74. Epub 2015 Jul 7.

Department of Epidemiology & Prevention, Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, NC

Obesity is linked to type 2 diabetes (T2D) and cardiovascular diseases; however, the underlying molecular mechanisms remain unclear. We aimed to identify obesity-associated molecular features that may contribute to obesity-related diseases. Using circulating monocytes from 1,264 Multi-Ethnic Study of Atherosclerosis (MESA) participants, we quantified the transcriptome and epigenome. We discovered that alterations in a network of coexpressed cholesterol metabolism genes are a signature feature of obesity and inflammatory stress. This network included 11 BMI-associated genes related to sterol uptake (↑LDLR, ↓MYLIP), synthesis (↑SCD, FADS1, HMGCS1, FDFT1, SQLE, CYP51A1, SC4MOL), and efflux (↓ABCA1, ABCG1), producing a molecular profile expected to increase intracellular cholesterol. Importantly, these alterations were associated with T2D and coronary artery calcium (CAC), independent from cardiometabolic factors, including serum lipid profiles. This network mediated the associations between obesity and T2D/CAC. Several genes in the network harbored C-phosphorus-G dinucleotides (e.g., ABCG1/cg06500161), which overlapped Encyclopedia of DNA Elements (ENCODE)-annotated regulatory regions and had methylation profiles that mediated the associations between BMI/inflammation and expression of their cognate genes. Taken together with several lines of previous experimental evidence, these data suggest that alterations of the cholesterol metabolism gene network represent a molecular link between obesity/inflammation and T2D/CAC.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2337/db14-1314DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4587646PMC
October 2015

Transcriptomic profiles of aging in purified human immune cells.

BMC Genomics 2015 Apr 22;16:333. Epub 2015 Apr 22.

Department of Epidemiology and Prevention, Division of Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, 27157, USA.

Background: Transcriptomic studies hold great potential towards understanding the human aging process. Previous transcriptomic studies have identified many genes with age-associated expression levels; however, small samples sizes and mixed cell types often make these results difficult to interpret.

Results: Using transcriptomic profiles in CD14+ monocytes from 1,264 participants of the Multi-Ethnic Study of Atherosclerosis (aged 55-94 years), we identified 2,704 genes differentially expressed with chronological age (false discovery rate, FDR ≤ 0.001). We further identified six networks of co-expressed genes that included prominent genes from three pathways: protein synthesis (particularly mitochondrial ribosomal genes), oxidative phosphorylation, and autophagy, with expression patterns suggesting these pathways decline with age. Expression of several chromatin remodeler and transcriptional modifier genes strongly correlated with expression of oxidative phosphorylation and ribosomal protein synthesis genes. 17% of genes with age-associated expression harbored CpG sites whose degree of methylation significantly mediated the relationship between age and gene expression (p < 0.05). Lastly, 15 genes with age-associated expression were also associated (FDR ≤ 0.01) with pulse pressure independent of chronological age. Comparing transcriptomic profiles of CD14+ monocytes to CD4+ T cells from a subset (n = 423) of the population, we identified 30 age-associated (FDR < 0.01) genes in common, while larger sets of differentially expressed genes were unique to either T cells (188 genes) or monocytes (383 genes). At the pathway level, a decline in ribosomal protein synthesis machinery gene expression with age was detectable in both cell types.

Conclusions: An overall decline in expression of ribosomal protein synthesis genes with age was detected in CD14+ monocytes and CD4+ T cells, demonstrating that some patterns of aging are likely shared between different cell types. Our findings also support cell-specific effects of age on gene expression, illustrating the importance of using purified cell samples for future transcriptomic studies. Longitudinal work is required to establish the relationship between identified age-associated genes/pathways and aging-related diseases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-015-1522-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4417516PMC
April 2015

Age-related variations in the methylome associated with gene expression in human monocytes and T cells.

Nat Commun 2014 Nov 18;5:5366. Epub 2014 Nov 18.

Departments of Epidemiology & Prevention, and Biostatistics, Division of Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina 27157, USA.

Age-related variations in DNA methylation have been reported; however, the functional relevance of these differentially methylated sites (age-dMS) are unclear. Here we report potentially functional age-dMS, defined as age- and cis-gene expression-associated methylation sites (age-eMS), identified by integrating genome-wide CpG methylation and gene expression profiles collected ex vivo from circulating T cells (227 CD4+ samples) and monocytes (1,264 CD14+ samples, age range: 55-94 years). None of the age-eMS detected in 227 T-cell samples are detectable in 1,264 monocyte samples, in contrast to the majority of age-dMS detected in T cells that replicated in monocytes. Age-eMS tend to be hypomethylated with older age, located in predicted enhancers and preferentially linked to expression of antigen processing and presentation genes. These results identify and characterize potentially functional age-related methylation in human T cells and monocytes, and provide novel insights into the role age-dMS may have in the aging process.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms6366DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4280798PMC
November 2014

Penalized multimarker vs. single-marker regression methods for genome-wide association studies of quantitative traits.

Genetics 2015 Jan 28;199(1):205-22. Epub 2014 Oct 28.

Virginia Bioinformatics Institute, Virginia Tech University, Blacksburg, Virginia 24061 Department of Statistics, Virginia Tech University, Blacksburg, Virginia 24061

The data from genome-wide association studies (GWAS) in humans are still predominantly analyzed using single-marker association methods. As an alternative to single-marker analysis (SMA), all or subsets of markers can be tested simultaneously. This approach requires a form of penalized regression (PR) as the number of SNPs is much larger than the sample size. Here we review PR methods in the context of GWAS, extend them to perform penalty parameter and SNP selection by false discovery rate (FDR) control, and assess their performance in comparison with SMA. PR methods were compared with SMA, using realistically simulated GWAS data with a continuous phenotype and real data. Based on these comparisons our analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS. We found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than SMA with Benjamini-Hochberg FDR control (SMA-BH). PR with FDR-based penalty parameter selection controlled the FDR somewhat conservatively while SMA-BH may not achieve FDR control in all situations. Differences among PR methods seem quite small when the focus is on SNP selection with FDR control. Incorporating linkage disequilibrium into the penalization by adapting penalties developed for covariates measured on graphs can improve power but also generate more false positives or wider regions for follow-up. We recommend the elastic net with a mixing weight for the Lasso penalty near 0.5 as the best method.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.114.167817DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286685PMC
January 2015

Methylomics of gene expression in human monocytes.

Hum Mol Genet 2013 Dec 29;22(24):5065-74. Epub 2013 Jul 29.

Wake Forest School of Medicine, Winston-Salem, NC, USA.

DNA methylation is one of several epigenetic mechanisms that contribute to the regulation of gene expression; however, the extent to which methylation of CpG dinucleotides correlates with gene expression at the genome-wide level is still largely unknown. Using purified primary monocytes from subjects in a large community-based cohort (n = 1264), we characterized methylation (>485 000 CpG sites) and mRNA expression (>48K transcripts) and carried out genome-wide association analyses of 8370 expression phenotypes. We identified 11 203 potential cis-acting CpG loci whose degree of methylation was associated with gene expression (eMS) at a false discovery rate threshold of 0.001. Most of the associations were consistent in effect size and direction of effect across sex and three ethnicities. Contrary to expectation, these eMS were not predominately enriched in promoter regions, or CpG islands, but rather in the 3' UTR, gene bodies, CpG shores or 'offshore' sites, and both positive and negative correlations between methylation and expression were observed across all locations. eMS were enriched for regions predicted to be regulatory by ENCODE (Encyclopedia of DNA Elements) data in multiple cell types, particularly enhancers. One of the strongest association signals detected (P < 2.2 × 10(-308)) was a methylation probe (cg17005068) in the promoter/enhancer region of the glutathione S-transferase theta 1 gene (GSTT1, encoding the detoxification enzyme) with GSTT1 mRNA expression. Our study provides a detailed description of the epigenetic architecture in human monocytes and its relationship to gene expression. These data may help prioritize interrogation of biologically relevant methylation loci and provide new insights into the epigenetic basis of human health and diseases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddt356DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3836482PMC
December 2013

Simulating systems genetics data with SysGenSIM.

Bioinformatics 2011 Sep 6;27(17):2459-62. Epub 2011 Jul 6.

CRS4 Bioinformatica, 09010 Pula (CA), Italy.

Summary: SysGenSIM is a software package to simulate Systems Genetics (SG) experiments in model organisms, for the purpose of evaluating and comparing statistical and computational methods and their implementations for analyses of SG data [e.g. methods for expression quantitative trait loci (eQTL) mapping and network inference]. SysGenSIM allows the user to select a variety of network topologies, genetic and kinetic parameters to simulate SG data ( genotyping, gene expression and phenotyping) with large gene networks with thousands of nodes. The software is encoded in MATLAB, and a user-friendly graphical user interface is provided.

Availability: The open-source software code and user manual can be downloaded at: http://sysgensim.sourceforge.net/

Contact: [email protected]
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btr407DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3157927PMC
September 2011

Sporadic breast cancer patients' germline DNA exhibit an AT-rich microsatellite signature.

Genes Chromosomes Cancer 2011 Apr 14;50(4):275-83. Epub 2011 Jan 14.

Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061-0477, USA.

Using a custom CGH-like oligonucleotide array to measure the global microsatellite content in the genomes of 72 cancer, cancer-free, and high risk patient and cell line samples (56 germline DNA and 16 in tumor or tumor cell line DNA) we found a unique, reproducible, and statistically significant pattern of 18 motif-specific microsatellite families (out of 962 possible 1-6 mer repeats) in breast cancer patient germline and tumor DNA, but not in germline DNA of cancer-free volunteer controls or in breast cancer patients with BRCA1/2 mutations. These high-similarity A/T rich repetitive motifs were also more pronounced in the germlines and tumors of colon cancer tumor patients (3/6 samples) and microsatellite unstable colon cancer cell lines; however, germline DNA of sporadic breast cancer patients exhibited the largest global content shift for those motifs with extreme AT/GC ratios. These results indicate that global microsatellite variability is complex, suggest the existence of a previously unknown genomic destabilization mechanism in breast cancer patients' germline DNA, and warrant further testing of such microsatellite variability as a predictor of future breast cancer development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gcc.20853DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3107400PMC
April 2011

Nonparametric Bayesian variable selection with applications to multiple quantitative trait loci mapping with epistasis and gene-environment interaction.

Genetics 2010 Sep 15;186(1):385-94. Epub 2010 Jun 15.

Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599, USA.

The joint action of multiple genes is an important source of variation for complex traits and human diseases. However, mapping genes with epistatic effects and gene-environment interactions is a difficult problem because of relatively small sample sizes and very large parameter spaces for quantitative trait locus models that include such interactions. Here we present a nonparametric Bayesian method to map multiple quantitative trait loci (QTL) by considering epistatic and gene-environment interactions. The proposed method is not restricted to pairwise interactions among genes, as is typically done in parametric QTL analysis. Rather than modeling each main and interaction term explicitly, our nonparametric Bayesian method measures the importance of each QTL, irrespective of whether it is mostly due to a main effect or due to some interaction effect(s), via an unspecified function of the genotypes at all candidate QTL. A Gaussian process prior is assigned to this unknown function. In addition to the candidate QTL, nongenetic factors and covariates, such as age, gender, and environmental conditions, can also be included in the unspecified function. The importance of each genetic factor (QTL) and each nongenetic factor/covariate included in the function is estimated by a single hyperparameter, which enters the covariance function and captures any main or interaction effect associated with a given factor/covariate. An initial evaluation of the performance of the proposed method is obtained via analysis of simulated and real data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.109.113688DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2940302PMC
September 2010

Genome scan for loci regulating HDL cholesterol levels in Finnish extended pedigrees with early coronary heart disease.

Eur J Hum Genet 2010 May 25;18(5):604-13. Epub 2009 Nov 25.

Institute of Clinical Medicine, Department of Internal Medicine and Biocenter Oulu, University of Oulu and Clinical Research Center, Oulu University Hospital, Oulu, Finland.

Coronary heart disease (CHD) is the leading cause of mortality in Western societies. Its risk is inversely correlated with plasma high-density lipoprotein cholesterol (HDL-C) levels, and approximately 50% of the variability in these levels is genetically determined. In this study, the aim was to carry out a whole-genome scan for the loci regulating plasma HDL-C levels in 35 well-defined Finnish extended pedigrees (375 members genotyped) with probands having low HDL-C levels and premature CHD. The additive genetic heritability of HDL-C was 43%. A variance component analysis revealed four suggestive quantitative trait loci (QTLs) for HDL-C levels, with the highest LOD score, 3.1, at the chromosomal locus 4p12. Other suggestive LOD scores were 2.1 at 2q33, 2.1 at 6p24 and 2.0 at 17q25. Three suggestive loci for the qualitative low HDL-C trait were found, with a nonparametric multipoint score of 2.6 at the chromosomal locus 10p15.3, 2.5 at 22q11 and 2.1 at 6p12. After correction for statin use, the strongest evidence of linkage was shown on chromosomes 4p12, 6p24, 6p12, 15q22 and 22q11. To search for the underlying gene on chromosome 6, we analyzed two functional and positional candidate genes (peroxisome proliferator-activated receptor-delta (PPARD), and retinoid X receptor beta, (RXRB)), but found no significant evidence of association. In conclusion, we identified seven chromosomal regions for HDL-C regulation exceeding the level for suggestive evidence of linkage.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ejhg.2009.202DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987327PMC
May 2010

Gaussian process based bayesian semiparametric quantitative trait Loci interval mapping.

Biometrics 2010 Mar 12;66(1):222-32. Epub 2009 May 12.

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.

In linkage analysis, it is often necessary to include covariates such as age or weight to increase power or avoid spurious false positive findings. However, if a covariate term in the model is specified incorrectly (e.g., a quadratic term misspecified as a linear term), then the inclusion of the covariate may adversely affect power and accuracy of the identification of quantitative trait loci (QTL). Furthermore, some covariates may interact with each other in a complicated fashion. We implement semiparametric models for single and multiple QTL mapping. Both mapping methods include an unspecified function of any covariate found or suspected to have a more complex than linear but unknown relationship with the response variable. They also allow for interactions among different covariates. This analysis is performed in a Bayesian inference framework using Markov chain Monte Carlo. The advantages of our methods are demonstrated via extensive simulations and real data analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/j.1541-0420.2009.01268.xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2875332PMC
March 2010

Differential protein expression analysis using stable isotope labeling and PQD linear ion trap MS technology.

J Am Soc Mass Spectrom 2009 Jul 4;20(7):1287-302. Epub 2009 Mar 4.

Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA.

An isotope tags for relative and absolute quantitation (iTRAQ)-based reversed-phase liquid chromatography (RPLC)-tandem mass spectrometry (MS/MS) method was developed for differential protein expression profiling in complex cellular extracts. The estrogen positive MCF-7 cell line, cultured in the presence of 17beta-estradiol (E2) and tamoxifen (Tam), was used as a model system. MS analysis was performed with a linear trap quadrupole (LTQ) instrument operated by using pulsed Q dissociation (PQD) detection. Optimization experiments were conducted to maximize the iTRAQ labeling efficiency and the number of quantified proteins. MS data filtering criteria were chosen to result in a false positive identification rate of <4%. The reproducibility of protein identifications was approximately 60%-67% between duplicate, and approximately 50% among triplicate LC-MS/MS runs, respectively. The run-to-run reproducibility, in terms of relative standard deviations (RSD) of global mean iTRAQ ratios, was better than 10%. The quantitation accuracy improved with the number of peptides used for protein identification. From a total of 530 identified proteins (P < 0.001) in the E2/Tam treated MCF-7 cells, a list of 255 proteins (quantified by at least two peptides) was generated for differential expression analysis. A method was developed for the selection, normalization, and statistical evaluation of such datasets. An approximate approximately 2-fold change in protein expression levels was necessary for a protein to be selected as a biomarker candidate. According to this data processing strategy, approximately 16 proteins involved in biological processes such as apoptosis, RNA processing/metabolism, DNA replication/transcription/repair, cell proliferation and metastasis, were found to be up- or down-regulated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jasms.2009.02.029DOI Listing
July 2009

Haplotyping methods for pedigrees.

Hum Hered 2009 27;67(4):248-66. Epub 2009 Jan 27.

Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, Ala., USA.

Haplotypes provide valuable information in the study of diseases, complex traits, population histories, and evolutionary genetics. With the dramatic increase in the number of available single nucleotide polymorphism (SNP) markers, haplotype inference (haplotyping) using observed genotype data has become an important component of genetic studies in general and of statistical gene mapping in particular. Existing haplotyping methods include (1) population-based methods, (2) methods for pooled DNA samples, and (3) methods for family and pedigree data. The methods and computer programs for population data and pooled DNA samples were reviewed recently in the literature. As several authors noted, family and pedigree datasets are abundant and have unique advantages. In the past twenty years, many haplotyping methods for family and pedigree data have been developed. Therefore, in this contribution we review haplotyping methods and the corresponding computer programs suitable for family and pedigree data and discuss their applications and limitations. We explore the connections among these methods, and describe the challenges that remain to be addressed.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1159/000194978DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2692835PMC
May 2009

Infection and genotype remodel the entire soybean transcriptome.

BMC Genomics 2009 Jan 26;10:49. Epub 2009 Jan 26.

Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA.

Background: High throughput methods, such as high density oligonucleotide microarray measurements of mRNA levels, are popular and critical to genome scale analysis and systems biology. However understanding the results of these analyses and in particular understanding the very wide range of levels of transcriptional changes observed is still a significant challenge. Many researchers still use an arbitrary cut off such as two-fold in order to identify changes that may be biologically significant. We have used a very large-scale microarray experiment involving 72 biological replicates to analyze the response of soybean plants to infection by the pathogen Phytophthora sojae and to analyze transcriptional modulation as a result of genotypic variation.

Results: With the unprecedented level of statistical sensitivity provided by the high degree of replication, we show unambiguously that almost the entire plant genome (97 to 99% of all detectable genes) undergoes transcriptional modulation in response to infection and genetic variation. The majority of the transcriptional differences are less than two-fold in magnitude. We show that low amplitude modulation of gene expression (less than two-fold changes) is highly statistically significant and consistent across biological replicates, even for modulations of less than 20%. Our results are consistent through two different normalization methods and two different statistical analysis procedures.

Conclusion: Our findings demonstrate that the entire plant genome undergoes transcriptional modulation in response to infection and genetic variation. The pervasive low-magnitude remodeling of the transcriptome may be an integral component of physiological adaptation in soybean, and in all eukaryotes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-10-49DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2662884PMC
January 2009

Gene network inference via structural equation modeling in genetical genomics experiments.

Genetics 2008 Mar 3;178(3):1763-76. Epub 2008 Feb 3.

Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061-0477, USA.

Our goal is gene network inference in genetical genomics or systems genetics experiments. For species where sequence information is available, we first perform expression quantitative trait locus (eQTL) mapping by jointly utilizing cis-, cis-trans-, and trans-regulation. After using local structural models to identify regulator-target pairs for each eQTL, we construct an encompassing directed network (EDN) by assembling all retained regulator-target relationships. The EDN has nodes corresponding to expressed genes and eQTL and directed edges from eQTL to cis-regulated target genes, from cis-regulated genes to cis-trans-regulated target genes, from trans-regulator genes to target genes, and from trans-eQTL to target genes. For network inference within the strongly constrained search space defined by the EDN, we propose structural equation modeling (SEM), because it can model cyclic networks and the EDN indeed contains feedback relationships. On the basis of a factorization of the likelihood and the constrained search space, our SEM algorithm infers networks involving several hundred genes and eQTL. Structure inference is based on a penalized likelihood ratio and an adaptation of Occam's window model selection. The SEM algorithm was evaluated using data simulated with nonlinear ordinary differential equations and known cyclic network topologies and was applied to a real yeast data set.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.107.080069DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2278111PMC
March 2008

A rapid conditional enumeration haplotyping method in pedigrees.

Genet Sel Evol 2008 Jan-Feb;40(1):25-36. Epub 2007 Dec 21.

Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA.

Haplotyping in pedigrees provides valuable information for genetic studies (e.g., linkage analysis and association study). In order to identify a set of haplotype configurations with the highest likelihoods for a large pedigree with a large number of linked loci, in our previous work, we proposed a conditional enumeration haplotyping method which sets a threshold for the conditional probabilities of the possible ordered genotypes at every unordered individual-marker to delete some ordered genotypes with low conditional probabilities and then eliminate some haplotype configurations with low likelihoods. In this article we present a rapid haplotyping algorithm based on a modification of our previous method by setting an additional threshold for the ratio of the conditional probability of a haplotype configuration to the largest conditional probability of all haplotype configurations in order to eliminate those configurations with relatively low conditional probabilities. The new algorithm is much more efficient than our previous method and the widely used software SimWalk2.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1297-9686-40-1-25DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2674916PMC
March 2008

Bayesian estimation of genetic parameters for multivariate threshold and continuous phenotypes and molecular genetic data in simulated horse populations using Gibbs sampling.

BMC Genet 2007 May 9;8:19. Epub 2007 May 9.

Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover (Foundation), Hannover, Germany.

Background: Requirements for successful implementation of multivariate animal threshold models including phenotypic and genotypic information are not known yet. Here simulated horse data were used to investigate the properties of multivariate estimators of genetic parameters for categorical, continuous and molecular genetic data in the context of important radiological health traits using mixed linear-threshold animal models via Gibbs sampling. The simulated pedigree comprised 7 generations and 40000 animals per generation. Additive genetic values, residuals and fixed effects for one continuous trait and liabilities of four binary traits were simulated, resembling situations encountered in the Warmblood horse. Quantitative trait locus (QTL) effects and genetic marker information were simulated for one of the liabilities. Different scenarios with respect to recombination rate between genetic markers and QTL and polymorphism information content of genetic markers were studied. For each scenario ten replicates were sampled from the simulated population, and within each replicate six different datasets differing in number and distribution of animals with trait records and availability of genetic marker information were generated. (Co)Variance components were estimated using a Bayesian mixed linear-threshold animal model via Gibbs sampling. Residual variances were fixed to zero and a proper prior was used for the genetic covariance matrix.

Results: Effective sample sizes (ESS) and biases of genetic parameters differed significantly between datasets. Bias of heritability estimates was -6% to +6% for the continuous trait, -6% to +10% for the binary traits of moderate heritability, and -21% to +25% for the binary traits of low heritability. Additive genetic correlations were mostly underestimated between the continuous trait and binary traits of low heritability, under- or overestimated between the continuous trait and binary traits of moderate heritability, and overestimated between two binary traits. Use of trait information on two subsequent generations of animals increased ESS and reduced bias of parameter estimates more than mere increase of the number of informative animals from one generation. Consideration of genotype information as a fixed effect in the model resulted in overestimation of polygenic heritability of the QTL trait, but increased accuracy of estimated additive genetic correlations of the QTL trait.

Conclusion: Combined use of phenotype and genotype information on parents and offspring will help to identify agonistic and antagonistic genetic correlations between traits of interests, facilitating design of effective multiple trait selection schemes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2156-8-19DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1876470PMC
May 2007

Influence of priors in Bayesian estimation of genetic parameters for multivariate threshold models using Gibbs sampling.

Genet Sel Evol 2007 Mar-Apr;39(2):123-37. Epub 2007 Feb 17.

Institute for Animal Breeding and Genetics, University of Veterinary Medicine, Hannover Foundation, Buenteweg 17p, D-30559 Hannover, Germany.

Simulated data were used to investigate the influence of the choice of priors on estimation of genetic parameters in multivariate threshold models using Gibbs sampling. We simulated additive values, residuals and fixed effects for one continuous trait and liabilities of four binary traits, and QTL effects for one of the liabilities. Within each of four replicates six different datasets were generated which resembled different practical scenarios in horses with respect to number and distribution of animals with trait records and availability of QTL information. (Co)Variance components were estimated using a Bayesian threshold animal model via Gibbs sampling. The Gibbs sampler was implemented with both a flat and a proper prior for the genetic covariance matrix. Convergence problems were encountered in > 50% of flat prior analyses, with indications of potential or near posterior impropriety between about round 10,000 and 100,000. Terminations due to non-positive definite genetic covariance matrix occurred in flat prior analyses of the smallest datasets. Use of a proper prior resulted in improved mixing and convergence of the Gibbs chain. In order to avoid (near) impropriety of posteriors and extremely poorly mixing Gibbs chains, a proper prior should be used for the genetic covariance matrix when implementing the Gibbs sampler.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1297-9686-39-2-123DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2682833PMC
May 2007

Nucleoplasmin facilitates reprogramming and in vivo development of bovine nuclear transfer embryos.

Mol Reprod Dev 2006 Aug;73(8):977-86

Infigen, Inc., Madison, Wisconsin 53718, USA.

Successful cloning by somatic cell nuclear transfer (NT) involves an oocyte-driven transition in gene expression from an inherited somatic pattern, to an embryonic form, during early development. This reprogramming of gene expression is thought to require the remodeling of somatic chromatin and as such, faulty and/or incomplete chromatin remodeling may contribute to the aberrant gene expression and abnormal development observed in NT embryos. We used a novel approach to supplement the oocyte with chromatin remodeling factors and determined the impact of these molecules on gene expression and development of bovine NT embryos. Nucleoplasmin (NPL) or polyglutamic acid (PGA) was injected into bovine oocytes at different concentrations, either before (pre-NT) or after (post-NT) NT. Pre-implantation embryos were then transferred to bovine recipients to assess in vivo development. Microinjection of remodeling factors resulted in apparent differences in the rate of blastocyst development and in pregnancy initiation rates in both NPL- and PGA-injected embryos, and these differences were dependent on factor concentration and/or the time of injection. Post-NT NPL-injected embryos that produced the highest rate of pregnancy also demonstrated differentially expressed genes relative to pre-NT NPL embryos and control NT embryos, both of which had lower pregnancy rates. Over 200 genes were upregulated following post-NT NPL injection. Several of these genes were previously shown to be downregulated in NT embryos when compared to bovine IVF embryos. These data suggest that addition of chromatin remodeling factors to the oocyte may improve development of NT embryos by facilitating reprogramming of the somatic nucleus.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/mrd.20493DOI Listing
August 2006

Approximating identity-by-descent matrices using multiple haplotype configurations on pedigrees.

Genetics 2005 Sep 18;171(1):365-76. Epub 2005 Jun 18.

Virginia Bioinformatics Institute and Department of Statistics, Virginia Tech, Blacksburg, Virginia 24061, USA.

Identity-by-descent (IBD) matrix calculation is an important step in quantitative trait loci (QTL) analysis using variance component models. To calculate IBD matrices efficiently for large pedigrees with large numbers of loci, an approximation method based on the reconstruction of haplotype configurations for the pedigrees is proposed. The method uses a subset of haplotype configurations with high likelihoods identified by a haplotyping method. The new method is compared with a Markov chain Monte Carlo (MCMC) method (Loki) in terms of QTL mapping performance on simulated pedigrees. Both methods yield almost identical results for the estimation of QTL positions and variance parameters, while the new method is much more computationally efficient than the MCMC approach for large pedigrees and large numbers of loci. The proposed method is also compared with an exact method (Merlin) in small simulated pedigrees, where both methods produce nearly identical estimates of position-specific kinship coefficients. The new method can be used for fine mapping with joint linkage disequilibrium and linkage analysis, which improves the power and accuracy of QTL mapping.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.104.040337DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1456528PMC
September 2005

Finite mixture model analysis of microarray expression data on samples of uncertain biological type with application to reproductive efficiency.

Vet Immunol Immunopathol 2005 May;105(3-4):187-96

Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061-0477, USA.

Common goals of microarray experiments are the detection of genes that are differentially expressed between several biological types and the construction of classifiers that predict biological type of samples. Here we consider a situation where there is no training data. There is considerable interest in comparing expression profiles associated with successful pregnancies (SP) and unsuccessful pregnancies (UP) in model and farm animals. Successful pregnancy rate is known to be much higher in embryos generated by in vitro fertilization (IVF) than in nuclear transfer (NT) embryos, and higher under induced ovulation for large follicles (LF) than for small follicles (SF). The tasks of identifying genes differentially expressed between SP and UP, and predicting SP for future samples are not well accomplished by comparing IVF and NT, or LF and SF. A suitable method is finite mixture model analysis (FMMA), which models each observed class (IVF and NT, or LF and SF) as a mixture of two distributions, one for SP and one for UP, with different known or unknown proportions (here known to be 0.50 SP for IVF and 0.02 SP for NT). The means of the two distributions differ for the differentially expressed genes, which we identify via a likelihood ratio test. We confirm by simulation that FMMA strongly outperforms hierarchical clustering and linear discriminant analysis using the known class labels (NT, IVF). We apply FMMA to a real data set on IVF and NT embryos, and compute their posterior probabilities of SP, which confirm our prior knowledge of the SP proportions for IVF and NT.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.vetimm.2005.02.008DOI Listing
May 2005

Genetical genomics analysis of a yeast segregant population for transcription network inference.

Genetics 2005 Jun 21;170(2):533-42. Epub 2005 Mar 21.

Virginia Bioinformatics Institute and Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, 24061-0477, USA.

Genetic analysis of gene expression in a segregating population, which is expression profiled and genotyped at DNA markers throughout the genome, can reveal regulatory networks of polymorphic genes. We propose an analysis strategy with several steps: (1) genome-wide QTL analysis of all expression profiles to identify eQTL confidence regions, followed by fine mapping of identified eQTL; (2) identification of regulatory candidate genes in each eQTL region; (3) correlation analysis of the expression profiles of the candidates in any eQTL region with the gene affected by the eQTL to reduce the number of candidates; (4) drawing directional links from retained regulatory candidate genes to genes affected by the eQTL and joining links to form networks; and (5) statistical validation and refinement of the inferred network structure. Here, we apply an initial implementation of this strategy to a segregating yeast population. In 65, 7, and 28% of the identified eQTL regions, a single candidate regulatory gene, no gene, or more than one gene was retained in step 3, respectively. Overall, 768 putative regulatory links were retained, 331 of which are the strongest candidate links, as they were retained in the expression correlation analysis and were located within or near an eQTL subregion identified by a multimarker analysis separating multiple linked QTL. One or several biological processes were statistically significantly overrepresented in independent network structures or in highly interconnected subnetworks. Most of the transcription factors found in the inferred network had a putative regulatory link to only one other gene or exhibited cis-regulation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/genetics.105.041103DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1450429PMC
June 2005

A note on joint versus gene-specific mixed model analysis of microarray gene expression data.

Biostatistics 2005 Apr;6(2):183-6

Virginia Bioinformatics Institute and Department of Statistics, Virginia Tech, Blacksburg, VA 24061-0477, USA.

Currently, linear mixed model analyses of expression microarray experiments are performed either in a gene-specific or global mode. The joint analysis provides more flexibility in terms of how parameters are fitted and estimated and tends to be more powerful than the gene-specific analysis. Here we show how to implement the gene-specific linear mixed model analysis as an exact algorithm for the joint linear mixed model analysis. The gene-specific algorithm is exact, when the mixed model equations can be partitioned into unrelated components: One for all global fixed and random effects and the others for the gene-specific fixed and random effects for each gene separately. This unrelatedness holds under three conditions: (1) any gene must have the same number of replicates or probes on all arrays, but these numbers can differ among genes; (2) the residual variance of the (transformed) expression data must be homogeneous or constant across genes (other variance components need not be homogeneous) and (3) the number of genes in the experiment is large. When these conditions are violated, the gene-specific algorithm is expected to be nearly exact.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/biostatistics/kxi001DOI Listing
April 2005
-->