Publications by authors named "Daniel Taliun"

27 Publications

  • Page 1 of 1

LocusZoom.js: Interactive and embeddable visualization of genetic association study results.

Bioinformatics 2021 Mar 17. Epub 2021 Mar 17.

Department of Biostatistics and the Center for Statistical Genetics, University of Michigan, Ann Arbor, MI.

LocusZoom.js is a JavaScript library for creating interactive web-based visualizations of genetic association study results. It can display one or more traits in the context of relevant biological data (such as gene models and other genomic annotation), and allows interactive refinement of analysis models (by selecting linkage disequilibrium reference panels, identifying sets of likely causal variants, or comparisons to the GWAS catalog). It can be embedded in web pages to enable data sharing and exploration. Views can be customized and extended to display other data types such as phenome-wide association study (PheWAS) results, chromatin co-accessibility, or eQTL measurements. A new web upload service harmonizes datasets, adds annotations, and makes it easy to explore user-provided result sets. Availability LocusZoom.js is open-source software under a permissive MIT license. Code and documentation are available at: https://github.com/statgen/locuszoom/. Installable packages for all versions are also distributed via NPM. Additional features are provided as standalone libraries to promote reuse. Use with your own GWAS results at https://my.locuszoom.org/. Supplementary information Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab186DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479674PMC
March 2021

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.

Nature 2021 02 10;590(7845):290-299. Epub 2021 Feb 10.

The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes). In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03205-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875770PMC
February 2021

Loss-of-function genomic variants highlight potential therapeutic targets for cardiovascular disease.

Nat Commun 2020 12 18;11(1):6417. Epub 2020 Dec 18.

The Institute for Translational Genomics and Population Sciences, Department of Pediatrics and Los Angeles Biomedical Research Institute, Harbor-UCLA, Torrance, CA, USA.

Pharmaceutical drugs targeting dyslipidemia and cardiovascular disease (CVD) may increase the risk of fatty liver disease and other metabolic disorders. To identify potential novel CVD drug targets without these adverse effects, we perform genome-wide analyses of participants in the HUNT Study in Norway (n = 69,479) to search for protein-altering variants with beneficial impact on quantitative blood traits related to cardiovascular disease, but without detrimental impact on liver function. We identify 76 (11 previously unreported) presumed causal protein-altering variants associated with one or more CVD- or liver-related blood traits. Nine of the variants are predicted to result in loss-of-function of the protein. This includes ZNF529:p.K405X, which is associated with decreased low-density-lipoprotein (LDL) cholesterol (P = 1.3 × 10) without being associated with liver enzymes or non-fasting blood glucose. Silencing of ZNF529 in human hepatoma cells results in upregulation of LDL receptor and increased LDL uptake in the cells. This suggests that inhibition of ZNF529 or its gene product should be prioritized as a novel candidate drug target for treating dyslipidemia and associated CVD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-20086-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7749177PMC
December 2020

Cancer PRSweb: An Online Repository with Polygenic Risk Scores for Major Cancer Traits and Their Evaluation in Two Independent Biobanks.

Am J Hum Genet 2020 11 28;107(5):815-836. Epub 2020 Sep 28.

Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Precision Health Data Science, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI 48109, USA; Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; University of Michigan Rogel Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA. Electronic address:

To facilitate scientific collaboration on polygenic risk scores (PRSs) research, we created an extensive PRS online repository for 35 common cancer traits integrating freely available genome-wide association studies (GWASs) summary statistics from three sources: published GWASs, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWASs. Our framework condenses these summary statistics into PRSs using various approaches such as linkage disequilibrium pruning/p value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRSs in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRSs. We expect this integrated platform to accelerate PRS-related cancer research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2020.08.025DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7675001PMC
November 2020

De novo mutations across 1,465 diverse genomes reveal mutational insights and reductions in the Amish founder population.

Proc Natl Acad Sci U S A 2020 02 21;117(5):2560-2569. Epub 2020 Jan 21.

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201;

De novo mutations (DNMs), or mutations that appear in an individual despite not being seen in their parents, are an important source of genetic variation whose impact is relevant to studies of human evolution, genetics, and disease. Utilizing high-coverage whole-genome sequencing data as part of the Trans-Omics for Precision Medicine (TOPMed) Program, we called 93,325 single-nucleotide DNMs across 1,465 trios from an array of diverse human populations, and used them to directly estimate and analyze DNM counts, rates, and spectra. We find a significant positive correlation between local recombination rate and local DNM rate, and that DNM rate explains a substantial portion (8.98 to 34.92%, depending on the model) of the genome-wide variation in population-level genetic variation from 41K unrelated TOPMed samples. Genome-wide heterozygosity does correlate with DNM rate, but only explains <1% of variation. While we are underpowered to see small differences, we do not find significant differences in DNM rate between individuals of European, African, and Latino ancestry, nor across ancestrally distinct segments within admixed individuals. However, we did find significantly fewer DNMs in Amish individuals, even when compared with other Europeans, and even after accounting for parental age and sequencing center. Specifically, we found significant reductions in the number of C→A and T→C mutations in the Amish, which seem to underpin their overall reduction in DNMs. Finally, we calculated near-zero estimates of narrow sense heritability ( ), which suggest that variation in DNM rate is significantly shaped by nonadditive genetic effects and the environment.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1902766117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7007577PMC
February 2020

Genome-wide analyses identify a role for SLC17A4 and AADAT in thyroid hormone regulation.

Nat Commun 2018 10 26;9(1):4455. Epub 2018 Oct 26.

Translational Gerontology Branch, National Institute on Aging, Baltimore, MD, USA.

Thyroid dysfunction is an important public health problem, which affects 10% of the general population and increases the risk of cardiovascular morbidity and mortality. Many aspects of thyroid hormone regulation have only partly been elucidated, including its transport, metabolism, and genetic determinants. Here we report a large meta-analysis of genome-wide association studies for thyroid function and dysfunction, testing 8 million genetic variants in up to 72,167 individuals. One-hundred-and-nine independent genetic variants are associated with these traits. A genetic risk score, calculated to assess their combined effects on clinical end points, shows significant associations with increased risk of both overt (Graves' disease) and subclinical thyroid disease, as well as clinical complications. By functional follow-up on selected signals, we identify a novel thyroid hormone transporter (SLC17A4) and a metabolizing enzyme (AADAT). Together, these results provide new knowledge about thyroid hormone physiology and disease, opening new possibilities for therapeutic targets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-06356-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6203810PMC
October 2018

Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps.

Nat Genet 2018 11 8;50(11):1505-1513. Epub 2018 Oct 8.

Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore.

We expanded GWAS discovery for type 2 diabetes (T2D) by combining data from 898,130 European-descent individuals (9% cases), after imputation to high-density reference panels. With these data, we (i) extend the inventory of T2D-risk variants (243 loci, 135 newly implicated in T2D predisposition, comprising 403 distinct association signals); (ii) enrich discovery of lower-frequency risk alleles (80 index variants with minor allele frequency <5%, 14 with estimated allelic odds ratio >2); (iii) substantially improve fine-mapping of causal variants (at 51 signals, one variant accounted for >80% posterior probability of association (PPA)); (iv) extend fine-mapping through integration of tissue-specific epigenomic information (islet regulatory annotations extend the number of variants with PPA >80% to 73); (v) highlight validated therapeutic targets (18 genes with associations attributable to coding variants); and (vi) demonstrate enhanced potential for clinical translation (genome-wide chip heritability explains 18% of T2D risk; individuals in the extremes of a T2D polygenic risk score differ more than ninefold in prevalence).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0241-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6287706PMC
November 2018

emeraLD: rapid linkage disequilibrium estimation with massive datasets.

Bioinformatics 2019 01;35(1):164-166

Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.

Summary: Estimating linkage disequilibrium (LD) is essential for a wide range of summary statistics-based association methods for genome-wide association studies. Large genetic datasets, e.g. the TOPMed WGS project and UK Biobank, enable more accurate and comprehensive LD estimates, but increase the computational burden of LD estimation. Here, we describe emeraLD (Efficient Methods for Estimation and Random Access of LD), a computational tool that leverages sparsity and haplotype structure to estimate LD up to 2 orders of magnitude faster than current tools.

Availability And Implementation: emeraLD is implemented in C++, and is open source under GPLv3. Source code and documentation are freely available at http://github.com/statgen/emeraLD.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty547DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6298049PMC
January 2019

Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies.

G3 (Bethesda) 2018 10 3;8(10):3255-3267. Epub 2018 Oct 3.

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029

The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance mean imputed r at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5-3.1% for an array of one million sites and 0.7-7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/g3.118.200502DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169386PMC
October 2018

Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes.

Nat Genet 2018 04 9;50(4):559-571. Epub 2018 Apr 9.

Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands.

We aggregated coding variant data for 81,412 type 2 diabetes cases and 370,832 controls of diverse ancestry, identifying 40 coding variant association signals (P < 2.2 × 10); of these, 16 map outside known risk-associated loci. We make two important observations. First, only five of these signals are driven by low-frequency variants: even for these, effect sizes are modest (odds ratio ≤1.29). Second, when we used large-scale genome-wide association data to fine-map the associated variants in their regional context, accounting for the global enrichment of complex trait associations in coding sequence, compelling evidence for coding variant causality was obtained for only 16 signals. At 13 others, the associated coding variants clearly represent 'false leads' with potential to generate erroneous mechanistic inference. Coding variant associations offer a direct route to biological insight for complex diseases and identification of validated therapeutic targets; however, appropriate mechanistic inference requires careful specification of their causal contribution to disease predisposition.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0084-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5898373PMC
April 2018

1000 Genomes-based meta-analysis identifies 10 novel loci for kidney function.

Sci Rep 2017 04 28;7:45040. Epub 2017 Apr 28.

Department of Nephrology, University Hospital Regensburg, Regensburg, Germany.

HapMap imputed genome-wide association studies (GWAS) have revealed >50 loci at which common variants with minor allele frequency >5% are associated with kidney function. GWAS using more complete reference sets for imputation, such as those from The 1000 Genomes project, promise to identify novel loci that have been missed by previous efforts. To investigate the value of such a more complete variant catalog, we conducted a GWAS meta-analysis of kidney function based on the estimated glomerular filtration rate (eGFR) in 110,517 European ancestry participants using 1000 Genomes imputed data. We identified 10 novel loci with p-value < 5 × 10 previously missed by HapMap-based GWAS. Six of these loci (HOXD8, ARL15, PIK3R1, EYA4, ASTN2, and EPB41L3) are tagged by common SNPs unique to the 1000 Genomes reference panel. Using pathway analysis, we identified 39 significant (FDR < 0.05) genes and 127 significantly (FDR < 0.05) enriched gene sets, which were missed by our previous analyses. Among those, the 10 identified novel genes are part of pathways of kidney development, carbohydrate metabolism, cardiac septum development and glucose metabolism. These results highlight the utility of re-imputing from denser reference panels, until whole-genome sequencing becomes feasible in large samples.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep45040DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408227PMC
April 2017

LASER server: ancestry tracing with genotypes or sequence reads.

Bioinformatics 2017 Jul;33(13):2056-2058

Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore.

Summary: To enable direct comparison of ancestry background in different studies, we developed LASER to estimate individual ancestry by placing either sezquenced or genotyped samples in a common ancestry space, regardless of the sequencing strategy or genotyping array used to characterize each sample. Here we describe the LASER server to facilitate application of the method to a wide range of genetic studies. The server provides genetic ancestry estimation for different geographic regions and user-friendly interactive visualization of the results.

Availability And Implementation: The LASER server is freely accessible at http://laser.sph.umich.edu/.

Contact: [email protected] or [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btx075DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870850PMC
July 2017

Fast Sampling-Based Whole-Genome Haplotype Block Recognition.

IEEE/ACM Trans Comput Biol Bioinform 2016 Mar-Apr;13(2):315-25

Scaling linkage disequilibrium (LD) based haplotype block recognition to the entire human genome has always been a challenge. The best-known algorithm has quadratic runtime complexity and, even when sophisticated search space pruning is applied, still requires several days of computations. Here, we propose a novel sampling-based algorithm, called S-MIG (++), where the main idea is to estimate the area that most likely contains all haplotype blocks by sampling a very small number of SNP pairs. A subsequent refinement step computes the exact blocks by considering only the SNP pairs within the estimated area. This approach significantly reduces the number of computed LD statistics, making the recognition of haplotype blocks very fast. We theoretically and empirically prove that the area containing all haplotype blocks can be estimated with a very high degree of certainty. Through experiments on the 243,080 SNPs on chromosome 20 from the 1,000 Genomes Project, we compared our previous algorithm MIG (++) with the new S-MIG (++) and observed a runtime reduction from 2.8 weeks to 34.8 hours. In a parallelized version of the S-MIG (++) algorithm using 32 parallel processes, the runtime was further reduced to 5.1 hours.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2015.2456897DOI Listing
January 2017

Genetic associations at 53 loci highlight cell types and biological pathways relevant for kidney function.

Nat Commun 2016 Jan 21;7:10023. Epub 2016 Jan 21.

Unit of Genetic Epidemiology and Bioinformatics, Department of Epidemiology, University Medical Center Groningen, PO Box 30001, Groningen 9700 RB, The Netherlands.

Reduced glomerular filtration rate defines chronic kidney disease and is associated with cardiovascular and all-cause mortality. We conducted a meta-analysis of genome-wide association studies for estimated glomerular filtration rate (eGFR), combining data across 133,413 individuals with replication in up to 42,166 individuals. We identify 24 new and confirm 29 previously identified loci. Of these 53 loci, 19 associate with eGFR among individuals with diabetes. Using bioinformatics, we show that identified genes at eGFR loci are enriched for expression in kidney tissues and in pathways relevant for kidney development and transmembrane transporter activity, kidney structure, and regulation of glucose metabolism. Chromatin state mapping and DNase I hypersensitivity analyses across adult tissues demonstrate preferential mapping of associated variants to regulatory regions in kidney but not extra-renal tissues. These findings suggest that genetic determinants of eGFR are mediated largely through direct effects within the kidney and highlight important cell types and biological pathways.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms10023DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4735748PMC
January 2016

FamAgg: an R package to evaluate familial aggregation of traits in large pedigrees.

Bioinformatics 2016 05 22;32(10):1583-5. Epub 2016 Jan 22.

Center for Biomedicine, European Academy of Bozen/Bolzano (EURAC) (Affiliated to the University of Lübeck, Lübeck, Germany), Bolzano 39100, Italy and.

Unlabelled: Familial aggregation analysis is the first fundamental step to perform when assessing the extent of genetic background of a disease. However, there is a lack of software to analyze the familial clustering of complex phenotypes in very large pedigrees. Such pedigrees can be utilized to calculate measures that express trait aggregation on both the family and individual level, providing valuable directions in choosing families for detailed follow-up studies. We developed FamAgg, an open source R package that contains both established and novel methods to investigate familial aggregation of traits in large pedigrees. We demonstrate its use and interpretation by analyzing a publicly available cancer dataset with more than 20 000 participants distributed across approximately 400 families.

Availability And Implementation: The FamAgg package is freely available at the Bioconductor repository, http://www.bioconductor.org/packages/FamAgg

Contact: [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btw019DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4866523PMC
May 2016

Genome-wide Association Studies Identify Genetic Loci Associated With Albuminuria in Diabetes.

Diabetes 2016 Mar 2;65(3):803-17. Epub 2015 Dec 2.

Department of Medicine, University of Maryland School of Medicine, Baltimore, MD.

Elevated concentrations of albumin in the urine, albuminuria, are a hallmark of diabetic kidney disease and are associated with an increased risk for end-stage renal disease and cardiovascular events. To gain insight into the pathophysiological mechanisms underlying albuminuria, we conducted meta-analyses of genome-wide association studies and independent replication in up to 5,825 individuals of European ancestry with diabetes and up to 46,061 without diabetes, followed by functional studies. Known associations of variants in CUBN, encoding cubilin, with the urinary albumin-to-creatinine ratio (UACR) were confirmed in the overall sample (P = 2.4 × 10(-10)). Gene-by-diabetes interactions were detected and confirmed for variants in HS6ST1 and near RAB38/CTSC. Single nucleotide polymorphisms at these loci demonstrated a genetic effect on UACR in individuals with but not without diabetes. The change in the average UACR per minor allele was 21% for HS6ST1 (P = 6.3 × 10(-7)) and 13% for RAB38/CTSC (P = 5.8 × 10(-7)). Experiments using streptozotocin-induced diabetic Rab38 knockout and control rats showed higher urinary albumin concentrations and reduced amounts of megalin and cubilin at the proximal tubule cell surface in Rab38 knockout versus control rats. Relative expression of RAB38 was higher in tubuli of patients with diabetic kidney disease compared with control subjects. The loci identified here confirm known pathways and highlight novel pathways influencing albuminuria.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2337/db15-1313DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4764151PMC
March 2016

Meta-analysis of genome-wide association studies identifies two loci associated with circulating osteoprotegerin levels.

Hum Mol Genet 2014 Dec 30;23(24):6684-93. Epub 2014 Jul 30.

Department of Medicine,

Osteoprotegerin (OPG) is involved in bone homeostasis and tumor cell survival. Circulating OPG levels are also important biomarkers of various clinical traits, such as cancers and atherosclerosis. OPG levels were measured in serum or in plasma. In a meta-analysis of genome-wide association studies in up to 10 336 individuals from European and Asian origin, we discovered that variants >100 kb upstream of the TNFRSF11B gene encoding OPG and another new locus on chromosome 17q11.2 were significantly associated with OPG variation. We also identified a suggestive locus on chromosome 14q21.2 associated with the trait. Moreover, we estimated that over half of the heritability of OPG levels could be explained by all variants examined in our study. Our findings provide further insight into the genetic regulation of circulating OPG levels.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/ddu386DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4240210PMC
December 2014

Efficient haplotype block recognition of very long and dense genetic sequences.

BMC Bioinformatics 2014 Jan 14;15:10. Epub 2014 Jan 14.

Center for Biomedicine, European Academy of Bolzano/Bozen (EURAC), Bozen-Bolzano, Italy.

Background: The new sequencing technologies enable to scan very long and dense genetic sequences, obtaining datasets of genetic markers that are an order of magnitude larger than previously available. Such genetic sequences are characterized by common alleles interspersed with multiple rarer alleles. This situation has renewed the interest for the identification of haplotypes carrying the rare risk alleles. However, large scale explorations of the linkage-disequilibrium (LD) pattern to identify haplotype blocks are not easy to perform, because traditional algorithms have at least Θ(n2) time and memory complexity.

Results: We derived three incremental optimizations of the widely used haplotype block recognition algorithm proposed by Gabriel et al. in 2002. Our most efficient solution, called MIG ++, has only Θ(n) memory complexity and, on a genome-wide scale, it omits >80% of the calculations, which makes it an order of magnitude faster than the original algorithm. Differently from the existing software, the MIG ++ analyzes the LD between SNPs at any distance, avoiding restrictions on the maximal block length. The haplotype block partition of the entire HapMap II CEPH dataset was obtained in 457 hours. By replacing the standard likelihood-based D' variance estimator with an approximated estimator, the runtime was further improved. While producing a coarser partition, the approximate method allowed to obtain the full-genome haplotype block partition of the entire 1000 Genomes Project CEPH dataset in 44 hours, with no restrictions on allele frequency or long-range correlations. These experiments showed that LD-based haplotype blocks can span more than one million base-pairs in both HapMap II and 1000 Genomes datasets. An application to the North American Rheumatoid Arthritis Consortium (NARAC) dataset shows how the MIG ++ can support genome-wide haplotype association studies.

Conclusions: The MIG ++ enables to perform LD-based haplotype block recognition on genetic sequences of any length and density. In the new generation sequencing era, this can help identify haplotypes that carry rare variants of interest. The low computational requirements open the possibility to include the haplotype block structure into genome-wide association scans, downstream analyses, and visual interfaces for online genome browsers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-15-10DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3898000PMC
January 2014

Common variants in Mendelian kidney disease genes and their association with renal function.

J Am Soc Nephrol 2013 Dec 12;24(12):2105-17. Epub 2013 Sep 12.

Division of Nephrology, University of Maryland School of Medicine, Baltimore, Maryland;

Many common genetic variants identified by genome-wide association studies for complex traits map to genes previously linked to rare inherited Mendelian disorders. A systematic analysis of common single-nucleotide polymorphisms (SNPs) in genes responsible for Mendelian diseases with kidney phenotypes has not been performed. We thus developed a comprehensive database of genes for Mendelian kidney conditions and evaluated the association between common genetic variants within these genes and kidney function in the general population. Using the Online Mendelian Inheritance in Man database, we identified 731 unique disease entries related to specific renal search terms and confirmed a kidney phenotype in 218 of these entries, corresponding to mutations in 258 genes. We interrogated common SNPs (minor allele frequency >5%) within these genes for association with the estimated GFR in 74,354 European-ancestry participants from the CKDGen Consortium. However, the top four candidate SNPs (rs6433115 at LRP2, rs1050700 at TSC1, rs249942 at PALB2, and rs9827843 at ROBO2) did not achieve significance in a stage 2 meta-analysis performed in 56,246 additional independent individuals, indicating that these common SNPs are not associated with estimated GFR. The effect of less common or rare variants in these genes on kidney function in the general population and disease-specific cohorts requires further research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1681/ASN.2012100983DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3839542PMC
December 2013

Importance of different types of prior knowledge in selecting genome-wide findings for follow-up.

Genet Epidemiol 2013 Feb;37(2):205-13

Center for Biomedicine, European Academy Bozen/Bolzano (EURAC), Bolzano, Italy.

Biological plausibility and other prior information could help select genome-wide association (GWA) findings for further follow-up, but there is no consensus on which types of knowledge should be considered or how to weight them. We used experts' opinions and empirical evidence to estimate the relative importance of 15 types of information at the single-nucleotide polymorphism (SNP) and gene levels. Opinions were elicited from 10 experts using a two-round Delphi survey. Empirical evidence was obtained by comparing the frequency of each type of characteristic in SNPs established as being associated with seven disease traits through GWA meta-analysis and independent replication, with the corresponding frequency in a randomly selected set of SNPs. SNP and gene characteristics were retrieved using a specially developed bioinformatics tool. Both the expert and the empirical evidence rated previous association in a meta-analysis or more than one study as conferring the highest relative probability of true association, whereas previous association in a single study ranked much lower. High relative probabilities were also observed for location in a functional protein domain, although location in a region evolutionarily conserved in vertebrates was ranked high by the data but not by the experts. Our empirical evidence did not support the importance attributed by the experts to whether the gene encodes a protein in a pathway or shows interactions relevant to the trait. Our findings provide insight into the selection and weighting of different types of knowledge in SNP or gene prioritization, and point to areas requiring further research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.21705DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3725558PMC
February 2013

SNP prioritization using a Bayesian probability of association.

Genet Epidemiol 2013 Feb 26;37(2):214-21. Epub 2012 Dec 26.

Department of Health Sciences, University of Leicester, Leicester, United Kingdom.

Prioritization is the process whereby a set of possible candidate genes or SNPs is ranked so that the most promising can be taken forward into further studies. In a genome-wide association study, prioritization is usually based on the P-values alone, but researchers sometimes take account of external annotation information about the SNPs such as whether the SNP lies close to a good candidate gene. Using external information in this way is inherently subjective and is often not formalized, making the analysis difficult to reproduce. Building on previous work that has identified 14 important types of external information, we present an approximate Bayesian analysis that produces an estimate of the probability of association. The calculation combines four sources of information: the genome-wide data, SNP information derived from bioinformatics databases, empirical SNP weights, and the researchers' subjective prior opinions. The calculation is fast enough that it can be applied to millions of SNPS and although it does rely on subjective judgments, those judgments are made explicit so that the final SNP selection can be reproduced. We show that the resulting probability of association is intuitively more appealing than the P-value because it is easier to interpret and it makes allowance for the power of the study. We illustrate the use of the probability of association for SNP prioritization by applying it to a meta-analysis of kidney function genome-wide association studies and demonstrate that SNP selection performs better using the probability of association compared with P-values alone.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.21704DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3725584PMC
February 2013

Integration of genome-wide association studies with biological knowledge identifies six novel genes related to kidney function.

Hum Mol Genet 2012 Dec 8;21(24):5329-43. Epub 2012 Sep 8.

Division of Preventive Medicine, Brigham and Women's Hospital, Boston, MA 02215, USA.

In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10(-9)) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10(-4)-2.2 × 10(-7). Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/dds369DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3607468PMC
December 2012

Genome-wide association and functional follow-up reveals new loci for kidney function.

PLoS Genet 2012 29;8(3):e1002584. Epub 2012 Mar 29.

Institute of Genetic Medicine, European Academy of Bozen/Bolzano (EURAC) and Affiliated Institute of the University of Lübeck, Bolzano, Italy.

Chronic kidney disease (CKD) is an important public health problem with a genetic component. We performed genome-wide association studies in up to 130,600 European ancestry participants overall, and stratified for key CKD risk factors. We uncovered 6 new loci in association with estimated glomerular filtration rate (eGFR), the primary clinical measure of CKD, in or near MPPED2, DDX1, SLC47A1, CDK12, CASP9, and INO80. Morpholino knockdown of mpped2 and casp9 in zebrafish embryos revealed podocyte and tubular abnormalities with altered dextran clearance, suggesting a role for these genes in renal function. By providing new insights into genes that regulate renal function, these results could further our understanding of the pathogenesis of CKD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1002584DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315455PMC
September 2012

GWAtoolbox: an R package for fast quality control and handling of genome-wide association studies meta-analysis data.

Bioinformatics 2012 Feb 8;28(3):444-5. Epub 2011 Dec 8.

Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.

Summary: The GWAtoolbox is an R package that standardizes and accelerates the handling of data from genome-wide association studies (GWAS), particularly in the context of large-scale GWAS meta-analyses. A key feature of GWAtoolbox is its ability to perform quality control (QC) of any number of files in a matter of minutes. The implemented workflow has been structured to check three particular data quality aspects: (i) data formatting, (ii) quality of the GWAS results and (iii) data consistency across studies. Output consists of an extensive list of quality statistics and plots which allow inspection of individual files and between-study comparison to identify systematic bias.

Availability: http://www.eurac.edu/GWAtoolbox

Contact: [email protected]; [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btr679DOI Listing
February 2012

Linkage and association analysis of hyperthyrotropinaemia in an Alpine population reveal two novel loci on chromosomes 3q28-29 and 6q26-27.

J Med Genet 2011 Aug 20;48(8):549-56. Epub 2011 Jun 20.

Institute of Genetic Medicine, European Academy of Bolzano/Bozen, Viale Druso/Drususallee 1, Bolzano/Bozen, Italy.

Background: Thyroid hormones have important roles in growth, development and control of metabolism, and their dysregulation can lead to disease.

Objectives: To identify genes contributing to hyperthyrotropinaemia.

Design, Setting, Participants: Linkage and association analyses using 1258 individuals from three Alpine villages.

Outcome Measures: The study applied two different upper limits of the reference range (URR) for serum thyroid stimulating hormone (TSH) values (TSH ≥4.6 mU/l and TSH >3.0 mU/l), along with normal or low fT4 (free thyroxine) values or thyroid medical treatment to define two groups of individuals for analysis: one hyperthyrotropinaemic or high-TSH (H-TSH) (TSH ≥4.6 mU/l) group; and a larger group (TSH >3.0 mU/l) called hyperthyrotropinaemic and upper reference range TSH (H+URR-TSH).

Results: Non-parametric genome-wide linkage analysis was performed on pedigrees generated from the two groups. Linkage analysis in the H+URR-TSH group revealed a significant peak on chromosome 3q28-q29 (LOD 3.34) and a suggestive linkage peak on chromosome 6q26-27 (LOD 2.66). Analysis in the smaller hyperthyrotropinaemic (H-TSH) group supported linkage to chromosome 6q26-27. Single SNP and gene based SNP association analyses under the linkage peaks identified the PDE10A and DACT2 genes as candidates at the chromosome 6 locus.

Conclusions: PDE10A or DACT2 were identified as candidate genes contributing to hyperthyrotropinaemia (and possibly hypothyroidism) in this sample. Studies in additional populations support association of variants at this locus with TSH values, especially in the PDE10A gene. Genetic linkage in families with hyperthyrotropinaemia suggests the presence of functional variants that contribute to pathological disruption of the hypothalamus-pituitary-thyroid axis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1136/jmg.2010.088583DOI Listing
August 2011
-->