Publications by authors named "Piotr Dworzyński"

11 Publications

  • Page 1 of 1

Nationwide prediction of type 2 diabetes comorbidities.

Sci Rep 2020 02 4;10(1):1776. Epub 2020 Feb 4.

The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

Identification of individuals at risk of developing disease comorbidities represents an important task in tackling the growing personal and societal burdens associated with chronic diseases. We employed machine learning techniques to investigate to what extent data from longitudinal, nationwide Danish health registers can be used to predict individuals at high risk of developing type 2 diabetes (T2D) comorbidities. Leveraging logistic regression-, random forest- and gradient boosting models and register data spanning hospitalizations, drug prescriptions and contacts with primary care contractors from >200,000 individuals newly diagnosed with T2D, we predicted five-year risk of heart failure (HF), myocardial infarction (MI), stroke (ST), cardiovascular disease (CVD) and chronic kidney disease (CKD). For HF, MI, CVD, and CKD, register-based models outperformed a reference model leveraging canonical individual characteristics by achieving area under the receiver operating characteristic curve improvements of 0.06, 0.03, 0.04, and 0.07, respectively. The top 1,000 patients predicted to be at highest risk exhibited observed incidence ratios exceeding 4.99, 3.52, 1.97 and 4.71 respectively. In summary, prediction of T2D comorbidities utilizing Danish registers led to consistent albeit modest performance improvements over reference models, suggesting that register data could be leveraged to systematically identify individuals at risk of developing disease comorbidities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-020-58601-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7000818PMC
February 2020

A catalog of the mouse gut metagenome.

Nat Biotechnol 2015 Oct 28;33(10):1103-8. Epub 2015 Sep 28.

Institut National de la Recherche Agronomique (Microbiologie de l'Alimentation au Service de la Santé), Jouy en Josas, France.

We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing laboratories and fed either a low-fat or high-fat diet. Similar to the human gut microbiome, >99% of the cataloged genes are bacterial. We identified 541 metagenomic species and defined a core set of 26 metagenomic species found in 95% of the mice. The mouse gut microbiome is functionally similar to its human counterpart, with 95.2% of its Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologous groups in common. However, only 4.0% of the mouse gut microbial genes were shared (95% identity, 90% coverage) with those of the human gut microbiome. This catalog provides a useful reference for future studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.3353DOI Listing
October 2015

Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios.

Nat Commun 2015 Jan 19;6:5969. Epub 2015 Jan 19.

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kemitorvet 208, DK-2800 Kgs Lyngby, Denmark.

Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing clinical and public health genetics in large cohorts. Here we report a Danish pan-genome obtained from sequencing 10 trios to high depth (50 × ). We report 536k novel SNVs and 283k novel short indels from mapping approaches and develop a population-wide de novo assembly approach to identify 132k novel indels larger than 10 nucleotides with low false discovery rates. We identify a higher proportion of indels and SVs than previous efforts showing the merits of high coverage and de novo assembly approaches. In addition, we use trio information to identify de novo mutations and use a probabilistic method to provide direct estimates of 1.27e-8 and 1.5e-9 per nucleotide per generation for SNVs and indels, respectively.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms6969DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4309431PMC
January 2015

Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes.

Nat Biotechnol 2014 Aug 6;32(8):822-8. Epub 2014 Jul 6.

1] BGI-Shenzhen, Shenzhen, China. [2] Department of Biology, University of Copenhagen, Copenhagen, Denmark.

Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.2939DOI Listing
August 2014

An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge.

Genome Biol 2014 Mar 25;15(3):R53. Epub 2014 Mar 25.

Background: There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance.

Results: A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization.

Conclusions: The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/gb-2014-15-3-r53DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4073084PMC
March 2014

MetaRanker 2.0: a web server for prioritization of genetic variation data.

Nucleic Acids Res 2013 Jul 22;41(Web Server issue):W104-8. Epub 2013 May 22.

Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark.

MetaRanker 2.0 is a web server for prioritization of common and rare frequency genetic variation data. Based on heterogeneous data sets including genetic association data, protein-protein interactions, large-scale text-mining data, copy number variation data and gene expression experiments, MetaRanker 2.0 prioritizes the protein-coding part of the human genome to shortlist candidate genes for targeted follow-up studies. MetaRanker 2.0 is made freely available at www.cbs.dtu.dk/services/MetaRanker-2.0.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkt387DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692047PMC
July 2013

Mutations in FGF17, IL17RD, DUSP6, SPRY4, and FLRT3 are identified in individuals with congenital hypogonadotropic hypogonadism.

Am J Hum Genet 2013 May;92(5):725-43

Faculty of Biology and Medicine, University of Lausanne in collaboration with Service of Endocrinology, Diabetology, and Metabolism, Centre Hospitalier Universitaire Vaudois, Rue du Bugnon 7, Lausanne CH-1005, Switzerland.

Congenital hypogonadotropic hypogonadism (CHH) and its anosmia-associated form (Kallmann syndrome [KS]) are genetically heterogeneous. Among the >15 genes implicated in these conditions, mutations in FGF8 and FGFR1 account for ~12% of cases; notably, KAL1 and HS6ST1 are also involved in FGFR1 signaling and can be mutated in CHH. We therefore hypothesized that mutations in genes encoding a broader range of modulators of the FGFR1 pathway might contribute to the genetics of CHH as causal or modifier mutations. Thus, we aimed to (1) investigate whether CHH individuals harbor mutations in members of the so-called "FGF8 synexpression" group and (2) validate the ability of a bioinformatics algorithm on the basis of protein-protein interactome data (interactome-based affiliation scoring [IBAS]) to identify high-quality candidate genes. On the basis of sequence homology, expression, and structural and functional data, seven genes were selected and sequenced in 386 unrelated CHH individuals and 155 controls. Except for FGF18 and SPRY2, all other genes were found to be mutated in CHH individuals: FGF17 (n = 3 individuals), IL17RD (n = 8), DUSP6 (n = 5), SPRY4 (n = 14), and FLRT3 (n = 3). Independently, IBAS predicted FGF17 and IL17RD as the two top candidates in the entire proteome on the basis of a statistical test of their protein-protein interaction patterns to proteins known to be altered in CHH. Most of the FGF17 and IL17RD mutations altered protein function in vitro. IL17RD mutations were found only in KS individuals and were strongly linked to hearing loss (6/8 individuals). Mutations in genes encoding components of the FGF pathway are associated with complex modes of CHH inheritance and act primarily as contributors to an oligogenic genetic architecture underlying CHH.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2013.04.008DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3644636PMC
May 2013

Crystal structures of putative phosphoglycerate kinases from B. anthracis and C. jejuni.

J Struct Funct Genomics 2012 Mar 10;13(1):15-26. Epub 2012 Mar 10.

Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue, Charlottesville, VA 22908, USA.

Phosphoglycerate kinase (PGK) is indispensable during glycolysis for anaerobic glucose degradation and energy generation. Here we present comprehensive structure analysis of two putative PGKs from Bacillus anthracis str. Sterne and Campylobacter jejuni in the context of their structural homologs. They are the first PGKs from pathogenic bacteria reported in the Protein Data Bank. The crystal structure of PGK from Bacillus anthracis str. Sterne (BaPGK) has been determined at 1.68 Å while the structure of PGK from Campylobacter jejuni (CjPGK) has been determined at 2.14 Å resolution. The proteins' monomers are composed of two domains, each containing a Rossmann fold, hinged together by a helix which can be used to adjust the relative position between two domains. It is also shown that apo-forms of both BaPGK and CjPGK adopt open conformations as compared to the substrate and ATP bound forms of PGK from other species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10969-012-9131-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4485498PMC
March 2012

Protein interaction-based genome-wide analysis of incident coronary heart disease.

Circ Cardiovasc Genet 2011 Oct 31;4(5):549-56. Epub 2011 Aug 31.

Department of Nutrition, Harvard School of Public Health, Boston, MA 02115, USA.

Background: Network-based approaches may leverage genome-wide association (GWA) analysis by testing for the aggregate association across several pathway members. We aimed to examine if networks of genes that represent experimentally determined protein-protein interactions (PPIs) are enriched in genes associated with risk of coronary heart disease (CHD).

Methods And Results: Genome-wide association analyses of approximately ≈700,000 single-nucleotide polymorphisms in 899 incident CHD cases and 1823 age- and sex-matched controls within the Nurses' Health and the Health Professionals Follow-up Studies were used to assign genewise P values. A large database of PPIs was used to assemble 8351 unbiased protein complexes and corresponding gene sets. Superimposed genewise P values were used to rank gene sets based on their enrichment in genes associated with CHD. After correcting for the number of complexes tested, 1 gene set was overrepresented in CHD-associated genes (P=0.002). Centered on the β1-adrenergic receptor gene (ADRB1), this complex included 18 protein interaction partners that have not been identified as candidate loci for CHD. Of the 19 genes in the top complex, 5 are involved in abnormal cardiovascular system physiological features based on knockout mice (4-fold enrichment; Fisher exact test, P=0.006). Ingenuity pathway analysis revealed that canonical pathways, especially related to blood pressure regulation, were significantly enriched in the genes from the top complex.

Conclusions: The integration of a GWA study with PPI data successfully identifies a set of candidate susceptibility genes for incident CHD that would have been missed in single-marker GWA analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1161/CIRCGENETICS.111.960393DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3197770PMC
October 2011

Meta-analysis of heterogeneous data sources for genome-scale identification of risk genes in complex phenotypes.

Genet Epidemiol 2011 Jul 11;35(5):318-32. Epub 2011 Apr 11.

Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark.

Meta-analyses of large-scale association studies typically proceed solely within one data type and do not exploit the potential complementarities in other sources of molecular evidence. Here, we present an approach to combine heterogeneous data from genome-wide association (GWA) studies, protein-protein interaction screens, disease similarity, linkage studies, and gene expression experiments into a multi-layered evidence network which is used to prioritize the entire protein-coding part of the genome identifying a shortlist of candidate genes. We report specifically results on bipolar disorder, a genetically complex disease where GWA studies have only been moderately successful. We validate one such candidate experimentally, YWHAH, by genotyping five variations in 640 patients and 1,377 controls. We found a significant allelic association for the rs1049583 polymorphism in YWHAH (adjusted P = 5.6e-3) with an odds ratio of 1.28 [1.12-1.48], which replicates a previous case-control study. In addition, we demonstrate our approach's general applicability by use of type 2 diabetes data sets. The method presented augments moderately powered GWA data, and represents a validated, flexible, and publicly available framework for identifying risk genes in highly polygenic diseases. The method is made available as a web service at www.cbs.dtu.dk/services/metaranker.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/gepi.20580DOI Listing
July 2011
-->