Publications by authors named "Jose M G Izarzugaza"

37 Publications

Semen quality and waiting time to pregnancy explored using association mining.

Andrology 2021 03 14;9(2):577-587. Epub 2020 Nov 14.

University Department of Growth and Reproduction, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark.

Background: Assessment of semen quality is a key pillar in the evaluation of men from infertile couples. Usually, semen parameters are interpreted individually because the interactions between parameters are difficult to account for.

Objectives: To determine how combinations of classical semen parameters and female partner age were associated with waiting time to pregnancy (TTP).

Materials And Methods: Semen results of 500 fertile men, information of TTP, and partner age were used for regressions and to detect breaking points. For a modified Association Rule Mining algorithm, semen parameters were categorized as High, Medium, and Low.

Results: Men ≤32.1 years and women ≤32.9 years had shorter TTP than older. Decreasing TTP was associated with increasing level of individual semen parameters up to threshold values: sperm concentration 46 mill/mL, total sperm count 179 mill, progressive motility 63%, and normal morphology 11.5%. Using association mining, approximately 100 combinations of semen parameters and partner age were associated with TTP. TTP ≤ 1 month often co-occurred with high percentages of progressive motility (≥62%) and morphologically normal spermatozoa (≥10.5%). Furthermore, TTP ≤ 1 did not tend to appear with lower percentages of these two semen parameters or high partner age (≥32 years). However, high percentages of motile or normal spermatozoa could not compensate for sperm concentration ≤42 mill/mL or total sperm count ≤158 mill. The prolonging effect of high partner age could not be compensated for by the man's semen quality.

Discussion And Conclusion: Using association mining, we observed that TTP was best predicted when combinations of semen parameters were accounted for. Sperm counts, motility, and morphology were all important, and no single semen parameter was inferior. Additionally, female age above 32 years had a negative impact on TTP that could not be compensated for by high semen parameters of the man.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/andr.12924DOI Listing
March 2021

Systems genetics analysis identifies calcium-signaling defects as novel cause of congenital heart disease.

Genome Med 2020 08 28;12(1):76. Epub 2020 Aug 28.

Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3A, DK-2200, Copenhagen, Denmark.

Background: Congenital heart disease (CHD) occurs in almost 1% of newborn children and is considered a multifactorial disorder. CHD may segregate in families due to significant contribution of genetic factors in the disease etiology. The aim of the study was to identify pathophysiological mechanisms in families segregating CHD.

Methods: We used whole exome sequencing to identify rare genetic variants in ninety consenting participants from 32 Danish families with recurrent CHD. We applied a systems biology approach to identify developmental mechanisms influenced by accumulation of rare variants. We used an independent cohort of 714 CHD cases and 4922 controls for replication and performed functional investigations using zebrafish as in vivo model.

Results: We identified 1785 genes, in which rare alleles were shared between affected individuals within a family. These genes were enriched for known cardiac developmental genes, and 218 of these genes were mutated in more than one family. Our analysis revealed a functional cluster, enriched for proteins with a known participation in calcium signaling. Replication in an independent cohort confirmed increased mutation burden of calcium-signaling genes in CHD patients. Functional investigation of zebrafish orthologues of ITPR1, PLCB2, and ADCY2 verified a role in cardiac development and suggests a combinatorial effect of inactivation of these genes.

Conclusions: The study identifies abnormal calcium signaling as a novel pathophysiological mechanism in human CHD and confirms the complex genetic architecture underlying CHD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13073-020-00772-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7453558PMC
August 2020

In the rat pancreas, somatostatin tonically inhibits glucagon secretion and is required for glucose-induced inhibition of glucagon secretion.

Acta Physiol (Oxf) 2020 07 25;229(3):e13464. Epub 2020 Mar 25.

Department of Biomedical Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

Aim: It is debated whether the inhibition of glucagon secretion by glucose results from direct effects of glucose on the α-cell (intrinsic regulation) or by paracrine effects exerted by beta- or delta-cell products.

Methods: To study this in a more physiological model than isolated islets, we perfused isolated rat pancreases and measured glucagon, insulin and somatostatin secretion in response to graded increases in perfusate glucose concentration (from 3.5 to 4, 5, 6, 7, 8, 10, 12 mmol/L) as well as glucagon responses to blockage/activation of insulin/GABA/somatostatin signalling with or without addition of glucose.

Results: Glucagon secretion was reduced by about 50% (compared to baseline secretion at 3.5 mmol/L) within minutes after increasing glucose from 4 to 5 mmol/L (P < .01, n = 13). Insulin secretion was increased minimally, but significantly, compared to baseline (3.5 mmol/L) at 4 mmol/L, whereas somatostatin secretion was not significantly increased from baseline until 7 mmol/L. Hereafter secretion of both increased gradually up to 12 mmol/L glucose. Neither recombinant insulin (1 µmol/L), GABA (300 µmol/L) or the insulin-receptor antagonist S961 (at 1 µmol/L) affected basal (3.5 mmol/L) or glucose-induced (5.0 mmol/L) attenuation of glucagon secretion (n = 7-8). Somatostatin-14 attenuated glucagon secretion by ~ 95%, and blockage of somatostatin-receptor (SSTR)-2 or combined blockage of SSTR-2, -3 and -5 by specific antagonists increased glucagon output (at 3.5 mmol/L glucose) and prevented glucose-induced (from 3.5 to 5.0 mmol/L) suppression of secretion.

Conclusion: Somatostatin is a powerful and tonic inhibitor of glucagon secretion from the rat pancreas and is required for glucose to inhibit glucagon secretion.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/apha.13464DOI Listing
July 2020

Pathway and network analysis of more than 2500 whole cancer genomes.

Nat Commun 2020 02 5;11(1):729. Epub 2020 Feb 5.

Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA.

The catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notably TERT promoter mutations, have been reported. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancer across 38 tumor types, we perform multi-faceted pathway and network analyses of non-coding mutations across 2583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project that was motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes. While few non-coding genomic elements are recurrently mutated in this cohort, we identify 93 genes harboring non-coding mutations that cluster into several modules of interacting proteins. Among these are promoter mutations associated with reduced mRNA expression in TP53, TLE4, and TCF4. We find that biological processes had variable proportions of coding and non-coding mutations, with chromatin remodeling and proliferation pathways altered primarily by coding mutations, while developmental pathways, including Wnt and Notch, altered by both coding and non-coding mutations. RNA splicing is primarily altered by non-coding mutations in this cohort, and samples containing non-coding mutations in well-known RNA splicing factors exhibit similar gene expression signatures as samples with coding mutations in these genes. These analyses contribute a new repertoire of possible cancer genes and mechanisms that are altered by non-coding mutations and offer insights into additional cancer vulnerabilities that can be investigated for potential therapeutic treatments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-14367-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7002574PMC
February 2020

Von Frey testing revisited: Provision of an online algorithm for improved accuracy of 50% thresholds.

Eur J Pain 2020 04 23;24(4):783-790. Epub 2020 Jan 23.

Danish Headache Center, Glostrup Research Institute, Rigshospitalet Glostrup, Denmark.

Background: In the pain field, it is essential to quantify nociceptive responses. The response to the application of von Frey filaments to the skin measures tactile sensitivity and is a surrogate marker of allodynia in states of peripheral and/or central sensitization. The method is widely used across species within the pain field. However, uncertainties appear to exist regarding the appropriate method for analysing obtained data. Therefore, there is a need for refinement of the calculations for transformation of raw data to quantifiable data.

Methods: Here, we briefly review the fundamentals behind von Frey testing using the standard up-down method and the associated statistics and show how different parameters of the statistical equation influence the calculated 50% threshold results. We discuss how to obtain the most accurate estimations in a given experimental setting.

Results: To enhance accuracy and reproducibility across laboratories, we present an easy to use algorithm that calculates 50% thresholds based on the exact filaments and their interval using math beyond the traditional methods. This tool is available to the everyday user of von Frey filaments and allows the insertion of all imaginable ranges of filaments and is thus applicable to data derived in any species.

Conclusion: We advocate for the use of this algorithm to minimize inaccuracies and to improve internal and external reproducibility.

Significance: The von Frey testing procedure is standard for assessing peripheral and central sensitization but is associated with inaccuracies and lack of transparency in the associated math. Here, we describe these problems and present a novel statistical algorithm that calculates the exact thresholds using math beyond the traditional methods. The online platform is transparent, free of charge and easy to use also for the everyday user of von Frey filaments. Application of this resource will ultimately reduce errors due to methodological misinterpretations and increase reproducibility across laboratories.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/ejp.1528DOI Listing
April 2020

Identification of hyper-rewired genomic stress non-oncogene addiction genes across 15 cancer types.

NPJ Syst Biol Appl 2019 7;5:27. Epub 2019 Aug 7.

1Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, DK-2200 Copenhagen, Denmark.

Non-oncogene addiction (NOA) genes are essential for supporting the stress-burdened phenotype of tumours and thus vital for their survival. Although NOA genes are acknowledged to be potential drug targets, there has been no large-scale attempt to identify and characterise them as a group across cancer types. Here we provide the first method for the identification of conditional NOA genes and their rewired neighbours using a systems approach. Using copy number data and expression profiles from The Cancer Genome Atlas (TCGA) we performed comparative analyses between high and low genomic stress tumours for 15 cancer types. We identified 101 condition-specific differential coexpression modules, mapped to a high-confidence human interactome, comprising 133 candidate NOA rewiring hub genes. We observe that most modules lose coexpression in the high-stress state and that activated stress modules and hubs take part in homoeostasis maintenance processes such as chromosome segregation, oxireductase activity, mitotic checkpoint (PLK1 signalling), DNA replication initiation and synaptic signalling. We furthermore show that candidate NOA rewiring hubs are unique for each cancer type, but that their respective rewired neighbour genes largely are shared across cancer types.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41540-019-0104-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6685999PMC
April 2020

High-Throughput Sequencing-Based Investigation of Viruses in Human Cancers by Multienrichment Approach.

J Infect Dis 2019 09;220(8):1312-1324

Department of Surgery, Herlev and Gentofte Hospital, University of Copenhagen, Denmark.

Background: Viruses and other infectious agents cause more than 15% of human cancer cases. High-throughput sequencing-based studies of virus-cancer associations have mainly focused on cancer transcriptome data.

Methods: In this study, we applied a diverse selection of presequencing enrichment methods targeting all major viral groups, to characterize the viruses present in 197 samples from 18 sample types of cancerous origin. Using high-throughput sequencing, we generated 710 datasets constituting 57 billion sequencing reads.

Results: Detailed in silico investigation of the viral content, including exclusion of viral artefacts, from de novo assembled contigs and individual sequencing reads yielded a map of the viruses detected. Our data reveal a virome dominated by papillomaviruses, anelloviruses, herpesviruses, and parvoviruses. More than half of the included samples contained 1 or more viruses; however, no link between specific viruses and cancer types were found.

Conclusions: Our study sheds light on viral presence in cancers and provides highly relevant virome data for future reference.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/infdis/jiz318DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6743825PMC
September 2019

Conflicting associations between dietary patterns and changes of anthropometric traits across subgroups of middle-aged women and men.

Clin Nutr 2020 01 14;39(1):265-275. Epub 2019 Feb 14.

Center for Biological Sequence Analysis, Technical University of Denmark, 2800 Lyngby, Denmark; Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark. Electronic address:

Background: Individuals respond differently to dietary intake leading to different associations between diet and traits. Most studies have investigated large cohorts without subgrouping them.

Objective: The purpose was to identify non-uniform associations between diets and anthropometric traits that appeared to be in conflict with one another across subgroups.

Design: We used a cohort comprising 43,790 women and men, the Danish Diet, Cancer and Health study, which includes a baseline examination at age 50-64 years and a follow-up about 5 years later. The baseline examination involved anthropometrics, body fat percentage, a food frequency questionnaire and information on lifestyle. From the questionnaire data we computed association rules between the intake of food groups and changes in waist circumference and body weight. Using association rule mining on subgroups and gender-specific cohorts, we identified non-uniform associations. The two gender-specific cohorts were stratified into subgroups using a non-linear, self-organizing map based method.

Results: We found 22 and 7 cases of conflicting rules in 8 participant subgroups for different anthropometric traits in women and men, respectively. For example, in a subgroup of women moderate waist loss was associated with a dietary pattern characterized by low intake in both cabbages and wine, in conflict with the association trends of both dietary factors in the female cohort. The finding of more conflicting rules in women suggests that inter-individual differences in response to dietary intake are stronger in women than in men.

Conclusions: This combined stratification and association discovery approach revealed epidemiological relationships between dietary factors and changes in anthropometric traits in subgroups that take food group interactions into account. Conflicting rules adds an additional layer of complexity that should be integrated into the study of these relationships, for example in relation to genotypes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.clnu.2019.02.003DOI Listing
January 2020

A generic deep convolutional neural network framework for prediction of receptor-ligand interactions-NetPhosPan: application to kinase phosphorylation prediction.

Bioinformatics 2019 04;35(7):1098-1107

Laboratorio de Bioinformatica, Instituto de Investigaciones Biotecnologicas, Universidad Nacional de San Martin, San Martin, B, HMP Buenos Aires, Argentina.

Motivation: Understanding the specificity of protein receptor-ligand interactions is pivotal for our comprehension of biological mechanisms and systems. Receptor protein families often have a certain level of sequence diversity that converges into fewer conserved protein structures, allowing the exertion of well-defined functions. T and B cell receptors of the immune system and protein kinases that control the dynamic behaviour and decision processes in eukaryotic cells by catalysing phosphorylation represent prime examples. Driven by the large sequence diversity, the receptors within such protein families are often found to share specificities although divergent at the sequence level. This observation has led to the notion that prediction models of such systems are most effectively handled in a receptor-specific manner.

Results: We show that this approach in many cases is suboptimal, and describe an alternative improved framework for generating models with pan-receptor-predictive power for receptor protein families. The framework is based on deep artificial neural networks and integrates information from individual receptors into a single pan-receptor model, leveraging information across multiple receptor-specific datasets allowing predictions of the receptor specificity for all members of a given protein family including those described by limited or no ligand data. The approach was applied to the protein kinase superfamily, leading to the method NetPhosPan. The method was extensively validated and benchmarked against state-of-the-art prediction methods and was found to have unprecedented performance in particularly for kinase domains characterized by limited or no experimental data.

Availability And Implementation: The method is freely available to non-commercial users and can be downloaded at http://www.cbs.dtu.dk/services/NetPhospan-1.0.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty715DOI Listing
April 2019

Benchmarking the HLA typing performance of Polysolver and Optitype in 50 Danish parental trios.

BMC Bioinformatics 2018 06 25;19(1):239. Epub 2018 Jun 25.

Center for Biological Sequence Analysis, Department of Bio and Health Informatics, Technical University of Denmark, DK-2800, Lyngby, Denmark.

Background: The adaptive immune response intrinsically depends on hypervariable human leukocyte antigen (HLA) genes. Concomitantly, correct HLA phenotyping is crucial for successful donor-patient matching in organ transplantation. The cost and technical limitations of current laboratory techniques, together with advances in next-generation sequencing (NGS) methodologies, have increased the need for precise computational typing methods.

Results: We tested two widespread HLA typing methods using high quality full genome sequencing data from 150 individuals in 50 family trios from the Genome Denmark project. First, we computed descendant accuracies assessing the agreement in the inheritance of alleles from parents to offspring. Second, we compared the locus-specific homozygosity rates as well as the allele frequencies; and we compared those to the observed values in related populations. We provide guidelines for testing the accuracy of HLA typing methods by comparing family information, which is independent of the availability of curated alleles.

Conclusions: Although current computational methods for HLA typing generally provide satisfactory results, our benchmark - using data with ultra-high sequencing depth - demonstrates the incompleteness of current reference databases, and highlights the importance of providing genomic databases addressing current sequencing standards, a problem yet to be resolved before benefiting fully from personalised medicine approaches HLA phenotyping is essential.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-018-2239-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019707PMC
June 2018

Retinoic Acid Signaling in Thymic Epithelial Cells Regulates Thymopoiesis.

J Immunol 2018 07 30;201(2):524-532. Epub 2018 May 30.

Immunology Section, Department of Experimental Medical Science, Lund University, 22184 Lund, Sweden;

Despite the essential role of thymic epithelial cells (TEC) in T cell development, the signals regulating TEC differentiation and homeostasis remain incompletely understood. In this study, we show a key in vivo role for the vitamin A metabolite, retinoic acid (RA), in TEC homeostasis. In the absence of RA signaling in TEC, cortical TEC (cTEC) and CD80MHC class II medullary TEC displayed subset-specific alterations in gene expression, which in cTEC included genes involved in epithelial proliferation, development, and differentiation. Mice whose TEC were unable to respond to RA showed increased cTEC proliferation, an accumulation of stem cell Ag-1 cTEC, and, in early life, a decrease in medullary TEC numbers. These alterations resulted in reduced thymic cellularity in early life, a reduction in CD4 single-positive and CD8 single-positive numbers in both young and adult mice, and enhanced peripheral CD8 T cell survival upon TCR stimulation. Collectively, our results identify RA as a regulator of TEC homeostasis that is essential for TEC function and normal thymopoiesis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.4049/jimmunol.1800418DOI Listing
July 2018

Analysis of a gene panel for targeted sequencing of colorectal cancer samples.

Oncotarget 2018 Feb 10;9(10):9043-9060. Epub 2018 Jan 10.

Oncology Department, Vejle Hospital, Vejle 7100, Denmark.

Colorectal cancer (CRC) is a leading cause of death worldwide. Surgical intervention is a successful treatment for stage I patients, whereas other more advanced cases may require adjuvant chemotherapy. The selection of effective adjuvant treatments remains, however, challenging. Accurate patient stratification is necessary for the identification of the subset of patients likely responding to treatment, while sparing others from pernicious treatment. Targeted sequencing approaches may help in this regard, enabling rapid genetic investigation, and at the same time easily applicable in routine diagnosis. We propose a set of guidelines for the identification, including variant calling and filtering, of somatic mutations driving tumorigenesis in the absence of matched healthy tissue. We also discuss the inclusion criteria for the generation of our gene panel. Furthermore, we evaluate the prognostic impact of individual genes, using Cox regression models in the context of overall survival and disease-free survival. These analyses confirmed the role of commonly used biomarkers, and shed light on controversial genes such as . Applying those guidelines, we created a novel gene panel to investigate the onset and progression of CRC in 273 patients. Our comprehensive biomarker set includes 266 genes that may play a role in the progression through the different stages of the disease. Tracing the developmental state of the tumour, and its resistances, is instrumental in patient stratification and reliable decision making in precision clinical practice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.18632/oncotarget.24138DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5823670PMC
February 2018

Sequencing and de novo assembly of 150 genomes from Denmark as a population reference.

Nature 2017 08 26;548(7665):87-91. Epub 2017 Jul 26.

Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 17177, Sweden.

Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature23264DOI Listing
August 2017

Cutavirus in Cutaneous Malignant Melanoma.

Emerg Infect Dis 2017 02;23(2):363-365

A novel human protoparvovirus related to human bufavirus and preliminarily named cutavirus has been discovered. We detected cutavirus in a sample of cutaneous malignant melanoma by using viral enrichment and high-throughput sequencing. The role of cutaviruses in cutaneous cancers remains to be investigated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3201/eid2302.161564DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5324802PMC
February 2017

How compelling are the data for Epstein-Barr virus being a trigger for systemic lupus and other autoimmune diseases?

Curr Opin Rheumatol 2016 Jul;28(4):398-404

aDepartment of Autoimmunology and Biomarkers, Statens Serum Institut, Copenhagen bDepartment of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark.

Purpose Of Review: Systemic lupus erythematosus (SLE) is caused by a combination of genetic and acquired immunodeficiencies and environmental factors including infections. An association with Epstein-Barr virus (EBV) has been established by numerous studies over the past decades. Here, we review recent experimental studies on EBV, and present our integrated theory of SLE development.

Recent Findings: SLE patients have dysfunctional control of EBV infection resulting in frequent reactivations and disease progression. These comprise impaired functions of EBV-specific T-cells with an inverse correlation to disease activity and elevated serum levels of antibodies against lytic cycle EBV antigens. The presence of EBV proteins in renal tissue from SLE patients with nephritis suggests direct involvement of EBV in SLE development. As expected for patients with immunodeficiencies, studies reveal that SLE patients show dysfunctional responses to other viruses as well. An association with EBV infection has also been demonstrated for other autoimmune diseases, including Sjögren's syndrome, rheumatoid arthritis, and multiple sclerosis.

Summary: Collectively, the interplay between an impaired immune system and the cumulative effects of EBV and other viruses results in frequent reactivation of EBV and enhanced cell death, causing development of SLE and concomitant autoreactivities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/BOR.0000000000000289DOI Listing
July 2016

Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers.

Viruses 2016 Feb 19;8(2). Epub 2016 Feb 19.

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.

Virus discovery from high throughput sequencing data often follows a bottom-up approach where taxonomic annotation takes place prior to association to disease. Albeit effective in some cases, the approach fails to detect novel pathogens and remote variants not present in reference databases. We have developed a species independent pipeline that utilises sequence clustering for the identification of nucleotide sequences that co-occur across multiple sequencing data instances. We applied the workflow to 686 sequencing libraries from 252 cancer samples of different cancer and tissue types, 32 non-template controls, and 24 test samples. Recurrent sequences were statistically associated to biological, methodological or technical features with the aim to identify novel pathogens or plausible contaminants that may associate to a particular kit or method. We provide examples of identified inhabitants of the healthy tissue flora as well as experimental contaminants. Unmapped sequences that co-occur with high statistical significance potentially represent the unknown sequence space where novel pathogens can be identified.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/v8020053DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4776208PMC
February 2016

Propionibacterium acnes: Disease-Causing Agent or Common Contaminant? Detection in Diverse Patient Samples by Next-Generation Sequencing.

J Clin Microbiol 2016 Apr 27;54(4):980-7. Epub 2016 Jan 27.

Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark

Propionibacterium acnesis the most abundant bacterium on human skin, particularly in sebaceous areas.P. acnesis suggested to be an opportunistic pathogen involved in the development of diverse medical conditions but is also a proven contaminant of human clinical samples and surgical wounds. Its significance as a pathogen is consequently a matter of debate. In the present study, we investigated the presence ofP. acnesDNA in 250 next-generation sequencing data sets generated from 180 samples of 20 different sample types, mostly of cancerous origin. The samples were subjected to either microbial enrichment, involving nuclease treatment to reduce the amount of host nucleic acids, or shotgun sequencing. We detected high proportions ofP. acnesDNA in enriched samples, particularly skin tissue-derived and other tissue samples, with the levels being higher in enriched samples than in shotgun-sequenced samples.P. acnesreads were detected in most samples analyzed, though the proportions in most shotgun-sequenced samples were low. Our results show thatP. acnescan be detected in practically all sample types when molecular methods, such as next-generation sequencing, are employed. The possibility of contamination from the patient or other sources, including laboratory reagents or environment, should therefore always be considered carefully whenP. acnesis detected in clinical samples. We advocate that detection ofP. acnesalways be accompanied by experiments validating the association between this bacterium and any clinical condition.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/JCM.02723-15DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4809928PMC
April 2016

wKinMut-2: Identification and Interpretation of Pathogenic Variants in Human Protein Kinases.

Hum Mutat 2016 Jan 20;37(1):36-42. Epub 2015 Oct 20.

Center for Biological Sequence Analysis (CBS), Systems Biology Department, Technical University of Denmark (DTU), Kongens Lyngby 2800, Denmark.

Most genomic alterations are tolerated while only a minor fraction disrupts molecular function sufficiently to drive disease. Protein kinases play a central biological function and the functional consequences of their variants are abundantly characterized. However, this heterogeneous information is often scattered across different sources, which makes the integrative analysis complex and laborious. wKinMut-2 constitutes a solution to facilitate the interpretation of the consequences of human protein kinase variation. Nine methods predict their pathogenicity, including a kinase-specific random forest approach. To understand the biological mechanisms causative of human diseases and cancer, information from pertinent reference knowledge bases and the literature is automatically mined, digested, and homogenized. Variants are visualized in their structural contexts and residues affecting catalytic and drug binding are identified. Known protein-protein interactions are reported. Altogether, this information is intended to assist the generation of new working hypothesis to be corroborated with ulterior experimental work. The wKinMut-2 system, along with a user manual and examples, is freely accessible at http://kinmut2.bioinfo.cnio.es, the code for local installations can be downloaded from https://github.com/Rbbt-Workflows/KinMut2.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.22914DOI Listing
January 2016

Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing.

Sci Rep 2015 Aug 19;5:13201. Epub 2015 Aug 19.

Centre for GeoGenetics Natural History Museum, University of Copenhagen Østervoldgade 5-7, 1350 Copenhagen K, Denmark.

Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep13201DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4541070PMC
August 2015

Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios.

Nat Commun 2015 Jan 19;6:5969. Epub 2015 Jan 19.

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kemitorvet 208, DK-2800 Kgs Lyngby, Denmark.

Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing clinical and public health genetics in large cohorts. Here we report a Danish pan-genome obtained from sequencing 10 trios to high depth (50 × ). We report 536k novel SNVs and 283k novel short indels from mapping approaches and develop a population-wide de novo assembly approach to identify 132k novel indels larger than 10 nucleotides with low false discovery rates. We identify a higher proportion of indels and SVs than previous efforts showing the merits of high coverage and de novo assembly approaches. In addition, we use trio information to identify de novo mutations and use a probabilistic method to provide direct estimates of 1.27e-8 and 1.5e-9 per nucleotide per generation for SNVs and indels, respectively.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms6969DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4309431PMC
January 2015

wKinMut: an integrated tool for the analysis and interpretation of mutations in human protein kinases.

BMC Bioinformatics 2013 Nov 29;14:345. Epub 2013 Nov 29.

Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernandez Almagro, 3, E-28029 Madrid, Spain.

Background: Protein kinases are involved in relevant physiological functions and a broad number of mutations in this superfamily have been reported in the literature to affect protein function and stability. Unfortunately, the exploration of the consequences on the phenotypes of each individual mutation remains a considerable challenge.

Results: The wKinMut web-server offers direct prediction of the potential pathogenicity of the mutations from a number of methods, including our recently developed prediction method based on the combination of information from a range of diverse sources, including physicochemical properties and functional annotations from FireDB and Swissprot and kinase-specific characteristics such as the membership to specific kinase groups, the annotation with disease-associated GO terms or the occurrence of the mutation in PFAM domains, and the relevance of the residues in determining kinase subfamily specificity from S3Det. This predictor yields interesting results that compare favourably with other methods in the field when applied to protein kinases.Together with the predictions, wKinMut offers a number of integrated services for the analysis of mutations. These include: the classification of the kinase, information about associations of the kinase with other proteins extracted from iHop, the mapping of the mutations onto PDB structures, pathogenicity records from a number of databases and the classification of mutations in large-scale cancer studies. Importantly, wKinMut is connected with the SNP2L system that extracts mentions of mutations directly from the literature, and therefore increases the possibilities of finding interesting functional information associated to the studied mutations.

Conclusions: wKinMut facilitates the exploration of the information available about individual mutations by integrating prediction approaches with the automatic extraction of information from the literature (text mining) and several state-of-the-art databases.wKinMut has been used during the last year for the analysis of the consequences of mutations in the context of a number of cancer genome projects, including the recent analysis of Chronic Lymphocytic Leukemia cases and is publicly available at http://wkinmut.bioinfo.cnio.es.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-14-345DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3879071PMC
November 2013

Tumor mutation burden forecasts outcome in ovarian cancer with BRCA1 or BRCA2 mutations.

PLoS One 2013 12;8(11):e80023. Epub 2013 Nov 12.

Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America ; Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark.

Background: Increased number of single nucleotide substitutions is seen in breast and ovarian cancer genomes carrying disease-associated mutations in BRCA1 or BRCA2. The significance of these genome-wide mutations is unknown. We hypothesize genome-wide mutation burden mirrors deficiencies in DNA repair and is associated with treatment outcome in ovarian cancer.

Methods And Results: The total number of synonymous and non-synonymous exome mutations (Nmut), and the presence of germline or somatic mutation in BRCA1 or BRCA2 (mBRCA) were extracted from whole-exome sequences of high-grade serous ovarian cancers from The Cancer Genome Atlas (TCGA). Cox regression and Kaplan-Meier methods were used to correlate Nmut with chemotherapy response and outcome. Higher Nmut correlated with a better response to chemotherapy after surgery. In patients with mBRCA-associated cancer, low Nmut was associated with shorter progression-free survival (PFS) and overall survival (OS), independent of other prognostic factors in multivariate analysis. Patients with mBRCA-associated cancers and a high Nmut had remarkably favorable PFS and OS. The association with survival was similar in cancers with either BRCA1 or BRCA2 mutations. In cancers with wild-type BRCA, tumor Nmut was associated with treatment response in patients with no residual disease after surgery.

Conclusions: Tumor Nmut was associated with treatment response and with both PFS and OS in patients with high-grade serous ovarian cancer carrying BRCA1 or BRCA2 mutations. In the TCGA cohort, low Nmut predicted resistance to chemotherapy, and for shorter PFS and OS, while high Nmut forecasts a remarkably favorable outcome in mBRCA-associated ovarian cancer. Our observations suggest that the total mutation burden coupled with BRCA1 or BRCA2 mutations in ovarian cancer is a genomic marker of prognosis and predictor of treatment response. This marker may reflect the degree of deficiency in BRCA-mediated pathways, or the extent of compensation for the deficiency by alternative mechanisms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0080023PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3827141PMC
July 2014

Prediction of disease causing non-synonymous SNPs by the Artificial Neural Network Predictor NetDiseaseSNP.

PLoS One 2013 25;8(7):e68370. Epub 2013 Jul 25.

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark.

We have developed a sequence conservation-based artificial neural network predictor called NetDiseaseSNP which classifies nsSNPs as disease-causing or neutral. Our method uses the excellent alignment generation algorithm of SIFT to identify related sequences and a combination of 31 features assessing sequence conservation and the predicted surface accessibility to produce a single score which can be used to rank nsSNPs based on their potential to cause disease. NetDiseaseSNP classifies successfully disease-causing and neutral mutations. In addition, we show that NetDiseaseSNP discriminates cancer driver and passenger mutations satisfactorily. Our method outperforms other state-of-the-art methods on several disease/neutral datasets as well as on cancer driver/passenger mutation datasets and can thus be used to pinpoint and prioritize plausible disease candidates among nsSNPs for further investigation. NetDiseaseSNP is publicly available as an online tool as well as a web service: http://www.cbs.dtu.dk/services/NetDiseaseSNP.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0068370PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3723835PMC
March 2014

Interpretation of the consequences of mutations in protein kinases: combined use of bioinformatics and text mining.

Front Physiol 2012 22;3:323. Epub 2012 Aug 22.

Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre Madrid, Spain.

Protein kinases play a crucial role in a plethora of significant physiological functions and a number of mutations in this superfamily have been reported in the literature to disrupt protein structure and/or function. Computational and experimental research aims to discover the mechanistic connection between mutations in protein kinases and disease with the final aim of predicting the consequences of mutations on protein function and the subsequent phenotypic alterations. In this article, we will review the possibilities and limitations of current computational methods for the prediction of the pathogenicity of mutations in the protein kinase superfamily. In particular we will focus on the problem of benchmarking the predictions with independent gold standard datasets. We will propose a pipeline for the curation of mutations automatically extracted from the literature. Since many of these mutations are not included in the databases that are commonly used to train the computational methods to predict the pathogenicity of protein kinase mutations we propose them to build a valuable gold standard dataset in the benchmarking of a number of these predictors. Finally, we will discuss how text mining approaches constitute a powerful tool for the interpretation of the consequences of mutations in the context of disease genome analysis with particular focus on cancer.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fphys.2012.00323DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3449330PMC
October 2012

Prioritization of pathogenic mutations in the protein kinase superfamily.

BMC Genomics 2012 Jun 18;13 Suppl 4:S3. Epub 2012 Jun 18.

Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain.

Background: Most of the many mutations described in human protein kinases are tolerated without significant disruption of the corresponding structures or molecular functions, while some of them have been associated to a variety of human diseases, including cancer. In the last decade, a plethora of computational methods to predict the effect of missense single-nucleotide variants (SNVs) have been developed. Still, current high-throughput sequencing efforts and the concomitant need for massive interpretation of protein sequence variants will demand for more efficient and/or accurate computational methods in the forthcoming years.

Results: We present KinMut, a support vector machine (SVM) approach, to identify pathogenic mutations in the protein kinase superfamily. KinMut relays on a combination of sequence-derived features that describe mutations at different levels: (1) Gene level: membership to a specific group in Kinbase and the annotation with GO terms; (2) Domain level: annotated PFAM domains; and (3) Residue level: physicochemical features of amino acids, specificity determining positions, and functional annotations from SwissProt and FireDB. The system has been trained with the set of 3492 human kinase mutations in UniProt for which experimental validation of their pathogenic or neutral character exists. In addition, we discuss the relative importance of these independent properties and their combination for the development of a kinase-specific predictor. Finally, we compare KinMut with other state-of-the-art prediction methods.

Conclusions: Family-specific features appear among the most discriminative information sources, which allow us to produce accurate results in a reliable and very simple way with minimal supervision. Our study aims to broaden the knowledge on the mechanisms by which mutations in the human kinome contribute to disease with a particular focus in cancer. The classifier as well as further documentation is available at http://kinmut.bioinfo.cnio.es/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-13-S4-S3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3303724PMC
June 2012

Characterization of pathogenic germline mutations in human protein kinases.

BMC Bioinformatics 2011 5;12 Suppl 4:S1. Epub 2011 Jul 5.

Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernandez Almagro 3, E28029 Madrid, Spain.

Background: Protein Kinases are a superfamily of proteins involved in crucial cellular processes such as cell cycle regulation and signal transduction. Accordingly, they play an important role in cancer biology. To contribute to the study of the relation between kinases and disease we compared pathogenic mutations to neutral mutations as an extension to our previous analysis of cancer somatic mutations. First, we analyzed native and mutant proteins in terms of amino acid composition. Secondly, mutations were characterized according to their potential structural effects and finally, we assessed the location of the different classes of polymorphisms with respect to kinase-relevant positions in terms of subfamily specificity, conservation, accessibility and functional sites.

Results: Pathogenic Protein Kinase mutations perturb essential aspects of protein function, including disruption of substrate binding and/or effector recognition at family-specific positions. Interestingly these mutations in Protein Kinases display a tendency to avoid structurally relevant positions, what represents a significant difference with respect to the average distribution of pathogenic mutations in other protein families.

Conclusions: Disease-associated mutations display sound differences with respect to neutral mutations: several amino acids are specific of each mutation type, different structural properties characterize each class and the distribution of pathogenic mutations within the consensus structure of the Protein Kinase domain is substantially different to that for non-pathogenic mutations. This preferential distribution confirms previous observations about the functional and structural distribution of the controversial cancer driver and passenger somatic mutations and their use as a proxy for the study of the involvement of somatic mutations in cancer development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-12-S4-S1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194193PMC
December 2011

An integrated approach to the interpretation of single amino acid polymorphisms within the framework of CATH and Gene3D.

BMC Bioinformatics 2009 Aug 27;10 Suppl 8:S5. Epub 2009 Aug 27.

Institute of Structural and Molecular Biology, University College London, UK.

Background: The phenotypic effects of sequence variations in protein-coding regions come about primarily via their effects on the resulting structures, for example by disrupting active sites or affecting structural stability. In order better to understand the mechanisms behind known mutant phenotypes, and predict the effects of novel variations, biologists need tools to gauge the impacts of DNA mutations in terms of their structural manifestation. Although many mutations occur within domains whose structure has been solved, many more occur within genes whose protein products have not been structurally characterized.

Results: Here we present 3DSim (3D Structural Implication of Mutations), a database and web application facilitating the localization and visualization of single amino acid polymorphisms (SAAPs) mapped to protein structures even where the structure of the protein of interest is unknown. The server displays information on 6514 point mutations, 4865 of them known to be associated with disease. These polymorphisms are drawn from SAAPdb, which aggregates data from various sources including dbSNP and several pathogenic mutation databases. While the SAAPdb interface displays mutations on known structures, 3DSim projects mutations onto known sequence domains in Gene3D. This resource contains sequences annotated with domains predicted to belong to structural families in the CATH database. Mappings between domain sequences in Gene3D and known structures in CATH are obtained using a MUSCLE alignment. 1210 three-dimensional structures corresponding to CATH structural domains are currently included in 3DSim; these domains are distributed across 396 CATH superfamilies, and provide a comprehensive overview of the distribution of mutations in structural space.

Conclusion: The server is publicly available at http://3DSim.bioinfo.cnio.es/. In addition, the database containing the mapping between SAAPdb, Gene3D and CATH is available on request and most of the functionality is available through programmatic web service access.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-10-S8-S5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2745587PMC
August 2009

Extraction of human kinase mutations from literature, databases and genotyping studies.

BMC Bioinformatics 2009 Aug 27;10 Suppl 8:S1. Epub 2009 Aug 27.

Spanish National Cancer Research Centre, Madrid, Spain.

Background: There is a considerable interest in characterizing the biological role of specific protein residue substitutions through mutagenesis experiments. Additionally, recent efforts related to the detection of disease-associated SNPs motivated both the manual annotation, as well as the automatic extraction, of naturally occurring sequence variations from the literature, especially for protein families that play a significant role in signaling processes such as kinases. Systematic integration and comparison of kinase mutation information from multiple sources, covering literature, manual annotation databases and large-scale experiments can result in a more comprehensive view of functional, structural and disease associated aspects of protein sequence variants. Previously published mutation extraction approaches did not sufficiently distinguish between two fundamentally different variation origin categories, namely natural occurring and induced mutations generated through in vitro experiments.

Results: We present a literature mining pipeline for the automatic extraction and disambiguation of single-point mutation mentions from both abstracts as well as full text articles, followed by a sequence validation check to link mutations to their corresponding kinase protein sequences. Each mutation is scored according to whether it corresponds to an induced mutation or a natural sequence variant. We were able to provide direct literature links for a considerable fraction of previously annotated kinase mutations, enabling thus more efficient interpretation of their biological characterization and experimental context. In order to test the capabilities of the presented pipeline, the mutations in the protein kinase domain of the kinase family were analyzed. Using our literature extraction system, we were able to recover a total of 643 mutations-protein associations from PubMed abstracts and 6,970 from a large collection of full text articles. When compared to state-of-the-art annotation databases and high throughput genotyping studies, the mutation mentions extracted from the literature overlap to a good extent with the existing knowledgebases, whereas the remaining mentions suggest new mutation records that were not previously annotated in the databases.

Conclusion: Using the proposed residue disambiguation and classification approach, we were able to differentiate between natural variant and mutagenesis types of mutations with an accuracy of 93.88. The resulting system is useful for constructing a Gold Standard set of mutations extracted from the literature by human experts with minimal manual curation effort, providing direct pointers to relevant evidence sentences. Our system is able to recover mutations from the literature that are not present in state-of-the-art databases. Human expert manual validation of a subset of the literature extracted mutations conducted on 100 mutations from PubMed abstracts highlights that almost three quarters (72%) of the extracted mutations turned out to be correct, and more than half of these had not been previously annotated in databases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-10-S8-S1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2745582PMC
August 2009