Publications by authors named "Florian Mittag"

9 Publications

  • Page 1 of 1

Influence of Feature Encoding and Choice of Classifier on Disease Risk Prediction in Genome-Wide Association Studies.

PLoS One 2015 18;10(8):e0135832. Epub 2015 Aug 18.

Cognitive Systems Group, University of Tübingen, Tübingen, Germany.

Various attempts have been made to predict the individual disease risk based on genotype data from genome-wide association studies (GWAS). However, most studies only investigated one or two classification algorithms and feature encoding schemes. In this study, we applied seven different classification algorithms on GWAS case-control data sets for seven different diseases to create models for disease risk prediction. Further, we used three different encoding schemes for the genotypes of single nucleotide polymorphisms (SNPs) and investigated their influence on the predictive performance of these models. Our study suggests that an additive encoding of the SNP data should be the preferred encoding scheme, as it proved to yield the best predictive performances for all algorithms and data sets. Furthermore, our results showed that the differences between most state-of-the-art classification algorithms are not statistically significant. Consequently, we recommend to prefer algorithms with simple models like the linear support vector machine (SVM) as they allow for better subsequent interpretation without significant loss of accuracy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0135832PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4540285PMC
May 2016

JSBML 1.0: providing a smorgasbord of options to encode systems biology models.

Bioinformatics 2015 Oct 16;31(20):3383-6. Epub 2015 Jun 16.

University of California, San Diego, La Jolla, CA, USA, Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany.

Unlabelled: JSBML, the official pure Java programming library for the Systems Biology Markup Language (SBML) format, has evolved with the advent of different modeling formalisms in systems biology and their ability to be exchanged and represented via extensions of SBML. JSBML has matured into a major, active open-source project with contributions from a growing, international team of developers who not only maintain compatibility with SBML, but also drive steady improvements to the Java interface and promote ease-of-use with end users.

Availability And Implementation: Source code, binaries and documentation for JSBML can be freely obtained under the terms of the LGPL 2.1 from the website http://sbml.org/Software/JSBML. More information about JSBML can be found in the user guide at http://sbml.org/Software/JSBML/docs/.

Contact: jsbml-development@googlegroups.com or andraeger@eng.ucsd.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btv341DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4595895PMC
October 2015

Integrative pathway-based approach for genome-wide association studies: identification of new pathways for rheumatoid arthritis and type 1 diabetes.

PLoS One 2013 25;8(10):e78577. Epub 2013 Oct 25.

Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany.

Genome-wide association studies (GWAS) led to the identification of numerous novel loci for a number of complex diseases. Pathway-based approaches using genotypic data provide tangible leads which cannot be identified by single marker approaches as implemented in GWAS. The available pathway analysis approaches mainly differ in the employed databases and in the applied statistics for determining the significance of the associated disease markers. So far, pathway-based approaches using GWAS data failed to consider the overlapping of genes among different pathways or the influence of protein-interactions. We performed a multistage integrative pathway (MIP) analysis on three common diseases--Crohn's disease (CD), rheumatoid arthritis (RA) and type 1 diabetes (T1D)--incorporating genotypic, pathway, protein- and domain-interaction data to identify novel associations between these diseases and pathways. Additionally, we assessed the sensitivity of our method by studying the influence of the most significant SNPs on the pathway analysis by removing those and comparing the corresponding pathway analysis results. Apart from confirming many previously published associations between pathways and RA, CD and T1D, our MIP approach was able to identify three new associations between disease phenotypes and pathways. This includes a relation between the influenza-A pathway and RA, as well as a relation between T1D and the phagosome and toxoplasmosis pathways. These results provide new leads to understand the molecular underpinnings of these diseases. The developed software herein used is available at http://www.cogsys.cs.uni-tuebingen.de/software/GWASPathwayIdentifier/index.htm.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0078577PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3808349PMC
August 2014

Path2Models: large-scale generation of computational models from biochemical pathway maps.

BMC Syst Biol 2013 Nov 1;7:116. Epub 2013 Nov 1.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

Background: Systems biology projects and omics technologies have led to a growing number of biochemical pathway models and reconstructions. However, the majority of these models are still created de novo, based on literature mining and the manual processing of pathway data.

Results: To increase the efficiency of model creation, the Path2Models project has automatically generated mathematical models from pathway representations using a suite of freely available software. Data sources include KEGG, BioCarta, MetaCyc and SABIO-RK. Depending on the source data, three types of models are provided: kinetic, logical and constraint-based. Models from over 2 600 organisms are encoded consistently in SBML, and are made freely available through BioModels Database at http://www.ebi.ac.uk/biomodels-main/path2models. Each model contains the list of participants, their interactions, the relevant mathematical constructs, and initial parameter values. Most models are also available as easy-to-understand graphical SBGN maps.

Conclusions: To date, the project has resulted in more than 140 000 freely available models. Such a resource can tremendously accelerate the development of mathematical models by providing initial starting models for simulation and analysis, which can be subsequently curated and further parameterized.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1752-0509-7-116DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4228421PMC
November 2013

A pathway-based analysis provides additional support for an immune-related genetic susceptibility to Parkinson's disease.

Hum Mol Genet 2013 Mar 7;22(5):1039-49. Epub 2012 Dec 7.

Department of Psychological Medicine and Neurology, Institute of Psychological Medicine and Clinical Neurosciences, MRC Centre in Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff CF14 4XN, UK.

Parkinson's disease (PD) is the second most common neurodegenerative disease affecting 1-2% in people >60 and 3-4% in people >80. Genome-wide association (GWA) studies have now implicated significant evidence for association in at least 18 genomic regions. We have studied a large PD-meta analysis and identified a significant excess of SNPs (P < 1 × 10(-16)) that are associated with PD but fall short of the genome-wide significance threshold. This result was independent of variants at the 18 previously implicated regions and implies the presence of additional polygenic risk alleles. To understand how these loci increase risk of PD, we applied a pathway-based analysis, testing for biological functions that were significantly enriched for genes containing variants associated with PD. Analysing two independent GWA studies, we identified that both had a significant excess in the number of functional categories enriched for PD-associated genes (minimum P = 0.014 and P = 0.006, respectively). Moreover, 58 categories were significantly enriched for associated genes in both GWA studies (P < 0.001), implicating genes involved in the 'regulation of leucocyte/lymphocyte activity' and also 'cytokine-mediated signalling' as conferring an increased susceptibility to PD. These results were unaltered by the exclusion of all 178 genes that were present at the 18 genomic regions previously reported to be strongly associated with PD (including the HLA locus). Our findings, therefore, provide independent support to the strong association signal at the HLA locus and imply that the immune-related genetic susceptibility to PD is likely to be more widespread in the genome than previously appreciated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/dds492DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3561909PMC
March 2013

Qualitative translation of relations from BioPAX to SBML qual.

Bioinformatics 2012 Oct 24;28(20):2648-53. Epub 2012 Aug 24.

Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, 72076 Tübingen, Germany.

Motivation: The biological pathway exchange language (BioPAX) and the systems biology markup language (SBML) belong to the most popular modeling and data exchange languages in systems biology. The focus of SBML is quantitative modeling and dynamic simulation of models, whereas the BioPAX specification concentrates mainly on visualization and qualitative analysis of pathway maps. BioPAX describes reactions and relations. In contrast, SBML core exclusively describes quantitative processes such as reactions. With the SBML qualitative models extension (qual), it has recently also become possible to describe relations in SBML. Before the development of SBML qual, relations could not be properly translated into SBML. Until now, there exists no BioPAX to SBML converter that is fully capable of translating both reactions and relations.

Results: The entire nature pathway interaction database has been converted from BioPAX (Level 2 and Level 3) into SBML (Level 3 Version 1) including both reactions and relations by using the new qual extension package. Additionally, we present the new webtool BioPAX2SBML for further BioPAX to SBML conversions. Compared with previous conversion tools, BioPAX2SBML is more comprehensive, more robust and more exact.

Availability: BioPAX2SBML is freely available at http://webservices.cs.uni-tuebingen.de/ and the complete collection of the PID models is available at http://www.cogsys.cs.uni-tuebingen.de/downloads/Qualitative-Models/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bts508DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3467751PMC
October 2012

Using genome-wide complex trait analysis to quantify 'missing heritability' in Parkinson's disease.

Hum Mol Genet 2012 Nov 13;21(22):4996-5009. Epub 2012 Aug 13.

Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA.

Genome-wide association studies (GWASs) have been successful at identifying single-nucleotide polymorphisms (SNPs) highly associated with common traits; however, a great deal of the heritable variation associated with common traits remains unaccounted for within the genome. Genome-wide complex trait analysis (GCTA) is a statistical method that applies a linear mixed model to estimate phenotypic variance of complex traits explained by genome-wide SNPs, including those not associated with the trait in a GWAS. We applied GCTA to 8 cohorts containing 7096 case and 19 455 control individuals of European ancestry in order to examine the missing heritability present in Parkinson's disease (PD). We meta-analyzed our initial results to produce robust heritability estimates for PD types across cohorts. Our results identify 27% (95% CI 17-38, P = 8.08E - 08) phenotypic variance associated with all types of PD, 15% (95% CI -0.2 to 33, P = 0.09) phenotypic variance associated with early-onset PD and 31% (95% CI 17-44, P = 1.34E - 05) phenotypic variance associated with late-onset PD. This is a substantial increase from the genetic variance identified by top GWAS hits alone (between 3 and 5%) and indicates there are substantially more risk loci to be identified. Our results suggest that although GWASs are a useful tool in identifying the most common variants associated with complex disease, a great deal of common variants of small effect remain to be discovered.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/hmg/dds335DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3576713PMC
November 2012

Use of support vector machines for disease risk prediction in genome-wide association studies: concerns and opportunities.

Hum Mutat 2012 Dec 3;33(12):1708-18. Epub 2012 Aug 3.

Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany.

The success of genome-wide association studies (GWAS) in deciphering the genetic architecture of complex diseases has fueled the expectations whether the individual risk can also be quantified based on the genetic architecture. So far, disease risk prediction based on top-validated single-nucleotide polymorphisms (SNPs) showed little predictive value. Here, we applied a support vector machine (SVM) to Parkinson disease (PD) and type 1 diabetes (T1D), to show that apart from magnitude of effect size of risk variants, heritability of the disease also plays an important role in disease risk prediction. Furthermore, we performed a simulation study to show the role of uncommon (frequency 1-5%) as well as rare variants (frequency <1%) in disease etiology of complex diseases. Using a cross-validation model, we were able to achieve predictions with an area under the receiver operating characteristic curve (AUC) of ~0.88 for T1D, highlighting the strong heritable component (∼90%). This is in contrast to PD, where we were unable to achieve a satisfactory prediction (AUC ~0.56; heritability ~38%). Our simulations showed that simultaneous inclusion of uncommon and rare variants in GWAS would eventually lead to feasible disease risk prediction for complex diseases such as PD. The used software is available at http://www.ra.cs.uni-tuebingen.de/software/MACLEAPS/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.22161DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5968822PMC
December 2012

Linking the epigenome to the genome: correlation of different features to DNA methylation of CpG islands.

PLoS One 2012 30;7(4):e35327. Epub 2012 Apr 30.

Center for Bioinformatics Tübingen, ZBIT, University of Tübingen, Tübingen, Germany.

DNA methylation of CpG islands plays a crucial role in the regulation of gene expression. More than half of all human promoters contain CpG islands with a tissue-specific methylation pattern in differentiated cells. Still today, the whole process of how DNA methyltransferases determine which region should be methylated is not completely revealed. There are many hypotheses of which genomic features are correlated to the epigenome that have not yet been evaluated. Furthermore, many explorative approaches of measuring DNA methylation are limited to a subset of the genome and thus, cannot be employed, e.g., for genome-wide biomarker prediction methods. In this study, we evaluated the correlation of genetic, epigenetic and hypothesis-driven features to DNA methylation of CpG islands. To this end, various binary classifiers were trained and evaluated by cross-validation on a dataset comprising DNA methylation data for 190 CpG islands in HEPG2, HEK293, fibroblasts and leukocytes. We achieved an accuracy of up to 91% with an MCC of 0.8 using ten-fold cross-validation and ten repetitions. With these models, we extended the existing dataset to the whole genome and thus, predicted the methylation landscape for the given cell types. The method used for these predictions is also validated on another external whole-genome dataset. Our results reveal features correlated to DNA methylation and confirm or disprove various hypotheses of DNA methylation related features. This study confirms correlations between DNA methylation and histone modifications, DNA structure, DNA sequence, genomic attributes and CpG island properties. Furthermore, the method has been validated on a genome-wide dataset from the ENCODE consortium. The developed software, as well as the predicted datasets and a web-service to compare methylation states of CpG islands are available at http://www.cogsys.cs.uni-tuebingen.de/software/dna-methylation/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0035327PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3340366PMC
September 2012