Gene2DisCo: Gene to disease using disease commonalities.

Authors:
Marco Frasca
Marco Frasca
University of London

Artif Intell Med 2017 Oct 4;82:34-46. Epub 2017 Sep 4.

Università Degli Studi di Milano, Dipartimento di Informatica, Via Comelico 39/41, Milano, Italy. Electronic address:

Objective: Finding the human genes co-causing complex diseases, also known as "disease-genes", is one of the emerging and challenging tasks in biomedicine. This process, termed gene prioritization (GP), is characterized by a scarcity of known disease-genes for most diseases, and by a vast amount of heterogeneous data, usually encoded into networks describing different types of functional relationships between genes. In addition, different diseases may share common profiles (e.g. genetic or therapeutic profiles), and exploiting disease commonalities may significantly enhance the performance of GP methods. This work aims to provide a systematic comparison of several disease similarity measures, and to embed disease similarities and heterogeneous data into a flexible framework for gene prioritization which specifically handles the lack of known disease-genes.

Methods: We present a novel network-based method, Gene2DisCo, based on generalized linear models (GLMs) to effectively prioritize genes by exploiting data regarding disease-genes, gene interaction networks and disease similarities. The scarcity of disease-genes is addressed by applying an efficient negative selection procedure, together with imbalance-aware GLMs. Gene2DisCo is a flexible framework, in the sense it is not dependent upon specific types of data, and/or upon specific disease ontologies.

Results: On a benchmark dataset composed of nine human networks and 708 medical subject headings (MeSH) diseases, Gene2DisCo largely outperformed the best benchmark algorithm, kernelized score functions, in terms of both area under the ROC curve (0.94 against 0.86) and precision at given recall levels (for recall levels from 0.1 to 1 with steps 0.1). Furthermore, we enriched and extended the benchmark data to the whole human genome and provided the top-ranked unannotated candidate genes even for MeSH disease terms without known annotations.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.artmed.2017.08.001DOI Listing
October 2017
4 Reads

Publication Analysis

Top Keywords

disease commonalities
8
flexible framework
8
recall levels
8
gene prioritization
8
disease similarities
8
scarcity disease-genes
8
heterogeneous data
8
disease
8
data
5
applying efficient
4
efficient negative
4
addressed applying
4
similarities scarcity
4
gene interaction
4
interaction networks
4
networks disease
4
negative selection
4
disease-genes addressed
4
selection procedure
4
framework sense
4

Similar Publications

An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods.

Artif Intell Med 2014 Jun 20;61(2):63-78. Epub 2014 Mar 20.

AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy.

Objective: In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. Read More

View Article
June 2014

Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model.

BMC Bioinformatics 2016 Nov 10;17(1):453. Epub 2016 Nov 10.

College of Information Sciences and Technology, Pennsylvania State University, 332 Information Sciences and Technology Building, University Park, 16802, PA, USA.

Background: Accurately prioritizing candidate disease genes is an important and challenging problem. Various network-based methods have been developed to predict potential disease genes by utilizing the disease similarity network and molecular networks such as protein interaction or gene co-expression networks. Although successful, a common limitation of the existing methods is that they assume all diseases share the same molecular network and a single generic molecular network is used to predict candidate genes for all diseases. Read More

View Article
November 2016

Interactogeneous: disease gene prioritization using heterogeneous networks and full topology scores.

PLoS One 2012 19;7(11):e49634. Epub 2012 Nov 19.

Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal.

Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Read More

View Article
May 2013

A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records.

BMC Bioinformatics 2014 Sep 24;15:315. Epub 2014 Sep 24.

Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark.

Background: Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Read More

View Article
September 2014