Gene2DisCo: Gene to disease using disease commonalities.

Authors:
Marco Frasca
Marco Frasca
University of London

Artif Intell Med 2017 Oct 4;82:34-46. Epub 2017 Sep 4.

Università Degli Studi di Milano, Dipartimento di Informatica, Via Comelico 39/41, Milano, Italy. Electronic address:

Objective: Finding the human genes co-causing complex diseases, also known as "disease-genes", is one of the emerging and challenging tasks in biomedicine. This process, termed gene prioritization (GP), is characterized by a scarcity of known disease-genes for most diseases, and by a vast amount of heterogeneous data, usually encoded into networks describing different types of functional relationships between genes. In addition, different diseases may share common profiles (e.g. genetic or therapeutic profiles), and exploiting disease commonalities may significantly enhance the performance of GP methods. This work aims to provide a systematic comparison of several disease similarity measures, and to embed disease similarities and heterogeneous data into a flexible framework for gene prioritization which specifically handles the lack of known disease-genes.

Methods: We present a novel network-based method, Gene2DisCo, based on generalized linear models (GLMs) to effectively prioritize genes by exploiting data regarding disease-genes, gene interaction networks and disease similarities. The scarcity of disease-genes is addressed by applying an efficient negative selection procedure, together with imbalance-aware GLMs. Gene2DisCo is a flexible framework, in the sense it is not dependent upon specific types of data, and/or upon specific disease ontologies.

Results: On a benchmark dataset composed of nine human networks and 708 medical subject headings (MeSH) diseases, Gene2DisCo largely outperformed the best benchmark algorithm, kernelized score functions, in terms of both area under the ROC curve (0.94 against 0.86) and precision at given recall levels (for recall levels from 0.1 to 1 with steps 0.1). Furthermore, we enriched and extended the benchmark data to the whole human genome and provided the top-ranked unannotated candidate genes even for MeSH disease terms without known annotations.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.artmed.2017.08.001DOI Listing

Still can't find the full text of the article?

Sign up to send a request to the authors directly.
October 2017
6 Reads

Publication Analysis

Top Keywords

disease commonalities
8
flexible framework
8
recall levels
8
gene prioritization
8
disease similarities
8
scarcity disease-genes
8
heterogeneous data
8
disease
8
data
5
applying efficient
4
efficient negative
4
addressed applying
4
similarities scarcity
4
gene interaction
4
interaction networks
4
networks disease
4
negative selection
4
disease-genes addressed
4
selection procedure
4
framework sense
4

Similar Publications