IEEE/ACM Trans Comput Biol Bioinform 2019 Mar 19. Epub 2019 Mar 19.
How do we integratively profile large-scale multi-platform genomic data that are high dimensional and sparse? Furthermore, how can we incorporate prior knowledge, such as the association between genes, in the analysis systematically to find better latent relationships? To solve this problem, we propose a Scalable Network Constrained Tucker decomposition method (SNeCT). SNeCT adopts parallel stochastic gradient descent approach on the proposed parallelizable network constrained optimization function. SNeCT decomposition is applied to a tensor constructed from a large scale multi-platform multi-cohort cancer data, PanCan12, constrained on a network built from PathwayCommons database. The decomposed factor matrices are applied to stratify cancers, to search for top- k similar patients given a new patient, and to illustrate how the matrices can be used to identify significant genomic patterns in each patient. In the stratification test, combined twelve-cohort data is clustered to form thirteen subclasses. The similarity of the top- k patient to the query was high for 23 clinical features, including estrogen/progesterone receptor statuses of BRCA patients with average precision value ranges from 0.72 to 0.86 and from 0.68 to 0.86, respectively. We also illustrate how the factor matrices can be used for identifying significant patterns for each patient. \\ Resources are available at: https://github.com/leesael/GIFT.