Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome.

Genome Biol 2020 03 30;21(1):81. Epub 2020 Mar 30.

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA.

The human epigenome has been experimentally characterized by thousands of measurements for every basepair in the human genome. We propose a deep neural network tensor factorization method, Avocado, that compresses this epigenomic data into a dense, information-rich representation. We use this learned representation to impute epigenomic data more accurately than previous methods, and we show that machine learning models that exploit this representation outperform those trained directly on epigenomic data on a variety of genomics tasks. These tasks include predicting gene expression, promoter-enhancer interactions, replication timing, and an element of 3D chromatin architecture.

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-01977-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7104480PMC
March 2020

Publication Analysis

Top Keywords

epigenomic data
12
factorization method
8
tensor factorization
8
human epigenome
8
methods machine
4
accurately previous
4
previous methods
4
learning models
4
exploit representation
4
models exploit
4
data accurately
4
machine learning
4
learned representation
4
compresses epigenomic
4
avocado compresses
4
method avocado
4
data dense
4
dense information-rich
4
representation impute
4
representation outperform
4

Altmetric Statistics

Similar Publications

Local Epigenomic Data are more Informative than Local Genome Sequence Data in Predicting Enhancer-Promoter Interactions Using Neural Networks.

Genes (Basel) 2019 12 29;11(1). Epub 2019 Dec 29.

Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA.

Enhancer-promoter interactions (EPIs) are crucial for transcriptional regulation. Mapping such interactions proves useful for understanding disease regulations and discovering risk genes in genome-wide association studies. Some previous studies showed that machine learning methods, as computational alternatives to costly experimental approaches, performed well in predicting EPIs from local sequence and/or local epigenomic data. Read More

View Article and Full-Text PDF
December 2019

Predicting functional variants in enhancer and promoter elements using RegulomeDB.

Hum Mutat 2019 09 22;40(9):1292-1298. Epub 2019 Jun 22.

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.

Here we present a computational model, Score of Unified Regulatory Features (SURF), that predicts functional variants in enhancer and promoter elements. SURF is trained on data from massively parallel reporter assays and predicts the effect of variants on reporter expression levels. It achieved the top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" challenge. Read More

View Article and Full-Text PDF
September 2019

Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network.

Bioinformatics 2020 01;36(2):496-503

MOE Key Laboratory of Bioinformatics, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing 100084, China.

Motivation: Interactions among cis-regulatory elements such as enhancers and promoters are main driving forces shaping context-specific chromatin structure and gene expression. Although there have been computational methods for predicting gene expression from genomic and epigenomic information, most of them neglect long-range enhancer-promoter interactions, due to the difficulty in precisely linking regulatory enhancers to target genes. Recently, HiChIP, a novel high-throughput experimental approach, has generated comprehensive data on high-resolution interactions between promoters and distal enhancers. Read More

View Article and Full-Text PDF
January 2020

Predicting regulatory variants using a dense epigenomic mapped CNN model elucidated the molecular basis of trait-tissue associations.

Nucleic Acids Res 2021 01;49(1):53-66

Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

Assessing the causal tissues of human complex diseases is important for the prioritization of trait-associated genetic variants. Yet, the biological underpinnings of trait-associated variants are extremely difficult to infer due to statistical noise in genome-wide association studies (GWAS), and because >90% of genetic variants from GWAS are located in non-coding regions. Here, we collected the largest human epigenomic map from ENCODE and Roadmap consortia and implemented a deep-learning-based convolutional neural network (CNN) model to predict the regulatory roles of genetic variants across a comprehensive list of epigenomic modifications. Read More

View Article and Full-Text PDF
January 2021