Seq2seq Fingerprint with Byte-Pair Encoding for Predicting Changes in Protein Stability upon Single Point Mutation.

Authors:
Chie Imamura
Chie Imamura
Biotechnology Laboratory

IEEE/ACM Trans Comput Biol Bioinform 2019 Apr 1. Epub 2019 Apr 1.

Engineering stable proteins is crucial to various industrial purposes. Several machine learning methods have been developed to predict changes in the stability of proteins upon single point mutations. To improve accuracy of the prediction, we propose a new unsupervised descriptor for protein sequences that is based on a sequence-to-sequence (seq2seq) neural network model combined with a sequence-compression method called byte-pair encoding (BPE). Our results exhibit that BPE can encode a protein sequence into a sequence of shorter length, thereby enabling efficient training of the seq2seq model. Furthermore, we implement a basic predictor using the proposed descriptor, and our experimental results demonstrate that the predictor achieved state-of-the-art accuracy in case of tests for proteins that are not included in the training data.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2019.2908641DOI Listing
April 2019
2 Reads

Publication Analysis

Top Keywords

byte-pair encoding
8
single point
8
encoding bpe
4
sequence-compression method
4
called byte-pair
4
bpe exhibit
4
method called
4
exhibit bpe
4
sequence sequence
4
protein sequence
4
encode protein
4
bpe encode
4
combined sequence-compression
4
network model
4
unsupervised descriptor
4
descriptor protein
4
propose unsupervised
4
prediction propose
4
accuracy prediction
4
protein sequences
4

Similar Publications