Publications by authors named "Tatsuhiko Tsunoda"

287 Publications

Forecasting the spread of COVID-19 using LSTM network.

BMC Bioinformatics 2021 Jun 10;22(Suppl 6):316. Epub 2021 Jun 10.

Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan.

Background: The novel coronavirus (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2, and within a few months, it has become a global pandemic. This forced many affected countries to take stringent measures such as complete lockdown, shutting down businesses and trade, as well as travel restrictions, which has had a tremendous economic impact. Therefore, having knowledge and foresight about how a country might be able to contain the spread of COVID-19 will be of paramount importance to the government, policy makers, business partners and entrepreneurs. To help social and administrative decision making, a model that will be able to forecast when a country might be able to contain the spread of COVID-19 is needed.

Results: The results obtained using our long short-term memory (LSTM) network-based model are promising as we validate our prediction model using New Zealand's data since they have been able to contain the spread of COVID-19 and bring the daily new cases tally to zero. Our proposed forecasting model was able to correctly predict the dates within which New Zealand was able to contain the spread of COVID-19. Similarly, the proposed model has been used to forecast the dates when other countries would be able to contain the spread of COVID-19.

Conclusion: The forecasted dates are only a prediction based on the existing situation. However, these forecasted dates can be used to guide actions and make informed decisions that will be practically beneficial in influencing the real future. The current forecasting trend shows that more stringent actions/restrictions need to be implemented for most of the countries as the forecasting model shows they will take over three months before they can possibly contain the spread of COVID-19.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-021-04224-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8190741PMC
June 2021

SPECTRA: a tool for enhanced brain wave signal recognition.

BMC Bioinformatics 2021 Jun 2;22(Suppl 6):195. Epub 2021 Jun 2.

Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan.

Background: Brain wave signal recognition has gained increased attention in neuro-rehabilitation applications. This has driven the development of brain-computer interface (BCI) systems. Brain wave signals are acquired using electroencephalography (EEG) sensors, processed and decoded to identify the category to which the signal belongs. Once the signal category is determined, it can be used to control external devices. However, the success of such a system essentially relies on significant feature extraction and classification algorithms. One of the commonly used feature extraction technique for BCI systems is common spatial pattern (CSP).

Results: The performance of the proposed spatial-frequency-temporal feature extraction (SPECTRA) predictor is analysed using three public benchmark datasets. Our proposed predictor outperformed other competing methods achieving lowest average error rates of 8.55%, 17.90% and 20.26%, and highest average kappa coefficient values of 0.829, 0.643 and 0.595 for BCI Competition III dataset IVa, BCI Competition IV dataset I and BCI Competition IV dataset IIb, respectively.

Conclusions: Our proposed SPECTRA predictor effectively finds features that are more separable and shows improvement in brain wave signal recognition that can be instrumental in developing improved real-time BCI systems that are computationally efficient.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-021-04091-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8170968PMC
June 2021

Genotype-Structure-Phenotype Correlations in Disease-Associated IGF1R Variants and Similarities to Those in INSR Variants.

Diabetes 2021 Jun 1. Epub 2021 Jun 1.

Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan

We previously reported that genotype-phenotype correlations in 12 missense variants causing severe insulin resistance, located in the second and third fibronectin type III (FnIII) domains of the insulin receptor (INSR), containing the α-β cleavage and part of insulin-binding sites. This study aimed to identify genotype-phenotype correlations in FnIII domain variants of IGF1R, a structurally related homolog of INSR, which may be associated with growth retardation, using the recently reported crystal structures of IGF1R. A structural bioinformatics analysis of five previously reported disease-associated heterozygous missense variants and a likely benign variant in the FnIII domains of IGF1R predicted that the disease-associated variants would severely impair the hydrophobic core formation and stability of the FnIII domains or affect the α-β cleavage site, while the likely benign variant would not affect the folding of the domains. A functional analysis of these variants in CHO cells showed impaired receptor processing and autophosphorylation in cells expressing the disease-associated variants, but not in those expressing the wild-type form or the likely benign variant. These results demonstrated genotype-phenotype correlations in the FnIII domain variants of , which are presumably consistent with those of and would help in the early diagnosis of patients with disease-associated variants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2337/db20-1145DOI Listing
June 2021

De novo ATP1A3 variants cause polymicrogyria.

Sci Adv 2021 Mar 24;7(13). Epub 2021 Mar 24.

Department of Pediatrics, Tottori Prefectural Central Hospital, Tottori 680-0901, Japan.

Polymicrogyria is a common malformation of cortical development whose etiology remains elusive. We conducted whole-exome sequencing for 124 patients with polymicrogyria and identified de novo variants in eight patients. Mutated causes functional brain diseases, including alternating hemiplegia of childhood (AHC), rapid-onset dystonia parkinsonism (RDP), and cerebellar ataxia, areflexia, pes cavus, optic nerve atrophy, and sensorineural deafness (CAPOS). However, our patients showed no clinical features of AHC, RDP, or CAPOS and had a completely different phenotype: a severe form of polymicrogyria with epilepsy and developmental delay. Detected variants had different locations in and different functional properties compared with AHC-, RDP-, or CAPOS-associated variants. In the developing cerebral cortex of mice, radial neuronal migration was impaired in neurons overexpressing the variant of the most severe patients, suggesting that this variant is involved in cortical malformation pathogenesis. We propose a previously unidentified category of polymicrogyria associated with abnormalities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/sciadv.abd2368DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7990330PMC
March 2021

Homozygous ADCY5 mutation causes early-onset movement disorder with severe intellectual disability.

Neurol Sci 2021 Mar 11. Epub 2021 Mar 11.

Center for Medical Genetics, Keio University School of Medicine, Tokyo, Japan.

Background: Mutations of theADCY5 have been identified in patients with familial dyskinesia, early-onsetautosomal dominant chorea and dystonia, and benign hereditary chorea. Most ofthe ADCY5 mutations are de novo or transmitted in an autosomal dominantfashion. Only two pedigrees are known to show autosomal recessive inheritance.

Objectives: We report twosiblings with severe ID, dystonic movement, and growth failure with unknownetiology.

Methods: We planned a proband-parentapproach using whole exome sequencing.

Results: Homozygous mutationin exon 21 of the ADCY5 (p.R1238W) was identified in the siblings. Althoughtheir parents were heterozygous for the mutation, they were free from clinicalmanifestations.

Conclusions: Our results furtherexpand the phenotype/genotype correlations of the ADCY5-related disorders.Mutations of ADCY5 should be considered in pediatric patients with ID andinvoluntary movement.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10072-021-05152-yDOI Listing
March 2021

Deep Learning Approach for Automated Detection of Myopic Maculopathy and Pathologic Myopia in Fundus Images.

Ophthalmol Retina 2021 Feb 18. Epub 2021 Feb 18.

Department of Ophthalmology and Visual Science, Tokyo Medical and Dental University, Tokyo, Japan. Electronic address:

Purpose: To determine whether eyes with pathologic myopia can be identified and whether each type of myopic maculopathy lesion on fundus photographs can be diagnosed by deep learning (DL) algorithms.

Design: A DL algorithm was developed to recognize myopic maculopathy features and to categorize the myopic maculopathy automatically.

Participants: We examined 7020 fundus images from 4432 highly myopic eyes obtained from the Advanced Clinical Center for Myopia.

Methods: Deep learning (DL) algorithms were developed to recognize the key features of myopic maculopathy with 5176 fundus images. These algorithms were also used to develop a Meta-analysis for Pathologic Myopia (META-PM) study categorizing system (CS) by adding a specific processing layer. Models and the system were evaluated by 1844 fundus image. The area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were used to determine the performance of each DL algorithm. The rate of correct predictions was used to determine the performance of the META-PM study CS.

Main Outcome Measures: Four trained DL models were able to recognize the lesions of myopic maculopathy accurately with high sensitivity and specificity. The META-PM study CS also showed a high accuracy and was qualified to be used in a semiautomated way during screening for myopic maculopathy in highly myopic eyes.

Results: The sensitivity of the DL models was 84.44% for diffuse atrophy, 87.22% for patchy atrophy, 85.10% for macular atrophy, and 37.07% for choroidal neovascularization, and the AUC values were 0.970, 0.978, 0.982, and 0.881, respectively. The rate of total correct predictions from the META-PM study CS was 87.53%, with rates of 90.18%, 95.28%, 97.50%, and 91.14%, respectively, for each type of lesion. The META-PM study CS showed an overall rate of 92.08% in detecting pathologic myopia correctly, which was defined as having myopic maculopathy equal to or more serious than diffuse atrophy.

Conclusions: The novel DL models and system can achieve high sensitivity and specificity in identifying the different types of lesions of myopic maculopathy. These results will assist in the screening for pathologic myopia and subsequent protection of patients against low vision and blindness caused by myopic maculopathy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.oret.2021.02.006DOI Listing
February 2021

A hypomorphic variant in EYS detected by genome-wide association study contributes toward retinitis pigmentosa.

Commun Biol 2021 Jan 29;4(1):140. Epub 2021 Jan 29.

Department of Ophthalmology, Tohoku University Graduate School of Medicine, Aoba-ku, Sendai, Japan.

The genetic basis of Japanese autosomal recessive retinitis pigmentosa (ARRP) remains largely unknown. Herein, we applied a 2-step genome-wide association study (GWAS) in 640 Japanese patients. Meta-GWAS identified three independent peaks at P < 5.0 × 10, all within the major ARRP gene EYS. Two of the three were each in linkage disequilibrium with a different low frequency variant (allele frequency < 0.05); a known founder Mendelian mutation (c.4957dupA, p.S1653Kfs*2) and a non-synonymous variant (c.2528 G > A, p.G843E) of unknown significance. mRNA harboring c.2528 G > A failed to restore rhodopsin mislocalization induced by morpholino-mediated knockdown of eys in zebrafish, consistent with the variant being pathogenic. c.2528 G > A solved an additional 7.0% of Japanese ARRP cases. The third peak was in linkage disequilibrium with a common non-synonymous variant (c.7666 A > T, p.S2556C), possibly representing an unreported disease-susceptibility signal. GWAS successfully unraveled genetic causes of a rare monogenic disorder and identified a high frequency variant potentially linked to development of local genome therapeutics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-021-01662-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7846782PMC
January 2021

ELF3 Overexpression as Prognostic Biomarker for Recurrence of Stage II Colorectal Cancer.

In Vivo 2021 Jan-Feb;35(1):191-201

Department of Gastrointestinal Surgery, Medical Research Institute, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan.

Background/aim: Adjuvant chemotherapy for high-risk Stage II colorectal cancer (CRC) is weakly recommended; however, no consensus exists on "high-risk" definition. Prognostic biomarker identification is important for selecting patients with poor prognosis who may benefit from adjuvant chemotherapy.

Materials And Methods: Using Microarray data analyses, ELF3 was identified as a candidate gene highly expressed in Stage II CRC with distant recurrences. ELF3 mRNA expression in 168 Stage II CRC patients was subjected to quantitative RT-PCR analysis and ELF3 protein expression in 185 patients was quantified by immunohistochemical analysis. The relationship between mRNA and protein expression levels and patient characteristics were also investigated.

Results: The overall recurrence rate and relapse-free survival were significantly poorer in the ELF3 high-expression than the low-expression group at the mRNA and protein levels. High ELF3 mRNA and protein expression levels were independent poor prognostic factors.

Conclusion: High ELF3 expression was associated with recurrence of Stage II.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.21873/invivo.12248DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7880722PMC
September 2020

Effects of clovamide and its related compounds on the aggregations of amyloid polypeptides.

J Nat Med 2021 Mar 3;75(2):299-307. Epub 2021 Jan 3.

Faculty of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8572, Japan.

Alzheimer's disease (AD) and type 2 diabetes (T2D) are common diseases in the elderly, and the increasing number of patients with these diseases has become a serious health problem worldwide. The aggregation and development of plaque of amyloid polypeptides (amyloid β; Aβ and human islet amyloid polypeptide; hIAPP, amylin) are found in the brains of patients with AD and the pancreas of patients with T2D and are considered to be, in part, the causes of both diseases, respectively. Therefore, preventing amyloid aggregation may be a promising therapeutic strategy for preventing AD and T2D. In addition, the disaggregation of the already aggregated amyloid polypeptides is expected to contribute to the prevention and treatment of both diseases as amyloid polypeptide aggregations begin several decades before the onset of disease. Therefore, in this study, we investigated the hIAPP aggregation inhibitory activity and Aβ42/hIAPP disaggregation activity of clovamide which had shown inhibitory activity against Aβ42 aggregation in our previous studies. In addition, active sites were identified (structure-activity relationship analysis) using clovamide-related compounds in which hydroxyl groups of these compounds were either eliminated or methylated. Our results showed that the compounds with one or two catechol moieties showed strong hIAPP aggregation inhibitory activity and Aβ42/hIAPP disaggregation activity; and the non-catechol type compounds showed little or no activity. This suggests that the catechol moiety is important in amyloid polypeptide aggregation inhibition and disaggregation. Thus, clovamide and its related compounds may be promising therapeutic strategies for inhibiting amyloid polypeptide-related pathology in AD and T2D.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s11418-020-01467-wDOI Listing
March 2021

Prognosis prediction model for conversion from mild cognitive impairment to Alzheimer's disease created by integrative analysis of multi-omics data.

Alzheimers Res Ther 2020 11 10;12(1):145. Epub 2020 Nov 10.

Medical Genome Center, National Center for Geriatrics and Gerontology, Obu, Aichi, Japan.

Background: Mild cognitive impairment (MCI) is a precursor to Alzheimer's disease (AD), but not all MCI patients develop AD. Biomarkers for early detection of individuals at high risk for MCI-to-AD conversion are urgently required.

Methods: We used blood-based microRNA expression profiles and genomic data of 197 Japanese MCI patients to construct a prognosis prediction model based on a Cox proportional hazard model. We examined the biological significance of our findings with single nucleotide polymorphism-microRNA pairs (miR-eQTLs) by focusing on the target genes of the miRNAs. We investigated functional modules from the target genes with the occurrence of hub genes though a large-scale protein-protein interaction network analysis. We further examined the expression of the genes in 610 blood samples (271 ADs, 248 MCIs, and 91 cognitively normal elderly subjects [CNs]).

Results: The final prediction model, composed of 24 miR-eQTLs and three clinical factors (age, sex, and APOE4 alleles), successfully classified MCI patients into low and high risk of MCI-to-AD conversion (log-rank test P = 3.44 × 10 and achieved a concordance index of 0.702 on an independent test set. Four important hub genes associated with AD pathogenesis (SHC1, FOXO1, GSK3B, and PTEN) were identified in a network-based meta-analysis of miR-eQTL target genes. RNA-seq data from 610 blood samples showed statistically significant differences in PTEN expression between MCI and AD and in SHC1 expression between CN and AD (PTEN, P = 0.023; SHC1, P = 0.049).

Conclusions: Our proposed model was demonstrated to be effective in MCI-to-AD conversion prediction. A network-based meta-analysis of miR-eQTL target genes identified important hub genes associated with AD pathogenesis. Accurate prediction of MCI-to-AD conversion would enable earlier intervention for MCI patients at high risk, potentially reducing conversion to AD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13195-020-00716-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7656734PMC
November 2020

Association of an IGHV3-66 gene variant with Kawasaki disease.

J Hum Genet 2021 May 26;66(5):475-489. Epub 2020 Oct 26.

Institute of Biomedical Sciences, Academia Sinica, Taipei, 11529, Taiwan.

In a meta-analysis of three GWAS for susceptibility to Kawasaki disease (KD) conducted in Japan, Korea, and Taiwan and follow-up studies with a total of 11,265 subjects (3428 cases and 7837 controls), a significantly associated SNV in the immunoglobulin heavy variable gene (IGHV) cluster in 14q33.32 was identified (rs4774175; OR = 1.20, P = 6.0 × 10). Investigation of nonsynonymous SNVs of the IGHV cluster in 9335 Japanese subjects identified the C allele of rs6423677, located in IGHV3-66, as the most significant reproducible association (OR = 1.25, P = 6.8 × 10 in 3603 cases and 5731 controls). We observed highly skewed allelic usage of IGHV3-66, wherein the rs6423677 A allele was nearly abolished in the transcripts in peripheral blood mononuclear cells of both KD patients and healthy adults. Association of the high-expression allele with KD strongly indicates some active roles of B-cells or endogenous immunoglobulins in the disease pathogenesis. Considering that significant association of SNVs in the IGHV region with disease susceptibility was previously known only for rheumatic heart disease (RHD), a complication of acute rheumatic fever (ARF), these observations suggest that common B-cell related mechanisms may mediate the symptomology of KD and ARF as well as RHD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s10038-020-00864-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7585995PMC
May 2021

Integrative immunogenomic analysis of gastric cancer dictates novel immunological classification and the functional status of tumor-infiltrating cells.

Clin Transl Immunology 2020 17;9(10):e1194. Epub 2020 Oct 17.

Department of Immunotherapeutics The University of Tokyo Hospital Tokyo Japan.

Objectives: A better understanding of antitumor immunity will help predict the prognosis of gastric cancer patients and tailor the appropriate therapies in each patient. Therefore, we propose a novel immunological classification of gastric cancer.

Methods: We performed whole-exome sequencing (WES), RNA-Seq and flow cytometry in 29 gastric cancer patients who received surgery. The TCGA data set of 323 gastric cancer patients and RNA-Seq data of 45 patients who received pembrolizumab (Kim . 2018; : 1449-1458) were also analysed.

Results: Immunogram analysis of cancer-immunity interaction of gastric cancer revealed immune signatures of four main types, designated Hot1, Hot2, Intermediate and Cold. Immunologically hot tumors displayed a dysfunctional T-cell signature, while cold tumors had an exclusion signature. tumor-infiltrating lymphocyte analysis documented T-cell dysfunction with the expression of checkpoint molecules and impaired cytokine production. The T-cell function was more profoundly damaged in Hot1 than Hot2 tumors. Patients in Hot2 subtypes had better survival in our cohort and TCGA cohort. Although these immunological subtypes overlapped to some degree with the molecular subtypes in the TCGA, intratumoral immune responses cannot be predicted solely based on histological or molecular subtyping of gastric cancer. Molecular and immunological classifications complement each other to predict the responses to anti-PD-1 therapy and have the potential to be a biomarker for the treatment of gastric cancer.

Conclusion: The immunological classification of gastric cancer resulted in four subtypes. Hot tumors were further divided into two subtypes, between which the functional status of T cells was different.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cti2.1194DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7568758PMC
October 2020

Clinical usefulness of multigene screening with phenotype-driven bioinformatics analysis for the diagnosis of patients with monogenic diabetes or severe insulin resistance.

Diabetes Res Clin Pract 2020 Nov 22;169:108461. Epub 2020 Sep 22.

Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan; Toranomon Hospital, Tokyo, Japan. Electronic address:

Aims: Monogenic diabetes is clinically heterogeneous and differs from common forms of diabetes (type 1 and 2). We aimed to investigate the clinical usefulness of a comprehensive genetic testing system, comprised of targeted next-generation sequencing (NGS) with phenotype-driven bioinformatics analysis in patients with monogenic diabetes, which uses patient genotypic and phenotypic data to prioritize potentially causal variants.

Methods: We performed targeted NGS of 383 genes associated with monogenic diabetes or common forms of diabetes in 13 Japanese patients with suspected (n = 10) or previously diagnosed (n = 3) monogenic diabetes or severe insulin resistance. We performed in silico structural analysis and phenotype-driven bioinformatics analysis of candidate variants from NGS data.

Results: Among the patients suspected having monogenic diabetes or insulin resistance, we diagnosed 3 patients as subtypes of monogenic diabetes due to disease-associated variants of INSR, LMNA, and HNF1B. Additionally, in 3 other patients, we detected rare variants with potential phenotypic effects. Notably, we identified a novel missense variant in TBC1D4 and an MC4R variant, which together may cause a mixed phenotype of severe insulin resistance.

Conclusions: This comprehensive approach could assist in the early diagnosis of patients with monogenic diabetes and facilitate the provision of tailored therapy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.diabres.2020.108461DOI Listing
November 2020

Single-stranded and double-stranded DNA-binding protein prediction using HMM profiles.

Anal Biochem 2021 01 15;612:113954. Epub 2020 Sep 15.

Laboratory of Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan; Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan; School of Engineering and Physics, The University of the South Pacific, Suva, Fiji; Institute for Integrated and Intelligent Systems, Griffith University, Nathan, Brisbane, QLD, Australia. Electronic address:

Background: DNA-binding proteins perform important roles in cellular processes and are involved in many biological activities. These proteins include crucial protein-DNA binding domains and can interact with single-stranded or double-stranded DNA, and accordingly classified as single-stranded DNA-binding proteins (SSBs) or double-stranded DNA-binding proteins (DSBs). Computational prediction of SSBs and DSBs helps in annotating protein functions and understanding of protein-binding domains.

Results: Performance is reported using the DNA-binding protein dataset that was recently introduced by Wang et al., [1]. The proposed method achieved a sensitivity of 0.600, specificity of 0.792, AUC of 0.758, MCC of 0.369, accuracy of 0.744, and F-measure of 0.536, on the independent test set.

Conclusion: The proposed method with the hidden Markov model (HMM) profiles for feature extraction, outperformed the benchmark method in the literature and achieved an overall improvement of approximately 3%. The source code and supplementary information of the proposed method is available at https://github.com/roneshsharma/Predict-DNA-binding-proteins/wiki.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ab.2020.113954DOI Listing
January 2021

Predicting protein-peptide binding sites with a deep convolutional neural network.

J Theor Biol 2020 07 13;496:110278. Epub 2020 Apr 13.

Institute for Integrated and Intelligent Systems, Griffith University, Queensland, Australia; Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan; CREST, JST, Tokyo 113-8510, Japan; School of Engineering and Physics, The University of the South Pacific, Suva, Fiji. Electronic address:

Motivation: Interactions between proteins and peptides influence biological functions. Predicting such bio-molecular interactions can lead to faster disease prevention and help in drug discovery. Experimental methods for determining protein-peptide binding sites are costly and time-consuming. Therefore, computational methods have become prevalent. However, existing models show extremely low detection rates of actual peptide binding sites in proteins. To address this problem, we employed a two-stage technique - first, we extracted the relevant features from protein sequences and transformed them into images applying a novel method and then, we applied a convolutional neural network to identify the peptide binding sites in proteins.

Results: We found that our approach achieves 67% sensitivity or recall (true positive rate) surpassing existing methods by over 35%.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jtbi.2020.110278DOI Listing
July 2020

Variants encoding a restricted carboxy-terminal domain of SLC12A2 cause hereditary hearing loss in humans.

PLoS Genet 2020 04 15;16(4):e1008643. Epub 2020 Apr 15.

Division of Hearing and Balance Research, National Institute of Sensory Organs, National Hospital Organization Tokyo Medical Center, Meguro, Tokyo, Japan.

Hereditary hearing loss is challenging to diagnose because of the heterogeneity of the causative genes. Further, some genes involved in hereditary hearing loss have yet to be identified. Using whole-exome analysis of three families with congenital, severe-to-profound hearing loss, we identified a missense variant of SLC12A2 in five affected members of one family showing a dominant inheritance mode, along with de novo splice-site and missense variants of SLC12A2 in two sporadic cases, as promising candidates associated with hearing loss. Furthermore, we detected another de novo missense variant of SLC12A2 in a sporadic case. SLC12A2 encodes Na+, K+, 2Cl- cotransporter (NKCC) 1 and plays critical roles in the homeostasis of K+-enriched endolymph. Slc12a2-deficient mice have congenital, profound deafness; however, no human variant of SLC12A2 has been reported as associated with hearing loss. All identified SLC12A2 variants mapped to exon 21 or its 3'-splice site. In vitro analysis indicated that the splice-site variant generates an exon 21-skipped SLC12A2 mRNA transcript expressed at much lower levels than the exon 21-included transcript in the cochlea, suggesting a tissue-specific role for the exon 21-encoded region in the carboy-terminal domain. In vitro functional analysis demonstrated that Cl- influx was significantly decreased in all SLC12A2 variants studied. Immunohistochemistry revealed that SLC12A2 is located on the plasma membrane of several types of cells in the cochlea, including the strial marginal cells, which are critical for endolymph homeostasis. Overall, this study suggests that variants affecting exon 21 of the SLC12A2 transcript are responsible for hereditary hearing loss in humans.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1008643DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7159186PMC
April 2020

Unveiling synapse pathology in spinal bulbar muscular atrophy by genome-wide transcriptome analysis of purified motor neurons derived from disease specific iPSCs.

Mol Brain 2020 02 19;13(1):18. Epub 2020 Feb 19.

Department of Neurology, Aichi Medical University School of Medicine, 1-1 Yazakokarimata, Nagakute, Aichi, 480-1195, Japan.

Spinal bulbar muscular atrophy (SBMA) is an adult-onset, slowly progressive motor neuron disease caused by abnormal CAG repeat expansion in the androgen receptor (AR) gene. Although ligand (testosterone)-dependent mutant AR aggregation has been shown to play important roles in motor neuronal degeneration by the analyses of transgenic mice models and in vitro cell culture models, the underlying disease mechanisms remain to be fully elucidated because of the discrepancy between model mice and SBMA patients. Thus, novel human disease models that recapitulate SBMA patients' pathology more accurately are required for more precise pathophysiological analysis and the development of novel therapeutics. Here, we established disease specific iPSCs from four SBMA patients, and differentiated them into spinal motor neurons. To investigate motor neuron specific pathology, we purified iPSC-derived motor neurons using flow cytometry and cell sorting based on the motor neuron specific reporter, HB9::Venus, and proceeded to the genome-wide transcriptome analysis by RNA sequences. The results revealed the involvement of the pathology associated with synapses, epigenetics, and endoplasmic reticulum (ER) in SBMA. Notably, we demonstrated the involvement of the neuromuscular synapse via significant upregulation of Synaptotagmin, R-Spondin2 (RSPO2), and WNT ligands in motor neurons derived from SBMA patients, which are known to be associated with neuromuscular junction (NMJ) formation and acetylcholine receptor (AChR) clustering. These aberrant gene expression in neuromuscular synapses might represent a novel therapeutic target for SBMA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13041-020-0561-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7029484PMC
February 2020

Analyses of non-coding somatic drivers in 2,658 cancer whole genomes.

Nature 2020 02 5;578(7793):102-111. Epub 2020 Feb 5.

Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.

The discovery of drivers of cancer has traditionally focused on protein-coding genes. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers, raise doubts about others and identify novel candidates, including point mutations in the 5' region of TP53, in the 3' untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-1965-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054214PMC
February 2020

Quantification of multicellular colonization in tumor metastasis using exome-sequencing data.

Int J Cancer 2020 05 15;146(9):2488-2497. Epub 2020 Feb 15.

Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, Japan.

Metastasis is a major cause of cancer-related mortality, and it is essential to understand how metastasis occurs in order to overcome it. One relevant question is the origin of a metastatic tumor cell population. Although the hypothesis of a single-cell origin for metastasis from a primary tumor has long been prevalent, several recent studies using mouse models have supported a multicellular origin of metastasis. Human bulk whole-exome sequencing (WES) studies also have demonstrated a multiple "clonal" origin of metastasis, with different mutational compositions. Specifically, there has not yet been strong research to determine how many founder cells colonize a metastatic tumor. To address this question, under the metastatic model of "single bottleneck followed by rapid growth," we developed a method to quantify the "founder cell population size" in a metastasis using paired WES data from primary and metachronous metastatic tumors. Simulation studies demonstrated the proposed method gives unbiased results with sufficient accuracy in the range of realistic settings. Applying the proposed method to real WES data from four colorectal cancer patients, all samples supported a multicellular origin of metastasis and the founder size was quantified, ranging from 3 to 17 cells. Such a wide-range of founder sizes estimated by the proposed method suggests that there are large variations in genetic similarity between primary and metastatic tumors in the same subjects, which may explain the observed (dis)similarity of drug responses between tumors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/ijc.32910DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7079087PMC
May 2020

Bigram-PGK: phosphoglycerylation prediction using the technique of bigram probabilities of position specific scoring matrix.

BMC Mol Cell Biol 2019 Dec 20;20(Suppl 2):57. Epub 2019 Dec 20.

Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan.

Background: The biological process known as post-translational modification (PTM) is a condition whereby proteomes are modified that affects normal cell biology, and hence the pathogenesis. A number of PTMs have been discovered in the recent years and lysine phosphoglycerylation is one of the fairly recent developments. Even with a large number of proteins being sequenced in the post-genomic era, the identification of phosphoglycerylation remains a big challenge due to factors such as cost, time consumption and inefficiency involved in the experimental efforts. To overcome this issue, computational techniques have emerged to accurately identify phosphoglycerylated lysine residues. However, the computational techniques proposed so far hold limitations to correctly predict this covalent modification.

Results: We propose a new predictor in this paper called Bigram-PGK which uses evolutionary information of amino acids to try and predict phosphoglycerylated sites. The benchmark dataset which contains experimentally labelled sites is employed for this purpose and profile bigram occurrences is calculated from position specific scoring matrices of amino acids in the protein sequences. The statistical measures of this work, such as sensitivity, specificity, precision, accuracy, Mathews correlation coefficient and area under ROC curve have been reported to be 0.9642, 0.8973, 0.8253, 0.9193, 0.8330, 0.9306, respectively.

Conclusions: The proposed predictor, based on the feature of evolutionary information and support vector machine classifier, has shown great potential to effectively predict phosphoglycerylated and non-phosphoglycerylated lysine residues when compared against the existing predictors. The data and software of this work can be acquired from https://github.com/abelavit/Bigram-PGK.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12860-019-0240-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923822PMC
December 2019

A comparison of machine learning classifiers for dementia with Lewy bodies using miRNA expression data.

BMC Med Genomics 2019 10 30;12(1):150. Epub 2019 Oct 30.

Laboratory Chief, Division of Genomic Medicine, Medical Genome Center, National Center for Geriatrics and Gerontology, 7-430 Morioka-cho, Obu, Aichi, 474-8511, Japan.

Background: Dementia with Lewy bodies (DLB) is the second most common subtype of neurodegenerative dementia in humans following Alzheimer's disease (AD). Present clinical diagnosis of DLB has high specificity and low sensitivity and finding potential biomarkers of prodromal DLB is still challenging. MicroRNAs (miRNAs) have recently received a lot of attention as a source of novel biomarkers.

Methods: In this study, using serum miRNA expression of 478 Japanese individuals, we investigated potential miRNA biomarkers and constructed an optimal risk prediction model based on several machine learning methods: penalized regression, random forest, support vector machine, and gradient boosting decision tree.

Results: The final risk prediction model, constructed via a gradient boosting decision tree using 180 miRNAs and two clinical features, achieved an accuracy of 0.829 on an independent test set. We further predicted candidate target genes from the miRNAs. Gene set enrichment analysis of the miRNA target genes revealed 6 functional genes included in the DHA signaling pathway associated with DLB pathology. Two of them were further supported by gene-based association studies using a large number of single nucleotide polymorphism markers (BCL2L1: P = 0.012, PIK3R2: P = 0.021).

Conclusions: Our proposed prediction model provides an effective tool for DLB classification. Also, a gene-based association test of rare variants revealed that BCL2L1 and PIK3R2 were statistically significantly associated with DLB.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12920-019-0607-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6822471PMC
October 2019

DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture.

Sci Rep 2019 08 6;9(1):11399. Epub 2019 Aug 6.

Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.

It is critical, but difficult, to catch the small variation in genomic or other kinds of data that differentiates phenotypes or categories. A plethora of data is available, but the information from its genes or elements is spread over arbitrarily, making it challenging to extract relevant details for identification. However, an arrangement of similar genes into clusters makes these differences more accessible and allows for robust identification of hidden mechanisms (e.g. pathways) than dealing with elements individually. Here we propose, DeepInsight, which converts non-image samples into a well-organized image-form. Thereby, the power of convolution neural network (CNN), including GPU utilization, can be realized for non-image samples. Furthermore, DeepInsight enables feature extraction through the application of CNN for non-image samples to seize imperative information and shown promising results. To our knowledge, this is the first work to apply CNN simultaneously on different kinds of non-image datasets: RNA-seq, vowels, text, and artificial.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-019-47765-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6684600PMC
August 2019

Brain wave classification using long short-term memory network based OPTICAL predictor.

Sci Rep 2019 06 24;9(1):9153. Epub 2019 Jun 24.

Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan.

Brain-computer interface (BCI) systems having the ability to classify brain waves with greater accuracy are highly desirable. To this end, a number of techniques have been proposed aiming to be able to classify brain waves with high accuracy. However, the ability to classify brain waves and its implementation in real-time is still limited. In this study, we introduce a novel scheme for classifying motor imagery (MI) tasks using electroencephalography (EEG) signal that can be implemented in real-time having high classification accuracy between different MI tasks. We propose a new predictor, OPTICAL, that uses a combination of common spatial pattern (CSP) and long short-term memory (LSTM) network for obtaining improved MI EEG signal classification. A sliding window approach is proposed to obtain the time-series input from the spatially filtered data, which becomes input to the LSTM network. Moreover, instead of using LSTM directly for classification, we use regression based output of the LSTM network as one of the features for classification. On the other hand, linear discriminant analysis (LDA) is used to reduce the dimensionality of the CSP variance based features. The features in the reduced dimensional plane after performing LDA are used as input to the support vector machine (SVM) classifier together with the regression based feature obtained from the LSTM network. The regression based feature further boosts the performance of the proposed OPTICAL predictor. OPTICAL showed significant improvement in the ability to accurately classify left and right-hand MI tasks on two publically available datasets. The improvements in the average misclassification rates are 3.09% and 2.07% for BCI Competition IV Dataset I and GigaDB dataset, respectively. The Matlab code is available at https://github.com/ShiuKumar/OPTICAL .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-019-45605-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6591300PMC
June 2019

Author Correction: A missense mutation in the HECT domain of NEDD4L identified in a girl with periventricular nodular heterotopia, polymicrogyria, and cleft palate.

J Hum Genet 2019 Jul;64(7):701-702

Department of Pediatrics and Neonatology, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan.

Since the publication of this article, it has been brought to our attention, that the identified mutation (NM_015277: c.2617 G > A; p.Glu873Lys) is identical with the mutation (NM_001144967: c.2677 G > A; p.Glu893Lys) reported by Broix et al (Nature Genetics 48, 1349-1358, 2016 https://doi.org/10.1038/ng.3676 ). Therefore the mutation is not novel but recurrent. Accordingly, the word "novel" should be deleted throughout the article including the title. Thus, the title should read "A missense mutation in the HECT domain of NEDD4L identified in a girl with periventricular nodular heterotopia, polymicrogyria, and cleft palate."
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s10038-019-0610-8DOI Listing
July 2019

HseSUMO: Sumoylation site prediction using half-sphere exposures of amino acids residues.

BMC Genomics 2019 Apr 18;19(Suppl 9):982. Epub 2019 Apr 18.

Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.

Background: Post-translational modifications are viewed as an important mechanism for controlling protein function and are believed to be involved in multiple important diseases. However, their profiling using laboratory-based techniques remain challenging. Therefore, making the development of accurate computational methods to predict post-translational modifications is particularly important for making progress in this area of research.

Results: This work explores the use of four half-sphere exposure-based features for computational prediction of sumoylation sites. Unlike most of the previously proposed approaches, which focused on patterns of amino acid co-occurrence, we were able to demonstrate that protein structural based features could be sufficiently informative to achieve good predictive performance. The evaluation of our method has demonstrated high sensitivity (0.9), accuracy (0.89) and Matthew's correlation coefficient (0.78-0.79). We have compared these results to the recently released pSumo-CD method and were able to demonstrate better performance of our method on the same evaluation dataset.

Conclusions: The proposed predictor HseSUMO uses half-sphere exposures of amino acids to predict sumoylation sites. It has shown promising results on a benchmark dataset when compared with the state-of-the-art method. The extracted data of this study can be accessed at https://github.com/YosvanyLopez/HseSUMO .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-018-5206-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7402407PMC
April 2019

Risk prediction models for dementia constructed by supervised principal component analysis using miRNA expression data.

Commun Biol 2019 25;2:77. Epub 2019 Feb 25.

Medical Genome Center, National Center for Geriatrics and Gerontology, Obu, Aichi, 474-8511, Japan.

Alzheimer's disease (AD) is the most common subtype of dementia, followed by Vascular Dementia (VaD), and Dementia with Lewy Bodies (DLB). Recently, microRNAs (miRNAs) have received a lot of attention as the novel biomarkers for dementia. Here, using serum miRNA expression of 1,601 Japanese individuals, we investigated potential miRNA biomarkers and constructed risk prediction models, based on a supervised principal component analysis (PCA) logistic regression method, according to the subtype of dementia. The final risk prediction model achieved a high accuracy of 0.873 on a validation cohort in AD, when using 78 miRNAs: Accuracy = 0.836 with 86 miRNAs in VaD; Accuracy = 0.825 with 110 miRNAs in DLB. To our knowledge, this is the first report applying miRNA-based risk prediction models to a dementia prospective cohort. Our study demonstrates our models to be effective in prospective disease risk prediction, and with further improvement may contribute to practical clinical use in dementia.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-019-0324-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6389908PMC
April 2020

Discovering MoRFs by trisecting intrinsically disordered protein sequence into terminals and middle regions.

BMC Bioinformatics 2019 Feb 4;19(Suppl 13):378. Epub 2019 Feb 4.

Laboratory of Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan.

Background: Molecular Recognition Features (MoRFs) are short protein regions present in intrinsically disordered protein (IDPs) sequences. MoRFs interact with structured partner protein and upon interaction, they undergo a disorder-to-order transition to perform various biological functions. Analyses of MoRFs are important towards understanding their function.

Results: Performance is reported using the MoRF dataset that has been previously used to compare the other existing MoRF predictors. The performance obtained in this study is equivalent to the benchmarked OPAL predictor, i.e., OPAL achieved AUC of 0.815, whereas the model in this study achieved AUC of 0.819 using TEST set.

Conclusion: Achieving comparable performance, the proposed method can be used as an alternative approach for MoRF prediction.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-018-2396-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7653905PMC
February 2019

de novo gain-of-function mutation in a patient with a novel megalencephaly syndrome.

J Med Genet 2019 06 20;56(6):388-395. Epub 2018 Dec 20.

Department of Pediatrics and Neonatology, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan.

Background: In this study, we aimed to identify the gene abnormality responsible for pathogenicity in an individual with an undiagnosed neurodevelopmental disorder with megalencephaly, ventriculomegaly, hypoplastic corpus callosum, intellectual disability, polydactyly and neuroblastoma. We then explored the underlying molecular mechanism.

Methods: Trio-based, whole-exome sequencing was performed to identify disease-causing gene mutation. Biochemical and cell biological analyses were carried out to elucidate the pathophysiological significance of the identified gene mutation.

Results: We identified a heterozygous missense mutation (c.173C>T; p.Thr58Met) in the gene, at the Thr58 phosphorylation site essential for ubiquitination and subsequent MYCN degradation. The mutant MYCN (MYCN-T58M) was non-phosphorylatable at Thr58 and subsequently accumulated in cells and appeared to induce CCND1 and CCND2 expression in neuronal progenitor and stem cells in vitro. Overexpression of Mycn mimicking the p.Thr58Met mutation also promoted neuronal cell proliferation, and affected neuronal cell migration during corticogenesis in mouse embryos.

Conclusions: We identified a de novo c.173C>T mutation in which leads to stabilisation and accumulation of the MYCN protein, leading to prolonged CCND1 and CCND2 expression. This may promote neurogenesis in the developing cerebral cortex, leading to megalencephaly. While loss-of-function mutations in are known to cause Feingold syndrome, this is the first report of a germline gain-of-function mutation in identified in a patient with a novel megalencephaly syndrome similar to, but distinct from, CCND2-related megalencephaly-polymicrogyria-polydactyly-hydrocephalus syndrome. The data obtained here provide new insight into the critical role of MYCN in brain development, as well as the consequences of MYCN defects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1136/jmedgenet-2018-105487DOI Listing
June 2019

PhoglyStruct: Prediction of phosphoglycerylated lysine residues using structural properties of amino acids.

Sci Rep 2018 12 18;8(1):17923. Epub 2018 Dec 18.

Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, 113-8510, Japan.

The biological process known as post-translational modification (PTM) contributes to diversifying the proteome hence affecting many aspects of normal cell biology and pathogenesis. There have been many recently reported PTMs, but lysine phosphoglycerylation has emerged as the most recent subject of interest. Despite a large number of proteins being sequenced, the experimental method for detection of phosphoglycerylated residues remains an expensive, time-consuming and inefficient endeavor in the post-genomic era. Instead, the computational methods are being proposed for accurately predicting phosphoglycerylated lysines. Though a number of predictors are available, performance in detecting phosphoglycerylated lysine residues is still limited. In this paper, we propose a new predictor called PhoglyStruct that utilizes structural information of amino acids alongside a multilayer perceptron classifier for predicting phosphoglycerylated and non-phosphoglycerylated lysine residues. For the experiment, we located phosphoglycerylated and non-phosphoglycerylated lysines in our employed benchmark. We then derived and integrated properties such as accessible surface area, backbone torsion angles, and local structure conformations. PhoglyStruct showed significant improvement in the ability to detect phosphoglycerylated residues from non-phosphoglycerylated ones when compared to previous predictors. The sensitivity, specificity, accuracy, Mathews correlation coefficient and AUC were 0.8542, 0.7597, 0.7834, 0.5468 and 0.8077, respectively. The data and Matlab/Octave software packages are available at https://github.com/abelavit/PhoglyStruct .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-018-36203-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6299098PMC
December 2018

SumSec: Accurate Prediction of Sumoylation Sites Using Predicted Secondary Structure.

Molecules 2018 Dec 10;23(12). Epub 2018 Dec 10.

Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.

Post Translational Modification (PTM) is defined as the modification of amino acids along the protein sequences after the translation process. These modifications significantly impact on the functioning of proteins. Therefore, having a comprehensive understanding of the underlying mechanism of PTMs turns out to be critical in studying the biological roles of proteins. Among a wide range of PTMs, sumoylation is one of the most important modifications due to its known cellular functions which include transcriptional regulation, protein stability, and protein subcellular localization. Despite its importance, determining sumoylation sites via experimental methods is time-consuming and costly. This has led to a great demand for the development of fast computational methods able to accurately determine sumoylation sites in proteins. In this study, we present a new machine learning-based method for predicting sumoylation sites called SumSec. To do this, we employed the predicted secondary structure of amino acids to extract two types of structural features from neighboring amino acids along the protein sequence which has never been used for this task. As a result, our proposed method is able to enhance the sumoylation site prediction task, outperforming previously proposed methods in the literature. SumSec demonstrated high sensitivity (0.91), accuracy (0.94) and MCC (0.88). The prediction accuracy achieved in this study is 21% better than those reported in previous studies. The script and extracted features are publicly available at: https://github.com/YosvanyLopez/SumSec.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/molecules23123260DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6320791PMC
December 2018