Publications by authors named "Hanaa Torkey"

4 Publications

  • Page 1 of 1

A novel deep autoencoder based survival analysis approach for microarray dataset.

PeerJ Comput Sci 2021 21;7:e492. Epub 2021 Apr 21.

Faculty of Engineering, Delta University for Science and Technology, Gamasa, Egypt.

Background: Breast cancer is one of the major causes of mortality globally. Therefore, different Machine Learning (ML) techniques were deployed for computing survival and diagnosis. Survival analysis methods are used to compute survival probability and the most important factors affecting that probability. Most survival analysis methods are used to deal with clinical features (up to hundreds), hence applying survival analysis methods like cox regression on RNAseq microarray data with many features (up to thousands) is considered a major challenge.

Methods: In this paper, a novel approach applying autoencoder to reduce the number of features is proposed. Our approach works on features reconstruction, and removal of noise within the data and features with zero variance across the samples, which facilitates extraction of features with the highest variances (across the samples) that most influence the survival probabilities. Then, it estimates the survival probability for each patient by applying random survival forests and cox regression. Applying the autoencoder on thousands of features takes a long time, thus our model is applied to the Graphical Processing Unit (GPU) in order to speed up the process. Finally, the model is evaluated and compared with the existing models on three different datasets in terms of run time, concordance index, and calibration curve, and the most related genes to survival are discovered. Finally, the biological pathways and GO molecular functions are analyzed for these significant genes.

Results: We fine-tuned our autoencoder model on RNA-seq data of three datasets to train the weights in our survival prediction model, then using different samples in each dataset for testing the model. The results show that the proposed AutoCox and AutoRandom algorithms based on our feature selection autoencoder approach have better concordance index results comparing the most recent deep learning approaches when applied to each dataset. Each gene resulting from our autoencoder model weight is computed. The weights show the degree of effect for each gene upon the survival probability. For instance, four of the most survival-related experimentally validated genes are on the top of our discovered genes weights list, including PTPRG, MYST1, BG683264, and AK094562 for the breast cancer gene expression dataset. Our approach improves the survival analysis in terms of speeding up the process, enhancing the prediction accuracy, and reducing the error rate in the estimated survival probability.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj-cs.492DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080419PMC
April 2021

Coronavirus disease 2019 (COVID-19): survival analysis using deep learning and Cox regression model.

Pattern Anal Appl 2021 Feb 15:1-13. Epub 2021 Feb 15.

Faculty of Engineering, Delta University for Science and Technology, Gamasa, Egypt.

Coronavirus (COVID-19) is one of the most serious problems that has caused stopping the wheel of life all over the world. It is widely spread to the extent that hospital places are not available for all patients. Therefore, most hospitals accept patients whose recovery rate is high. Machine learning techniques and artificial intelligence have been deployed for computing infection risks, performing survival analysis and classification. Survival analysis (time-to-event analysis) is widely used in many areas such as engineering and medicine. This paper presents two systems, Cox_COVID_19 and Deep_ Cox_COVID_19 that are based on Cox regression to study the survival analysis for COVID-19 and help hospitals to choose patients with better chances of survival and predict the most important symptoms (features) affecting survival probability. Cox_COVID_19 is based on Cox regression and Deep_Cox_COVID_19 is a combination of autoencoder deep neural network and Cox regression to enhance prediction accuracy. A clinical dataset for COVID-19 patients is used. This dataset consists of 1085 patients. The results show that applying an autoencoder on the data to reconstruct features, before applying Cox regression algorithm, would improve the results by increasing concordance, accuracy and precision. For Deep_ Cox_COVID_19 system, it has a concordance of 0.983 for training and 0.999 for testing, but for Cox_COVID_19 system, it has a concordance of 0.923 for training and 0.896 for testing. The most important features affecting mortality are, age, muscle pain, pneumonia and throat pain. Both Cox_COVID_19 and Deep_ Cox_COVID_19 prediction systems can predict the survival probability and present significant symptoms (features) that differentiate severe cases and death cases. But the accuracy of Deep_Cox_Covid_19 outperforms that of Cox_Covid_19. Both systems can provide definite information for doctors about detection and intervention to be taken, which can reduce mortality.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10044-021-00958-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7883884PMC
February 2021

MicroTarget: MicroRNA target gene prediction approach with application to breast cancer.

J Bioinform Comput Biol 2017 Aug 2;15(4):1750013. Epub 2017 May 2.

† Informatics and Systems Department, National Research Centre, Cairo, Egypt.

MicroRNAs are known to play an essential role in gene regulation in plants and animals. The standard method for understanding microRNA-gene interactions is randomized controlled perturbation experiments. These experiments are costly and time consuming. Therefore, use of computational methods is essential. Currently, several computational methods have been developed to discover microRNA target genes. However, these methods have limitations based on the features that are used for prediction. The commonly used features are complementarity to the seed region of the microRNA, site accessibility, and evolutionary conservation. Unfortunately, not all microRNA target sites are conserved or adhere to exact seed complementary, and relying on site accessibility does not guarantee that the interaction exists. Moreover, the study of regulatory interactions composed of the same tissue expression data for microRNAs and mRNAs is necessary to understand the specificity of regulation and function. We developed MicroTarget to predict a microRNA-gene regulatory network using heterogeneous data sources, especially gene and microRNA expression data. First, MicroTarget employs expression data to learn a candidate target set for each microRNA. Then, it uses sequence data to provide evidence of direct interactions. MicroTarget scores and ranks the predicted targets based on a set of features. The predicted targets overlap with many of the experimentally validated ones. Our results indicate that using expression data in target prediction is more accurate in terms of specificity and sensitivity. Available at: https://bioinformatics.cs.vt.edu/~htorkey/microTarget .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0219720017500135DOI Listing
August 2017

A system to automatically classify and name any individual genome-sequenced organism independently of current biological classification and nomenclature.

PLoS One 2014 21;9(2):e89142. Epub 2014 Feb 21.

Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, Virginia, United States of America ; This Genomic Life Inc., Blacksburg, Virginia, United States of America.

A broadly accepted and stable biological classification system is a prerequisite for biological sciences. It provides the means to describe and communicate about life without ambiguity. Current biological classification and nomenclature use the species as the basic unit and require lengthy and laborious species descriptions before newly discovered organisms can be assigned to a species and be named. The current system is thus inadequate to classify and name the immense genetic diversity within species that is now being revealed by genome sequencing on a daily basis. To address this lack of a general intra-species classification and naming system adequate for today's speed of discovery of new diversity, we propose a classification and naming system that is exclusively based on genome similarity and that is suitable for automatic assignment of codes to any genome-sequenced organism without requiring any phenotypic or phylogenetic analysis. We provide examples demonstrating that genome similarity-based codes largely align with current taxonomic groups at many different levels in bacteria, animals, humans, plants, and viruses. Importantly, the proposed approach is only slightly affected by the order of code assignment and can thus provide codes that reflect similarity between organisms and that do not need to be revised upon discovery of new diversity. We envision genome similarity-based codes to complement current biological nomenclature and to provide a universal means to communicate unambiguously about any genome-sequenced organism in fields as diverse as biodiversity research, infectious disease control, human and microbial forensics, animal breed and plant cultivar certification, and human ancestry research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0089142PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3931686PMC
January 2015