Publications by authors named "Syed Ahmad Chan Bukhari"

14 Publications

  • Page 1 of 1

The ADC API: A Web API for the Programmatic Query of the AIRR Data Commons.

Front Big Data 2020 17;3:22. Epub 2020 Jun 17.

Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, United States.

The Adaptive Immune Receptor Repertoire (AIRR) Community is a research-driven group that is establishing a clear set of community-accepted data and metadata standards; standards-based reference implementation tools; and policies and practices for infrastructure to support the deposit, curation, storage, and use of high-throughput sequencing data from B-cell and T-cell receptor repertoires (AIRR-seq data). The AIRR Data Commons is a distributed system of data repositories that utilizes a common data model, a common query language, and common interoperability formats for storage, query, and downloading of AIRR-seq data. Here is described the principal technical standards for the AIRR Data Commons consisting of the AIRR Data Model for repertoires and rearrangements, the AIRR Data Commons (ADC) API for programmatic query of data repositories, a reference implementation for ADC API services, and tools for querying and validating data repositories that support the ADC API. AIRR-seq data repositories can become part of the AIRR Data Commons by implementing the data model and API. The AIRR Data Commons allows AIRR-seq data to be reused for novel analyses and empowers researchers to discover new biological insights about the adaptive immune system.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fdata.2020.00022DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931935PMC
June 2020

Nonclinical Features in Predictive Modeling of Cardiovascular Diseases: A Machine Learning Approach.

Interdiscip Sci 2021 Jun 6;13(2):201-211. Epub 2021 Mar 6.

Department of Computer Science and Engineering, SRM Institute of Science & Technology, Kattankulathur, Chengalpattu (D.t), 603 203, Tamilnadu, India.

Background: In the broader healthcare domain, the prediction bears more value than an explanation considering the cost of delays in its services. There are various risk prediction models for cardiovascular diseases (CVDs) in the literature for early risk assessment. However, the substantial increase in CVDs-related mortality is challenging global health systems, especially in developing countries. This situation allows researchers to improve CVDs prediction models using new features and risk computing methods. This study aims to assess nonclinical features that can be easily available in any healthcare systems, in predicting CVDs using advanced and flexible machine learning (ML) algorithms.

Methods: A gender-matched case-control study was conducted in the largest public sector cardiac hospital of Pakistan, and the data of 460 subjects were collected. The dataset comprised of eight nonclinical features. Four supervised ML algorithms were used to train and test the models to predict the CVDs status by considering traditional logistic regression (LR) as the baseline model. The models were validated through the train-test split (70:30) and tenfold cross-validation approaches.

Results: Random forest (RF), a nonlinear ML algorithm, performed better than other ML algorithms and LR. The area under the curve (AUC) of RF was 0.851 and 0.853 in the train-test split and tenfold cross-validation approach, respectively. The nonclinical features yielded an admissible accuracy (minimum 71%) through the LR and ML models, exhibiting its predictive capability in risk estimation.

Conclusion: The satisfactory performance of nonclinical features reveals that these features and flexible computational methodologies can reinforce the existing risk prediction models for better healthcare services.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s12539-021-00423-wDOI Listing
June 2021

Modifiable risk factors and overall cardiovascular mortality: Moderation of urbanization.

J Public Health Res 2020 Oct 17;9(4):1893. Epub 2020 Nov 17.

Centre for Mathematical Sciences, College of Computing and Applied Sciences, Universiti Malaysia Pahang, Gambang, Malaysia.

Modifiable risk factors are associated with cardiovascular mortality (CVM) which is a leading form of global mortality. However, diverse nature of urbanization and its objective measurement can modify their relationship. This study aims to investigate the moderating role of urbanization in the relationship of combined exposure (CE) of modifiable risk factors and CVM. This is the first comprehensive study which considers different forms of urbanization to gauge its manifold impact. Therefore, in addition to existing original quantitative form and traditional two categories of urbanization, a new form consisted of four levels of urbanization was duly introduced. This study used data of 129 countries mainly retrieved from a WHO report, Non-Communicable Diseases Country Profile 2014. Factor scores obtained through confirmatory factor analysis were used to compute the CE. Age-income adjusted regression model for CVM was tested as a baseline with three bootstrap regression models developed for the three forms of urbanization. Results revealed that the CE and CVM baseline relationship was significantly moderated through the original quantitative form of urbanization. Contrarily, the two traditional categories of urbanization could not capture the moderating impact. However, the four levels of urbanization were objectively estimated the urbanization impact and subsequently indicated that the CE was more alarming in causing the CVM in levels 2 and 3 urbanized countries, mainly from low-middle-income countries. This study concluded that the urbanization is a strong moderator and it could be gauged effectively through four levels whereas sufficiency of two traditional categories of urbanization is questionable.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.4081/jphr.2020.1893DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7686791PMC
October 2020

Multimodal Brain Tumor Classification Using Deep Learning and Robust Feature Selection: A Machine Learning Application for Radiologists.

Diagnostics (Basel) 2020 Aug 6;10(8). Epub 2020 Aug 6.

Division of Computer Science, Mathematics and Science, Collins College of Professional Studies, St. John's University, New York, NY 11439, USA.

Manual identification of brain tumors is an error-prone and tedious process for radiologists; therefore, it is crucial to adopt an automated system. The binary classification process, such as malignant or benign is relatively trivial; whereas, the multimodal brain tumors classification (T1, T2, T1CE, and Flair) is a challenging task for radiologists. Here, we present an automated multimodal classification method using deep learning for brain tumor type classification. The proposed method consists of five core steps. In the first step, the linear contrast stretching is employed using edge-based histogram equalization and discrete cosine transform (DCT). In the second step, deep learning feature extraction is performed. By utilizing transfer learning, two pre-trained convolutional neural network (CNN) models, namely VGG16 and VGG19, were used for feature extraction. In the third step, a correntropy-based joint learning approach was implemented along with the extreme learning machine (ELM) for the selection of best features. In the fourth step, the partial least square (PLS)-based robust covariant features were fused in one matrix. The combined matrix was fed to ELM for final classification. The proposed method was validated on the BraTS datasets and an accuracy of 97.8%, 96.9%, 92.5% for BraTs2015, BraTs2017, and BraTs2018, respectively, was achieved.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/diagnostics10080565DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459797PMC
August 2020

A statistically rigorous deep neural network approach to predict mortality in trauma patients admitted to the intensive care unit.

J Trauma Acute Care Surg 2020 10;89(4):736-742

From the Department of Pathology (F.S.A.), Yale School of Medicine, Yale University, New Haven, Connecticut; School of Information and Communication Engineering (L.A.), University of Electronic Science and Technology of China (UESTC), Chengdu, China; Department of Electrical Engineering (L.A.), University of Science and Technology, Bannu, Pakistan; Division of Trauma, Acute Care, Burn, and Emergency Surgery (B.A.J.), University of Arizona, Tucson, Arizona; Department of Neurology (A.I.), University of New Mexico, Albuquerque, New Mexico; Department of Computer Science (R.-u.-M.), COMSATS University Islamabad, Islamabad, Pakistan; and Division of Computer Science, Mathematics, and Science (Healthcare Informatics) (S.A.C.B.), St. John's University, New York, New York.

Background: Trauma patients admitted to critical care are at high risk of mortality because of their injuries. Our aim was to develop a machine learning-based model to predict mortality using Fahad-Liaqat-Ahmad Intensive Machine (FLAIM) framework. We hypothesized machine learning could be applied to critically ill patients and would outperform currently used mortality scores.

Methods: The current Deep-FLAIM model evaluates the statistically significant risk factors and then supply these risk factors to deep neural network to predict mortality in trauma patients admitted to the intensive care unit (ICU). We analyzed adult patients (≥18 years) admitted to the trauma ICU in the publicly available database Medical Information Mart for Intensive Care III version 1.4. The first phase selection of risk factor was done using Cox-regression univariate and multivariate analyses. In the second phase, we applied deep neural network and other traditional machine learning models like Linear Discriminant Analysis, Gaussian Naïve Bayes, Decision Tree Model, and k-nearest neighbor models.

Results: We identified a total of 3,041 trauma patients admitted to the trauma surgery ICU. We observed that several clinical and laboratory-based variables were statistically significant for both univariate and multivariate analyses while others were not. With most significant being serum anion gap (hazard ratio [HR], 2.46; 95% confidence interval [CI], 1.94-3.11), sodium (HR, 2.11; 95% CI, 1.61-2.77), and chloride (HR, 2.11; 95% CI, 1.69-2.64) abnormalities on laboratories, while clinical variables included the diagnosis of sepsis (HR, 2.03; 95% CI, 1.23-3.37), Quick Sequential Organ Failure Assessment score (HR, 1.52; 95% CI, 1.32-3.76). And Systemic Inflammatory Response Syndrome criteria (HR. 1.41; 95% CI, 1.24-1.26). After we used these clinically significant variables and applied various machine learning models to the data, we found out that our proposed DNN outperformed all the other methods with test set accuracy of 92.25%, sensitivity of 79.13%, and specificity of 94.16%; positive predictive value, 66.42%; negative predictive value, 96.87%; and area under the curve of the receiver-operator curve of 0.91 (1.45-1.29).

Conclusion: Our novel Deep-FLAIM model outperformed all other machine learning models. The model is easy to implement, user friendly and with high accuracy.

Level Of Evidence: Prognostic study, level II.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/TA.0000000000002888DOI Listing
October 2020

Brain Tumor Detection by Using Stacked Autoencoders in Deep Learning.

J Med Syst 2019 Dec 17;44(2):32. Epub 2019 Dec 17.

Division of Computer Science, Mathematics and Science, Collins College of Professional Studies, St. John's University, New York, USA.

Brain tumor detection depicts a tough job because of its shape, size and appearance variations. In this manuscript, a deep learning model is deployed to predict input slices as a tumor (unhealthy)/non-tumor (healthy). This manuscript employs a high pass filter image to prominent the inhomogeneities field effect of the MR slices and fused with the input slices. Moreover, the median filter is applied to the fused slices. The resultant slices quality is improved with smoothen and highlighted edges of the input slices. After that, based on these slices' intensity, a 4-connected seed growing algorithm is applied, where optimal threshold clusters the similar pixels from the input slices. The segmented slices are then supplied to the fine-tuned two layers proposed stacked sparse autoencoder (SSAE) model. The hyperparameters of the model are selected after extensive experiments. At the first layer, 200 hidden units and at the second layer 400 hidden units are utilized. The testing is performed on the softmax layer for the prediction of the images having tumors and no tumors. The suggested model is trained and checked on BRATS datasets i.e., 2012(challenge and synthetic), 2013, and 2013 Leaderboard, 2014, and 2015 datasets. The presented model is evaluated with a number of performance metrics which demonstrates the improved performance.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10916-019-1483-2DOI Listing
December 2019

Fish-Pak: Fish species dataset from Pakistan for visual features based classification.

Data Brief 2019 Dec 4;27:104565. Epub 2019 Oct 4.

Division of Computer Science, Mathematics and Science, Collins College of Professional Studies, St. John's University, New York, USA.

Fishes are most diverse group of vertebrates with more than 33000 species. These are identified based on several visual characters including their shape, color and head. It is difficult for the common people to directly identify the fish species found in the market. Classifying fish species from images based on visual characteristics using computer vision and machine learning techniques is an interesting problem for the researchers. However, the classifier's performance depends upon quality of image dataset on which it has been trained. An imagery dataset is needed to examine the classification and recognition algorithms. This article exhibits Fish-Pak: an image dataset of 6 different fish species, captured by a single camera from different pools located nearby the Head Qadirabad, Chenab River in Punjab, Pakistan. The dataset Fish-Pak are quite useful to compare various factors of classifiers such as learning rate, momentum and their impact on the overall performance. Convolutional Neural Network (CNN) is one of the most widely used architectures for image classification based on visual features. Six data classes i.e. (Grass carp), (Common carp), Cirrhinus mrigala (Mori), (Rohu), (Silver carp), and Catla (Thala), with a different number of images, have been included in the dataset. Fish species are captured by one camera to ensure the fair environment to all data. Fish-Pak is hosted by the Zoology Lab under the mutual affiliation of the Department of Computer Science and the Department of Zoology, University of Gujrat, Gujrat, Pakistan.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.dib.2019.104565DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6806455PMC
December 2019

A citrus fruits and leaves dataset for detection and classification of citrus diseases through machine learning.

Data Brief 2019 Oct 22;26:104340. Epub 2019 Aug 22.

Division of Computer Science, Mathematics and Science, College of Professional Studies, St. John's University, New York, USA.

Plants are as vulnerable by diseases as animals. Citrus is a major plant grown mainly in the tropical areas of the world due to its richness in vitamin C and other important nutrients. The production of the citrus fruit has been widely affected by citrus diseases which ultimately degrades the fruit quality and causes financial loss to the growers. During the past decade, image processing and computer vision methods have been broadly adopted for the detection and classification of plant diseases. Early detection of diseases in citrus plants helps in preventing them to spread in the orchards which minimize the financial loss to the farmers. In this article, an image dataset citrus fruits, leaves, and stem is presented. The dataset holds citrus fruits and leaves images of healthy and infected plants with diseases such as Black spot, Canker, Scab, Greening, and Melanose. Most of the images were captured in December from the Orchards in Sargodha region of Pakistan when the fruit was about to ripen and maximum diseases were found on citrus plants. The dataset is hosted by the Department of Computer Science, University of Gujrat and acquired under the mutual cooperation of the University of Gujrat and the Citrus Research Center, Government of Punjab, Pakistan. The dataset would potentially be helpful to researchers who use machine learning and computer vision algorithms to develop computer applications to help farmers in early detection of plant diseases. The dataset is freely available at https://data.mendeley.com/datasets/3f83gxmv57/2.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.dib.2019.104340DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6731382PMC
October 2019

Reporting and connecting cell type names and gating definitions through ontologies.

BMC Bioinformatics 2019 Apr 25;20(Suppl 5):182. Epub 2019 Apr 25.

Division for Vaccine Discovery, La Jolla Institute for Allergy and Immunology, La Jolla, CA, USA.

Background: Human immunology studies often rely on the isolation and quantification of cell populations from an input sample based on flow cytometry and related techniques. Such techniques classify cells into populations based on the detection of a pattern of markers. The description of the cell populations targeted in such experiments typically have two complementary components: the description of the cell type targeted (e.g. 'T cells'), and the description of the marker pattern utilized (e.g. CD14-, CD3+).

Results: We here describe our attempts to use ontologies to cross-compare cell types and marker patterns (also referred to as gating definitions). We used a large set of such gating definitions and corresponding cell types submitted by different investigators into ImmPort, a central database for immunology studies, to examine the ability to parse gating definitions using terms from the Protein Ontology (PRO) and cell type descriptions, using the Cell Ontology (CL). We then used logical axioms from CL to detect discrepancies between the two.

Conclusions: We suggest adoption of our proposed format for describing gating and cell type definitions to make comparisons easier. We also suggest a number of new terms to describe gating definitions in flow cytometry that are not based on molecular markers captured in PRO, but on forward- and side-scatter of light during data acquisition, which is more appropriate to capture in the Ontology for Biomedical Investigations (OBI). Finally, our approach results in suggestions on what logical axioms and new cell types could be considered for addition to the Cell Ontology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-019-2725-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6509839PMC
April 2019

AIRR Community Standardized Representations for Annotated Immune Repertoires.

Front Immunol 2018 28;9:2206. Epub 2018 Sep 28.

Department of Genetics and Genomic Sciences and Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, United States.

Increased interest in the immune system's involvement in pathophysiological phenomena coupled with decreased DNA sequencing costs have led to an explosion of antibody and T cell receptor sequencing data collectively termed "adaptive immune receptor repertoire sequencing" (AIRR-seq or Rep-Seq). The AIRR Community has been actively working to standardize protocols, metadata, formats, APIs, and other guidelines to promote open and reproducible studies of the immune repertoire. In this paper, we describe the work of the AIRR Community's Data Representation Working Group to develop standardized data representations for storing and sharing annotated antibody and T cell receptor data. Our file format emphasizes ease-of-use, accessibility, scalability to large data sets, and a commitment to open and transparent science. It is composed of a tab-delimited format with a specific schema. Several popular repertoire analysis tools and data repositories already utilize this AIRR-seq data format. We hope that others will follow suit in the interest of promoting interoperable standards.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fimmu.2018.02206DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6173121PMC
October 2019

The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.

Front Immunol 2018 16;9:1877. Epub 2018 Aug 16.

Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT, United States.

The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists' ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fimmu.2018.01877DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6105692PMC
September 2019

CEDAR OnDemand: a browser extension to generate ontology-based scientific metadata.

BMC Bioinformatics 2018 07 16;19(1):268. Epub 2018 Jul 16.

Department of Pathology, Yale School of Medicine, New Haven, CT, USA.

Background: Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets. To increase reuse, discoverability, and reproducibility of the described experiments, datasets should be appropriately annotated by using agreed-upon terms, ideally from ontologies or other controlled term sources.

Results: This work presents "CEDAR OnDemand", a browser extension powered by the NCBO (National Center for Biomedical Ontology) BioPortal that enables users to seamlessly enter ontology-based metadata through existing web forms native to individual repositories. CEDAR OnDemand analyzes the web page contents to identify the text input fields and associate them with relevant ontologies which are recommended automatically based upon input fields' labels (using the NCBO ontology recommender) and a pre-defined list of ontologies. These field-specific ontologies are used for controlling metadata entry. CEDAR OnDemand works for any web form designed in the HTML format. We demonstrate how CEDAR OnDemand works through the NCBI (National Center for Biotechnology Information) BioSample web-based metadata entry.

Conclusion: CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories. CEDAR OnDemand is available freely on the Google Chrome store https://chrome.google.com/webstore/search/CEDAROnDemand.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-018-2247-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6048706PMC
July 2018

Reproducibility and Reuse of Adaptive Immune Receptor Repertoire Data.

Front Immunol 2017 1;8:1418. Epub 2017 Nov 1.

Department of Microbiology, Boston University School of Medicine, Boston, MA, United States.

High-throughput sequencing (HTS) of immunoglobulin (B-cell receptor, antibody) and T-cell receptor repertoires has increased dramatically since the technique was introduced in 2009 (1-3). This experimental approach explores the maturation of the adaptive immune system and its response to antigens, pathogens, and disease conditions in exquisite detail. It holds significant promise for diagnostic and therapy-guiding applications. New technology often spreads rapidly, sometimes more rapidly than the understanding of how to make the products of that technology reliable, reproducible, or usable by others. As complex technologies have developed, scientific communities have come together to adopt common standards, protocols, and policies for generating and sharing data sets, such as the MIAME protocols developed for microarray experiments. The Adaptive Immune Receptor Repertoire (AIRR) Community formed in 2015 to address similar issues for HTS data of immune repertoires. The purpose of this perspective is to provide an overview of the AIRR Community's founding principles and present the progress that the AIRR Community has made in developing standards of practice and data sharing protocols. Finally, and most important, we invite all interested parties to join this effort to facilitate sharing and use of these powerful data sets ([email protected]).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fimmu.2017.01418DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5671925PMC
November 2017
-->