736 results match your criteria database github


A graph auto-encoder model for miRNA-disease associations prediction.

Brief Bioinform 2021 Jul;22(4)

School of Information Engineering, Xuzhou University of Technology.

Emerging evidence indicates that the abnormal expression of miRNAs involves in the evolution and progression of various human complex diseases. Identifying disease-related miRNAs as new biomarkers can promote the development of disease pathology and clinical medicine. However, designing biological experiments to validate disease-related miRNAs is usually time-consuming and expensive. Read More

View Article and Full-Text PDF

CaliForest: Calibrated Random Forest for Health Data.

Proc ACM Conf Health Inference Learn (2020) 2020 Apr 2;2020:40-50. Epub 2020 Apr 2.

Emory University.

Real-world predictive models in healthcare should be evaluated in terms of discrimination, the ability to differentiate between high and low risk events, and calibration, or the accuracy of the risk estimates. Unfortunately, calibration is often neglected and only discrimination is analyzed. Calibration is crucial for personalized medicine as they play an increasing role in the decision making process. Read More

View Article and Full-Text PDF

uCARE Chem Suite and uCAREChemSuiteCLI: Tools for bacterial resistome prediction.

Genes Dis 2021 Sep 30;8(5):721-729. Epub 2020 Jun 30.

Department of Biological Sciences, SHUATS, Prayagraj, Uttar Pradesh, 211007, India.

In the era of antibiotic resistance, prediction of bacterial resistome profiles, likely to be associated with inactivation of new potential antibiotics is of utmost importance. Despite this, to the best of our knowledge, no tool exists for such prediction. Therefore, under the rationale that drugs with similar structures have similar resistome profiles, we developed two models, a deterministic model and a stochastic model, to predict the bacterial resistome likely to neutralize uncharacterized but potential chemical structures. Read More

View Article and Full-Text PDF
September 2021

Reduced Reference Perceptual Quality Model With Application to Rate Control for Video-Based Point Cloud Compression.

IEEE Trans Image Process 2021 26;30:6623-6636. Epub 2021 Jul 26.

In rate-distortion optimization, the encoder settings are determined by maximizing a reconstruction quality measure subject to a constraint on the bitrate. One of the main challenges of this approach is to define a quality measure that can be computed with low computational cost and which correlates well with the perceptual quality. While several quality measures that fulfil these two criteria have been developed for images and videos, no such one exists for point clouds. Read More

View Article and Full-Text PDF

quincunx: an R package to query, download and wrangle PGS Catalog data.

Bioinformatics 2021 Jul 16. Epub 2021 Jul 16.

Center for Research in Health Technologies and Information Systems (CINTESIS-UAlg).

Motivation: The Polygenic Score (PGS) Catalog is a recently established open database of published polygenic scores that, to date, has collected, curated, and made available 721 polygenic scores from over 133 publications. The PGS Catalog REST API is the only method allowing programmatic access to this resource.

Results: Here, we describe quincunx, an R package that provides the first client interface to the PGS Catalog REST API. Read More

View Article and Full-Text PDF

pKPDB: a Protein Data Bank extension database of pKa and pI theoretical values.

Bioinformatics 2021 Jul 14. Epub 2021 Jul 14.

BioISI-Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, Campo Grande, Lisboa, 1749-016, Portugal.

Summary: p Ka values of ionizable residues and isoelectric points of proteins provide valuable local and global insights about their structure and function. These properties can be estimated with reasonably good accuracy using Poisson-Boltzmann and Monte Carlo calculations at a considerable computational cost (from some minutes to several hours). pKPDB is a database of over 12 M theoretical p K a values calculated over 120k protein structures deposited in the Protein Data Bank. Read More

View Article and Full-Text PDF

Haplotype-based membership inference from summary genomic data.

Bioinformatics 2021 07;37(Suppl_1):i161-i168

Department of Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN 47408, USA.

Motivation: The availability of human genomic data, together with the enhanced capacity to process them, is leading to transformative technological advances in biomedical science and engineering. However, the public dissemination of such data has been difficult due to privacy concerns. Specifically, it has been shown that the presence of a human subject in a case group can be inferred from the shared summary statistics of the group, e. Read More

View Article and Full-Text PDF

Graph transformation for enzymatic mechanisms.

Bioinformatics 2021 07;37(Suppl_1):i392-i400

Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.

Motivation: The design of enzymes is as challenging as it is consequential for making chemical synthesis in medical and industrial applications more efficient, cost-effective and environmentally friendly. While several aspects of this complex problem are computationally assisted, the drafting of catalytic mechanisms, i.e. Read More

View Article and Full-Text PDF

Practical selection of representative sets of RNA-seq samples using a hierarchical approach.

Bioinformatics 2021 07;37(Suppl_1):i334-i341

Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Motivation: Despite numerous RNA-seq samples available at large databases, most RNA-seq analysis tools are evaluated on a limited number of RNA-seq samples. This drives a need for methods to select a representative subset from all available RNA-seq samples to facilitate comprehensive, unbiased evaluation of bioinformatics tools. In sequence-based approaches for representative set selection (e. Read More

View Article and Full-Text PDF

Deep learning for peptide identification from metaproteomics datasets.

J Proteomics 2021 Jul 8;247:104316. Epub 2021 Jul 8.

Department of Computer Science and Engineering, University of North Texas, TX, USA. Electronic address:

Metaproteomics is becoming widely used in microbiome research for gaining insights into the functional state of the microbial community. Current metaproteomics studies are generally based on high-throughput tandem mass spectrometry (MS/MS) coupled with liquid chromatography. In this paper, we proposed a deep-learning-based algorithm, named DeepFilter, for improving peptide identifications from a collection of tandem mass spectra. Read More

View Article and Full-Text PDF

BtToxin_Digger: a comprehensive and high-throughput pipeline for mining toxin protein genes from Bacillus thuringiensis.

Bioinformatics 2021 Jul 9. Epub 2021 Jul 9.

State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, 430070, China.

Summary: Bacillus thuringiensis (Bt) has been used as the most successful microbial pesticide for decades. Its toxin genes are used for the development of GM crops against pests. We previously developed a web-based insecticidal gene mining tool BtToxin_scanner. Read More

View Article and Full-Text PDF

stk: An extendable Python framework for automated molecular and supramolecular structure assembly and discovery.

J Chem Phys 2021 Jun;154(21):214102

Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom.

Computational software workflows are emerging as all-in-one solutions to speed up the discovery of new materials. Many computational approaches require the generation of realistic structural models for property prediction and candidate screening. However, molecular and supramolecular materials represent classes of materials with many potential applications for which there is no go-to database of existing structures or general protocol for generating structures. Read More

View Article and Full-Text PDF

ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing.

IEEE Trans Pattern Anal Mach Intell 2021 Jul 7;PP. Epub 2021 Jul 7.

Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models taken from NLP. These LMs reach for new prediction frontiers at low inference costs. Here, we trained two auto-regressive models (Transformer-XL, XLNet) and four auto-encoder models (BERT, Albert, Electra, T5) on data from UniRef and BFD containing up to 393 billion amino acids. Read More

View Article and Full-Text PDF

Consensus clustering applied to multi-omics disease subtyping.

BMC Bioinformatics 2021 Jul 6;22(1):361. Epub 2021 Jul 6.

CNRS, Bordeaux INP, LaBRI, UMR 5800, Univ. Bordeaux, 33400, Talence, France.

Background: Facing the diversity of omics data and the difficulty of selecting one result over all those produced by several methods, consensus strategies have the potential to reconcile multiple inputs and to produce robust results.

Results: Here, we introduce ClustOmics, a generic consensus clustering tool that we use in the context of cancer subtyping. ClustOmics relies on a non-relational graph database, which allows for the simultaneous integration of both multiple omics data and results from various clustering methods. Read More

View Article and Full-Text PDF

WADDAICA: A webserver for aiding protein drug design by artificial intelligence and classical algorithm.

Comput Struct Biotechnol J 2021 14;19:3573-3579. Epub 2021 Jun 14.

Key Lab of Preclinical Study for New Drugs of Gansu Province, Institute of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, P. R. China.

Artificial intelligence can train the related known drug data into deep learning models for drug design, while classical algorithms can design drugs through established and predefined procedures. Both deep learning and classical algorithms have their merits for drug design. Here, the webserver WADDAICA is built to employ the advantage of deep learning model and classical algorithms for drug design. Read More

View Article and Full-Text PDF

Estimating pulse wave velocity from the radial pressure wave using machine learning algorithms.

PLoS One 2021 28;16(6):e0245026. Epub 2021 Jun 28.

Department of Biomedical Engineering, King's College London, London, United Kingdom.

One of the European gold standard measurement of vascular ageing, a risk factor for cardiovascular disease, is the carotid-femoral pulse wave velocity (cfPWV), which requires an experienced operator to measure pulse waves at two sites. In this work, two machine learning pipelines were proposed to estimate cfPWV from the peripheral pulse wave measured at a single site, the radial pressure wave measured by applanation tonometry. The study populations were the Twins UK cohort containing 3,082 subjects aged from 18 to 110 years, and a database containing 4,374 virtual subjects aged from 25 to 75 years. Read More

View Article and Full-Text PDF

MultiDTI: Drug-target interaction prediction based on multi-modal representation learning to bridge the gap between new chemical entities and known heterogeneous network.

Bioinformatics 2021 Jun 28. Epub 2021 Jun 28.

Department of Computer Science, Hunan University, Changsha, 410082, China.

Motivation: Predicting new drug-target interactions is an important step in new drug development, understanding of its side effects, and drug repositioning. Heterogeneous data sources can provide comprehensive information and different perspectives for drug-target interaction prediction. Thus, there have been many calculation methods relying on heterogeneous networks. Read More

View Article and Full-Text PDF

Aggregating large-scale databases for PubMed author name disambiguation.

J Am Med Inform Assoc 2021 Jun 28. Epub 2021 Jun 28.

School of Information Management, Wuhan University, Wuhan, China.

Objective: PubMed has suffered from the author ambiguity problem for many years. Existing studies on author name disambiguation (AND) for PubMed only used internal metadata for development. However, some of them are incomplete (eg, a large number of names are only abbreviated and their full names are not available) or less discriminative. Read More

View Article and Full-Text PDF

The DNA methylation haplotype (mHap) format and mHapTools.

Bioinformatics 2021 Jun 19. Epub 2021 Jun 19.

State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science Chinese Academy of Sciences, Shanghai 200031, China.

Summary: Bisulfite sequencing (BS-seq) is currently the gold standard for measuring genome-wide DNA methylation profiles at single-nucleotide resolution. Most analyses focus on mean CpG methylation and ignore methylation states on the same DNA fragments [DNA methylation haplotypes (mHaps)]. Here, we propose mHap, a simple DNA mHap format for storing DNA BS-seq data. Read More

View Article and Full-Text PDF

A New Library-Search Algorithm for Mixture Analysis Using DART-MS.

J Am Soc Mass Spectrom 2021 Jul 17;32(7):1725-1734. Epub 2021 Jun 17.

National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States.

Forensic analysis of seized drug evidence often involves determining whether the components of an unknown mixture are illicit compounds. One approach to this task is to screen the evidence using direct analysis in real time mass spectrometry (DART-MS) to make presumptive identifications. This manuscript introduces a new library-search algorithm that enhances presumptive identifications of mixture components using a series of in-source collision-induced dissociation mass spectra collected through DART-MS. Read More

View Article and Full-Text PDF

ivTerm-An R package for interactive visualization of functional analysis results of meta-omics data.

J Cell Biochem 2021 Jun 16. Epub 2021 Jun 16.

Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.

Interpreting functional analysis results derived from environmental samples using direct sequencing meta-omics data, including metagenomics and meta-transcriptomics data, is challenging due to their complexity. Visualization of functional analysis results can help researchers discover relevant biological insights. Despite the availability of many R packages, there lacks interactive and comprehensive graphic systems for displaying functional terms and corresponding genes in meta-omics analysis results. Read More

View Article and Full-Text PDF

LigTMap: ligand and structure-based target identification and activity prediction for small molecular compounds.

J Cheminform 2021 Jun 10;13(1):44. Epub 2021 Jun 10.

Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida da Universidade, Taipa, Macau, China.

Target prediction is a crucial step in modern drug discovery. However, existing experimental approaches to target prediction are time-consuming and costly. Here, we introduce LigTMap, an online server with a fully automated workflow that can identify protein targets of chemical compounds among 17 classes of therapeutic proteins extracted from the PDBbind database. Read More

View Article and Full-Text PDF

R2DT is a framework for predicting and visualising RNA secondary structure using templates.

Nat Commun 2021 06 9;12(1):3494. Epub 2021 Jun 9.

European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.

Non-coding RNAs (ncRNA) are essential for all life, and their functions often depend on their secondary (2D) and tertiary structure. Despite the abundance of software for the visualisation of ncRNAs, few automatically generate consistent and recognisable 2D layouts, which makes it challenging for users to construct, compare and analyse structures. Here, we present R2DT, a method for predicting and visualising a wide range of RNA structures in standardised layouts. Read More

View Article and Full-Text PDF

CytoPy: An autonomous cytometry analysis framework.

PLoS Comput Biol 2021 Jun 8;17(6):e1009071. Epub 2021 Jun 8.

Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, United Kingdom.

Cytometry analysis has seen a considerable expansion in recent years in the maximum number of parameters that can be acquired in a single experiment. In response to this technological advance there has been an increased effort to develop new computational methodologies for handling high-dimensional single cell data acquired by flow or mass cytometry. Despite the success of numerous algorithms and published packages to replicate and outperform traditional manual analysis, widespread adoption of these techniques has yet to be realised in the field of immunology. Read More

View Article and Full-Text PDF

SynSig2Vec: Forgery-free Learning of Dynamic Signature Representations by Sigma Lognormal-based Synthesis.

IEEE Trans Pattern Anal Mach Intell 2021 Jun 8;PP. Epub 2021 Jun 8.

Handwritten signature verification is a challenging task because signatures of a writer may be skillfully imitated by a forger. As skilled forgeries are generally difficult to acquire for training, in this paper, we propose a deep learning-based dynamic signature verification framework, SynSig2Vec, to address the skilled forgery attack without training with any skilled forgeries. Specifically, SynSig2Vec consists of a novel learning-by-synthesis method for training and a novel 1D convolutional neural network model, called Sig2Vec, for signature representation extraction. Read More

View Article and Full-Text PDF

A vital sign-based prediction algorithm for differentiating COVID-19 versus seasonal influenza in hospitalized patients.

NPJ Digit Med 2021 Jun 4;4(1):95. Epub 2021 Jun 4.

Division of Cardiology, West Virginia University Medicine Heart & Vascular Institute, Morgantown, WV, USA.

Patients with influenza and SARS-CoV2/Coronavirus disease 2019 (COVID-19) infections have a different clinical course and outcomes. We developed and validated a supervised machine learning pipeline to distinguish the two viral infections using the available vital signs and demographic dataset from the first hospital/emergency room encounters of 3883 patients who had confirmed diagnoses of influenza A/B, COVID-19 or negative laboratory test results. The models were able to achieve an area under the receiver operating characteristic curve (ROC AUC) of at least 97% using our multiclass classifier. Read More

View Article and Full-Text PDF

MiMiC: a bioinformatic approach for generation of synthetic communities from metagenomes.

Microb Biotechnol 2021 Jul 3;14(4):1757-1770. Epub 2021 Jun 3.

Functional Microbiome Research Group, Institute of Medical Microbiology, University Hospital of RWTH, Aachen, Germany.

Environmental and host-associated microbial communities are complex ecosystems, of which many members are still unknown. Hence, it is challenging to study community dynamics and important to create model systems of reduced complexity that mimic major community functions. Therefore, we developed MiMiC, a computational approach for data-driven design of simplified communities from shotgun metagenomes. Read More

View Article and Full-Text PDF

A deep database of medical abbreviations and acronyms for natural language processing.

Sci Data 2021 06 2;8(1):149. Epub 2021 Jun 2.

Department of Biomedical Informatics, Columbia University, New York, NY, USA.

The recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Read More

View Article and Full-Text PDF

Mantis: flexible and consensus-driven genome annotation.

Gigascience 2021 Jun;10(6)

Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg.

Background: The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation can be described as the identification of regions of interest (i. Read More

View Article and Full-Text PDF

CovidExpress: an interactive portal for intuitive investigation on SARS-CoV-2 related transcriptomes.

bioRxiv 2021 May 26. Epub 2021 May 26.

Infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in humans could cause coronavirus disease 2019 (COVID-19). Since its first discovery in Dec 2019, SARS-CoV-2 has become a global pandemic and caused 3.3 million direct/indirect deaths (2021 May). Read More

View Article and Full-Text PDF