26,013 results match your criteria Bioinformatics [Journal]


TE-greedy-nester: structure-based detection of LTR retrotransposons and their nesting.

Bioinformatics 2020 Jul 14. Epub 2020 Jul 14.

Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, Brno, Czech Republic.

Motivation: Transposable elements (TEs) in eukaryotes often get inserted into one another, forming sequences that become a complex mixture of full-length elements and their fragments. The reconstruction of full-length elements and the order in which they have been inserted is important for genome and transposon evolution studies. However, the accumulation of mutations and genome rearrangements over evolutionary time makes this process error-prone and decreases the efficiency of software aiming to recover all nested full-length TEs. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa632DOI Listing

qSNE: Quadratic rate t-SNE optimizer with automatic parameter tuning for large data sets.

Bioinformatics 2020 Jul 14. Epub 2020 Jul 14.

Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, Haartmaninkatu 8 (PO box 63), 00014 University of Helsinki, Helsinki, Finland.

Motivation: Nonparametric dimensionality reduction techniques, such as t-distributed stochastic neighbor embedding (t-SNE), are the most frequently used methods in the exploratory analysis of single-cell data sets. Current implementations scale poorly to massive data sets and often require downsampling or interpolative approximations, which can leave less frequent populations undiscovered and much information unexploited.

Results: We implemented a fast t-SNE package, qSNE, which uses a quasi-Newton optimizer, allowing quadratic convergence rate, and automatic perplexity (level of detail) optimizer. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa637DOI Listing

RainbowSTORM: An open-source ImageJ plug-in for spectroscopic single-molecule localization microscopy (sSMLM) data analysis and image reconstruction.

Bioinformatics 2020 Jul 14. Epub 2020 Jul 14.

Department of Biomedical Engineering, Northwestern University, Evanston, Illinois, USA.

Summary: Spectroscopic single-molecule localization microscopy (sSMLM) simultaneously captures the spatial locations and full spectra of stochastically emitting fluorescent single-molecules. It provides an optical platform to develop new multi-molecular and functional imaging capabilities. While several open-source software suites provide sub-diffraction localization of fluorescent molecules, software suites for spectroscopic analysis of sSMLM data remain unavailable. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa635DOI Listing

Investigations of sequencing data and sample type on HLA class Ia typing with different computational tools.

Brief Bioinform 2020 Jul 14. Epub 2020 Jul 14.

Yucebio Cancer Translational Research Institute and Chief Medical Officer for Yucebio Technology Co.

Human leukocyte antigen (HLA) can encode the human major histocompatibility complex (MHC) proteins and play a key role in adaptive and innate immunity. Emerging clinical evidences suggest that the presentation of tumor neoantigens and neoantigen-specific T cell response associated with MHC class I molecules are of key importance to activate the adaptive immune systemin cancer immunotherapy. Therefore, accurate HLA typing is very essential for the clinical application of immunotherapy. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbaa143DOI Listing

Metabolic networks of the Nicotiana genus in the spotlight: content, progress and outlook.

Brief Bioinform 2020 Jul 14. Epub 2020 Jul 14.

Boyce Thompson Institute.

Manually curated metabolic databases residing at the Sol Genomics Network comprise two taxon-specific databases for the Solanaceae family, i.e. SolanaCyc and the genus Nicotiana, i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbaa136DOI Listing

Integrative pharmacological mechanism of vitamin C combined with glycyrrhizic acid against COVID-19: findings of bioinformatics analyses.

Brief Bioinform 2020 Jul 14. Epub 2020 Jul 14.

Guilin Medical University.

Objective: Coronavirus disease 2019 (COVID-19) is a fatal and fast-spreading viral infection. To date, the number of COVID-19 patients worldwide has crossed over six million with over three hundred and seventy thousand deaths (according to the data from World Health Organization; updated on 2 June 2020). Although COVID-19 can be rapidly diagnosed, efficient clinical treatment of COVID-19 remains unavailable, resulting in high fatality. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbaa141DOI Listing

MicroBVS: Dirichlet-tree multinomial regression models with Bayesian variable selection - an R package.

BMC Bioinformatics 2020 Jul 13;21(1):301. Epub 2020 Jul 13.

Department of Statistics, Rice University, Houston, TX, USA.

Background: Understanding the relation between the human microbiome and modulating factors, such as diet, may help researchers design intervention strategies that promote and maintain healthy microbial communities. Numerous analytical tools are available to help identify these relations, oftentimes via automated variable selection methods. However, available tools frequently ignore evolutionary relations among microbial taxa, potential relations between modulating factors, as well as model selection uncertainty. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-03640-0DOI Listing

Fully interpretable deep learning model of transcriptional control.

Bioinformatics 2020 Jul;36(Supplement_1):i499-i507

Departments of Statistics, Ecology and Evolution, Molecular Genetics & Cell Biology, Institute of Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA.

Motivation: The universal expressibility assumption of Deep Neural Networks (DNNs) is the key motivation behind recent worksin the systems biology community to employDNNs to solve important problems in functional genomics and moleculargenetics. Typically, such investigations have taken a 'black box' approach in which the internal structure of themodel used is set purely by machine learning considerations with little consideration of representing the internalstructure of the biological system by the mathematical structure of the DNN. DNNs have not yet been applied to thedetailed modeling of transcriptional control in which mRNA production is controlled by the binding of specific transcriptionfactors to DNA, in part because such models are in part formulated in terms of specific chemical equationsthat appear different in form from those used in neural networks. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa506DOI Listing

CRISPRL and: Interpretable large-scale inference of DNA repair landscape based on a spectral approach.

Bioinformatics 2020 Jul;36(Supplement_1):i560-i568

Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley 94720, CA, USA.

Summary: We propose a new spectral framework for reliable training, scalable inference and interpretable explanation of the DNA repair outcome following a Cas9 cutting. Our framework, dubbed CRISPRL and, relies on an unexploited observation about the nature of the repair process: the landscape of the DNA repair is highly sparse in the (Walsh-Hadamard) spectral domain. This observation enables our framework to address key shortcomings that limit the interpretability and scaling of current deep-learning-based DNA repair models. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa505DOI Listing

MutCombinator: identification of mutated peptides allowing combinatorial mutations using nucleotide-based graph search.

Bioinformatics 2020 Jul;36(Supplement_1):i203-i209

Department of Computer Science, Hanyang University, Seongdong-gu, Seoul 04763, Republic of Korea.

Motivation: Proteogenomics has proven its utility by integrating genomics and proteomics. Typical approaches use data from next-generation sequencing to infer proteins expressed. A sample-specific protein sequence database is often adopted to identify novel peptides from matched mass spectrometry-based proteomics; nevertheless, there is no software that can practically identify all possible forms of mutated peptides suggested by various genomic information sources. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa504DOI Listing

Discovery of multi-operon colinear syntenic blocks in microbial genomes.

Bioinformatics 2020 Jul;36(Supplement_1):i21-i29

Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, Israel.

Motivation: An important task in comparative genomics is to detect functional units by analyzing gene-context patterns. Colinear syntenic blocks (CSBs) are groups of genes that are consistently encoded in the same neighborhood and in the same order across a wide range of taxa. Such CSBs are likely essential for the regulation of gene expression in prokaryotes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa503DOI Listing

Phylogenetic double placement of mixed samples.

Bioinformatics 2020 Jul;36(Supplement_1):i335-i343

Electrical and Computer Engineering Department, University of California San Diego, San Diego, CA 92093, USA.

Motivation: Consider a simple computational problem. The inputs are (i) the set of mixed reads generated from a sample that combines two organisms and (ii) separate sets of reads for several reference genomes of known origins. The goal is to find the two organisms that constitute the mixed sample. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa489DOI Listing

Cancer mutational signatures representation by large-scale context embedding.

Bioinformatics 2020 Jul;36(Supplement_1):i309-i316

Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Motivation: The accumulation of somatic mutations plays critical roles in cancer development and progression. However, the global patterns of somatic mutations, especially non-coding mutations, and their roles in defining molecular subtypes of cancer have not been well characterized due to the computational challenges in analysing the complex mutational patterns.

Results: Here, we develop a new algorithm, called MutSpace, to effectively extract patient-specific mutational features using an embedding framework for larger sequence context. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa433DOI Listing

Geometric potentials from deep learning improve prediction of CDR H3 loop structures.

Bioinformatics 2020 Jul;36(Supplement_1):i268-i275

Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD 21218, USA.

Motivation: Antibody structure is largely conserved, except for a complementarity-determining region featuring six variable loops. Five of these loops adopt canonical folds which can typically be predicted with existing methods, while the remaining loop (CDR H3) remains a challenge due to its highly diverse set of observed conformations. In recent years, deep neural networks have proven to be effective at capturing the complex patterns of protein structure. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa457DOI Listing

Inference attacks against differentially private query results from genomic datasets including dependent tuples.

Bioinformatics 2020 Jul;36(Supplement_1):i136-i145

Computer Engineering Department, Bilkent University, Bilkent, Ankara 06800, Turkey.

Motivation: The rapid decrease in the sequencing technology costs leads to a revolution in medical research and clinical care. Today, researchers have access to large genomic datasets to study associations between variants and complex traits. However, availability of such genomic datasets also results in new privacy concerns about personal information of the participants in genomic studies. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa475DOI Listing

TopicNet: a framework for measuring transcriptional regulatory network change.

Bioinformatics 2020 Jul;36(Supplement_1):i474-i481

Department of Molecular Biophysics and Biochemistry.

Motivation: Recently, many chromatin immunoprecipitation sequencing experiments have been carried out for a diverse group of transcription factors (TFs) in many different types of human cells. These experiments manifest large-scale and dynamic changes in regulatory network connectivity (i.e. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa403DOI Listing

TinGa: fast and flexible trajectory inference with Growing Neural Gas.

Bioinformatics 2020 Jul;36(Supplement_1):i66-i74

Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent 9000, Belgium.

Motivation: During the last decade, trajectory inference (TI) methods have emerged as a novel framework to model cell developmental dynamics, most notably in the area of single-cell transcriptomics. At present, more than 70 TI methods have been published, and recent benchmarks showed that even state-of-the-art methods only perform well for certain trajectory types but not others.

Results: In this work, we present TinGa, a new TI model that is fast and flexible, and that is based on Growing Neural Graphs. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa463DOI Listing

Network-based characterization of disease-disease relationships in terms of drugs and therapeutic targets.

Bioinformatics 2020 Jul;36(Supplement_1):i516-i524

Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan.

Motivation: Disease states are distinguished from each other in terms of differing clinical phenotypes, but characteristic molecular features are often common to various diseases. Similarities between diseases can be explained by characteristic gene expression patterns. However, most disease-disease relationships remain uncharacterized. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa439DOI Listing

Graph neural representational learning of RNA secondary structures for predicting RNA-protein interactions.

Bioinformatics 2020 Jul;36(Supplement_1):i276-i284

School of Computer Science, McGill University, Montreal, QC H3A 2B2, Canada.

Motivation: RNA-protein interactions are key effectors of post-transcriptional regulation. Significant experimental and bioinformatics efforts have been expended on characterizing protein binding mechanisms on the molecular level, and on highlighting the sequence and structural traits of RNA that impact the binding specificity for different proteins. Yet our ability to predict these interactions in silico remains relatively poor. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa456DOI Listing

Combining phenome-driven drug-target interaction prediction with patients' electronic health records-based clinical corroboration toward drug discovery.

Bioinformatics 2020 Jul;36(Supplement_1):i436-i444

Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA.

Motivation: Predicting drug-target interactions (DTIs) using human phenotypic data have the potential in eliminating the translational gap between animal experiments and clinical outcomes in humans. One challenge in human phenome-driven DTI predictions is integrating and modeling diverse drug and disease phenotypic relationships. Leveraging large amounts of clinical observed phenotypes of drugs and diseases and electronic health records (EHRs) of 72 million patients, we developed a novel integrated computational drug discovery approach by seamlessly combining DTI prediction and clinical corroboration. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa451DOI Listing

Chromatin network markers of leukemia.

Bioinformatics 2020 Jul;36(Supplement_1):i455-i463

Department of Life Sciences, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain.

Motivation: The structure of chromatin impacts gene expression. Its alteration has been shown to coincide with the occurrence of cancer. A key challenge is in understanding the role of chromatin structure (CS) in cellular processes and its implications in diseases. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa445DOI Listing

Efficient exact inference for dynamical systems with noisy measurements using sequential approximate Bayesian computation.

Bioinformatics 2020 Jul;36(Supplement_1):i551-i559

Helmholtz Zentrum München, Institute of Computational Biology, Neuherberg 85764, Germany.

Motivation: Approximate Bayesian computation (ABC) is an increasingly popular method for likelihood-free parameter inference in systems biology and other fields of research, as it allows analyzing complex stochastic models. However, the introduced approximation error is often not clear. It has been shown that ABC actually gives exact inference under the implicit assumption of a measurement noise model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa397DOI Listing

Factorized embeddings learns rich and biologically meaningful embedding spaces using factorized tensor decomposition.

Bioinformatics 2020 Jul;36(Supplement_1):i417-i426

Department of Computer Science, Univerity of Montreal, Québec, Canada.

Motivation: The recent development of sequencing technologies revolutionized our understanding of the inner workings of the cell as well as the way disease is treated. A single RNA sequencing (RNA-Seq) experiment, however, measures tens of thousands of parameters simultaneously. While the results are information rich, data analysis provides a challenge. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa488DOI Listing

BIRD: identifying cell doublets via biallelic expression from single cells.

Bioinformatics 2020 Jul;36(Supplement_1):i251-i257

Department of Biological Chemistry, The Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Givat Ram 91904, Israel.

Summary: Current technologies for single-cell transcriptomics allow thousands of cells to be analyzed in a single experiment. The increased scale of these methods raises the risk of cell doublets contamination. Available tools and algorithms for identifying doublets and estimating their occurrence in single-cell experimental data focus on doublets of different species, cell types or individuals. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa474DOI Listing

Improved survival analysis by learning shared genomic information from pan-cancer data.

Bioinformatics 2020 Jul;36(Supplement_1):i389-i398

Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul 02841, Republic of Korea.

Motivation: Recent advances in deep learning have offered solutions to many biomedical tasks. However, there remains a challenge in applying deep learning to survival analysis using human cancer transcriptome data. As the number of genes, the input variables of survival model, is larger than the amount of available cancer patient samples, deep-learning models are prone to overfitting. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa462DOI Listing

Finding the direct optimal RNA barrier energy and improving pathways with an arbitrary energy model.

Bioinformatics 2020 Jul;36(Supplement_1):i227-i235

Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Chiba 277-8561 Japan.

Motivation: RNA folding kinetics plays an important role in the biological functions of RNA molecules. An important goal in the investigation of the kinetic behavior of RNAs is to find the folding pathway with the lowest energy barrier. For this purpose, most of the existing methods use heuristics because the number of possible pathways is huge even if only the shortest (direct) folding pathways are considered. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa469DOI Listing

Sampling and summarizing transmission trees with multi-strain infections.

Bioinformatics 2020 Jul;36(Supplement_1):i362-i370

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbama, IL 61801, USA.

Motivation: The combination of genomic and epidemiological data holds the potential to enable accurate pathogen transmission history inference. However, the inference of outbreak transmission histories remains challenging due to various factors such as within-host pathogen diversity and multi-strain infections. Current computational methods ignore within-host diversity and/or multi-strain infections, often failing to accurately infer the transmission history. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa438DOI Listing

The locality dilemma of Sankoff-like RNA alignments.

Bioinformatics 2020 Jul;36(Supplement_1):i242-i250

Bioinformatics Group, University of Freiburg, Freiburg 79110, Germany.

Motivation: Elucidating the functions of non-coding RNAs by homology has been strongly limited due to fundamental computational and modeling issues. While existing simultaneous alignment and folding (SA&F) algorithms successfully align homologous RNAs with precisely known boundaries (global SA&F), the more pressing problem of identifying new classes of homologous RNAs in the genome (local SA&F) is intrinsically more difficult and much less understood. Typically, the length of local alignments is strongly overestimated and alignment boundaries are dramatically mispredicted. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa431DOI Listing

QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks.

Bioinformatics 2020 Jul;36(Supplement_1):i285-i291

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA.

Motivation: Protein model quality estimation, in many ways, informs protein structure prediction. Despite their tight coupling, existing model quality estimation methods do not leverage inter-residue distance information or the latest technological breakthrough in deep learning that has recently revolutionized protein structure prediction.

Results: We present a new distance-based single-model quality estimation method called QDeep by harnessing the power of stacked deep residual neural networks (ResNets). Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa455DOI Listing

FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models.

Bioinformatics 2020 Jul;36(Supplement_1):i57-i65

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

Motivation: Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed.

Results: We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa444DOI Listing

Interactive visualization and analysis of morphological skeletons of brain vasculature networks with VessMorphoVis.

Bioinformatics 2020 Jul;36(Supplement_1):i534-i541

Blue Brain Project (BBP), École Polytechnique Fédérale de Lausanne (EPFL), Campus Biotech, 1202 Geneva, Switzerland.

Motivation: Accurate morphological models of brain vasculature are key to modeling and simulating cerebral blood flow in realistic vascular networks. This in silico approach is fundamental to revealing the principles of neurovascular coupling. Validating those vascular morphologies entails performing certain visual analysis tasks that cannot be accomplished with generic visualization frameworks. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa461DOI Listing

A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification.

Bioinformatics 2020 Jul;36(Supplement_1):i292-i299

Computer Science Department, University of Maryland, College Park 20742, MD, USA.

Motivation: Droplet-based single-cell RNA-seq (dscRNA-seq) data are being generated at an unprecedented pace, and the accurate estimation of gene-level abundances for each cell is a crucial first step in most dscRNA-seq analyses. When pre-processing the raw dscRNA-seq data to generate a count matrix, care must be taken to account for the potentially large number of multi-mapping locations per read. The sparsity of dscRNA-seq data, and the strong 3' sampling bias, makes it difficult to disambiguate cases where there is no uniquely mapping read to any of the candidate target genes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa450DOI Listing

Robust and accurate deconvolution of tumor populations uncovers evolutionary mechanisms of breast cancer metastasis.

Bioinformatics 2020 Jul;36(Supplement_1):i407-i416

Department of computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Motivation: Cancer develops and progresses through a clonal evolutionary process. Understanding progression to metastasis is of particular clinical importance, but is not easily analyzed by recent methods because it generally requires studying samples gathered years apart, for which modern single-cell sequencing is rarely an option. Revealing the clonal evolution mechanisms in the metastatic transition thus still depends on unmixing tumor subpopulations from bulk genomic data. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa396DOI Listing

REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets.

Bioinformatics 2020 Jul;36(Supplement_1):i177-i185

Institut Pasteur, CNRS, C3BI - USR 3756, 75015 Paris, France.

Motivation: In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets.

Results: We used REINDEER to index the abundances of sequences within 2585 human RNA-seq experiments in 45 h using only 56 GB of RAM. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa487DOI Listing

The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction.

Bioinformatics 2020 Jul;36(Supplement_1):i219-i226

Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA.

Motivation: The computational prediction of gene function is a key step in making full use of newly sequenced genomes. Function is generally predicted by transferring annotations from homologous genes or proteins for which experimental evidence exists. The 'ortholog conjecture' proposes that orthologous genes should be preferred when making such predictions, as they evolve functions more slowly than paralogous genes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa468DOI Listing

The string decomposition problem and its applications to centromere analysis and assembly.

Bioinformatics 2020 Jul;36(Supplement_1):i93-i101

Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA.

Motivation: Recent attempts to assemble extra-long tandem repeats (such as centromeres) faced the challenge of translating long error-prone reads from the nucleotide alphabet into the alphabet of repeat units. Human centromeres represent a particularly complex type of high-order repeats (HORs) formed by chromosome-specific monomers. Given a set of all human monomers, translating a read from a centromere into the monomer alphabet is modeled as the String Decomposition Problem. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa454DOI Listing

PEDL: extracting protein-protein associations using deep language models and distant supervision.

Bioinformatics 2020 Jul;36(Supplement_1):i490-i498

Computer Science Department, Humboldt-Universität zu Berlin, Berlin 10099, Germany.

Motivation: A significant portion of molecular biology investigates signalling pathways and thus depends on an up-to-date and complete resource of functional protein-protein associations (PPAs) that constitute such pathways. Despite extensive curation efforts, major pathway databases are still notoriously incomplete. Relation extraction can help to gather such pathway information from biomedical publications. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa430DOI Listing

Mutational signature learning with supervised negative binomial non-negative matrix factorization.

Bioinformatics 2020 Jul;36(Supplement_1):i154-i160

Department of Computer Science.

Motivation: Understanding the underlying mutational processes of cancer patients has been a long-standing goal in the community and promises to provide new insights that could improve cancer diagnoses and treatments. Mutational signatures are summaries of the mutational processes, and improving the derivation of mutational signatures can yield new discoveries previously obscured by technical and biological confounders. Results from existing mutational signature extraction methods depend on the size of available patient cohort and solely focus on the analysis of mutation count data without considering the exploitation of metadata. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa473DOI Listing

Toward heterogeneous information fusion: bipartite graph convolutional networks for in silico drug repurposing.

Bioinformatics 2020 Jul;36(Supplement_1):i525-i533

Computational Diagnostics Lab, Departments of Radiology and Pathology, Los Angeles, CA 90095, USA.

Motivation: Mining drug-disease association and related interactions are essential for developing in silico drug repurposing (DR) methods and understanding underlying biological mechanisms. Recently, large-scale biological databases are increasingly available for pharmaceutical research, allowing for deep characterization for molecular informatics and drug discovery. However, DR is challenging due to the molecular heterogeneity of disease and diverse drug-disease associations. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa437DOI Listing

MHCAttnNet: predicting MHC-peptide bindings for MHC alleles classes I and II using an attention-based deep neural model.

Bioinformatics 2020 Jul;36(Supplement_1):i399-i406

International Institute of Information Technology Bangalore, Bangalore 560100, India.

Motivation: Accurate prediction of binding between a major histocompatibility complex (MHC) allele and a peptide plays a major role in the synthesis of personalized cancer vaccines. The immune system struggles to distinguish between a cancerous and a healthy cell. In a patient suffering from cancer who has a particular MHC allele, only those peptides that bind with the MHC allele with high affinity, help the immune system recognize the cancerous cells. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa479DOI Listing

Identifying tumor clones in sparse single-cell mutation data.

Bioinformatics 2020 Jul;36(Supplement_1):i186-i193

Department of Computer Science, Princeton University, Princeton, NJ 08544, USA.

Motivation: Recent single-cell DNA sequencing technologies enable whole-genome sequencing of hundreds to thousands of individual cells. However, these technologies have ultra-low sequencing coverage (<0.5× per cell) which has limited their use to the analysis of large copy-number aberrations (CNAs) in individual cells. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa449DOI Listing

Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization.

Bioinformatics 2020 Jul;36(Supplement_1):i317-i325

Department of Computer Science, The University of Arizona, Tucson, AZ 85721, USA.

Motivation: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa336DOI Listing

Artificial-cell-type aware cell-type classification in CITE-seq.

Bioinformatics 2020 Jul;36(Supplement_1):i542-i550

Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.

Motivation: Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq), couples the measurement of surface marker proteins with simultaneous sequencing of mRNA at single cell level, which brings accurate cell surface phenotyping to single-cell transcriptomics. Unfortunately, multiplets in CITE-seq datasets create artificial cell types (ACT) and complicate the automation of cell surface phenotyping.

Results: We propose CITE-sort, an artificial-cell-type aware surface marker clustering method for CITE-seq. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa467DOI Listing

Unsupervised topological alignment for single-cell multi-omics integration.

Bioinformatics 2020 Jul;36(Supplement_1):i48-i56

School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100190, China.

Motivation: Single-cell multi-omics data provide a comprehensive molecular view of cells. However, single-cell multi-omics datasets consist of unpaired cells measured with distinct unmatched features across modalities, making data integration challenging.

Results: In this study, we present a novel algorithm, termed UnionCom, for the unsupervised topological alignment of single-cell multi-omics integration. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa443DOI Listing

Topological and kernel-based microbial phenotype prediction from MALDI-TOF mass spectra.

Bioinformatics 2020 Jul;36(Supplement_1):i30-i38

Machine Learning and Computational Biology Lab, D-BSSE, ETH Zurich, 4058 Basel, Switzerland.

Motivation: Microbial species identification based on matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) has become a standard tool in clinical microbiology. The resulting MALDI-TOF mass spectra also harbour the potential to deliver prediction results for other phenotypes, such as antibiotic resistance. However, the development of machine learning algorithms specifically tailored to MALDI-TOF MS-based phenotype prediction is still in its infancy. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa429DOI Listing

Privacy-preserving construction of generalized linear mixed model for biomedical computation.

Bioinformatics 2020 Jul;36(Supplement_1):i128-i135

Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN 47405, USA.

Motivation: The generalized linear mixed model (GLMM) is an extension of the generalized linear model (GLM) in which the linear predictor takes random effects into account. Given its power of precisely modeling the mixed effects from multiple sources of random variations, the method has been widely used in biomedical computation, for instance in the genome-wide association studies (GWASs) that aim to detect genetic variance significantly associated with phenotypes such as human diseases. Collaborative GWAS on large cohorts of patients across multiple institutions is often impeded by the privacy concerns of sharing personal genomic and other health data. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa478DOI Listing

LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities.

Bioinformatics 2020 Jul;36(Supplement_1):i258-i267

Baidu Research, Sunnyvale, CA 94089, USA.

Motivation: RNA secondary structure prediction is widely used to understand RNA function. Recently, there has been a shift away from the classical minimum free energy methods to partition function-based methods that account for folding ensembles and can therefore estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore prohibitively slow for long sequences. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa460DOI Listing

Deep multiple instance learning classifies subtissue locations in mass spectrometry images from tissue-level annotations.

Bioinformatics 2020 Jul;36(Supplement_1):i300-i308

Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA.

Motivation: Mass spectrometry imaging (MSI) characterizes the molecular composition of tissues at spatial resolution, and has a strong potential for distinguishing tissue types, or disease states. This can be achieved by supervised classification, which takes as input MSI spectra, and assigns class labels to subtissue locations. Unfortunately, developing such classifiers is hindered by the limited availability of training sets with subtissue labels as the ground truth. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa436DOI Listing

Terminus enables the discovery of data-driven, robust transcript groups from RNA-seq data.

Bioinformatics 2020 Jul;36(Supplement_1):i102-i110

Department of Computer Science, University of Maryland, College Park, MD 20742, USA.

Motivation: Advances in sequencing technology, inference algorithms and differential testing methodology have enabled transcript-level analysis of RNA-seq data. Yet, the inherent inferential uncertainty in transcript-level abundance estimation, even among the most accurate approaches, means that robust transcript-level analysis often remains a challenge. Conversely, gene-level analysis remains a common and robust approach for understanding RNA-seq data, but it coarsens the resulting analysis to the level of genes, even if the data strongly support specific transcript-level effects. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa448DOI Listing

Improved design and analysis of practical minimizers.

Bioinformatics 2020 Jul;36(Supplement_1):i119-i127

Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Motivation: Minimizers are methods to sample k-mers from a string, with the guarantee that similar set of k-mers will be chosen on similar strings. It is parameterized by the k-mer length k, a window length w and an order on the k-mers. Minimizers are used in a large number of softwares and pipelines to improve computation efficiency and decrease memory usage. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa472DOI Listing