9,743 results match your criteria BMC Bioinformatics [Journal]


Extracting chemical reactions from text using Snorkel.

BMC Bioinformatics 2020 May 27;21(1):217. Epub 2020 May 27.

Departments of Medicine, Genetics, Bioengineering, and Biomedical Data Science, Stanford University, Stanford, CA, USA.

Background: Enzymatic and chemical reactions are key for understanding biological processes in cells. Curated databases of chemical reactions exist but these databases struggle to keep up with the exponential growth of the biomedical literature. Conventional text mining pipelines provide tools to automatically extract entities and relationships from the scientific literature, and partially replace expert curation, but such machine learning frameworks often require a large amount of labeled training data and thus lack scalability for both larger document corpora and new relationship types. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-03542-1DOI Listing

ARTDeco: automatic readthrough transcription detection.

BMC Bioinformatics 2020 May 26;21(1):214. Epub 2020 May 26.

Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0640, USA.

Background: Mounting evidence suggests several diseases and biological processes target transcription termination to misregulate gene expression. Disruption of transcription termination leads to readthrough transcription past the 3' end of genes, which can result in novel transcripts, changes in epigenetic states and altered 3D genome structure.

Results: We developed Automatic Readthrough Transcription Detection (ARTDeco), a tool to detect and analyze multiple features of readthrough transcription from RNA-seq and other next-generation sequencing (NGS) assays that profile transcriptional activity. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-03551-0DOI Listing

SCSIM: Jointly simulating correlated single-cell and bulk next-generation DNA sequencing data.

BMC Bioinformatics 2020 May 26;21(1):215. Epub 2020 May 26.

Department of Mathematics & Statistics, University of Massachusetts Amherst, 710 N. Pleasant St., Amherst, 01003, USA.

Background: Recently, it has become possible to collect next-generation DNA sequencing data sets that are composed of multiple samples from multiple biological units where each of these samples may be from a single cell or bulk tissue. Yet, there does not yet exist a tool for simulating DNA sequencing data from such a nested sampling arrangement with single-cell and bulk samples so that developers of analysis methods can assess accuracy and precision.

Results: We have developed a tool that simulates DNA sequencing data from hierarchically grouped (correlated) samples where each sample is designated bulk or single-cell. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-03550-1DOI Listing

Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique.

BMC Bioinformatics 2020 May 26;21(1):216. Epub 2020 May 26.

National Biobank of Thailand, National Science and Technology Development Agency, Khong Luang, 12120, Thailand.

Background: The number of porcine Single Nucleotide Polymorphisms (SNPs) used in genetic association studies is very large, suitable for statistical testing. However, in breed classification problem, one needs to have a much smaller porcine-classifying SNPs (PCSNPs) set that could accurately classify pigs into different breeds. This study attempted to find such PCSNPs by using several combinations of feature selection and classification methods. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3471-4DOI Listing

Data-based RNA-seq simulations by binomial thinning.

Authors:
David Gerard

BMC Bioinformatics 2020 May 24;21(1):206. Epub 2020 May 24.

Department of Mathematics and Statistics, American University, Massachusetts Ave NW, Washington, DC, 20016, USA.

Background: With the explosion in the number of methods designed to analyze bulk and single-cell RNA-seq data, there is a growing need for approaches that assess and compare these methods. The usual technique is to compare methods on data simulated according to some theoretical model. However, as real data often exhibit violations from theoretical models, this can result in unsubstantiated claims of a method's performance. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3450-9DOI Listing

A novel investigation of the effect of iterations in sliding semi-landmarks for 3D human facial images.

BMC Bioinformatics 2020 May 24;21(1):208. Epub 2020 May 24.

Department of Biomedical Science, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, Selangor, Malaysia.

Background: Landmark-based approaches of two- or three-dimensional coordinates are the most widely used in geometric morphometrics (GM). As human face hosts the organs that act as the central interface for identification, more landmarks are needed to characterize biological shape variation. Because the use of few anatomical landmarks may not be sufficient for variability of some biological patterns and form, sliding semi-landmarks are required to quantify complex shape. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3497-7DOI Listing

Substitution matrix based color schemes for sequence alignment visualization.

BMC Bioinformatics 2020 May 24;21(1):209. Epub 2020 May 24.

Department of Computational Biology and Simulation, TU Darmstadt, Schnittspahnstraße 2, Darmstadt, 64287, Germany.

Background: Visualization of multiple sequence alignments often includes colored symbols, usually characters encoding amino acids, according to some (physical) properties, such as hydrophobicity or charge. Typically, color schemes are created manually, so that equal or similar colors are assigned to amino acids that share similar properties. However, this assessment is subjective and may not represent the similarity of symbols very well. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3526-6DOI Listing

RintC: fast and accuracy-aware decomposition of distributions of RNA secondary structures with extended logsumexp.

BMC Bioinformatics 2020 May 24;21(1):210. Epub 2020 May 24.

Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan.

Background: Analysis of secondary structures is essential for understanding the functions of RNAs. Because RNA molecules thermally fluctuate, it is necessary to analyze the probability distributions of their secondary structures. Existing methods, however, are not applicable to long RNAs owing to their high computational complexity. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3535-5DOI Listing

PACVr: plastome assembly coverage visualization in R.

BMC Bioinformatics 2020 May 24;21(1):207. Epub 2020 May 24.

Institut für Bioinformatik, Freie Universität Berlin, Berlin, 14195, Germany.

Background: Plastid genomes typically display a circular, quadripartite structure with two inverted repeat regions, which challenges automatic assembly procedures. The correct assembly of plastid genomes is a prerequisite for the validity of subsequent analyses on genome structure and evolution. The average coverage depth of a genome assembly is often used as an indicator of assembly quality. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3475-0DOI Listing

Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and LDA.

BMC Bioinformatics 2020 May 24;21(1):212. Epub 2020 May 24.

School of Information Science and Engineering, University of Jinan, Jinan, 250022, China.

Background: Apoptosis, also called programmed cell death, refers to the spontaneous and orderly death of cells controlled by genes in order to maintain a stable internal environment. Identifying the subcellular location of apoptosis proteins is very helpful in understanding the mechanism of apoptosis and designing drugs. Therefore, the subcellular localization of apoptosis proteins has attracted increased attention in computational biology. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3539-1DOI Listing

VADR: validation and annotation of virus sequence submissions to GenBank.

BMC Bioinformatics 2020 May 24;21(1):211. Epub 2020 May 24.

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, 20894, MD, USA.

Background: GenBank contains over 3 million viral sequences. The National Center for Biotechnology Information (NCBI) previously made available a tool for validating and annotating influenza virus sequences that is used to check submissions to GenBank. Before this project, there was no analogous tool in use for non-influenza viral sequence submissions. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3537-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7245624PMC

Bio-semantic relation extraction with attention-based external knowledge reinforcement.

BMC Bioinformatics 2020 May 24;21(1):213. Epub 2020 May 24.

School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.

Background: Semantic resources such as knowledge bases contains high-quality-structured knowledge and therefore require significant effort from domain experts. Using the resources to reinforce the information retrieval from the unstructured text may further exploit the potentials of such unstructured text resources and their curated knowledge.

Results: The paper proposes a novel method that uses a deep neural network model adopting the prior knowledge to improve performance in the automated extraction of biological semantic relations from the scientific literature. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3540-8DOI Listing

Uncovering the prognostic gene signatures for the improvement of risk stratification in cancers by using deep learning algorithm coupled with wavelet transform.

BMC Bioinformatics 2020 May 19;21(1):195. Epub 2020 May 19.

College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China.

Background: The aim of gene expression-based clinical modelling in tumorigenesis is not only to accurately predict the clinical endpoints, but also to reveal the genome characteristics for downstream analysis for the purpose of understanding the mechanisms of cancers. Most of the conventional machine learning methods involved a gene filtering step, in which tens of thousands of genes were firstly filtered based on the gene expression levels by a statistical method with an arbitrary cutoff. Although gene filtering procedure helps to reduce the feature dimension and avoid overfitting, there is a risk that some pathogenic genes important to the disease will be ignored. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-03544-zDOI Listing

Power analysis for RNA-Seq differential expression studies using generalized linear mixed effects models.

BMC Bioinformatics 2020 May 19;21(1):198. Epub 2020 May 19.

Center for Biostatistics, Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Dr., Columbus, 43210, OH, USA.

Background: Power analysis becomes an inevitable step in experimental design of current biomedical research. Complex designs allowing diverse correlation structures are commonly used in RNA-Seq experiments. However, the field currently lacks statistical methods to calculate sample size and estimate power for RNA-Seq differential expression studies using such designs. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3541-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7236949PMC

Enhancing SVM for survival data using local invariances and weighting.

BMC Bioinformatics 2020 May 19;21(1):193. Epub 2020 May 19.

Department of Global Health, Boston University, 801 Massachusetts Avenue, Boston, MA, 02118, USA.

Background: The necessity to analyze medium-throughput data in epidemiological studies with small sample size, particularly when studying biomedical data may hinder the use of classical statistical methods. Support vector machines (SVM) models can be successfully applied in this setting because they are a powerful tool to analyze data with large number of predictors and limited sample size, especially when handling binary outcomes. However, biomedical research often involves analysis of time-to-event outcomes and has to account for censoring. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3481-2DOI Listing

fcScan: a versatile tool to cluster combinations of sites using genomic coordinates.

BMC Bioinformatics 2020 May 19;21(1):194. Epub 2020 May 19.

Department of Biochemistry and Molecular Genetics, Faculty of Medicine, American University of Beirut, Beirut, Lebanon.

Background: Finding combinations of homotypic or heterotypic genomic sites obeying a specific grammar in DNA sequences is a frequent task in bioinformatics. A typical case corresponds to the identification of cis-regulatory modules characterized by a combination of transcription factor binding sites in a defined window size. Although previous studies identified clusters of genomic sites in species with varying genome sizes, the availability of a dedicated and versatile tool to search for such clusters is lacking. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3536-4DOI Listing

Repetitive DNA profile of the amphibian mitogenome.

BMC Bioinformatics 2020 May 19;21(1):197. Epub 2020 May 19.

Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Cd. Universitaria, 04510, Cd. Mx., Mexico.

Background: Repetitive DNA elements such as direct and inverted repeat sequences are present in every genome, playing numerous biological roles. In amphibians, the functions and effects of the repeat sequences have not been extensively explored. We consider that the data of mitochondrial genomes in the NCBI database are a valuable alternative to generate a better understanding of the molecular dynamic of the repeat sequences in the amphibians. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3532-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7236288PMC

Direct comparison shows that mRNA-based diagnostics incorporate information which cannot be learned directly from genomic mutations.

BMC Bioinformatics 2020 May 19;21(1):196. Epub 2020 May 19.

Shraga Segal Department of Microbiology, Immunology and Genetics, Ben-Gurion University of the Negev, Beersheba, Israel.

Background: Compared to the many uses of DNA-level testing in clinical oncology, development of RNA-based diagnostics has been more limited. An exception to this trend is the growing use of mRNA-based methods in early-stage breast cancer. Although DNA and mRNA are used together in breast cancer research, the distinct contribution of mRNA beyond that of DNA in clinical challenges has not yet been directly assessed. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3512-zDOI Listing

CIPR: a web-based R/shiny app and R package to annotate cell clusters in single cell RNA sequencing experiments.

BMC Bioinformatics 2020 May 15;21(1):191. Epub 2020 May 15.

Division of Microbiology and Immunology, Department of Pathology, University of Utah, 15 N. Medical Dr. East, JMRB, Salt Lake City, UT, 84112, USA.

Background: Single cell RNA sequencing (scRNAseq) has provided invaluable insights into cellular heterogeneity and functional states in health and disease. During the analysis of scRNAseq data, annotating the biological identity of cell clusters is an important step before downstream analyses and it remains technically challenging. The current solutions for annotating single cell clusters generally lack a graphical user interface, can be computationally intensive or have a limited scope. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3538-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7227235PMC

CSN: unsupervised approach for inferring biological networks based on the genome alone.

BMC Bioinformatics 2020 May 15;21(1):190. Epub 2020 May 15.

Biomedical Engineering Department, Tel Aviv University, Tel-Aviv, Israel.

Background: Most organisms cannot be cultivated, as they live in unique ecological conditions that cannot be mimicked in the lab. Understanding the functionality of those organisms' genes and their interactions by performing large-scale measurements of transcription levels, protein-protein interactions or metabolism, is extremely difficult and, in some cases, impossible. Thus, efficient algorithms for deciphering genome functionality based only on the genomic sequences with no other experimental measurements are needed. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3479-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7227238PMC

Broad-coverage biomedical relation extraction with SemRep.

BMC Bioinformatics 2020 May 14;21(1):188. Epub 2020 May 14.

Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, 20894, MD, USA.

Background: In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3517-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7222583PMC

ICEKAT: an interactive online tool for calculating initial rates from continuous enzyme kinetic traces.

BMC Bioinformatics 2020 May 14;21(1):186. Epub 2020 May 14.

Department of Biochemistry, Medical College of Wisconsin, 8701 Watertown Plank Rd., Milwaukee, 53226, USA.

Background: Continuous enzyme kinetic assays are often used in high-throughput applications, as they allow rapid acquisition of large amounts of kinetic data and increased confidence compared to discontinuous assays. However, data analysis is often rate-limiting in high-throughput enzyme assays, as manual inspection and selection of a linear range from individual kinetic traces is cumbersome and prone to user error and bias. Currently available software programs are specialized and designed for the analysis of complex enzymatic models. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3513-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7222511PMC

Automated image analysis system for studying cardiotoxicity in human pluripotent stem cell-Derived cardiomyocytes.

BMC Bioinformatics 2020 May 14;21(1):187. Epub 2020 May 14.

Dept of Applied Stem Cell Technologies, MIRA Institute, University of Twente, Drienerlolaan 5, Enschede, 7522 NB, The Netherlands.

Background: Cardiotoxicity, characterized by severe cardiac dysfunction, is a major problem in patients treated with different classes of anticancer drugs. Development of predictable human-based models and assays for drug screening are crucial for preventing potential drug-induced adverse effects. Current animal in vivo models and cell lines are not always adequate to represent human biology. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3466-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7222481PMC

TAMA: improved metagenomic sequence classification through meta-analysis.

BMC Bioinformatics 2020 May 12;21(1):185. Epub 2020 May 12.

Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea.

Background: Microorganisms are important occupants of many different environments. Identifying the composition of microbes and estimating their abundance promote understanding of interactions of microbes in environmental samples. To understand their environments more deeply, the composition of microorganisms in environmental samples has been studied using metagenomes, which are the collections of genomes of the microorganisms. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3533-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7218625PMC

SCeQTL: an R package for identifying eQTL from single-cell parallel sequencing data.

BMC Bioinformatics 2020 May 11;21(1):184. Epub 2020 May 11.

MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing, 100084, China.

Background: With the rapid development of single-cell genomics, technologies for parallel sequencing of the transcriptome and genome in each single cell is being explored in several labs and is becoming available. This brings us the opportunity to uncover association between genotypes and gene expression phenotypes at single-cell level by eQTL analysis on single-cell data. New method is needed for such tasks due to special characteristics of single-cell sequencing data. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3534-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7216638PMC

Multi-task learning sparse group lasso: a method for quantifying antigenicity of influenza A(H1N1) virus using mutations and variations in glycosylation of Hemagglutinin.

BMC Bioinformatics 2020 May 11;21(1):182. Epub 2020 May 11.

Department of Basic Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi State, MS, USA.

Background: In addition to causing the pandemic influenza outbreaks of 1918 and 2009, subtype H1N1 influenza A viruses (IAVs) have caused seasonal epidemics since 1977. Antigenic property of influenza viruses are determined by both protein sequence and N-linked glycosylation of influenza glycoproteins, especially hemagglutinin (HA). The currently available computational methods are only considered features in protein sequence but not N-linked glycosylation. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3527-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7216668PMC

methCancer-gen: a DNA methylome dataset generator for user-specified cancer type based on conditional variational autoencoder.

BMC Bioinformatics 2020 May 11;21(1):181. Epub 2020 May 11.

Division of Computer Science, Sookmyung Women's University, Seoul, Republic of Korea.

Background: Recently, DNA methylation has drawn great attention due to its strong correlation with abnormal gene activities and informative representation of the cancer status. As a number of studies focus on DNA methylation signatures in cancer, demand for utilizing publicly available methylome dataset has been increased. To satisfy this, large-scale projects were launched to discover biological insights into cancer, providing a collection of the dataset. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3516-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7216580PMC

MEPHAS: an interactive graphical user interface for medical and pharmaceutical statistical analysis with R and Shiny.

BMC Bioinformatics 2020 May 11;21(1):183. Epub 2020 May 11.

Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita City, Osaka, 565-0871, Japan.

Background: Even though R is one of the most commonly used statistical computing environments, it lacks a graphical user interface (GUI) that appeals to students, researchers, lecturers, and practitioners in medicine and pharmacy for conducting standard data analytics. Current GUIs built on top of R, such as EZR and R-Commander, aim to facilitate R coding and visualization, but most of the functionalities are still accessed through a command-line interface (CLI). To assist practitioners of medicine and pharmacy and researchers to run most routines in fundamental statistical analysis, we developed an interactive GUI; i. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3494-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7216538PMC

Cluster correlation based method for lncRNA-disease association prediction.

BMC Bioinformatics 2020 May 11;21(1):180. Epub 2020 May 11.

School of Computer Science and Technology, XIDIAN UNIVERSITY, Xi'an, Shaanxi, China.

Background: In recent years, increasing evidences have indicated that long non-coding RNAs (lncRNAs) are deeply involved in a wide range of human biological pathways. The mutations and disorders of lncRNAs are closely associated with many human diseases. Therefore, it is of great importance to predict potential associations between lncRNAs and complex diseases for the diagnosis and cure of complex diseases. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3496-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7216352PMC

Self-analysis of repeat proteins reveals evolutionarily conserved patterns.

BMC Bioinformatics 2020 May 7;21(1):179. Epub 2020 May 7.

Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland.

Background: Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional "dot plot" protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric.

Results: Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3493-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7204011PMC

Identifying novel associations in GWAS by hierarchical Bayesian latent variable detection of differentially misclassified phenotypes.

BMC Bioinformatics 2020 May 7;21(1):178. Epub 2020 May 7.

Department of Computational Biology, Cornell University, Ithaca, NY, USA.

Background: Heterogeneity in the definition and measurement of complex diseases in Genome-Wide Association Studies (GWAS) may lead to misdiagnoses and misclassification errors that can significantly impact discovery of disease loci. While well appreciated, almost all analyses of GWAS data consider reported disease phenotype values as is without accounting for potential misclassification.

Results: Here, we introduce Phenotype Latent variable Extraction of disease misdiagnosis (PheLEx), a GWAS analysis framework that learns and corrects misclassified phenotypes using structured genotype associations within a dataset. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3387-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7204256PMC

Blast2Fish: a reference-based annotation web tool for transcriptome analysis of non-model teleost fish.

BMC Bioinformatics 2020 May 4;21(1):174. Epub 2020 May 4.

Department of Aquaculture, National Taiwan Ocean University, No.2, Beining Rd., Zhongzheng Dist, Keelung City, 20224, Taiwan.

Background: Transcriptome analysis by next-generation sequencing has become a popular technique in recent years. This approach is quite suitable for non-model organism study, as de novo assembly is independent of prior genomic sequences of organisms. De novo sequencing has benefited many studies on commercially important fish species. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3507-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7199347PMC

Exploring the dynamics and interplay of human papillomavirus and cervical tumorigenesis by integrating biological data into a mathematical model.

BMC Bioinformatics 2020 May 5;21(Suppl 7):152. Epub 2020 May 5.

College of Computer Science, Sichuan University, Chengdu, 610065, China.

Background: Cervical cancer is the fourth most common tumor in women worldwide, mostly resulting from high-risk human papillomavirus (HR-HPV) with persistent infection.

Results: The present discoveries are comprised of the following: (i) A total of 16.64% of the individuals were positive for HR-HPV infection, with 13. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3454-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7199323PMC

Comparative analysis of similarity measurements in miRNAs with applications to miRNA-disease association predictions.

BMC Bioinformatics 2020 May 4;21(1):176. Epub 2020 May 4.

School of Computer Science and Engineering, Central South University, Changsha, 410083, China.

Background: As regulators of gene expression, microRNAs (miRNAs) are increasingly recognized as critical biomarkers of human diseases. Till now, a series of computational methods have been proposed to predict new miRNA-disease associations based on similarity measurements. Different categories of features in miRNAs are applied in these methods for miRNA-miRNA similarity calculation. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3515-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7199309PMC
May 2020
2.576 Impact Factor

Transfer posterior error probability estimation for peptide identification.

BMC Bioinformatics 2020 May 4;21(1):173. Epub 2020 May 4.

National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.

Background: In shotgun proteomics, database searching of tandem mass spectra results in a great number of peptide-spectrum matches (PSMs), many of which are false positives. Quality control of PSMs is a multiple hypothesis testing problem, and the false discovery rate (FDR) or the posterior error probability (PEP) is the commonly used statistical confidence measure. PEP, also called local FDR, can evaluate the confidence of individual PSMs and thus is more desirable than FDR, which evaluates the global confidence of a collection of PSMs. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3485-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7199311PMC

Main findings and advances in bioinformatics and biomedical engineering- IWBBIO 2018.

BMC Bioinformatics 2020 May 5;21(Suppl 7):153. Epub 2020 May 5.

Department of Electrical Engineering and Computer Science, University of Applied Sciences of Munster, Stegerweldstr 39, Steinfurt, 48565, Germany.

In the current supplement, we are proud to present seventeen relevant contributions from the 6th International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO 2018), which was held during April 25-27, 2018 in Granada (Spain). These contributions have been chosen because of their quality and the importance of their findings. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3467-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7199304PMC

Detecting PCOS susceptibility loci from genome-wide association studies via iterative trend correlation based feature screening.

BMC Bioinformatics 2020 May 4;21(1):177. Epub 2020 May 4.

Idaho National Laboratory, Idaho, USA.

Background: Feature screening plays a critical role in handling ultrahigh dimensional data analyses when the number of features exponentially exceeds the number of observations. It is increasingly common in biomedical research to have case-control (binary) response and an extremely large-scale categorical features. However, the approach considering such data types is limited in extant literature. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3492-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7199379PMC

TS: a powerful truncated test to detect novel disease associated genes using publicly available gWAS summary data.

BMC Bioinformatics 2020 May 4;21(1):172. Epub 2020 May 4.

Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, 76203, TX, USA.

Background: In the last decade, a large number of common variants underlying complex diseases have been identified through genome-wide association studies (GWASs). Summary data of the GWASs are freely and publicly available. The summary data is usually obtained through single marker analysis. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3511-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7199321PMC

Prediction model construction of mouse stem cell pluripotency using CpG and non-CpG DNA methylation markers.

BMC Bioinformatics 2020 May 4;21(1):175. Epub 2020 May 4.

School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-gu, Gwangju, 61005, Republic of Korea.

Background: Genome-wide studies of DNA methylation across the epigenetic landscape provide insights into the heterogeneity of pluripotent embryonic stem cells (ESCs). Differentiating into embryonic somatic and germ cells, ESCs exhibit varying degrees of pluripotency, and epigenetic changes occurring in this process have emerged as important factors explaining stem cell pluripotency.

Results: Here, using paired scBS-seq and scRNA-seq data of mice, we constructed a machine learning model that predicts degrees of pluripotency for mouse ESCs. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3448-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7199378PMC

Negative binomial additive model for RNA-Seq data analysis.

Authors:
Xu Ren Pei-Fen Kuan

BMC Bioinformatics 2020 May 1;21(1):171. Epub 2020 May 1.

Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, 11794, NY, USA.

Background: High-throughput sequencing experiments followed by differential expression analysis is a widely used approach for detecting genomic biomarkers. A fundamental step in differential expression analysis is to model the association between gene counts and covariates of interest. Existing models assume linear effect of covariates, which is restrictive and may not be sufficient for certain phenotypes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3506-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7195715PMC
May 2020
2.576 Impact Factor

wg-blimp: an end-to-end analysis pipeline for whole genome bisulfite sequencing data.

BMC Bioinformatics 2020 May 1;21(1):169. Epub 2020 May 1.

Institute of Human Genetics, University of Duisburg-Essen, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany.

Background: Analysing whole genome bisulfite sequencing datasets is a data-intensive task that requires comprehensive and reproducible workflows to generate valid results. While many algorithms have been developed for tasks such as alignment, comprehensive end-to-end pipelines are still sparse. Furthermore, previous pipelines lack features or show technical deficiencies, thus impeding analyses. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3470-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7195798PMC

Enhancing fragment-based protein structure prediction by customising fragment cardinality according to local secondary structure.

BMC Bioinformatics 2020 May 1;21(1):170. Epub 2020 May 1.

Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE, UK.

Background: Whenever suitable template structures are not available, usage of fragment-based protein structure prediction becomes the only practical alternative as pure ab initio techniques require massive computational resources even for very small proteins. However, inaccuracy of their energy functions and their stochastic nature imposes generation of a large number of decoys to explore adequately the solution space, limiting their usage to small proteins. Taking advantage of the uneven complexity of the sequence-structure relationship of short fragments, we adjusted the fragment insertion process by customising the number of available fragment templates according to the expected complexity of the predicted local secondary structure. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3491-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7195757PMC

2SigFinder: the combined use of small-scale and large-scale statistical testing for genomic island detection from a single genome.

BMC Bioinformatics 2020 Apr 29;21(1):159. Epub 2020 Apr 29.

College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou, 310018, China.

Background: Genomic islands are associated with microbial adaptations, carrying genomic signatures different from the host. Some methods perform an overall test to identify genomic islands based on their local features. However, regions of different scales will display different genomic features. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3501-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7191778PMC

CYPminer: an automated cytochrome P450 identification, classification, and data analysis tool for genome data sets across kingdoms.

BMC Bioinformatics 2020 Apr 29;21(1):160. Epub 2020 Apr 29.

Division of Microbiology, National Center for Toxicological Research (NCTR)/U.S. FDA, Jefferson, AR, 72079, USA.

Background: Cytochrome P450 monooxygenases (termed CYPs or P450s) are hemoproteins ubiquitously found across all kingdoms, playing a central role in intracellular metabolism, especially in metabolism of drugs and xenobiotics. The explosive growth of genome sequencing brings a new set of challenges and issues for researchers, such as a systematic investigation of CYPs across all kingdoms in terms of identification, classification, and pan-CYPome analyses. Such investigation requires an automated tool that can handle an enormous amount of sequencing data in a timely manner. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3473-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7191761PMC
April 2020
2.576 Impact Factor

Adaptations of Escherichia coli strains to oxidative stress are reflected in properties of their structural proteomes.

BMC Bioinformatics 2020 Apr 29;21(1):162. Epub 2020 Apr 29.

Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA.

Background: The reconstruction of metabolic networks and the three-dimensional coverage of protein structures have reached the genome-scale in the widely studied Escherichia coli K-12 MG1655 strain. The combination of the two leads to the formation of a structural systems biology framework, which we have used to analyze differences between the reactive oxygen species (ROS) sensitivity of the proteomes of sequenced strains of E. coli. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3505-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7191737PMC

circRNAprofiler: an R-based computational framework for the downstream analysis of circular RNAs.

BMC Bioinformatics 2020 Apr 29;21(1):164. Epub 2020 Apr 29.

Department of Experimental Cardiology, Amsterdam UMC, location AMC, Amsterdam, The Netherlands.

Background: Circular RNAs (circRNAs) are a newly appreciated class of non-coding RNA molecules. Numerous tools have been developed for the detection of circRNAs, however computational tools to perform downstream functional analysis of circRNAs are scarce.

Results: We present circRNAprofiler, an R-based computational framework that runs after circRNAs have been identified. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3500-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7191743PMC

YSMR: a video tracking and analysis program for bacterial motility.

BMC Bioinformatics 2020 Apr 29;21(1):166. Epub 2020 Apr 29.

Institute for Medical Microbiology, University Medical Center Göttingen, Göttingen, Germany.

Background: Motility in bacteria forms the basis for taxis and is in some pathogenic bacteria important for virulence. Video tracking of motile bacteria allows the monitoring of bacterial swimming behaviour and taxis on the level of individual cells, which is a prerequisite to study the underlying molecular mechanisms.

Results: The open-source python program YSMR (Your Software for Motility Recognition) was designed to simultaneously track a large number of bacterial cells on standard computers from video files in various formats. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3495-9DOI Listing

Intrinsic limitations in mainstream methods of identifying network motifs in biology.

BMC Bioinformatics 2020 Apr 29;21(1):165. Epub 2020 Apr 29.

Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria, 3800, Australia.

Background: Network motifs are connectivity structures that occur with significantly higher frequency than chance, and are thought to play important roles in complex biological networks, for example in gene regulation, interactomes, and metabolomes. Network motifs may also become pivotal in the rational design and engineering of complex biological systems underpinning the field of synthetic biology. Distinguishing true motifs from arbitrary substructures, however, remains a challenge. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3441-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7191746PMC

Predicting potential adverse events using safety data from marketed drugs.

BMC Bioinformatics 2020 Apr 29;21(1):163. Epub 2020 Apr 29.

Division of Applied Regulatory Science, Food and Drug Administration, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA.

Background: While clinical trials are considered the gold standard for detecting adverse events, often these trials are not sufficiently powered to detect difficult to observe adverse events. We developed a preliminary approach to predict 135 adverse events using post-market safety data from marketed drugs. Adverse event information available from FDA product labels and scientific literature for drugs that have the same activity at one or more of the same targets, structural and target similarities, and the duration of post market experience were used as features for a classifier algorithm. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3509-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7191698PMC

GrAPFI: predicting enzymatic function of proteins from domain similarity graphs.

BMC Bioinformatics 2020 Apr 29;21(1):168. Epub 2020 Apr 29.

University of Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France.

An amendment to this paper has been published and can be accessed via the original article. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-020-3460-7DOI Listing