Search our Database of Scientific Publications and Authors

I’m looking for a

    8286 results match your criteria BMC Bioinformatics [Journal]

    1 OF 166

    The VAAST Variant Prioritizer (VVP): ultrafast, easy to use whole genome variant prioritization tool.
    BMC Bioinformatics 2018 Feb 20;19(1):57. Epub 2018 Feb 20.
    Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
    Background: Prioritization of sequence variants for diagnosis and discovery of Mendelian diseases is challenging, especially in large collections of whole genome sequences (WGS). Fast, scalable solutions are needed for discovery research, for clinical applications, and for curation of massive public variant repositories such as dbSNP and gnomAD. In response, we have developed VVP, the VAAST Variant Prioritizer. Read More

    CEMiTool: a Bioconductor package for performing comprehensive modular co-expression analyses.
    BMC Bioinformatics 2018 Feb 20;19(1):56. Epub 2018 Feb 20.
    Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, SP, 05508-900, Brazil.
    Background: The analysis of modular gene co-expression networks is a well-established method commonly used for discovering the systems-level functionality of genes. In addition, these studies provide a basis for the discovery of clinically relevant molecular pathways underlying different diseases and conditions.

    Results: In this paper, we present a fast and easy-to-use Bioconductor package named CEMiTool that unifies the discovery and the analysis of co-expression modules. Read More

    StructRNAfinder: an automated pipeline and web server for RNA families prediction.
    BMC Bioinformatics 2018 Feb 17;19(1):55. Epub 2018 Feb 17.
    Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, 8580745, Santiago, Chile.
    Background: The function of many noncoding RNAs (ncRNAs) depend upon their secondary structures. Over the last decades, several methodologies have been developed to predict such structures or to use them to functionally annotate RNAs into RNA families. However, to fully perform this analysis, researchers should utilize multiple tools, which require the constant parsing and processing of several intermediate files. Read More

    Oasis 2: improved online analysis of small RNA-seq data.
    BMC Bioinformatics 2018 Feb 14;19(1):54. Epub 2018 Feb 14.
    Laboratory of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany.
    Background: Small RNA molecules play important roles in many biological processes and their dysregulation or dysfunction can cause disease. The current method of choice for genome-wide sRNA expression profiling is deep sequencing.

    Results: Here we present Oasis 2, which is a new main release of the Oasis web application for the detection, differential expression, and classification of small RNAs in deep sequencing data. Read More

    Evaluation of reaction gap-filling accuracy by randomization.
    BMC Bioinformatics 2018 Feb 14;19(1):53. Epub 2018 Feb 14.
    SRI International/Artificial Intelligence Center, 333 Ravenswood Ave, Menlo Park, 94025, USA.
    Background: Completion of genome-scale flux-balance models using computational reaction gap-filling is a widely used approach, but its accuracy is not well known.

    Results: We report on computational experiments of reaction gap filling in which we generated degraded versions of the EcoCyc-20.0-GEM model by randomly removing flux-carrying reactions from a growing model. Read More

    Antigenic cartography of H1N1 influenza viruses using sequence-based antigenic distance calculation.
    BMC Bioinformatics 2018 Feb 12;19(1):51. Epub 2018 Feb 12.
    New York Influenza Center of Excellence at David Smith Center for Immunology and Vaccine Biology, Department of Microbiology and Immunology, University of Rochester School of Medicine and Dentistry, Rochester, NY, USA.
    Background: The ease at which influenza virus sequence data can be used to estimate antigenic relationships between strains and the existence of databases containing sequence data for hundreds of thousands influenza strains make sequence-based antigenic distance estimates an attractive approach to researchers. Antigenic mismatch between circulating strains and vaccine strains results in significantly decreased vaccine effectiveness. Furthermore, antigenic relatedness between the vaccine strain and the strains an individual was originally primed with can affect the cross-reactivity of the antibody response. Read More

    FMLRC: Hybrid long read error correction using an FM-index.
    BMC Bioinformatics 2018 Feb 9;19(1):50. Epub 2018 Feb 9.
    Department of Biology and Integrative Program for Biological and Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
    Background: Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. Read More

    Prioritizing disease genes with an improved dual label propagation framework.
    BMC Bioinformatics 2018 Feb 8;19(1):47. Epub 2018 Feb 8.
    College of Software, Nankai University, TianJin, 300350, China.
    Background: Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive protein-protein interactions that exist in the dataset. To the best of our knowledge, the false positive protein-protein interactions have not been considered before in disease gene prioritization. Read More

    OVAS: an open-source variant analysis suite with inheritance modelling.
    BMC Bioinformatics 2018 Feb 8;19(1):46. Epub 2018 Feb 8.
    Division of Medicine, University College London, London, NW3 2PF, UK.
    Background: The advent of modern high-throughput genetics continually broadens the gap between the rising volume of sequencing data, and the tools required to process them. The need to pinpoint a small subset of functionally important variants has now shifted towards identifying the critical differences between normal variants and disease-causing ones. The ever-increasing reliance on cloud-based services for sequence analysis and the non-transparent methods they utilize has prompted the need for more in-situ services that can provide a safer and more accessible environment to process patient data, especially in circumstances where continuous internet usage is limited. Read More

    The research on gene-disease association based on text-mining of PubMed.
    BMC Bioinformatics 2018 Feb 7;19(1):37. Epub 2018 Feb 7.
    Guangdong Key Laboratory of Computer Network, School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China.
    Background: The associations between genes and diseases are of critical significance in aspects of prevention, diagnosis and treatment. Although gene-disease relationships have been investigated extensively, much of the underpinnings of these associations are yet to be elucidated.

    Methods: A novel method integrates MeSH database, term weight (TW), and co-occurrence methods to predict gene-disease associations based on the cosine similarity between gene vectors and disease vectors. Read More

    Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction.
    BMC Bioinformatics 2018 Feb 6;19(1):35. Epub 2018 Feb 6.
    Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, 35131, Italy.
    Background: The correct determination of protein-protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. Read More

    Semantic annotation of consumer health questions.
    BMC Bioinformatics 2018 Feb 6;19(1):34. Epub 2018 Feb 6.
    Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD, USA.
    Background: Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. Read More

    G4PromFinder: an algorithm for predicting transcription promoters in GC-rich bacterial genomes based on AT-rich elements and G-quadruplex motifs.
    BMC Bioinformatics 2018 Feb 6;19(1):36. Epub 2018 Feb 6.
    Department of Biological and Environmental Sciences and Technologies, University of Salento, Lecce, Italy.
    Background: Over the last few decades, computational genomics has tremendously contributed to decipher biology from genome sequences and related data. Considerable effort has been devoted to the prediction of transcription promoter and terminator sites that represent the essential "punctuation marks" for DNA transcription. Computational prediction of promoters in prokaryotes is a problem whose solution is far from being determined in computational genomics. Read More

    xenoGI: reconstructing the history of genomic island insertions in clades of closely related bacteria.
    BMC Bioinformatics 2018 Feb 5;19(1):32. Epub 2018 Feb 5.
    Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA.
    Background: Genomic islands play an important role in microbial genome evolution, providing a mechanism for strains to adapt to new ecological conditions. A variety of computational methods, both genome-composition based and comparative, have been developed to identify them. Some of these methods are explicitly designed to work in single strains, while others make use of multiple strains. Read More

    Bio-SimVerb and Bio-SimLex: wide-coverage evaluation sets of word similarity in biomedicine.
    BMC Bioinformatics 2018 Feb 5;19(1):33. Epub 2018 Feb 5.
    Language Technology Laboratory, DTAL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK.
    Background: Word representations support a variety of Natural Language Processing (NLP) tasks. The quality of these representations is typically assessed by comparing the distances in the induced vector spaces against human similarity judgements. Whereas comprehensive evaluation resources have recently been developed for the general domain, similar resources for biomedicine currently suffer from the lack of coverage, both in terms of word types included and with respect to the semantic distinctions. Read More

    Defiant: (DMRs: easy, fast, identification and ANnoTation) identifies differentially Methylated regions from iron-deficient rat hippocampus.
    BMC Bioinformatics 2018 Feb 5;19(1):31. Epub 2018 Feb 5.
    Department of Genetics, The Institute for Diabetes, Obesity, and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
    Background: Identification of differentially methylated regions (DMRs) is the initial step towards the study of DNA methylation-mediated gene regulation. Previous approaches to call DMRs suffer from false prediction, use extreme resources, and/or require library installation and input conversion.

    Results: We developed a new approach called Defiant to identify DMRs. Read More

    Cross-linking BioThings APIs through JSON-LD to facilitate knowledge exploration.
    BMC Bioinformatics 2018 Feb 1;19(1):30. Epub 2018 Feb 1.
    Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.
    Background: Application Programming Interfaces (APIs) are now widely used to distribute biological data. And many popular biological APIs developed by many different research teams have adopted Javascript Object Notation (JSON) as their primary data format. While usage of a common data format offers significant advantages, that alone is not sufficient for rich integrative queries across APIs. Read More

    Grid-based prediction of torsion angle probabilities of protein backbone and its application to discrimination of protein intrinsic disorder regions and selection of model structures.
    BMC Bioinformatics 2018 Feb 1;19(1):29. Epub 2018 Feb 1.
    Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr, Southport, QLD, 4222, Australia.
    Background: Protein structure can be described by backbone torsion angles: rotational angles about the N-Cα bond (φ) and the Cα-C bond (ψ) or the angle between Cα-Cα-Cα(θ) and the rotational angle about the Cα-Cαbond (τ). Thus, their accurate prediction is useful for structure prediction and model refinement. Early methods predicted torsion angles in a few discrete bins whereas most recent methods have focused on prediction of angles in real, continuous values. Read More

    Germline contamination and leakage in whole genome somatic single nucleotide variant detection.
    BMC Bioinformatics 2018 Jan 31;19(1):28. Epub 2018 Jan 31.
    Informatics & Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, M5G 0A3, Canada.
    Background: The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. Read More

    Inferring synteny between genome assemblies: a systematic evaluation.
    BMC Bioinformatics 2018 Jan 30;19(1):26. Epub 2018 Jan 30.
    Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan.
    Background: Genome assemblies across all domains of life are being produced routinely. Initial analysis of a new genome usually includes annotation and comparative genomics. Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of different species. Read More

    Large scale analysis of protein conformational transitions from aqueous to non-aqueous media.
    BMC Bioinformatics 2018 Jan 30;19(1):27. Epub 2018 Jan 30.
    Departamento de Ciencia y Tecnología, CONICET, Universidad Nacional de Quilmes, Roque Sáenz Peña 352, B1876BXD, Bernal, Provincia de Buenos Aires, Argentina.
    Background: Biocatalysis in organic solvents is nowadays a common practice with a large potential in Biotechnology. Several studies report that proteins which are co-crystallized or soaked in organic solvents preserve their fold integrity showing almost identical arrangements when compared to their aqueous forms. However, it is well established that the catalytic activity of proteins in organic solvents is much lower than in water. Read More

    GenIO: a phenotype-genotype analysis web server for clinical genomics of rare diseases.
    BMC Bioinformatics 2018 Jan 27;19(1):25. Epub 2018 Jan 27.
    Instituto de Investigación en Biomedicina de Buenos Aires (IBioBA), CONICET - Partner Institute of the Max Planck Society, Buenos Aires, Argentina.
    Background: GenIO is a novel web-server, designed to assist clinical genomics researchers and medical doctors in the diagnostic process of rare genetic diseases. The tool identifies the most probable variants causing a rare disease, using the genomic and clinical information provided by a medical practitioner. Variants identified in a whole-genome, whole-exome or target sequencing studies are annotated, classified and filtered by clinical significance. Read More

    SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data.
    BMC Bioinformatics 2018 Jan 26;19(1):24. Epub 2018 Jan 26.
    Computational Biology Group, Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152, Martinsried, Germany.
    Background: Protein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs). Comparing these CSEs is needed, whenever we want to match identical annotations or integrate distinctive ones. Currently, there is no ready-to-use software available that provides comprehensive statistical readout for comparing two annotations of the same type with each other, which can be adapted to the application logic of the scientific question. Read More

    Scuba: scalable kernel-based gene prioritization.
    BMC Bioinformatics 2018 Jan 25;19(1):23. Epub 2018 Jan 25.
    CRIBI Biotechnology Center, University of Padova, viale G. Colombo, 3, Padova, Italy.
    Background: The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. Read More

    CONFOLD2: improved contact-driven ab initio protein structure modeling.
    BMC Bioinformatics 2018 Jan 25;19(1):22. Epub 2018 Jan 25.
    Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, 65211, MO, USA.
    Background: Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed.

    Results: We develop an improved contact-driven protein modelling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. Read More

    Deep learning of mutation-gene-drug relations from the literature.
    BMC Bioinformatics 2018 Jan 25;19(1):21. Epub 2018 Jan 25.
    Department of Computer Science and Engineering, Korea University, Seoul, South Korea.
    Background: Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinical models have increasingly led to the identification of novel gene-mutation-drug relations, and these results have been reported and published in the scientific literature. Read More

    Shrinkage Clustering: a fast and size-constrained clustering algorithm for biomedical applications.
    BMC Bioinformatics 2018 Jan 23;19(1):19. Epub 2018 Jan 23.
    Department of Bioengineering, Rice University, Main Street, Houston, 77030, USA.
    Background: Many common clustering algorithms require a two-step process that limits their efficiency. The algorithms need to be performed repetitively and need to be implemented together with a model selection criterion. These two steps are needed in order to determine both the number of clusters present in the data and the corresponding cluster memberships. Read More

    Three-dimensional spatial analysis of missense variants in RTEL1 identifies pathogenic variants in patients with Familial Interstitial Pneumonia.
    BMC Bioinformatics 2018 Jan 23;19(1):18. Epub 2018 Jan 23.
    Department of Biological Sciences, Vanderbilt Genetics Institute, and Center for Structural Biology, Vanderbilt University, Nashville, USA.
    Background: Next-generation sequencing of individuals with genetic diseases often detects candidate rare variants in numerous genes, but determining which are causal remains challenging. We hypothesized that the spatial distribution of missense variants in protein structures contains information about function and pathogenicity that can help prioritize variants of unknown significance (VUS) and elucidate the structural mechanisms leading to disease.

    Results: To illustrate this approach in a clinical application, we analyzed 13 candidate missense variants in regulator of telomere elongation helicase 1 (RTEL1) identified in patients with Familial Interstitial Pneumonia (FIP). Read More

    Spherical: an iterative workflow for assembling metagenomic datasets.
    BMC Bioinformatics 2018 Jan 24;19(1):20. Epub 2018 Jan 24.
    Institute of Biological, Environmental and Rural Sciences (IBERS), Aberystwyth University, Aberystwyth, Wales, SY23 3FG, UK.
    Background: The consensus emerging from the study of microbiomes is that they are far more complex than previously thought, requiring better assemblies and increasingly deeper sequencing. However, current metagenomic assembly techniques regularly fail to incorporate all, or even the majority in some cases, of the sequence information generated for many microbiomes, negating this effort. This can especially bias the information gathered and the perceived importance of the minor taxa in a microbiome. Read More

    MutScan: fast detection and visualization of target mutations by scanning FASTQ data.
    BMC Bioinformatics 2018 Jan 22;19(1):16. Epub 2018 Jan 22.
    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
    Background: Some types of clinical genetic tests, such as cancer testing using circulating tumor DNA (ctDNA), require sensitive detection of known target mutations. However, conventional next-generation sequencing (NGS) data analysis pipelines typically involve different steps of filtering, which may cause miss-detection of key mutations with low frequencies. Variant validation is also indicated for key mutations detected by bioinformatics pipelines. Read More

    Feature selection for high-dimensional temporal data.
    BMC Bioinformatics 2018 Jan 23;19(1):17. Epub 2018 Jan 23.
    Department of Computer Science, University of Crete, Voutes Campus, Heraklion, 70013, Greece.
    Background: Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work we extend established constrained-based, feature-selection methods to high-dimensional "omics" temporal data, where the number of measurements is orders of magnitude larger than the sample size. The extension required the development of conditional independence tests for temporal and/or static variables conditioned on a set of temporal variables. Read More

    LocText: relation extraction of protein localizations to assist database curation.
    BMC Bioinformatics 2018 Jan 17;19(1):15. Epub 2018 Jan 17.
    Bioinformatics & Computational Biology, Department of Informatics, Technical University of Munich (TUM), Boltzmannstr. 3, Garching, 85748, Germany.
    Background: The subcellular localization of a protein is an important aspect of its function. However, the experimental annotation of locations is not even complete for well-studied model organisms. Text mining might aid database curators to add experimental annotations from the scientific literature. Read More

    Tuning iteration space slicing based tiled multi-core code implementing Nussinov's RNA folding.
    BMC Bioinformatics 2018 Jan 15;19(1):12. Epub 2018 Jan 15.
    West Pomeranian University of Technology, Faculty of Computer Science, Zolnierska 49, Szczecin, 71-210, Poland.
    Background: RNA folding is an ongoing compute-intensive task of bioinformatics. Parallelization and improving code locality for this kind of algorithms is one of the most relevant areas in computational biology. Fortunately, RNA secondary structure approaches, such as Nussinov's recurrence, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. Read More

    Protein-protein interface hot spots prediction based on a hybrid feature selection strategy.
    BMC Bioinformatics 2018 Jan 15;19(1):14. Epub 2018 Jan 15.
    Institute of Health Sciences, Anhui University, Hefei, Anhui, 230601, China.
    Background: Hot spots are interface residues that contribute most binding affinity to protein-protein interaction. A compact and relevant feature subset is important for building machine learning methods to predict hot spots on protein-protein interfaces. Although different methods have been used to detect the relevant feature subset from a variety of features related to interface residues, it is still a challenge to detect the optimal feature subset for building the final model. Read More

    diceR: an R package for class discovery using an ensemble driven approach.
    BMC Bioinformatics 2018 Jan 15;19(1):11. Epub 2018 Jan 15.
    Department of Molecular Oncology, BC Cancer Agency, Vancouver, BC, Canada.
    Background: Given a set of features, researchers are often interested in partitioning objects into homogeneous clusters. In health research, cancer research in particular, high-throughput data is collected with the aim of segmenting patients into sub-populations to aid in disease diagnosis, prognosis or response to therapy. Cluster analysis, a class of unsupervised learning techniques, is often used for class discovery. Read More

    Bi-objective integer programming for RNA secondary structure prediction with pseudoknots.
    BMC Bioinformatics 2018 Jan 15;19(1):13. Epub 2018 Jan 15.
    IBISC, Univ Evry, Université Paris-Saclay, Evry, 91025, France.
    Background: RNA structure prediction is an important field in bioinformatics, and numerous methods and tools have been proposed. Pseudoknots are specific motifs of RNA secondary structures that are difficult to predict. Almost all existing methods are based on a single model and return one solution, often missing the real structure. Read More

    Ontological representation, integration, and analysis of LINCS cell line cells and their cellular responses.
    BMC Bioinformatics 2017 Dec 21;18(Suppl 17):556. Epub 2017 Dec 21.
    Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
    Background: Aiming to understand cellular responses to different perturbations, the NIH Common Fund Library of Integrated Network-based Cellular Signatures (LINCS) program involves many institutes and laboratories working on over a thousand cell lines. The community-based Cell Line Ontology (CLO) is selected as the default ontology for LINCS cell line representation and integration.

    Results: CLO has consistently represented all 1097 LINCS cell lines and included information extracted from the LINCS Data Portal and ChEMBL. Read More

    Cells in experimental life sciences - challenges and solution to the rapid evolution of knowledge.
    BMC Bioinformatics 2017 Dec 21;18(Suppl 17):560. Epub 2017 Dec 21.
    Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
    Cell cultures used in biomedical experiments come in the form of both sample biopsy primary cells, and maintainable immortalised cell lineages. The rise of bioinformatics and high-throughput technologies has led us to the requirement of ontology representation of cell types and cell lines. The Cell Ontology (CL) and Cell Line Ontology (CLO) have long been established as reference ontologies in the OBO framework. Read More

    Comparison, alignment, and synchronization of cell line information between CLO and EFO.
    BMC Bioinformatics 2017 Dec 21;18(Suppl 17):557. Epub 2017 Dec 21.
    Center of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
    Background: The Experimental Factor Ontology (EFO) is an application ontology driven by experimental variables including cell lines to organize and describe the diverse experimental variables and data resided in the EMBL-EBI resources. The Cell Line Ontology (CLO) is an OBO community-based ontology that contains information of immortalized cell lines and relevant experimental components. EFO integrates and extends ontologies from the bio-ontology community to drive a number of practical applications. Read More

    Cell ontology in an age of data-driven cell classification.
    BMC Bioinformatics 2017 Dec 21;18(Suppl 17):558. Epub 2017 Dec 21.
    European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK.
    Background: Data-driven cell classification is becoming common and is now being implemented on a massive scale by projects such as the Human Cell Atlas. The scale of these efforts poses a challenge. How can the results be made searchable and accessible to biologists in general? How can they be related back to the rich classical knowledge of cell-types, anatomy and development? How will data from the various types of single cell analysis be made cross-searchable? Structured annotation with ontology terms provides a potential solution to these problems. Read More

    Cell type discovery and representation in the era of high-content single cell phenotyping.
    BMC Bioinformatics 2017 Dec 21;18(Suppl 17):559. Epub 2017 Dec 21.
    J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA, 92037, USA.
    Background: A fundamental characteristic of multicellular organisms is the specialization of functional cell types through the process of differentiation. These specialized cell types not only characterize the normal functioning of different organs and tissues, they can also be used as cellular biomarkers of a variety of different disease states and therapeutic/vaccine responses. In order to serve as a reference for cell type representation, the Cell Ontology has been developed to provide a standard nomenclature of defined cell types for comparative analysis and biomarker discovery. Read More

    Usage of cell nomenclature in biomedical literature.
    BMC Bioinformatics 2017 Dec 21;18(Suppl 17):561. Epub 2017 Dec 21.
    Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia.
    Background: Cell lines and cell types are extensively studied in biomedical research yielding to a significant amount of publications each year. Identifying cell lines and cell types precisely in publications is crucial for science reproducibility and knowledge integration. There are efforts for standardisation of the cell nomenclature based on ontology development to support FAIR principles of the cell knowledge. Read More

    Thresher: determining the number of clusters while removing outliers.
    BMC Bioinformatics 2018 Jan 8;19(1). Epub 2018 Jan 8.
    Department of Biomedical Informatics, The Ohio State University, 250 Lincoln Tower, 1800 Cannon Drive, Columbus, 43210, OH, USA.
    Background: Cluster analysis is the most common unsupervised method for finding hidden groups in data. Clustering presents two main challenges: (1) finding the optimal number of clusters, and (2) removing "outliers" among the objects being clustered. Few clustering algorithms currently deal directly with the outlier problem. Read More

    Gene flow analysis method, the D-statistic, is robust in a wide parameter space.
    BMC Bioinformatics 2018 Jan 8;19(1):10. Epub 2018 Jan 8.
    Biodiversität und Klima Forschungszentrum, Senckenberg Gesellschaft für Naturforschung, 60325, Frankfurt, Germany.
    Background: We evaluated the sensitivity of the D-statistic, a parsimony-like method widely used to detect gene flow between closely related species. This method has been applied to a variety of taxa with a wide range of divergence times. However, its parameter space and thus its applicability to a wide taxonomic range has not been systematically studied. Read More

    A powerful parent-of-origin effects test for qualitative traits on X chromosome in general pedigrees.
    BMC Bioinformatics 2018 Jan 5;19(1). Epub 2018 Jan 5.
    State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, Department of Biostatistics, School of Public Health, Southern Medical University, No. 1023, South Shatai Road, Baiyun District, Guangzhou, 510515, China.
    Background: Genomic imprinting is one of the well-known epigenetic factors causing the association between traits and genes, and has generally been examined by detecting parent-of-origin effects of alleles. A lot of methods have been proposed to test for parent-of-origin effects on autosomes based on nuclear families and general pedigrees. Although these parent-of-origin effects tests on autosomes have been available for more than 15 years, there has been no statistical test developed to test for parent-of-origin effects on X chromosome, until the parental-asymmetry test on X chromosome (XPAT) and its extensions were recently proposed. Read More

    Inferring ontology graph structures using OWL reasoning.
    BMC Bioinformatics 2018 Jan 5;19(1). Epub 2018 Jan 5.
    Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia.
    Background: Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. Read More

    PIVOT: platform for interactive analysis and visualization of transcriptomics data.
    BMC Bioinformatics 2018 Jan 5;19(1). Epub 2018 Jan 5.
    Department of Biology, University of Pennsylvania, Philadelphia, PA, USA.
    Background: Many R packages have been developed for transcriptome analysis but their use often requires familiarity with R and integrating results of different packages requires scripts to wrangle the datatypes. Furthermore, exploratory data analyses often generate multiple derived datasets such as data subsets or data transformations, which can be difficult to track.

    Results: Here we present PIVOT, an R-based platform that wraps open source transcriptome analysis packages with a uniform user interface and graphical data management that allows non-programmers to interactively explore transcriptomics data. Read More

    Strategies for identification of somatic variants using the Ion Torrent deep targeted sequencing platform.
    BMC Bioinformatics 2018 Jan 4;19(1). Epub 2018 Jan 4.
    Departments of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX, USA.
    Background: 'Next-generation' (NGS) sequencing has wide application in medical genetics, including the detection of somatic variation in cancer. The Ion Torrent-based (IONT) platform is among NGS technologies employed in clinical, research and diagnostic settings. However, identifying mutations from IONT deep sequencing with high confidence has remained a challenge. Read More

    Sequence motif finder using memetic algorithm.
    BMC Bioinformatics 2018 Jan 3;19(1). Epub 2018 Jan 3.
    Department of Computer Science, Bioinformatics Graduate Program, Federal University of Technology - Paraná, Cornélio Procópio, PR, Brazil.
    Background: De novo prediction of Transcription Factor Binding Sites (TFBS) using computational methods is a difficult task and it is an important problem in Bioinformatics. The correct recognition of TFBS plays an important role in understanding the mechanisms of gene regulation and helps to develop new drugs.

    Results: We here present Memetic Framework for Motif Discovery (MFMD), an algorithm that uses semi-greedy constructive heuristics as a local optimizer. Read More

    Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis.
    BMC Bioinformatics 2018 Jan 3;19(1). Epub 2018 Jan 3.
    Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road 2#, Haidian District, Beijing, 100193, China.
    Background: Running multiple-chain Markov Chain Monte Carlo (MCMC) provides an efficient parallel computing method for complex Bayesian models, although the efficiency of the approach critically depends on the length of the non-parallelizable burn-in period, for which all simulated data are discarded. In practice, this burn-in period is set arbitrarily and often leads to the performance of far more iterations than required. In addition, the accuracy of genomic predictions does not improve after the MCMC reaches equilibrium. Read More

    1 OF 166