Search our Database of Scientific Publications and Authors

I’m looking for a

    261 results match your criteria BioData Mining [Journal]

    1 OF 6

    Scalable non-negative matrix tri-factorization.
    BioData Min 2017 29;10:41. Epub 2017 Dec 29.
    Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.
    Background: Matrix factorization is a well established pattern discovery tool that has seen numerous applications in biomedical data analytics, such as gene expression co-clustering, patient stratification, and gene-disease association mining. Matrix factorization learns a latent data model that takes a data matrix and transforms it into a latent feature space enabling generalization, noise removal and feature discovery. However, factorization algorithms are numerically intensive, and hence there is a pressing challenge to scale current algorithms to work with large datasets. Read More

    An automated pipeline for bouton, spine, and synapse detection of in vivo two-photon images.
    BioData Min 2017 20;10:40. Epub 2017 Dec 20.
    Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190 China.
    Background: In the nervous system, the neurons communicate through synapses. The size, morphology, and connectivity of these synapses are significant in determining the functional properties of the neural network. Therefore, they have always been a major focus of neuroscience research. Read More

    Sparse generalized linear model with L0 approximation for feature selection and prediction with big omics data.
    BioData Min 2017 19;10:39. Epub 2017 Dec 19.
    Foundation Inflammatory Bowel & Immunobiology Research Institute, Cedars-Sinai Medical Center, Los Angeles, 90048 CA USA.
    Background: Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are L1, SCAD and MC+. However, none of the existing algorithms optimizes L0, which penalizes the number of nonzero features directly. Read More

    TSPmap, a tool making use of traveling salesperson problem solvers in the efficient and accurate construction of high-density genetic linkage maps.
    BioData Min 2017 19;10:38. Epub 2017 Dec 19.
    Department of Bioagricultural Sciences & Pest Management, Colorado State University, 1177 Campus Delivery, Fort Collins, CO 80523 USA.
    Background: Recent advances in nucleic acid sequencing technologies have led to a dramatic increase in the number of markers available to generate genetic linkage maps. This increased marker density can be used to improve genome assemblies as well as add much needed resolution for loci controlling variation in ecologically and agriculturally important traits. However, traditional genetic map construction methods from these large marker datasets can be computationally prohibitive and highly error prone. Read More

    Cluster ensemble based on Random Forests for genetic data.
    BioData Min 2017 15;10:37. Epub 2017 Dec 15.
    College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.
    Background: Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Read More

    PMLB: a large benchmark suite for machine learning evaluation and comparison.
    BioData Min 2017 11;10:36. Epub 2017 Dec 11.
    Institute for Biomedical Informatics, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, 19104 PA USA.
    Background: The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. Read More

    Ten quick tips for machine learning in computational biology.
    BioData Min 2017 8;10:35. Epub 2017 Dec 8.
    Princess Margaret Cancer Centre, PMCR Tower 11-401, 101 College Street, Toronto, Ontario, M5G 1L7 Canada.
    Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experience to run a data mining project effectively, and therefore can follow incorrect practices, that may lead to common mistakes or over-optimistic results. With this review, we present ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors that we observed hundreds of times in multiple bioinformatics projects. Read More

    OCDD: an obesity and co-morbid disease database.
    BioData Min 2017 21;10:33. Epub 2017 Nov 21.
    Machine Intelligence Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata, 700108 India.
    Background: Obesity is a medical condition that is known for increased body mass index (BMI). It is also associated with chronic low level inflammation. Obesity disrupts the immune-metabolic homeostasis by changing the secretion of adipocytes. Read More

    Metrics to estimate differential co-expression networks.
    BioData Min 2017 10;10:32. Epub 2017 Nov 10.
    Cátedra de Bioinformática, Escuela de Medicina, Tecnológico de Monterrey, 64710 Monterrey, Nuevo León Mexico.
    Background: Detecting the differences in gene expression data is important for understanding the underlying molecular mechanisms. Although the differentially expressed genes are a large component, differences in correlation are becoming an interesting approach to achieving deeper insights. However, diverse metrics have been used to detect differential correlation, making selection and use of a single metric difficult. Read More

    Methods for enhancing the reproducibility of biomedical research findings using electronic health records.
    BioData Min 2017 11;10:31. Epub 2017 Sep 11.
    EHR Research Group, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, Keppel Streeet, London, WC1E 7HT UK.
    Background: The ability of external investigators to reproduce published scientific findings is critical for the evaluation and validation of biomedical research by the wider community. However, a substantial proportion of health research using electronic health records (EHR), data collected and generated during clinical care, is potentially not reproducible mainly due to the fact that the implementation details of most data preprocessing, cleaning, phenotyping and analysis approaches are not systematically made available or shared. With the complexity, volume and variety of electronic health record data sources made available for research steadily increasing, it is critical to ensure that scientific findings from EHR data are reproducible and replicable by researchers. Read More

    RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study.
    BioData Min 2017 5;10:30. Epub 2017 Sep 5.
    Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
    Background: Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Read More

    Identifying time-delayed gene regulatory networks via an evolvable hierarchical recurrent neural network.
    BioData Min 2017 3;10:29. Epub 2017 Aug 3.
    Department of Electrical and Computer Engineering, North Carolina A&T State University, 1601 E. Market Street, Greensboro, 27411 NC USA.
    Background: The modeling of genetic interactions within a cell is crucial for a basic understanding of physiology and for applied areas such as drug design. Interactions in gene regulatory networks (GRNs) include effects of transcription factors, repressors, small metabolites, and microRNA species. In addition, the effects of regulatory interactions are not always simultaneous, but can occur after a finite time delay, or as a combined outcome of simultaneous and time delayed interactions. Read More

    Genetically improved BarraCUDA.
    BioData Min 2017 2;10:28. Epub 2017 Aug 2.
    University of Cambridge Metabolic Research Laboratories, Addenbrooke's Hospital, Cambridge, UK.
    Background: BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using "Genetic Improvement".

    Results: The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short BioPlanet. Read More

    nRC: non-coding RNA Classifier based on structural features.
    BioData Min 2017 1;10:27. Epub 2017 Aug 1.
    ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa, Palermo, 90146 Italy.
    Motivation: Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. Read More

    Discovery and replication of SNP-SNP interactions for quantitative lipid traits in over 60,000 individuals.
    BioData Min 2017 24;10:25. Epub 2017 Jul 24.
    Biomedical and Translational Informatics, Geisinger Clinic, Danville, PA USA.
    Background: The genetic etiology of human lipid quantitative traits is not fully elucidated, and interactions between variants may play a role. We performed a gene-centric interaction study for four different lipid traits: low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), total cholesterol (TC), and triglycerides (TG).

    Results: Our analysis consisted of a discovery phase using a merged dataset of five different cohorts (n = 12,853 to n = 16,849 depending on lipid phenotype) and a replication phase with ten independent cohorts totaling up to 36,938 additional samples. Read More

    epiACO - a method for identifying epistasis based on ant Colony optimization algorithm.
    BioData Min 2017 6;10:23. Epub 2017 Jul 6.
    School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826 China.
    Background: Identifying epistasis or epistatic interactions, which refer to nonlinear interaction effects of single nucleotide polymorphisms (SNPs), is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Though many works have been done for identifying epistatic interactions, due to their methodological and computational challenges, the algorithmic development is still ongoing.

    Results: In this study, a method epiACO is proposed to identify epistatic interactions, which based on ant colony optimization algorithm. Read More

    Arete - candidate gene prioritization using biological network topology with additional evidence types.
    BioData Min 2017 6;10:22. Epub 2017 Jul 6.
    Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, 230-0045 Japan.
    Background: Refinement of candidate gene lists to select the most promising candidates for further experimental verification remains an essential step between high-throughput exploratory analysis and the discovery of specific causal genes. Given the qualitative and semantic complexity of biological data, successfully addressing this challenge requires development of flexible and interoperable solutions for making the best possible use of the largest possible fraction of all available data.

    Results: We have developed an easily accessible framework that links two established network-based gene prioritization approaches with a supporting isolation forest-based integrative ranking method. Read More

    EFS: an ensemble feature selection tool implemented as R-package and web-application.
    BioData Min 2017 27;10:21. Epub 2017 Jun 27.
    Straubing Center of Science, Schulgasse 22, Straubing, 94315 Germany.
    Background: Feature selection methods aim at identifying a subset of features that improve the prediction performance of subsequent classification models and thereby also simplify their interpretability. Preceding studies demonstrated that single feature selection methods can have specific biases, whereas an ensemble feature selection has the advantage to alleviate and compensate for these biases.

    Results: The software EFS (Ensemble Feature Selection) makes use of multiple feature selection methods and combines their normalized outputs to a quantitative ensemble importance. Read More

    Computational dynamic approaches for temporal omics data with applications to systems medicine.
    BioData Min 2017 17;10:20. Epub 2017 Jun 17.
    Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD 21201 USA.
    Modeling and predicting biological dynamic systems and simultaneously estimating the kinetic structural and functional parameters are extremely important in systems and computational biology. This is key for understanding the complexity of the human health, drug response, disease susceptibility and pathogenesis for systems medicine. Temporal omics data used to measure the dynamic biological systems are essentials to discover complex biological interactions and clinical mechanism and causations. Read More

    Grid-based stochastic search for hierarchical gene-gene interactions in population-based genetic studies of common human diseases.
    BioData Min 2017 30;10:19. Epub 2017 May 30.
    Parabon Computation, Inc, Reston, 20190 VA USA.
    Background: Large-scale genetic studies of common human diseases have focused almost exclusively on the independent main effects of single-nucleotide polymorphisms (SNPs) on disease susceptibility. These studies have had some success, but much of the genetic architecture of common disease remains unexplained. Attention is now turning to detecting SNPs that impact disease susceptibility in the context of other genetic factors and environmental exposures. Read More

    Gene Set Enrichment Analyses: lessons learned from the heart failure phenotype.
    BioData Min 2017 26;10:18. Epub 2017 May 26.
    Department of Cardiology, Division Heart & Lungs, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands.
    Background: Genetic studies for complex diseases have predominantly discovered main effects at individual loci, but have not focused on genomic and environmental contexts important for a phenotype. Gene Set Enrichment Analysis (GSEA) aims to address this by identifying sets of genes or biological pathways contributing to a phenotype, through gene-gene interactions or other mechanisms, which are not the focus of conventional association methods.

    Results: Approaches that utilize GSEA can now take input from array chips, either gene-centric or genome-wide, but are highly sensitive to study design, SNP selection and pruning strategies, SNP-to-gene mapping, and pathway definitions. Read More

    Vinasse fertirrigation alters soil resistome dynamics: an analysis based on metagenomic profiles.
    BioData Min 2017 23;10:17. Epub 2017 May 23.
    Cell and Molecular Biology Laboratory, Center for Nuclear Energy in Agriculture (CENA), University of São Paulo (USP), Av. Centenário 303, Piracicaba, 13400-970 São Paulo Brazil.
    Every year around 300 Gl of vinasse, a by-product of ethanol distillation in sugarcane mills, are flushed into more than 9 Mha of sugarcane cropland in Brazil. This practice links fermentation waste management to fertilization for plant biomass production, and it is known as fertirrigation. Here we evaluate public datasets of soil metagenomes mining for changes in antibiotic resistance genes (ARGs) of soils from sugarcane mesocosms repeatedly amended with vinasse. Read More

    The optimal crowd learning machine.
    BioData Min 2017 19;10:16. Epub 2017 May 19.
    Center for Information Technology, National Institutes of Health, Bethesda, MD USA.
    Background: Any family of learning machines can be combined into a single learning machine using various methods with myriad degrees of usefulness.

    Results: For making predictions on an outcome, it is provably at least as good as the best machine in the family, given sufficient data. And if one machine in the family minimizes the probability of misclassification, in the limit of large data, then Optimal Crowd does also. Read More

    Study of Meta-analysis strategies for network inference using information-theoretic approaches.
    BioData Min 2017 6;10:15. Epub 2017 May 6.
    Bioinformatics and Systems Biology (BioSys) Lab, Université de Liège, Liège, Belgium.
    Background: Reverse engineering of gene regulatory networks (GRNs) from gene expression data is a classical challenge in systems biology. Thanks to high-throughput technologies, a massive amount of gene-expression data has been accumulated in the public repositories. Modelling GRNs from multiple experiments (also called integrative analysis) has; therefore, naturally become a standard procedure in modern computational biology. Read More

    Feature analysis for classification of trace fluorescent labeled protein crystallization images.
    BioData Min 2017 27;10:14. Epub 2017 Apr 27.
    Computer Science Department, University of Alabama in Huntsville, Huntsville, 35899 Alabama USA.
    Background: Large number of features are extracted from protein crystallization trial images to improve the accuracy of classifiers for predicting the presence of crystals or phases of the crystallization process. The excessive number of features and computationally intensive image processing methods to extract these features make utilization of automated classification tools on stand-alone computing systems inconvenient due to the required time to complete the classification tasks. Combinations of image feature sets, feature reduction and classification techniques for crystallization images benefiting from trace fluorescence labeling are investigated. Read More

    Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery.
    BioData Min 2017 24;10:13. Epub 2017 Apr 24.
    Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, Jefferson, AR USA.
    Background: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. Read More

    Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution.
    BioData Min 2017 15;10:12. Epub 2017 Mar 15.
    Universidad Distrital FJC, School of Engineering, Bogota, Colombia.
    Background: Discovering relevant features (biomarkers) that discriminate etiologies of a disease is useful to provide biomedical researchers with candidate targets for further laboratory experimentation while saving costs; dependencies among biomarkers may suggest additional valuable information, for example, to characterize complex epistatic relationships from genetic data. The use of classifiers to guide the search for biomarkers (the so-called wrapper approach) has been widely studied. However, simultaneously searching for relevancy and dependencies among markers is a less explored ground. Read More

    Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces.
    BioData Min 2017 11;10:11. Epub 2017 Mar 11.
    Neuro-Biomorphic Engineering lab, Faculty of Engineering, Jerusalem College of Technology, Jerusalem, Israel.
    Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. Read More

    Label-free data standardization for clinical metabolomics.
    BioData Min 2017 28;10:10. Epub 2017 Feb 28.
    Institute of Biomedical Chemistry, Pogodinskaya st.10, 119121 Moscow, Russia.
    Background: In metabolomics, thousands of substances can be detected in a single assay. This capacity motivates the development of metabolomics testing, which is currently a very promising option for improving laboratory diagnostics. However, the simultaneous measurement of an enormous number of substances leads to metabolomics data often representing concentrations only in conditional units, while laboratory diagnostics generally require actual concentrations. Read More

    Variant Set Enrichment: an R package to identify disease-associated functional genomic regions.
    BioData Min 2017 22;10. Epub 2017 Feb 22.
    Princess Margaret Cancer Centre, University Health Network, Toronto, ON Canada.
    Background: Genetic predispositions to diseases populate the noncoding regions of the human genome. Delineating their functional basis can inform on the mechanisms contributing to disease development. However, this remains a challenge due to the poor characterization of the noncoding genome. Read More

    Semantics-based plausible reasoning to extend the knowledge coverage of medical knowledge bases for improved clinical decision support.
    BioData Min 2017 10;10. Epub 2017 Feb 10.
    NICHE Research Group, Faculty of Computer Science, Dalhousie University, Halifax, NS B3H4R2 Canada.
    Background: Capturing complete medical knowledge is challenging-often due to incomplete patient Electronic Health Records (EHR), but also because of valuable, tacit medical knowledge hidden away in physicians' experiences. To extend the coverage of incomplete medical knowledge-based systems beyond their deductive closure, and thus enhance their decision-support capabilities, we argue that innovative, multi-strategy reasoning approaches should be applied. In particular, plausible reasoning mechanisms apply patterns from human thought processes, such as generalization, similarity and interpolation, based on attributional, hierarchical, and relational knowledge. Read More

    Elevated transcriptional levels of aldolase A (ALDOA) associates with cell cycle-related genes in patients with NSCLC and several solid tumors.
    BioData Min 2017 7;10. Epub 2017 Feb 7.
    Guangdong Provincial Key Laboratory for Breast Cancer Diagnosis and Treatment, Cancer Hospital of Shantou University Medical College, Shantou, 515041 China.
    Background: Aldolase A (ALDOA) is one of the glycolytic enzymes primarily found in the developing embryo and adult muscle. Recently, a new role of ALDOA in several cancers has been proposed. However, the underlying mechanism remains obscure and inconsistent. Read More

    Gene set analysis controlling for length bias in RNA-seq experiments.
    BioData Min 2017 6;10. Epub 2017 Feb 6.
    Department of Biostatistics, SUNY University at Buffalo, Buffalo, 14214 USA.
    Background: In gene set analysis, the researchers are interested in determining the gene sets that are significantly correlated with an outcome, e.g. disease status or treatment. Read More

    A feature selection method based on multiple kernel learning with expression profiles of different types.
    BioData Min 2017 2;10. Epub 2017 Feb 2.
    College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China.
    Background: With the development of high-throughput technology, the researchers can acquire large number of expression data with different types from several public databases. Because most of these data have small number of samples and hundreds or thousands features, how to extract informative features from expression data effectively and robustly using feature selection technique is challenging and crucial. So far, a mass of many feature selection approaches have been proposed and applied to analyse expression data of different types. Read More

    Mining pathway associations for disease-related pathway activity analysis based on gene expression and methylation data.
    BioData Min 2017 1;10. Epub 2017 Feb 1.
    School of Electronics Engineering, Kyungpook National University, 80, Daehak-ro, Buk-gu, Daegu, 41566 Republic of Korea.
    Background: The problem of discovering genetic markers as disease signatures is of great significance for the successful diagnosis, treatment, and prognosis of complex diseases. Even if many earlier studies worked on identifying disease markers from a variety of biological resources, they mostly focused on the markers of genes or gene-sets (i.e. Read More

    Meta-analytic support vector machine for integrating multiple omics data.
    BioData Min 2017 26;10. Epub 2017 Jan 26.
    Department of Statistics, Korea University, Anam-dong, Seoul, 136-701 South Korea.
    Background: Of late, high-throughput microarray and sequencing data have been extensively used to monitor biomarkers and biological processes related to many diseases. Under this circumstance, the support vector machine (SVM) has been popularly used and been successful for gene selection in many applications. Despite surpassing benefits of the SVMs, single data analysis using small- and mid-size of data inevitably runs into the problem of low reproducibility and statistical power. Read More

    Accurate prediction of protein relative solvent accessibility using a balanced model.
    BioData Min 2017 24;10. Epub 2017 Jan 24.
    Department of Chemistry, Tongji University, Shanghai, China.
    Background: Protein relative solvent accessibility provides insight into understanding protein structure and function. Prediction of protein relative solvent accessibility is often the first stage of predicting other protein properties. Recent predictors of relative solvent accessibility discriminate against exposed regions as compared with buried regions, resulting in higher prediction accuracy associated with buried regions relative to exposed regions. Read More

    The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature.
    BioData Min 2016 19;9:41. Epub 2016 Dec 19.
    Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI 48109 USA.
    Background: The Interaction Network Ontology (INO) logically represents biological interactions, pathways, and networks. INO has been demonstrated to be valuable in providing a set of structured ontological terms and associated keywords to support literature mining of gene-gene interactions from biomedical literature. However, previous work using INO focused on single keyword matching, while many interactions are represented with two or more interaction keywords used in combination. Read More

    Complex systems analysis of bladder cancer susceptibility reveals a role for decarboxylase activity in two genome-wide association studies.
    BioData Min 2016 12;9:40. Epub 2016 Dec 12.
    Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116 USA.
    Background: Bladder cancer is common disease with a complex etiology that is likely due to many different genetic and environmental factors. The goal of this study was to embrace this complexity using a bioinformatics analysis pipeline designed to use machine learning to measure synergistic interactions between single nucleotide polymorphisms (SNPs) in two genome-wide association studies (GWAS) and then to assess their enrichment within functional groups defined by Gene Ontology. The significance of the results was evaluated using permutation testing and those results that replicated between the two GWAS data sets were reported. Read More

    matK-QR classifier: a patterns based approach for plant species identification.
    BioData Min 2016 9;9:39. Epub 2016 Dec 9.
    Environmental Genomics Division, CSIR-National Environmental Engineering Research Institute, Nagpur, 440020 Maharashtra India.
    Background: DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK (matK) and ribulose-1, 5-bisphosphate carboxylase (rbcL) marker loci are commonly used in plant species identification. Here, we present a new and highly efficient approach for identifying a unique set of discriminating nucleotide patterns to generate a signature (i. Read More

    MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification.
    BioData Min 2016 6;9:38. Epub 2016 Dec 6.
    Institute of Systems Analysis and Computer Science A. Ruberti (IASI), National Research Council (CNR), Via dei Taurini 19, Rome, 00185 Italy.
    Background: Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g. Read More

    Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification.
    BioData Min 2016 1;9:37. Epub 2016 Dec 1.
    Centre for Biomedical Engineering, School of Electrical & Electronic Engineering, University of Adelaide, Adelaide, Australia.
    Background: An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. Read More

    Compensation of feature selection biases accompanied with improved predictive performance for binary classification by using a novel ensemble feature selection approach.
    BioData Min 2016 18;9:36. Epub 2016 Nov 18.
    Department of Bioinformatics, Straubing, 94315 Germany ; University of Applied Science, Weihenstephan-Triesdorf, Freising, 85354 Germany ; Wissenschaftszentrum Weihenstephan, Technische Universität München, Freising, 85354 Germany.
    Motivation: Biomarker discovery methods are essential to identify a minimal subset of features (e.g., serum markers in predictive medicine) that are relevant to develop prediction models with high accuracy. Read More

    Considerations for higher efficiency and productivity in research activities.
    BioData Min 2016 9;9:35. Epub 2016 Nov 9.
    Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School or Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116 USA.
    There are several factors that are known to affect research productivity; some of them imply the need for large financial investments and others are related to work styles. There are some articles that provide suggestions for early career scientists (PhD students and postdocs) but few publications are oriented to professors about scientific leadership. As academic mentoring might be useful at all levels of experience, in this note we suggest several key considerations for higher efficiency and productivity in academic and research activities. Read More

    On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs.
    BioData Min 2016 2;9:34. Epub 2016 Nov 2.
    Institut de Médecine Régénératrice et de Biothérapie, INSERM U1183, CHU Montpellier, Montpellier, France ; Institut de Biologie Computationnelle, Université Montpellier, Montpellier, France ; Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Université Montpellier, UMR 5506 CNRS, Montpellier, France ; PPGCC, Universidade Federal do Pará, Belém, Brazil ; Instituto Tecnológico Vale, Belém, Brazil.
    Background: High-throughput sequencing technology and bioinformatics have identified chimeric RNAs (chRNAs), raising the possibility of chRNAs expressing particularly in diseases can be used as potential biomarkers in both diagnosis and prognosis.

    Results: The task of discriminating true chRNAs from the false ones poses an interesting Machine Learning (ML) challenge. First of all, the sequencing data may contain false reads due to technical artifacts and during the analysis process, bioinformatics tools may generate false positives due to methodological biases. Read More

    Developing a modular architecture for creation of rule-based clinical diagnostic criteria.
    BioData Min 2016 21;9:33. Epub 2016 Oct 21.
    Department of Health Sciences Research, Mayo Clinic, 200 First Street, SW, Rochester, MN 55905 USA.
    Background: With recent advances in computerized patient records system, there is an urgent need for producing computable and standards-based clinical diagnostic criteria. Notably, constructing rule-based clinical diagnosis criteria has become one of the goals in the International Classification of Diseases (ICD)-11 revision. However, few studies have been done in building a unified architecture to support the need for diagnostic criteria computerization. Read More

    1 OF 6