Search our Database of Scientific Publications and Authors

I’m looking for a

    251 results match your criteria BioData Mining [Journal]

    1 OF 6

    Methods for enhancing the reproducibility of biomedical research findings using electronic health records.
    BioData Min 2017 11;10:31. Epub 2017 Sep 11.
    EHR Research Group, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, Keppel Streeet, London, WC1E 7HT UK.
    Background: The ability of external investigators to reproduce published scientific findings is critical for the evaluation and validation of biomedical research by the wider community. However, a substantial proportion of health research using electronic health records (EHR), data collected and generated during clinical care, is potentially not reproducible mainly due to the fact that the implementation details of most data preprocessing, cleaning, phenotyping and analysis approaches are not systematically made available or shared. With the complexity, volume and variety of electronic health record data sources made available for research steadily increasing, it is critical to ensure that scientific findings from EHR data are reproducible and replicable by researchers. Read More

    RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study.
    BioData Min 2017 5;10:30. Epub 2017 Sep 5.
    Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
    Background: Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Read More

    Identifying time-delayed gene regulatory networks via an evolvable hierarchical recurrent neural network.
    BioData Min 2017 3;10:29. Epub 2017 Aug 3.
    Department of Electrical and Computer Engineering, North Carolina A&T State University, 1601 E. Market Street, Greensboro, 27411 NC USA.
    Background: The modeling of genetic interactions within a cell is crucial for a basic understanding of physiology and for applied areas such as drug design. Interactions in gene regulatory networks (GRNs) include effects of transcription factors, repressors, small metabolites, and microRNA species. In addition, the effects of regulatory interactions are not always simultaneous, but can occur after a finite time delay, or as a combined outcome of simultaneous and time delayed interactions. Read More

    Genetically improved BarraCUDA.
    BioData Min 2017 2;10:28. Epub 2017 Aug 2.
    University of Cambridge Metabolic Research Laboratories, Addenbrooke's Hospital, Cambridge, UK.
    Background: BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using "Genetic Improvement".

    Results: The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short BioPlanet. Read More

    nRC: non-coding RNA Classifier based on structural features.
    BioData Min 2017 1;10:27. Epub 2017 Aug 1.
    ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa, Palermo, 90146 Italy.
    Motivation: Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. Read More

    Discovery and replication of SNP-SNP interactions for quantitative lipid traits in over 60,000 individuals.
    BioData Min 2017 24;10:25. Epub 2017 Jul 24.
    Biomedical and Translational Informatics, Geisinger Clinic, Danville, PA USA.
    Background: The genetic etiology of human lipid quantitative traits is not fully elucidated, and interactions between variants may play a role. We performed a gene-centric interaction study for four different lipid traits: low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), total cholesterol (TC), and triglycerides (TG).

    Results: Our analysis consisted of a discovery phase using a merged dataset of five different cohorts (n = 12,853 to n = 16,849 depending on lipid phenotype) and a replication phase with ten independent cohorts totaling up to 36,938 additional samples. Read More

    epiACO - a method for identifying epistasis based on ant Colony optimization algorithm.
    BioData Min 2017 6;10:23. Epub 2017 Jul 6.
    School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826 China.
    Background: Identifying epistasis or epistatic interactions, which refer to nonlinear interaction effects of single nucleotide polymorphisms (SNPs), is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Though many works have been done for identifying epistatic interactions, due to their methodological and computational challenges, the algorithmic development is still ongoing.

    Results: In this study, a method epiACO is proposed to identify epistatic interactions, which based on ant colony optimization algorithm. Read More

    Arete - candidate gene prioritization using biological network topology with additional evidence types.
    BioData Min 2017 6;10:22. Epub 2017 Jul 6.
    Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, 230-0045 Japan.
    Background: Refinement of candidate gene lists to select the most promising candidates for further experimental verification remains an essential step between high-throughput exploratory analysis and the discovery of specific causal genes. Given the qualitative and semantic complexity of biological data, successfully addressing this challenge requires development of flexible and interoperable solutions for making the best possible use of the largest possible fraction of all available data.

    Results: We have developed an easily accessible framework that links two established network-based gene prioritization approaches with a supporting isolation forest-based integrative ranking method. Read More

    EFS: an ensemble feature selection tool implemented as R-package and web-application.
    BioData Min 2017 27;10:21. Epub 2017 Jun 27.
    Straubing Center of Science, Schulgasse 22, Straubing, 94315 Germany.
    Background: Feature selection methods aim at identifying a subset of features that improve the prediction performance of subsequent classification models and thereby also simplify their interpretability. Preceding studies demonstrated that single feature selection methods can have specific biases, whereas an ensemble feature selection has the advantage to alleviate and compensate for these biases.

    Results: The software EFS (Ensemble Feature Selection) makes use of multiple feature selection methods and combines their normalized outputs to a quantitative ensemble importance. Read More

    Computational dynamic approaches for temporal omics data with applications to systems medicine.
    BioData Min 2017 17;10:20. Epub 2017 Jun 17.
    Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD 21201 USA.
    Modeling and predicting biological dynamic systems and simultaneously estimating the kinetic structural and functional parameters are extremely important in systems and computational biology. This is key for understanding the complexity of the human health, drug response, disease susceptibility and pathogenesis for systems medicine. Temporal omics data used to measure the dynamic biological systems are essentials to discover complex biological interactions and clinical mechanism and causations. Read More

    Grid-based stochastic search for hierarchical gene-gene interactions in population-based genetic studies of common human diseases.
    BioData Min 2017 30;10:19. Epub 2017 May 30.
    Parabon Computation, Inc, Reston, 20190 VA USA.
    Background: Large-scale genetic studies of common human diseases have focused almost exclusively on the independent main effects of single-nucleotide polymorphisms (SNPs) on disease susceptibility. These studies have had some success, but much of the genetic architecture of common disease remains unexplained. Attention is now turning to detecting SNPs that impact disease susceptibility in the context of other genetic factors and environmental exposures. Read More

    Gene Set Enrichment Analyses: lessons learned from the heart failure phenotype.
    BioData Min 2017 26;10:18. Epub 2017 May 26.
    Department of Cardiology, Division Heart & Lungs, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands.
    Background: Genetic studies for complex diseases have predominantly discovered main effects at individual loci, but have not focused on genomic and environmental contexts important for a phenotype. Gene Set Enrichment Analysis (GSEA) aims to address this by identifying sets of genes or biological pathways contributing to a phenotype, through gene-gene interactions or other mechanisms, which are not the focus of conventional association methods.

    Results: Approaches that utilize GSEA can now take input from array chips, either gene-centric or genome-wide, but are highly sensitive to study design, SNP selection and pruning strategies, SNP-to-gene mapping, and pathway definitions. Read More

    Vinasse fertirrigation alters soil resistome dynamics: an analysis based on metagenomic profiles.
    BioData Min 2017 23;10:17. Epub 2017 May 23.
    Cell and Molecular Biology Laboratory, Center for Nuclear Energy in Agriculture (CENA), University of São Paulo (USP), Av. Centenário 303, Piracicaba, 13400-970 São Paulo Brazil.
    Every year around 300 Gl of vinasse, a by-product of ethanol distillation in sugarcane mills, are flushed into more than 9 Mha of sugarcane cropland in Brazil. This practice links fermentation waste management to fertilization for plant biomass production, and it is known as fertirrigation. Here we evaluate public datasets of soil metagenomes mining for changes in antibiotic resistance genes (ARGs) of soils from sugarcane mesocosms repeatedly amended with vinasse. Read More

    The optimal crowd learning machine.
    BioData Min 2017 19;10:16. Epub 2017 May 19.
    Center for Information Technology, National Institutes of Health, Bethesda, MD USA.
    Background: Any family of learning machines can be combined into a single learning machine using various methods with myriad degrees of usefulness.

    Results: For making predictions on an outcome, it is provably at least as good as the best machine in the family, given sufficient data. And if one machine in the family minimizes the probability of misclassification, in the limit of large data, then Optimal Crowd does also. Read More

    Study of Meta-analysis strategies for network inference using information-theoretic approaches.
    BioData Min 2017 6;10:15. Epub 2017 May 6.
    Bioinformatics and Systems Biology (BioSys) Lab, Université de Liège, Liège, Belgium.
    Background: Reverse engineering of gene regulatory networks (GRNs) from gene expression data is a classical challenge in systems biology. Thanks to high-throughput technologies, a massive amount of gene-expression data has been accumulated in the public repositories. Modelling GRNs from multiple experiments (also called integrative analysis) has; therefore, naturally become a standard procedure in modern computational biology. Read More

    Feature analysis for classification of trace fluorescent labeled protein crystallization images.
    BioData Min 2017 27;10:14. Epub 2017 Apr 27.
    Computer Science Department, University of Alabama in Huntsville, Huntsville, 35899 Alabama USA.
    Background: Large number of features are extracted from protein crystallization trial images to improve the accuracy of classifiers for predicting the presence of crystals or phases of the crystallization process. The excessive number of features and computationally intensive image processing methods to extract these features make utilization of automated classification tools on stand-alone computing systems inconvenient due to the required time to complete the classification tasks. Combinations of image feature sets, feature reduction and classification techniques for crystallization images benefiting from trace fluorescence labeling are investigated. Read More

    Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery.
    BioData Min 2017 24;10:13. Epub 2017 Apr 24.
    Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, Jefferson, AR USA.
    Background: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. Read More

    Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution.
    BioData Min 2017 15;10:12. Epub 2017 Mar 15.
    Universidad Distrital FJC, School of Engineering, Bogota, Colombia.
    Background: Discovering relevant features (biomarkers) that discriminate etiologies of a disease is useful to provide biomedical researchers with candidate targets for further laboratory experimentation while saving costs; dependencies among biomarkers may suggest additional valuable information, for example, to characterize complex epistatic relationships from genetic data. The use of classifiers to guide the search for biomarkers (the so-called wrapper approach) has been widely studied. However, simultaneously searching for relevancy and dependencies among markers is a less explored ground. Read More

    Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces.
    BioData Min 2017 11;10:11. Epub 2017 Mar 11.
    Neuro-Biomorphic Engineering lab, Faculty of Engineering, Jerusalem College of Technology, Jerusalem, Israel.
    Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. Read More

    Label-free data standardization for clinical metabolomics.
    BioData Min 2017 28;10:10. Epub 2017 Feb 28.
    Institute of Biomedical Chemistry, Pogodinskaya st.10, 119121 Moscow, Russia.
    Background: In metabolomics, thousands of substances can be detected in a single assay. This capacity motivates the development of metabolomics testing, which is currently a very promising option for improving laboratory diagnostics. However, the simultaneous measurement of an enormous number of substances leads to metabolomics data often representing concentrations only in conditional units, while laboratory diagnostics generally require actual concentrations. Read More

    Variant Set Enrichment: an R package to identify disease-associated functional genomic regions.
    BioData Min 2017 22;10. Epub 2017 Feb 22.
    Princess Margaret Cancer Centre, University Health Network, Toronto, ON Canada.
    Background: Genetic predispositions to diseases populate the noncoding regions of the human genome. Delineating their functional basis can inform on the mechanisms contributing to disease development. However, this remains a challenge due to the poor characterization of the noncoding genome. Read More

    Semantics-based plausible reasoning to extend the knowledge coverage of medical knowledge bases for improved clinical decision support.
    BioData Min 2017 10;10. Epub 2017 Feb 10.
    NICHE Research Group, Faculty of Computer Science, Dalhousie University, Halifax, NS B3H4R2 Canada.
    Background: Capturing complete medical knowledge is challenging-often due to incomplete patient Electronic Health Records (EHR), but also because of valuable, tacit medical knowledge hidden away in physicians' experiences. To extend the coverage of incomplete medical knowledge-based systems beyond their deductive closure, and thus enhance their decision-support capabilities, we argue that innovative, multi-strategy reasoning approaches should be applied. In particular, plausible reasoning mechanisms apply patterns from human thought processes, such as generalization, similarity and interpolation, based on attributional, hierarchical, and relational knowledge. Read More

    Elevated transcriptional levels of aldolase A (ALDOA) associates with cell cycle-related genes in patients with NSCLC and several solid tumors.
    BioData Min 2017 7;10. Epub 2017 Feb 7.
    Guangdong Provincial Key Laboratory for Breast Cancer Diagnosis and Treatment, Cancer Hospital of Shantou University Medical College, Shantou, 515041 China.
    Background: Aldolase A (ALDOA) is one of the glycolytic enzymes primarily found in the developing embryo and adult muscle. Recently, a new role of ALDOA in several cancers has been proposed. However, the underlying mechanism remains obscure and inconsistent. Read More

    Gene set analysis controlling for length bias in RNA-seq experiments.
    BioData Min 2017 6;10. Epub 2017 Feb 6.
    Department of Biostatistics, SUNY University at Buffalo, Buffalo, 14214 USA.
    Background: In gene set analysis, the researchers are interested in determining the gene sets that are significantly correlated with an outcome, e.g. disease status or treatment. Read More

    A feature selection method based on multiple kernel learning with expression profiles of different types.
    BioData Min 2017 2;10. Epub 2017 Feb 2.
    College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China.
    Background: With the development of high-throughput technology, the researchers can acquire large number of expression data with different types from several public databases. Because most of these data have small number of samples and hundreds or thousands features, how to extract informative features from expression data effectively and robustly using feature selection technique is challenging and crucial. So far, a mass of many feature selection approaches have been proposed and applied to analyse expression data of different types. Read More

    Mining pathway associations for disease-related pathway activity analysis based on gene expression and methylation data.
    BioData Min 2017 1;10. Epub 2017 Feb 1.
    School of Electronics Engineering, Kyungpook National University, 80, Daehak-ro, Buk-gu, Daegu, 41566 Republic of Korea.
    Background: The problem of discovering genetic markers as disease signatures is of great significance for the successful diagnosis, treatment, and prognosis of complex diseases. Even if many earlier studies worked on identifying disease markers from a variety of biological resources, they mostly focused on the markers of genes or gene-sets (i.e. Read More

    Meta-analytic support vector machine for integrating multiple omics data.
    BioData Min 2017 26;10. Epub 2017 Jan 26.
    Department of Statistics, Korea University, Anam-dong, Seoul, 136-701 South Korea.
    Background: Of late, high-throughput microarray and sequencing data have been extensively used to monitor biomarkers and biological processes related to many diseases. Under this circumstance, the support vector machine (SVM) has been popularly used and been successful for gene selection in many applications. Despite surpassing benefits of the SVMs, single data analysis using small- and mid-size of data inevitably runs into the problem of low reproducibility and statistical power. Read More

    Accurate prediction of protein relative solvent accessibility using a balanced model.
    BioData Min 2017 24;10. Epub 2017 Jan 24.
    Department of Chemistry, Tongji University, Shanghai, China.
    Background: Protein relative solvent accessibility provides insight into understanding protein structure and function. Prediction of protein relative solvent accessibility is often the first stage of predicting other protein properties. Recent predictors of relative solvent accessibility discriminate against exposed regions as compared with buried regions, resulting in higher prediction accuracy associated with buried regions relative to exposed regions. Read More

    The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature.
    BioData Min 2016 19;9:41. Epub 2016 Dec 19.
    Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI 48109 USA.
    Background: The Interaction Network Ontology (INO) logically represents biological interactions, pathways, and networks. INO has been demonstrated to be valuable in providing a set of structured ontological terms and associated keywords to support literature mining of gene-gene interactions from biomedical literature. However, previous work using INO focused on single keyword matching, while many interactions are represented with two or more interaction keywords used in combination. Read More

    Complex systems analysis of bladder cancer susceptibility reveals a role for decarboxylase activity in two genome-wide association studies.
    BioData Min 2016 12;9:40. Epub 2016 Dec 12.
    Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116 USA.
    Background: Bladder cancer is common disease with a complex etiology that is likely due to many different genetic and environmental factors. The goal of this study was to embrace this complexity using a bioinformatics analysis pipeline designed to use machine learning to measure synergistic interactions between single nucleotide polymorphisms (SNPs) in two genome-wide association studies (GWAS) and then to assess their enrichment within functional groups defined by Gene Ontology. The significance of the results was evaluated using permutation testing and those results that replicated between the two GWAS data sets were reported. Read More

    matK-QR classifier: a patterns based approach for plant species identification.
    BioData Min 2016 9;9:39. Epub 2016 Dec 9.
    Environmental Genomics Division, CSIR-National Environmental Engineering Research Institute, Nagpur, 440020 Maharashtra India.
    Background: DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK (matK) and ribulose-1, 5-bisphosphate carboxylase (rbcL) marker loci are commonly used in plant species identification. Here, we present a new and highly efficient approach for identifying a unique set of discriminating nucleotide patterns to generate a signature (i. Read More

    MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification.
    BioData Min 2016 6;9:38. Epub 2016 Dec 6.
    Institute of Systems Analysis and Computer Science A. Ruberti (IASI), National Research Council (CNR), Via dei Taurini 19, Rome, 00185 Italy.
    Background: Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g. Read More

    Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification.
    BioData Min 2016 1;9:37. Epub 2016 Dec 1.
    Centre for Biomedical Engineering, School of Electrical & Electronic Engineering, University of Adelaide, Adelaide, Australia.
    Background: An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. Read More

    Compensation of feature selection biases accompanied with improved predictive performance for binary classification by using a novel ensemble feature selection approach.
    BioData Min 2016 18;9:36. Epub 2016 Nov 18.
    Department of Bioinformatics, Straubing, 94315 Germany ; University of Applied Science, Weihenstephan-Triesdorf, Freising, 85354 Germany ; Wissenschaftszentrum Weihenstephan, Technische Universität München, Freising, 85354 Germany.
    Motivation: Biomarker discovery methods are essential to identify a minimal subset of features (e.g., serum markers in predictive medicine) that are relevant to develop prediction models with high accuracy. Read More

    Considerations for higher efficiency and productivity in research activities.
    BioData Min 2016 9;9:35. Epub 2016 Nov 9.
    Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School or Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116 USA.
    There are several factors that are known to affect research productivity; some of them imply the need for large financial investments and others are related to work styles. There are some articles that provide suggestions for early career scientists (PhD students and postdocs) but few publications are oriented to professors about scientific leadership. As academic mentoring might be useful at all levels of experience, in this note we suggest several key considerations for higher efficiency and productivity in academic and research activities. Read More

    On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs.
    BioData Min 2016 2;9:34. Epub 2016 Nov 2.
    Institut de Médecine Régénératrice et de Biothérapie, INSERM U1183, CHU Montpellier, Montpellier, France ; Institut de Biologie Computationnelle, Université Montpellier, Montpellier, France ; Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Université Montpellier, UMR 5506 CNRS, Montpellier, France ; PPGCC, Universidade Federal do Pará, Belém, Brazil ; Instituto Tecnológico Vale, Belém, Brazil.
    Background: High-throughput sequencing technology and bioinformatics have identified chimeric RNAs (chRNAs), raising the possibility of chRNAs expressing particularly in diseases can be used as potential biomarkers in both diagnosis and prognosis.

    Results: The task of discriminating true chRNAs from the false ones poses an interesting Machine Learning (ML) challenge. First of all, the sequencing data may contain false reads due to technical artifacts and during the analysis process, bioinformatics tools may generate false positives due to methodological biases. Read More

    Developing a modular architecture for creation of rule-based clinical diagnostic criteria.
    BioData Min 2016 21;9:33. Epub 2016 Oct 21.
    Department of Health Sciences Research, Mayo Clinic, 200 First Street, SW, Rochester, MN 55905 USA.
    Background: With recent advances in computerized patient records system, there is an urgent need for producing computable and standards-based clinical diagnostic criteria. Notably, constructing rule-based clinical diagnosis criteria has become one of the goals in the International Classification of Diseases (ICD)-11 revision. However, few studies have been done in building a unified architecture to support the need for diagnostic criteria computerization. Read More

    FEDRR: fast, exhaustive detection of redundant hierarchical relations for quality improvement of large biomedical ontologies.
    BioData Min 2016 10;9:31. Epub 2016 Oct 10.
    Institute of Biomedical Informatics, University of Kentucky, Lexington, 40536 KY USA ; Department of Computer Science, University of Kentucky, Lexington, 40506 KY USA.
    Background: Redundant hierarchical relations refer to such patterns as two paths from one concept to another, one with length one (direct) and the other with length greater than one (indirect). Each redundant relation represents a possibly unintended defect that needs to be corrected in the ontology quality assurance process. Detecting and eliminating redundant relations would help improve the results of all methods relying on the relevant ontological systems as knowledge source, such as the computation of semantic distance between concepts and for ontology matching and alignment. Read More

    Low-mass-ion discriminant equation (LOME) for ovarian cancer screening.
    BioData Min 2016 12;9:32. Epub 2016 Oct 12.
    Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Ewha Womans University Mokdong Hospital, College of Medicine, Ewha Womans University, Seoul, Republic of Korea.
    Background: A low-mass-ion discriminant equation (LOME) was constructed to investigate whether systematic low-mass-ion (LMI) profiling could be applied to ovarian cancer (OVC) screening.

    Results: Matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) mass spectrometry was performed to obtain mass spectral data on metabolites detected as LMIs up to a mass-to-charge ratio (m/z) of 2500 for 1184 serum samples collected from healthy individuals and patients with OVC, other types of cancer, or several types of benign tumor. Principal component analysis-based discriminant analysis and two search algorithms were employed to identify discriminative low-mass ions for distinguishing OVC from non-OVC cases. Read More

    ProtNN: fast and accurate protein 3D-structure classification in structural and topological space.
    BioData Min 2016 23;9:30. Epub 2016 Sep 23.
    Department of Computer Science, University of Quebec At Montreal, PO box 8888, Downtown stationMontreal, H3C 3P8 Canada.
    Background: Studying the functions and structures of proteins is important for understanding the molecular mechanisms of life. The number of publicly available protein structures has increasingly become extremely large. Still, the classification of a protein structure remains a difficult, costly, and time consuming task. Read More

    The tip of the iceberg: challenges of accessing hospital electronic health record data for biological data mining.
    BioData Min 2016 22;9:29. Epub 2016 Sep 22.
    Institute for Biomedical Informatics, Department of Biostatistics and Epidemiology, Perelman School or Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116 USA.
    Modern cohort studies include self-reported measures on disease, behavior and lifestyle, sensor-based observations from mobile phones and wearables, and rich -omics data. Follow-up is often achieved through electronic health record (EHR) linkages across primary and secondary healthcare providers. Historically however, researchers typically only get to see the tip of the iceberg: coded administrative data relating to healthcare claims which mainly record billable diagnoses and procedures. Read More

    Functional networks inference from rule-based machine learning models.
    BioData Min 2016 5;9(1):28. Epub 2016 Sep 5.
    Interdisciplinary Computing and Complex BioSystems (ICOS) research group, School of Computing Science, Newcastle University, Newcastle upon Tyne, UK.
    Background: Functional networks play an important role in the analysis of biological processes and systems. The inference of these networks from high-throughput (-omics) data is an area of intense research. So far, the similarity-based inference paradigm (e. Read More

    A biologically informed method for detecting rare variant associations.
    BioData Min 2016 30;9(1):27. Epub 2016 Aug 30.
    Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, PA 16802 USA.
    Background: BioBin is a bioinformatics software package developed to automate the process of binning rare variants into groups for statistical association analysis using a biological knowledge-driven framework. BioBin collapses variants into biological features such as genes, pathways, evolutionary conserved regions (ECRs), protein families, regulatory regions, and others based on user-designated parameters. BioBin provides the infrastructure to create complex and interesting hypotheses in an automated fashion thereby circumventing the necessity for advanced and time consuming scripting. Read More

    msBiodat analysis tool, big data analysis for high-throughput experiments.
    BioData Min 2016 19;9:26. Epub 2016 Aug 19.
    Translational Medicine Group, Institut Rudjer Bošković, Division of Molecular Medicine, Bijenička Cesta 54, Zagreb, 10000 Croatia.
    Background: Mass spectrometry (MS) are a group of a high-throughput techniques used to increase knowledge about biomolecules. They produce a large amount of data which is presented as a list of hundreds or thousands of proteins. Filtering those data efficiently is the first step for extracting biologically relevant information. Read More

    Mango: combining and analyzing heterogeneous biological networks.
    BioData Min 2016 2;9:25. Epub 2016 Aug 2.
    Department of Genetics, Development and Cell Biology, Iowa State University, Iowa, 50011 Ames USA.
    Background: Heterogeneous biological data such as sequence matches, gene expression correlations, protein-protein interactions, and biochemical pathways can be merged and analyzed via graphs, or networks. Existing software for network analysis has limited scalability to large data sets or is only accessible to software developers as libraries. In addition, the polymorphic nature of the data sets requires a more standardized method for integration and exploration. Read More

    Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer.
    BioData Min 2016 29;9:24. Epub 2016 Jul 29.
    The Hamamatsu/Queen's PET (Positron Emission Tomography) Imaging Center, Queen's Medical Center, Honolulu, HI 96816 USA.
    Background: Technological advances enable the cost-effective acquisition of Multi-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a MMDS should provide a more focused view of the biology underlying complex diseases such as cancer that would not be apparent from the analysis of a single data type alone. As multi-modal data rapidly accumulate in research laboratories and public databases such as The Cancer Genome Atlas (TCGA), the translation of such data into clinically actionable knowledge has been slowed by the lack of computational tools capable of analyzing MMDSs. Read More

    Representing and querying disease networks using graph databases.
    BioData Min 2016 25;9:23. Epub 2016 Jul 25.
    European Institute for Systems Biology and Medicine (EISBM), CIRI UMR CNRS 5308, CNRS-ENS-UCBL-INSERM, Lyon, France.
    Background: Systems biology experiments generate large volumes of data of multiple modalities and this information presents a challenge for integration due to a mix of complexity together with rich semantics. Here, we describe how graph databases provide a powerful framework for storage, querying and envisioning of biological data.

    Results: We show how graph databases are well suited for the representation of biological information, which is typically highly connected, semi-structured and unpredictable. Read More

    1 OF 6