293 results match your criteria BioData Mining [Journal]


Approximate kernel reconstruction for time-varying networks.

BioData Min 2019 6;12. Epub 2019 Feb 6.

4School of Medicine, University of Alabama at Birmingham, Birmingham, AL USA.

Background: Most existing algorithms for modeling and analyzing molecular networks assume a static or time-invariant network topology. Such view, however, does not render the temporal evolution of the underlying biological process as molecular networks are typically "re-wired" over time in response to cellular development and environmental changes. In our previous work, we formulated the inference of time-varying or dynamic networks as a tracking problem, where the target state is the ensemble of edges in the network. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0192-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364395PMC
February 2019

A biplot correlation range for group-wise metabolite selection in mass spectrometry.

BioData Min 2019 4;12. Epub 2019 Feb 4.

6Department of Industrial Engineering, Hanyang University, Seoul, 04763 South Korea.

Background: Analytic methods are available to acquire extensive metabolic information in a cost-effective manner for personalized medicine, yet disease risk and diagnosis mostly rely upon individual biomarkers based on statistical principles of false discovery rate and correlation. Due to functional redundancies and multiple layers of regulation in complex biologic systems, individual biomarkers, while useful, are inherently limited in disease characterization. Data reduction and discriminant analysis tools such as principal component analysis (PCA), partial least squares (PLS), or orthogonal PLS (O-PLS) provide approaches to separate the metabolic phenotypes, but do not offer a statistical basis for selection of group-wise metabolites as contributors to metabolic phenotypes. Read More

View Article

Download full-text PDF

Source
https://biodatamining.biomedcentral.com/articles/10.1186/s13
Publisher Site
http://dx.doi.org/10.1186/s13040-019-0191-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6360680PMC
February 2019
2 Reads

Predicting opioid dependence from electronic health records with machine learning.

BioData Min 2019 29;12. Epub 2019 Jan 29.

1Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA.

Background: The opioid epidemic in the United States is averaging over 100 deaths per day due to overdose. The effectiveness of opioids as pain treatments, and the drug-seeking behavior of opioid addicts, leads physicians in the United States to issue over 200 million opioid prescriptions every year. To better understand the biomedical profile of opioid-dependent patients, we analyzed information from electronic health records (EHR) including lab tests, vital signs, medical procedures, prescriptions, and other data from millions of patients to predict opioid substance dependence. Read More

View Article

Download full-text PDF

Source
https://biodatamining.biomedcentral.com/articles/10.1186/s13
Publisher Site
http://dx.doi.org/10.1186/s13040-019-0193-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6352440PMC
January 2019
12 Reads

Use case driven evaluation of open databases for pediatric cancer research.

BioData Min 2019 15;12. Epub 2019 Jan 15.

1Institute of Interactive Systems and Data Science, Graz University of Technology, Graz, Austria.

Background: A plethora of Web resources are available offering information on clinical, pre-clinical, genomic and theoretical aspects of cancer, including not only the comprehensive cancer projects as ICGC and TCGA, but also less-known and more specialized projects on pediatric diseases such as PCGP. However, in case of data on childhood cancer there is very little information openly available. Several web-based resources and tools offer general biomedical data which are not purpose-built, for neither pediatric nor cancer analysis. Read More

View Article

Download full-text PDF

Source
https://biodatamining.biomedcentral.com/articles/10.1186/s13
Publisher Site
http://dx.doi.org/10.1186/s13040-018-0190-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6334395PMC
January 2019
3 Reads

Application of an interpretable classification model on Early Folding Residues during protein folding.

BioData Min 2019 5;12. Epub 2019 Jan 5.

1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany.

Background: Machine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness of the resulting models. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0188-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6321665PMC
January 2019
1 Read

Unified Cox model based multifactor dimensionality reduction method for gene-gene interaction analysis of the survival phenotype.

BioData Min 2018 14;11:27. Epub 2018 Dec 14.

2Department of Statistics, Seoul National University, Shilim-dong, Kwanak-gu, Seoul, 151-742 South Korea.

Background: One strategy for addressing missing heritability in genome-wide association study is gene-gene interaction analysis, which, unlike a single gene approach, involves high-dimensionality. The multifactor dimensionality reduction method (MDR) has been widely applied to reduce multi-levels of genotypes into high or low risk groups. The Cox-MDR method has been proposed to detect gene-gene interactions associated with the survival phenotype by using the martingale residuals from a Cox model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0189-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6295107PMC
December 2018
1 Read

Distributed retrieval engine for the development of cloud-deployed biological databases.

BioData Min 2018 12;11:26. Epub 2018 Nov 12.

Neuro-biomorphic Engineering Lab, Faculty of Engineering, Jerusalem College of Technology, Jerusalem, Israel.

The integration of cloud resources with federated data retrieval has the potential of improving the maintenance, accessibility and performance of specialized databases in the biomedical field. However, such an integrative approach requires technical expertise in cloud computing, usage of a data retrieval engine and development of a unified data-model, which can encapsulate the heterogeneity of biological data. Here, a framework for the development of cloud-based biological specialized databases is proposed. Read More

View Article

Download full-text PDF

Source
https://biodatamining.biomedcentral.com/articles/10.1186/s13
Publisher Site
http://dx.doi.org/10.1186/s13040-018-0185-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6233384PMC
November 2018
11 Reads

Knomics-Biota - a system for exploratory analysis of human gut microbiota data.

BioData Min 2018 6;11:25. Epub 2018 Nov 6.

Research and Development Department, Knomics LLC, Skolkovo Innovation Center, Moscow, Russian Federation.

Background: Metagenomic surveys of human microbiota are becoming increasingly widespread in academic research as well as in food and pharmaceutical industries and clinical context. Intuitive tools for investigating experimental data are of high interest to researchers.

Results: Knomics-Biota is a web-based resource for exploratory analysis of human gut metagenomes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0187-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6220475PMC
November 2018
1 Read

A fast forward 3D connection algorithm for mitochondria and synapse segmentations from serial EM images.

BioData Min 2018 5;11:24. Epub 2018 Nov 5.

2Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190 China.

Background: It is becoming increasingly clear that the quantification of mitochondria and synapses is of great significance to understand the function of biological nervous systems. Electron microscopy (EM), with the necessary resolution in three directions, is the only available imaging method to look closely into these issues. Therefore, estimating the number of mitochondria and synapses from the serial EM images is coming into prominence. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0183-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6217761PMC
November 2018
1 Read

Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS.

BioData Min 2018 3;11:23. Epub 2018 Nov 3.

1Tandy School of Computer Science, The University of Tulsa, 800 S. Tucker Dr, Tulsa, OK 74104 USA.

Background: ReliefF is a nearest-neighbor based feature selection algorithm that efficiently detects variants that are important due to statistical interactions or epistasis. For categorical predictors, like genotypes, the standard metric used in ReliefF has been a simple (binary) mismatch difference. In this study, we develop new metrics of varying complexity that incorporate allele sharing, adjustment for allele frequency heterogeneity via the genetic relationship matrix (GRM), and physicochemical differences of variants via a new transition/transversion encoding. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0186-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6215626PMC
November 2018
1 Read

Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction.

BioData Min 2018 25;11:22. Epub 2018 Oct 25.

3Department of Engineering, Uninettuno University, Corso Vittorio Emanuele II, 39, Rome, 00186 Italy.

Background: In the Next Generation Sequencing (NGS) era a large amount of biological data is being sequenced, analyzed, and stored in many public databases, whose interoperability is often required to allow an enhanced accessibility. The combination of heterogeneous NGS genomic data is an open challenge: the analysis of data from different experiments is a fundamental practice for the study of diseases. In this work, we propose to combine DNA methylation and RNA sequencing NGS experiments at gene level for supervised knowledge extraction in cancer. Read More

View Article

Download full-text PDF

Source
https://biodatamining.biomedcentral.com/articles/10.1186/s13
Publisher Site
http://dx.doi.org/10.1186/s13040-018-0184-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6203208PMC
October 2018
9 Reads

To know the objective is not (necessarily) to know the objective function.

BioData Min 2018 4;11:21. Epub 2018 Oct 4.

1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, 19104-6021 PA USA.

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0182-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6172724PMC
October 2018
1 Read

Grasping frequent subgraph mining for bioinformatics applications.

BioData Min 2018 3;11:20. Epub 2018 Sep 3.

1Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.

Searching for interesting common subgraphs in graph data is a well-studied problem in data mining. Subgraph mining techniques focus on the discovery of patterns in graphs that exhibit a specific network structure that is deemed interesting within these data sets. The definition of which subgraphs are interesting and which are not is highly dependent on the application. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0181-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6122726PMC
September 2018
3 Reads

Functional relevance for central cornea thickness-associated genetic variants by using integrative analyses.

BioData Min 2018 15;11:19. Epub 2018 Aug 15.

Department of Ophthalmology and Visual Science, Eye Institute, Eye & ENT Hospital, Shanghai Medical College of Fudan University, NHC Key Laboratory of myopia (Fudan University), Shanghai, China.

Background: The genetic architecture underlying central cornea thickness (CCT) is far from understood. Most of the CCT-associated variants are located in the non-coding regions, raising the difficulty of following functional characterizations. Thus, integrative functional analyses on CCT-associated loci might benefit in overcoming these issues by prioritizing the hub genes that are located in the center of CCT genetic network. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0179-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6094462PMC
August 2018
5 Reads

Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases.

BioData Min 2018 14;11:18. Epub 2018 Aug 14.

1Department of Environmental and Biological Sciences, University of Eastern Finland, Yliopistonranta 1 E, 70211 Kuopio, Finland.

Background: The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular predictive modeling. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0180-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6092817PMC
August 2018
2 Reads

TRIQ: a new method to evaluate triclusters.

BioData Min 2018 6;11:15. Epub 2018 Aug 6.

2Department of computer Science, University of Seville, Seville, Spain.

Background: Triclustering has shown to be a valuable tool for the analysis of microarray data since its appearance as an improvement of classical clustering and biclustering techniques. The standard for validation of triclustering is based on three different measures: correlation, graphic similarity of the patterns and functional annotations for the genes extracted from the Gene Ontology project (GO).

Results: We propose , a single evaluation measure that combines the three measures previously described: correlation, graphic validation and functional annotation, providing a single value as result of the validation of a tricluster solution and therefore simplifying the steps inherent to research of comparison and selection of solutions. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0177-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6091209PMC
August 2018
2 Reads

Protein folding prediction in the HP model using ions motion optimization with a greedy algorithm.

BioData Min 2018 8;11:17. Epub 2018 Aug 8.

5Department of Biomedical Science and Environmental Biology, Kaohsiung Medical University, Kaohsiung, Taiwan.

Background: The function of a protein is determined by its native protein structure. Among many protein prediction methods, the Hydrophobic-Polar (HP) model, an ab initio method, simplifies the protein folding prediction process in order to reduce the prediction complexity.

Results: In this study, the ions motion optimization (IMO) algorithm was combined with the greedy algorithm (namely IMOG) and implemented to the HP model for the protein folding prediction based on the 2D-triangular-lattice model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0176-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6083565PMC
August 2018
2 Reads

A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies.

BioData Min 2018 7;11:16. Epub 2018 Aug 7.

1Centre for Biotechnology and Bioengineering (CeBiB), Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Av. Ecuador 3659, Santiago, Chile.

Background: Biologists aim to understand the genetic background of diseases, metabolic disorders or any other genetic condition. Microarrays are one of the main high-throughput technologies for collecting information about the behaviour of genetic information on different conditions. In order to analyse this data, clustering arises as one of the main techniques used, and it aims at finding groups of genes that have some criterion in common, like similar expression profile. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0178-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6081857PMC
August 2018
13 Reads

Soft document clustering using a novel graph covering approach.

BioData Min 2018 14;11:11. Epub 2018 Jun 14.

Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Schloss Birlinghoven, Sankt Augustin, Germany.

Background: In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation.

Results: In this paper we present and discuss a novel graph-theoretical approach for document clustering and its application on a real-world data set. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0172-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6047369PMC
June 2018
3 Reads

Feature selection for gene prediction in metagenomic fragments.

BioData Min 2018 7;11. Epub 2018 Jun 7.

College of Computer and Information Sciences, Computer Science Department, King Saud University, Riyadh, Saudi Arabia.

Background: Computational approaches, specifically machine-learning techniques, play an important role in many metagenomic analysis algorithms, such as gene prediction. Due to the large feature space, current de novo gene prediction algorithms use different combinations of classification algorithms to distinguish between coding and non-coding sequences.

Results: In this study, we apply a filter method to select relevant features from a large set of known features instead of combining them using linear classifiers or ignoring their individual coding potential. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0170-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6047368PMC
June 2018
3 Reads

PathCORE-T: identifying and visualizing globally co-occurring pathways in large transcriptomic compendia.

BioData Min 2018 3;11:14. Epub 2018 Jul 3.

1Department of Systems Pharmacology and Translational Therapeutics. Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA 19104 USA.

Background: Investigators often interpret genome-wide data by analyzing the expression levels of genes within pathways. While this within-pathway analysis is routine, the products of any one pathway can affect the activity of other pathways. Past efforts to identify relationships between biological processes have evaluated overlap in knowledge bases or evaluated changes that occur after specific treatments. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0175-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6029133PMC
July 2018
3 Reads

Integrative analysis of gene expression and methylation data for breast cancer cell lines.

BioData Min 2018 25;11:13. Epub 2018 Jun 25.

4Department of Mathematics, Texas State University, San Marcos, TX USA.

Background: The deadly costs of cancer and necessity for an accurate method of early cancer detection have demanded the identification of genetic and epigenetic factors associated with cancer. DNA methylation, an epigenetic event, plays an important role in cancer susceptibility. In this paper, we use DNA methylation and gene expression data integration and pathway analysis to further explore and understand the complex relationship between methylation and gene expression. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0174-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019806PMC
June 2018
5 Reads

Measuring associations between the microbiota and repeated measures of continuous clinical variables using a lasso-penalized generalized linear mixed model.

BioData Min 2018 15;11:12. Epub 2018 Jun 15.

1Department of Computational & Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261 USA.

Background: Human microbiome studies in clinical settings generally focus on distinguishing the microbiota in health from that in disease at a specific point in time. However, microbiome samples may be associated with disease severity or continuous clinical health indicators that are often assessed at multiple time points. While the temporal data from clinical and microbiome samples may be informative, analysis of this type of data can be problematic for standard statistical methods. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0173-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6003033PMC
June 2018
11 Reads

Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi.

BioData Min 2018 13;11:10. Epub 2018 Jun 13.

1Bioinformatics Research Center, North Carolina State University, 1 Lampe Dr, Raleigh, 27695 NC USA.

Background: The Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources. However, individual data sources ("assays"), such as in vitro bioassays or in vivo study endpoints, often feature sections of missing data, wherein subsets of chemicals have not been tested in all assays. In order to investigate the effects of missing data and recommend solutions, we designed simulation studies around high-throughput screening data generated by the ToxCast and Tox21 programs on chemicals highlighted by the Agency for Toxic Substances and Disease Registry's (ATSDR) Substance Priority List (SPL), which helps prioritize environmental research and remediation resources. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0169-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998548PMC
June 2018
7 Reads

Gene set analysis methods: a systematic comparison.

BioData Min 2018 31;11. Epub 2018 May 31.

1Bioinformatics Research Center, North Carolina State University, Raleigh, NC USA.

Background: Gene set analysis is a valuable tool to summarize high-dimensional gene expression data in terms of biologically relevant sets. This is an active area of research and numerous gene set analysis methods have been developed. Despite this popularity, systematic comparative studies have been limited in scope. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0166-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5984476PMC
May 2018
7 Reads

Connecting genetics and gene expression data for target prioritisation and drug repositioning.

BioData Min 2018 31;11. Epub 2018 May 31.

2Computational Biology, Target Sciences, GSK, 1250 S. Collegeville Road, UP12-100, Collegeville, PA 19426-0989 USA.

Developing new drugs continues to be a highly inefficient and costly business. By repurposing an existing compound for a different indication, drug repositioning offers an attractive alternative to traditional drug discovery. Most of these approaches work by matching transcriptional disease signatures to anti-correlated gene expression profiles of drug perturbations. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0171-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5984374PMC
May 2018
3 Reads

Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV).

BioData Min 2018 19;11. Epub 2018 Apr 19.

2Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA.

Background: Machine learning methods and conventions are increasingly employed for the analysis of large, complex biomedical data sets, including genome-wide association studies (GWAS). Reproducibility of machine learning analyses of GWAS can be hampered by biological and statistical factors, particularly so for the investigation of non-additive genetic interactions. Application of traditional cross validation to a GWAS data set may result in poor consistency between the training and testing data set splits due to an imbalance of the interaction genotypes relative to the data as a whole. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0167-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5907739PMC
April 2018
5 Reads

Collective feature selection to identify crucial epistatic variants.

BioData Min 2018 19;11. Epub 2018 Apr 19.

1Biomedical and Translational Bioinformatics Institute, Geisinger Health System, 100 N Academy Avenue, Danville, PA 17822 USA.

Background: Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0168-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5907720PMC
April 2018
6 Reads
1.530 Impact Factor

Pairwise gene GO-based measures for biclustering of high-dimensional expression data.

BioData Min 2018 27;11. Epub 2018 Mar 27.

2Área de Informática, Universidad Pablo de Olavide, Ctra. Utrera km. 1, Seville, 41013 Spain.

Background: Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0165-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5872503PMC
March 2018
4 Reads

A novel joint analysis framework improves identification of differentially expressed genes in cross disease transcriptomic analysis.

Authors:
Wenyi Qin Hui Lu

BioData Min 2018 20;11. Epub 2018 Feb 20.

1Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan, Rm 218, Chicago, IL 60607 USA.

Motivation: Detecting differentially expressed (DE) genes between disease and normal control group is one of the most common analyses in genome-wide transcriptomic data. Since most studies don't have a lot of samples, researchers have used meta-analysis to group different datasets for the same disease. Even then, in many cases the statistical power is still not enough. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0163-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5819186PMC
February 2018
6 Reads

Investigating the parameter space of evolutionary algorithms.

BioData Min 2018 17;11. Epub 2018 Feb 17.

1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, 19104-6021 PA USA.

Evolutionary computation (EC) has been widely applied to biological and biomedical data. The practice of EC involves the tuning of many parameters, such as population size, generation count, selection size, and crossover and mutation rates. Through an series of experiments over multiple evolutionary algorithm implementations and 25 problems we show that parameter space tends to be rife with viable parameters, at least for the problems studied herein. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0164-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5816380PMC
February 2018
8 Reads

Identification of influential observations in high-dimensional cancer survival data through the rank product test.

BioData Min 2018 14;11. Epub 2018 Feb 14.

IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Rovisco Pais, 1, Lisbon, Portugal.

Background: Survival analysis is a statistical technique widely used in many fields of science, in particular in the medical area, and which studies the time until an event of interest occurs. Outlier detection in this context has gained great importance due to the fact that the identification of long or short-term survivors may lead to the detection of new prognostic factors. However, the results obtained using different outlier detection methods and residuals are seldom the same and are strongly dependent of the specific Cox proportional hazards model selected. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0162-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813402PMC
February 2018
4 Reads

Scalable non-negative matrix tri-factorization.

BioData Min 2017 29;10:41. Epub 2017 Dec 29.

Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.

Background: Matrix factorization is a well established pattern discovery tool that has seen numerous applications in biomedical data analytics, such as gene expression co-clustering, patient stratification, and gene-disease association mining. Matrix factorization learns a latent data model that takes a data matrix and transforms it into a latent feature space enabling generalization, noise removal and feature discovery. However, factorization algorithms are numerically intensive, and hence there is a pressing challenge to scale current algorithms to work with large datasets. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0160-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5746986PMC
December 2017
6 Reads

An automated pipeline for bouton, spine, and synapse detection of in vivo two-photon images.

BioData Min 2017 20;10:40. Epub 2017 Dec 20.

Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190 China.

Background: In the nervous system, the neurons communicate through synapses. The size, morphology, and connectivity of these synapses are significant in determining the functional properties of the neural network. Therefore, they have always been a major focus of neuroscience research. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0161-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5738741PMC
December 2017
11 Reads

Sparse generalized linear model with approximation for feature selection and prediction with big omics data.

BioData Min 2017 19;10:39. Epub 2017 Dec 19.

Foundation Inflammatory Bowel & Immunobiology Research Institute, Cedars-Sinai Medical Center, Los Angeles, 90048 CA USA.

Background: Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are , SCAD and MC+. However, none of the existing algorithms optimizes , which penalizes the number of nonzero features directly. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0159-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5735537PMC
December 2017
13 Reads
1.535 Impact Factor

, a tool making use of traveling salesperson problem solvers in the efficient and accurate construction of high-density genetic linkage maps.

BioData Min 2017 19;10:38. Epub 2017 Dec 19.

Department of Bioagricultural Sciences & Pest Management, Colorado State University, 1177 Campus Delivery, Fort Collins, CO 80523 USA.

Background: Recent advances in nucleic acid sequencing technologies have led to a dramatic increase in the number of markers available to generate genetic linkage maps. This increased marker density can be used to improve genome assemblies as well as add much needed resolution for loci controlling variation in ecologically and agriculturally important traits. However, traditional genetic map construction methods from these large marker datasets can be computationally prohibitive and highly error prone. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0158-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5735504PMC
December 2017
13 Reads

Cluster ensemble based on Random Forests for genetic data.

BioData Min 2017 15;10:37. Epub 2017 Dec 15.

College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.

Background: Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0156-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5732374PMC
December 2017
8 Reads

PMLB: a large benchmark suite for machine learning evaluation and comparison.

BioData Min 2017 11;10:36. Epub 2017 Dec 11.

Institute for Biomedical Informatics, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, 19104 PA USA.

Background: The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0154-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5725843PMC
December 2017
7 Reads

Ten quick tips for machine learning in computational biology.

Authors:
Davide Chicco

BioData Min 2017 8;10:35. Epub 2017 Dec 8.

Princess Margaret Cancer Centre, PMCR Tower 11-401, 101 College Street, Toronto, Ontario, M5G 1L7 Canada.

Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experience to run a data mining project effectively, and therefore can follow incorrect practices, that may lead to common mistakes or over-optimistic results. With this review, we present ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors that we observed hundreds of times in multiple bioinformatics projects. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0155-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5721660PMC
December 2017
6 Reads

Artificial intelligence: more human with human.

BioData Min 2017 1;10:34. Epub 2017 Dec 1.

Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, 19104-6021 PA USA.

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0157-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5709984PMC
December 2017
9 Reads

OCDD: an obesity and co-morbid disease database.

BioData Min 2017 21;10:33. Epub 2017 Nov 21.

Machine Intelligence Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata, 700108 India.

Background: Obesity is a medical condition that is known for increased body mass index (BMI). It is also associated with chronic low level inflammation. Obesity disrupts the immune-metabolic homeostasis by changing the secretion of adipocytes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0153-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5697160PMC
November 2017
7 Reads

Metrics to estimate differential co-expression networks.

BioData Min 2017 10;10:32. Epub 2017 Nov 10.

Cátedra de Bioinformática, Escuela de Medicina, Tecnológico de Monterrey, 64710 Monterrey, Nuevo León Mexico.

Background: Detecting the differences in gene expression data is important for understanding the underlying molecular mechanisms. Although the differentially expressed genes are a large component, differences in correlation are becoming an interesting approach to achieving deeper insights. However, diverse metrics have been used to detect differential correlation, making selection and use of a single metric difficult. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0152-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5681815PMC
November 2017
25 Reads

Methods for enhancing the reproducibility of biomedical research findings using electronic health records.

BioData Min 2017 11;10:31. Epub 2017 Sep 11.

EHR Research Group, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, Keppel Streeet, London, WC1E 7HT UK.

Background: The ability of external investigators to reproduce published scientific findings is critical for the evaluation and validation of biomedical research by the wider community. However, a substantial proportion of health research using electronic health records (EHR), data collected and generated during clinical care, is potentially not reproducible mainly due to the fact that the implementation details of most data preprocessing, cleaning, phenotyping and analysis approaches are not systematically made available or shared. With the complexity, volume and variety of electronic health record data sources made available for research steadily increasing, it is critical to ensure that scientific findings from EHR data are reproducible and replicable by researchers. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0151-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5594436PMC
September 2017
18 Reads

RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study.

BioData Min 2017 5;10:30. Epub 2017 Sep 5.

Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.

Background: Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0150-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5584328PMC
September 2017
13 Reads

Identifying time-delayed gene regulatory networks via an evolvable hierarchical recurrent neural network.

BioData Min 2017 3;10:29. Epub 2017 Aug 3.

Department of Electrical and Computer Engineering, North Carolina A&T State University, 1601 E. Market Street, Greensboro, 27411 NC USA.

Background: The modeling of genetic interactions within a cell is crucial for a basic understanding of physiology and for applied areas such as drug design. Interactions in gene regulatory networks (GRNs) include effects of transcription factors, repressors, small metabolites, and microRNA species. In addition, the effects of regulatory interactions are not always simultaneous, but can occur after a finite time delay, or as a combined outcome of simultaneous and time delayed interactions. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0146-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5543747PMC
August 2017
14 Reads

Genetically improved BarraCUDA.

BioData Min 2017 2;10:28. Epub 2017 Aug 2.

University of Cambridge Metabolic Research Laboratories, Addenbrooke's Hospital, Cambridge, UK.

Background: BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using "Genetic Improvement".

Results: The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short BioPlanet. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0149-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5541657PMC
August 2017
7 Reads

nRC: non-coding RNA Classifier based on structural features.

BioData Min 2017 1;10:27. Epub 2017 Aug 1.

ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa, Palermo, 90146 Italy.

Motivation: Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0148-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5540506PMC
August 2017
6 Reads

Evolutionary computation: the next major transition of artificial intelligence?

BioData Min 2017 29;10:26. Epub 2017 Jul 29.

Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, 19104-6021 PA USA.

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0147-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5534026PMC
July 2017
11 Reads

Discovery and replication of SNP-SNP interactions for quantitative lipid traits in over 60,000 individuals.

BioData Min 2017 24;10:25. Epub 2017 Jul 24.

Biomedical and Translational Informatics, Geisinger Clinic, Danville, PA USA.

Background: The genetic etiology of human lipid quantitative traits is not fully elucidated, and interactions between variants may play a role. We performed a gene-centric interaction study for four different lipid traits: low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), total cholesterol (TC), and triglycerides (TG).

Results: Our analysis consisted of a discovery phase using a merged dataset of five different cohorts ( = 12,853 to  = 16,849 depending on lipid phenotype) and a replication phase with ten independent cohorts totaling up to 36,938 additional samples. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0145-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5525436PMC
July 2017
35 Reads

The Dark Proteome Database.

BioData Min 2017 20;10:24. Epub 2017 Jul 20.

Garvan Institute of Medical Research, Sydney, NSW 2010 Australia.

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-017-0144-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5520327PMC
July 2017
11 Reads