312 results match your criteria BioData Mining [Journal]


Ideas for how informaticians can get involved with COVID-19 research.

BioData Min 2020 12;13. Epub 2020 May 12.

1Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116 USA.

The coronavirus disease 2019 (COVID-19) pandemic has had a significant impact on population health and wellbeing. Biomedical informatics is central to COVID-19 research efforts and for the delivery of healthcare for COVID-19 patients. Critical to this effort is the participation of informaticians who typically work on other basic science or clinical problems. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-020-00213-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7216865PMC

A network pharmacology-based study on Alzheimer disease prevention and treatment of Qiong Yu Gao.

BioData Min 2020 25;13. Epub 2020 Apr 25.

1School of Basic Medical Sciences, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong Province China.

Background And Objective: As the pathological mechanisms of AD are complex, increasing evidence have demonstrated Chinese Medicine with multi-ingredients and multi-targets may be more suitable for the treatment of diseases with complex pathogenesis. Therefore, the study was to preliminarily decipher the bioactive compounds and potential mechanisms of Qiong Yu Gao (QYG) for AD prevention and treatment by an integrated network pharmacology approach.

Methods: Putative ingredients of QYG and significant genes of AD were retrieved from public database after screening. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-020-00212-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7183652PMC

Correction to: Investigating the parameter space of evolutionary algorithms.

BioData Min 2019;12:22. Epub 2019 Nov 18.

1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104-6021 USA.

[This corrects the article DOI: 10.1186/s13040-018-0164-x.]. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0210-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6862728PMC
November 2019

SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500.

BioData Min 2019 15;12:21. Epub 2019 Nov 15.

BGI-Wuhan Clinical Laboratories, Building B2, No.666 Gaoxin Road, Wuhan East lake Hi-tech Development zone, Wuhan, 430074 China.

Background: The sequencing platform BGISEQ-500 is based on DNBSEQ technology and provides high throughput with low costs. This sequencer has been widely used in various areas of scientific and clinical research. A better understanding of the sequencing process and performance of this system is essential for stabilizing the sequencing process, accurately interpreting sequencing results and efficiently solving sequencing problems. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0209-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857306PMC
November 2019

Screening for mouse genes lost in mammals with long lifespans.

BioData Min 2019 9;12:20. Epub 2019 Nov 9.

1Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute) IITP RAS, 19 build. 1 Bolshoy Karetny per., Moscow, 127051 Russia.

Background: Gerontogenes include those that modulate life expectancy in various species and may be the actual longevity genes. We believe that a long (relative to body weight) lifespan in individual rodent and primate species can be due, among other things, to the loss of particular genes that are present in short-lived species of the same orders. These genes can also explain the widely different rates of aging among diverse species as well as why similarly sized rodents or primates sometimes have anomalous life expectancies (e. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0208-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6842137PMC
November 2019

Predicting metabolite-disease associations based on KATZ model.

BioData Min 2019 26;12:19. Epub 2019 Oct 26.

School of Computer Science, Shaanxi Normal University, Xi'an, 710119 Shaanxi China.

Background: Increasing numbers of evidences have illuminated that metabolites can respond to pathological changes. However, identifying the diseases-related metabolites is a magnificent challenge in the field of biology and medicine. Traditional medical equipment not only has the limitation of its accuracy but also is expensive and time-consuming. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0206-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6815005PMC
October 2019

Integrative analysis of genetic and epigenetic profiling of lung squamous cell carcinoma (LSCC) patients to identify smoking level relevant biomarkers.

BioData Min 2019 21;12:18. Epub 2019 Oct 21.

3Department of Gynecology and Oncology, Wen Zhou Medical University affiliated People's Hospital, Wen Zhou, Zhe Jiang province People's Republic of China.

Background: Incidence and mortality of lung cancer have dramatically decreased during the last decades, yet still approximately 160,000 deaths per year occurred in United States. Smoking intensity, duration, starting age, as well as environmental cofactors including air-pollution, showed strong association with major types of lung cancer. Lung squamous cell carcinoma is a subtype of non-small cell lung cancer, which represents 25% of the cases. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0207-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6802182PMC
October 2019
1.535 Impact Factor

RNSCLC-PRSP software to predict the prognostic risk and survival in patients with resected TN M non-small cell lung cancer.

BioData Min 2019 23;12:17. Epub 2019 Aug 23.

1Department of Thoracic Surgery, The First Affiliated Hospital of Soochow University, No. 899 Pinghai Road, Suzhou, 215006 China.

Background: The clinical outcomes of patients with resected TNM non-small cell lung cancer (NSCLC) with the same tumor-node-metastasis (TNM) stage are diverse. Although other prognostic factors and prognostic prediction tools have been reported in many published studies, a convenient, accurate and specific prognostic prediction software for clinicians has not been developed. The purpose of our research was to develop this type of software that can analyze subdivided T and N staging and additional factors to predict prognostic risk and the corresponding mean and median survival time and 1-5-year survival rates of patients with resected TNM NSCLC. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0205-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6708148PMC
August 2019
1 Read

ViSEAGO: a Bioconductor package for clustering biological functions using Gene Ontology and semantic similarity.

BioData Min 2019 6;12:16. Epub 2019 Aug 6.

BOA, INRA, Université de Tours, 37380 Nouzilly, France.

The main objective of package is to carry out a data mining of biological functions and establish links between genes involved in the study. We developed in R to facilitate functional Gene Ontology (GO) analysis of complex experimental design with multiple comparisons of interest. It allows to study large-scale datasets together and visualize GO profiles to capture biological knowledge. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0204-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6685253PMC
August 2019
1 Read

ClickGene: an open cloud-based platform for big pan-cancer data genome-wide association study, visualization and exploration.

BioData Min 2019 26;12:12. Epub 2019 Jun 26.

1School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072 China.

Tremendous amount of whole-genome sequencing data have been provided by large consortium projects such as TCGA (The Cancer Genome Atlas), COSMIC and so on, which creates incredible opportunities for functional gene research and cancer associated mechanism uncovering. While the existing web servers are valuable and widely used, many whole genome analysis functions urgently needed by experimental biologists are still not adequately addressed. A cloud-based platform, named CG (ClickGene), therefore, was developed for DIY analyzing of user's private in-house data or public genome data without any requirement of software installation or system configuration. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0202-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6595587PMC

Disease associations depend on visit type: results from a visit-wide association study.

BioData Min 2019 11;12:15. Epub 2019 Jul 11.

1Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, PA 19104 USA.

Introduction: Widespread adoption of Electronic Health Records (EHR) increased the number of reported disease association studies, or Phenome-Wide Association Studies (PheWAS). Traditional PheWAS studies ignore (i.e. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0203-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6625053PMC

Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies.

BioData Min 2019 9;12:14. Epub 2019 Jul 9.

1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA USA.

Background: The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0201-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6617598PMC

On the utilization of deep and ensemble learning to detect milk adulteration.

BioData Min 2019 8;12:13. Epub 2019 Jul 8.

2Department of Computer Science, Federal University of Minas Gerais, Av. Antônio Carlos, 6627, Belo Horizonte, 31270-901 MG Brazil.

Background: Fraudulent milk adulteration is a dangerous practice in the dairy industry that is harmful to consumers since milk is one of the most consumed food products. Milk quality can be assessed by Fourier Transformed Infrared Spectroscopy (FTIR), a simple and fast method for obtaining its compositional information. The spectral data produced by this technique can be explored using machine learning methods, such as neural networks and decision trees, in order to create models that represent the characteristics of pure and adulterated milk samples. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0200-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6615233PMC
July 2019
1 Read

Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies.

BioData Min 2019 10;12:11. Epub 2019 Jun 10.

BIO3, GIGA-R Medical Genomics, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium.

Background: In Genome-Wide Association Studies (GWAS), the concept of linkage disequilibrium is important as it allows identifying genetic markers that tag the actual causal variants. In Genome-Wide Association Interaction Studies (GWAIS), similar principles hold for pairs of causal variants. However, Linkage Disequilibrium (LD) may also interfere with the detection of genuine epistasis signals in that there may be complete confounding between Gametic Phase Disequilibrium (GPD) and interaction. Read More

View Article

Download full-text PDF

Source
https://biodatamining.biomedcentral.com/articles/10.1186/s13
Publisher Site
http://dx.doi.org/10.1186/s13040-019-0199-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6558841PMC
June 2019
5 Reads

Innovative strategies for annotating the "relationSNP" between variants and molecular phenotypes.

BioData Min 2019 14;12:10. Epub 2019 May 14.

Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA 19104 USA.

Characterizing how variation at the level of individual nucleotides contributes to traits and diseases has been an area of growing interest since the completion of sequencing the first human genome. Our understanding of how a single nucleotide polymorphism (SNP) leads to a pathogenic phenotype on a genome-wide scale is a fruitful endeavor for anyone interested in developing diagnostic tests, therapeutics, or simply wanting to understand the etiology of a disease or trait. To this end, many datasets and algorithms have been developed as resources/tools to annotate SNPs. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0197-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6518798PMC

Within-sample co-methylation patterns in normal tissues.

BioData Min 2019 9;12. Epub 2019 May 9.

2Department of Mathematics, Texas State University, San Marcos, TX USA.

Background: DNA methylation is an epigenetic event that may regulate gene expression. Because of this regulation role, aberrant DNA methylation is often associated with many diseases. Within-sample DNA co-methylation is the similarity of methylation in nearby cytosine sites of a chromosome. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0198-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6506960PMC
May 2019
1 Read

Characterizing human genomic coevolution in locus-gene regulatory interactions.

BioData Min 2019 15;12. Epub 2019 Mar 15.

1Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106 OH USA.

Background: Coevolution has been used to identify and predict interactions and functional relationships between proteins of many different organisms including humans. Current efforts in annotating the human genome increasingly show that non-coding DNA sequence has important functional and regulatory interactions. Furthermore, regulatory elements do not necessarily reside in close proximity of the coding region for their target genes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0195-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419833PMC
March 2019
1 Read

Encodings and models for antimicrobial peptide classification for multi-resistant pathogens.

BioData Min 2019 4;12. Epub 2019 Mar 4.

Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany.

Antimicrobial peptides (AMPs) are part of the inherent immune system. In fact, they occur in almost all organisms including, e.g. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0196-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6399931PMC
March 2019
1 Read

Testing the assumptions of parametric linear models: the need for biological data mining in disciplines such as human genetics.

BioData Min 2019 11;12. Epub 2019 Feb 11.

3Department of Epidemiology and Biostatistics, Case-Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106 USA.

View Article

Download full-text PDF

Source
https://biodatamining.biomedcentral.com/articles/10.1186/s13
Publisher Site
http://dx.doi.org/10.1186/s13040-019-0194-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6371539PMC
February 2019
9 Reads

Approximate kernel reconstruction for time-varying networks.

BioData Min 2019 6;12. Epub 2019 Feb 6.

4School of Medicine, University of Alabama at Birmingham, Birmingham, AL USA.

Background: Most existing algorithms for modeling and analyzing molecular networks assume a static or time-invariant network topology. Such view, however, does not render the temporal evolution of the underlying biological process as molecular networks are typically "re-wired" over time in response to cellular development and environmental changes. In our previous work, we formulated the inference of time-varying or dynamic networks as a tracking problem, where the target state is the ensemble of edges in the network. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-019-0192-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364395PMC
February 2019
3 Reads

A biplot correlation range for group-wise metabolite selection in mass spectrometry.

BioData Min 2019 4;12. Epub 2019 Feb 4.

6Department of Industrial Engineering, Hanyang University, Seoul, 04763 South Korea.

Background: Analytic methods are available to acquire extensive metabolic information in a cost-effective manner for personalized medicine, yet disease risk and diagnosis mostly rely upon individual biomarkers based on statistical principles of false discovery rate and correlation. Due to functional redundancies and multiple layers of regulation in complex biologic systems, individual biomarkers, while useful, are inherently limited in disease characterization. Data reduction and discriminant analysis tools such as principal component analysis (PCA), partial least squares (PLS), or orthogonal PLS (O-PLS) provide approaches to separate the metabolic phenotypes, but do not offer a statistical basis for selection of group-wise metabolites as contributors to metabolic phenotypes. Read More

View Article

Download full-text PDF

Source
https://biodatamining.biomedcentral.com/articles/10.1186/s13
Publisher Site
http://dx.doi.org/10.1186/s13040-019-0191-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6360680PMC
February 2019
43 Reads

Predicting opioid dependence from electronic health records with machine learning.

BioData Min 2019 29;12. Epub 2019 Jan 29.

1Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA.

Background: The opioid epidemic in the United States is averaging over 100 deaths per day due to overdose. The effectiveness of opioids as pain treatments, and the drug-seeking behavior of opioid addicts, leads physicians in the United States to issue over 200 million opioid prescriptions every year. To better understand the biomedical profile of opioid-dependent patients, we analyzed information from electronic health records (EHR) including lab tests, vital signs, medical procedures, prescriptions, and other data from millions of patients to predict opioid substance dependence. Read More

View Article

Download full-text PDF

Source
https://biodatamining.biomedcentral.com/articles/10.1186/s13
Publisher Site
http://dx.doi.org/10.1186/s13040-019-0193-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6352440PMC
January 2019
49 Reads

Use case driven evaluation of open databases for pediatric cancer research.

BioData Min 2019 15;12. Epub 2019 Jan 15.

1Institute of Interactive Systems and Data Science, Graz University of Technology, Graz, Austria.

Background: A plethora of Web resources are available offering information on clinical, pre-clinical, genomic and theoretical aspects of cancer, including not only the comprehensive cancer projects as ICGC and TCGA, but also less-known and more specialized projects on pediatric diseases such as PCGP. However, in case of data on childhood cancer there is very little information openly available. Several web-based resources and tools offer general biomedical data which are not purpose-built, for neither pediatric nor cancer analysis. Read More

View Article

Download full-text PDF

Source
https://biodatamining.biomedcentral.com/articles/10.1186/s13
Publisher Site
http://dx.doi.org/10.1186/s13040-018-0190-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6334395PMC
January 2019
22 Reads

Application of an interpretable classification model on Early Folding Residues during protein folding.

BioData Min 2019 5;12. Epub 2019 Jan 5.

1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany.

Background: Machine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness of the resulting models. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0188-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6321665PMC
January 2019
2 Reads

Unified Cox model based multifactor dimensionality reduction method for gene-gene interaction analysis of the survival phenotype.

BioData Min 2018 14;11:27. Epub 2018 Dec 14.

2Department of Statistics, Seoul National University, Shilim-dong, Kwanak-gu, Seoul, 151-742 South Korea.

Background: One strategy for addressing missing heritability in genome-wide association study is gene-gene interaction analysis, which, unlike a single gene approach, involves high-dimensionality. The multifactor dimensionality reduction method (MDR) has been widely applied to reduce multi-levels of genotypes into high or low risk groups. The Cox-MDR method has been proposed to detect gene-gene interactions associated with the survival phenotype by using the martingale residuals from a Cox model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0189-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6295107PMC
December 2018
2 Reads

Distributed retrieval engine for the development of cloud-deployed biological databases.

BioData Min 2018 12;11:26. Epub 2018 Nov 12.

Neuro-biomorphic Engineering Lab, Faculty of Engineering, Jerusalem College of Technology, Jerusalem, Israel.

The integration of cloud resources with federated data retrieval has the potential of improving the maintenance, accessibility and performance of specialized databases in the biomedical field. However, such an integrative approach requires technical expertise in cloud computing, usage of a data retrieval engine and development of a unified data-model, which can encapsulate the heterogeneity of biological data. Here, a framework for the development of cloud-based biological specialized databases is proposed. Read More

View Article

Download full-text PDF

Source
https://biodatamining.biomedcentral.com/articles/10.1186/s13
Publisher Site
http://dx.doi.org/10.1186/s13040-018-0185-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6233384PMC
November 2018
26 Reads

Knomics-Biota - a system for exploratory analysis of human gut microbiota data.

BioData Min 2018 6;11:25. Epub 2018 Nov 6.

Research and Development Department, Knomics LLC, Skolkovo Innovation Center, Moscow, Russian Federation.

Background: Metagenomic surveys of human microbiota are becoming increasingly widespread in academic research as well as in food and pharmaceutical industries and clinical context. Intuitive tools for investigating experimental data are of high interest to researchers.

Results: Knomics-Biota is a web-based resource for exploratory analysis of human gut metagenomes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0187-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6220475PMC
November 2018
27 Reads

A fast forward 3D connection algorithm for mitochondria and synapse segmentations from serial EM images.

BioData Min 2018 5;11:24. Epub 2018 Nov 5.

2Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190 China.

Background: It is becoming increasingly clear that the quantification of mitochondria and synapses is of great significance to understand the function of biological nervous systems. Electron microscopy (EM), with the necessary resolution in three directions, is the only available imaging method to look closely into these issues. Therefore, estimating the number of mitochondria and synapses from the serial EM images is coming into prominence. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0183-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6217761PMC
November 2018
25 Reads

Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS.

BioData Min 2018 3;11:23. Epub 2018 Nov 3.

1Tandy School of Computer Science, The University of Tulsa, 800 S. Tucker Dr, Tulsa, OK 74104 USA.

Background: ReliefF is a nearest-neighbor based feature selection algorithm that efficiently detects variants that are important due to statistical interactions or epistasis. For categorical predictors, like genotypes, the standard metric used in ReliefF has been a simple (binary) mismatch difference. In this study, we develop new metrics of varying complexity that incorporate allele sharing, adjustment for allele frequency heterogeneity via the genetic relationship matrix (GRM), and physicochemical differences of variants via a new transition/transversion encoding. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0186-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6215626PMC
November 2018
2 Reads

Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction.

BioData Min 2018 25;11:22. Epub 2018 Oct 25.

3Department of Engineering, Uninettuno University, Corso Vittorio Emanuele II, 39, Rome, 00186 Italy.

Background: In the Next Generation Sequencing (NGS) era a large amount of biological data is being sequenced, analyzed, and stored in many public databases, whose interoperability is often required to allow an enhanced accessibility. The combination of heterogeneous NGS genomic data is an open challenge: the analysis of data from different experiments is a fundamental practice for the study of diseases. In this work, we propose to combine DNA methylation and RNA sequencing NGS experiments at gene level for supervised knowledge extraction in cancer. Read More

View Article

Download full-text PDF

Source
https://biodatamining.biomedcentral.com/articles/10.1186/s13
Publisher Site
http://dx.doi.org/10.1186/s13040-018-0184-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6203208PMC
October 2018
33 Reads

To know the objective is not (necessarily) to know the objective function.

BioData Min 2018 4;11:21. Epub 2018 Oct 4.

1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, 19104-6021 PA USA.

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0182-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6172724PMC
October 2018
2 Reads

Grasping frequent subgraph mining for bioinformatics applications.

BioData Min 2018 3;11:20. Epub 2018 Sep 3.

1Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.

Searching for interesting common subgraphs in graph data is a well-studied problem in data mining. Subgraph mining techniques focus on the discovery of patterns in graphs that exhibit a specific network structure that is deemed interesting within these data sets. The definition of which subgraphs are interesting and which are not is highly dependent on the application. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0181-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6122726PMC
September 2018
9 Reads

Functional relevance for central cornea thickness-associated genetic variants by using integrative analyses.

BioData Min 2018 15;11:19. Epub 2018 Aug 15.

Department of Ophthalmology and Visual Science, Eye Institute, Eye & ENT Hospital, Shanghai Medical College of Fudan University, NHC Key Laboratory of myopia (Fudan University), Shanghai, China.

Background: The genetic architecture underlying central cornea thickness (CCT) is far from understood. Most of the CCT-associated variants are located in the non-coding regions, raising the difficulty of following functional characterizations. Thus, integrative functional analyses on CCT-associated loci might benefit in overcoming these issues by prioritizing the hub genes that are located in the center of CCT genetic network. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0179-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6094462PMC
August 2018
17 Reads

Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases.

BioData Min 2018 14;11:18. Epub 2018 Aug 14.

1Department of Environmental and Biological Sciences, University of Eastern Finland, Yliopistonranta 1 E, 70211 Kuopio, Finland.

Background: The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular predictive modeling. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0180-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6092817PMC
August 2018
4 Reads

TRIQ: a new method to evaluate triclusters.

BioData Min 2018 6;11:15. Epub 2018 Aug 6.

2Department of computer Science, University of Seville, Seville, Spain.

Background: Triclustering has shown to be a valuable tool for the analysis of microarray data since its appearance as an improvement of classical clustering and biclustering techniques. The standard for validation of triclustering is based on three different measures: correlation, graphic similarity of the patterns and functional annotations for the genes extracted from the Gene Ontology project (GO).

Results: We propose , a single evaluation measure that combines the three measures previously described: correlation, graphic validation and functional annotation, providing a single value as result of the validation of a tricluster solution and therefore simplifying the steps inherent to research of comparison and selection of solutions. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0177-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6091209PMC
August 2018
4 Reads

Protein folding prediction in the HP model using ions motion optimization with a greedy algorithm.

BioData Min 2018 8;11:17. Epub 2018 Aug 8.

5Department of Biomedical Science and Environmental Biology, Kaohsiung Medical University, Kaohsiung, Taiwan.

Background: The function of a protein is determined by its native protein structure. Among many protein prediction methods, the Hydrophobic-Polar (HP) model, an ab initio method, simplifies the protein folding prediction process in order to reduce the prediction complexity.

Results: In this study, the ions motion optimization (IMO) algorithm was combined with the greedy algorithm (namely IMOG) and implemented to the HP model for the protein folding prediction based on the 2D-triangular-lattice model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0176-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6083565PMC
August 2018
4 Reads

A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies.

BioData Min 2018 7;11:16. Epub 2018 Aug 7.

1Centre for Biotechnology and Bioengineering (CeBiB), Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Av. Ecuador 3659, Santiago, Chile.

Background: Biologists aim to understand the genetic background of diseases, metabolic disorders or any other genetic condition. Microarrays are one of the main high-throughput technologies for collecting information about the behaviour of genetic information on different conditions. In order to analyse this data, clustering arises as one of the main techniques used, and it aims at finding groups of genes that have some criterion in common, like similar expression profile. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0178-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6081857PMC
August 2018
37 Reads

Soft document clustering using a novel graph covering approach.

BioData Min 2018 14;11:11. Epub 2018 Jun 14.

Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Schloss Birlinghoven, Sankt Augustin, Germany.

Background: In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation.

Results: In this paper we present and discuss a novel graph-theoretical approach for document clustering and its application on a real-world data set. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0172-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6047369PMC
June 2018
4 Reads

Feature selection for gene prediction in metagenomic fragments.

BioData Min 2018 7;11. Epub 2018 Jun 7.

College of Computer and Information Sciences, Computer Science Department, King Saud University, Riyadh, Saudi Arabia.

Background: Computational approaches, specifically machine-learning techniques, play an important role in many metagenomic analysis algorithms, such as gene prediction. Due to the large feature space, current de novo gene prediction algorithms use different combinations of classification algorithms to distinguish between coding and non-coding sequences.

Results: In this study, we apply a filter method to select relevant features from a large set of known features instead of combining them using linear classifiers or ignoring their individual coding potential. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0170-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6047368PMC
June 2018
5 Reads

PathCORE-T: identifying and visualizing globally co-occurring pathways in large transcriptomic compendia.

BioData Min 2018 3;11:14. Epub 2018 Jul 3.

1Department of Systems Pharmacology and Translational Therapeutics. Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA 19104 USA.

Background: Investigators often interpret genome-wide data by analyzing the expression levels of genes within pathways. While this within-pathway analysis is routine, the products of any one pathway can affect the activity of other pathways. Past efforts to identify relationships between biological processes have evaluated overlap in knowledge bases or evaluated changes that occur after specific treatments. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0175-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6029133PMC
July 2018
5 Reads

Integrative analysis of gene expression and methylation data for breast cancer cell lines.

BioData Min 2018 25;11:13. Epub 2018 Jun 25.

4Department of Mathematics, Texas State University, San Marcos, TX USA.

Background: The deadly costs of cancer and necessity for an accurate method of early cancer detection have demanded the identification of genetic and epigenetic factors associated with cancer. DNA methylation, an epigenetic event, plays an important role in cancer susceptibility. In this paper, we use DNA methylation and gene expression data integration and pathway analysis to further explore and understand the complex relationship between methylation and gene expression. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0174-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019806PMC
June 2018
7 Reads

Measuring associations between the microbiota and repeated measures of continuous clinical variables using a lasso-penalized generalized linear mixed model.

BioData Min 2018 15;11:12. Epub 2018 Jun 15.

1Department of Computational & Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261 USA.

Background: Human microbiome studies in clinical settings generally focus on distinguishing the microbiota in health from that in disease at a specific point in time. However, microbiome samples may be associated with disease severity or continuous clinical health indicators that are often assessed at multiple time points. While the temporal data from clinical and microbiome samples may be informative, analysis of this type of data can be problematic for standard statistical methods. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0173-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6003033PMC
June 2018
18 Reads

Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi.

BioData Min 2018 13;11:10. Epub 2018 Jun 13.

1Bioinformatics Research Center, North Carolina State University, 1 Lampe Dr, Raleigh, 27695 NC USA.

Background: The Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources. However, individual data sources ("assays"), such as in vitro bioassays or in vivo study endpoints, often feature sections of missing data, wherein subsets of chemicals have not been tested in all assays. In order to investigate the effects of missing data and recommend solutions, we designed simulation studies around high-throughput screening data generated by the ToxCast and Tox21 programs on chemicals highlighted by the Agency for Toxic Substances and Disease Registry's (ATSDR) Substance Priority List (SPL), which helps prioritize environmental research and remediation resources. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0169-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998548PMC
June 2018
9 Reads

Gene set analysis methods: a systematic comparison.

BioData Min 2018 31;11. Epub 2018 May 31.

1Bioinformatics Research Center, North Carolina State University, Raleigh, NC USA.

Background: Gene set analysis is a valuable tool to summarize high-dimensional gene expression data in terms of biologically relevant sets. This is an active area of research and numerous gene set analysis methods have been developed. Despite this popularity, systematic comparative studies have been limited in scope. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0166-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5984476PMC
May 2018
10 Reads

Connecting genetics and gene expression data for target prioritisation and drug repositioning.

BioData Min 2018 31;11. Epub 2018 May 31.

2Computational Biology, Target Sciences, GSK, 1250 S. Collegeville Road, UP12-100, Collegeville, PA 19426-0989 USA.

Developing new drugs continues to be a highly inefficient and costly business. By repurposing an existing compound for a different indication, drug repositioning offers an attractive alternative to traditional drug discovery. Most of these approaches work by matching transcriptional disease signatures to anti-correlated gene expression profiles of drug perturbations. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0171-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5984374PMC
May 2018
5 Reads

Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV).

BioData Min 2018 19;11. Epub 2018 Apr 19.

2Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA.

Background: Machine learning methods and conventions are increasingly employed for the analysis of large, complex biomedical data sets, including genome-wide association studies (GWAS). Reproducibility of machine learning analyses of GWAS can be hampered by biological and statistical factors, particularly so for the investigation of non-additive genetic interactions. Application of traditional cross validation to a GWAS data set may result in poor consistency between the training and testing data set splits due to an imbalance of the interaction genotypes relative to the data as a whole. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0167-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5907739PMC
April 2018
24 Reads

Collective feature selection to identify crucial epistatic variants.

BioData Min 2018 19;11. Epub 2018 Apr 19.

1Biomedical and Translational Bioinformatics Institute, Geisinger Health System, 100 N Academy Avenue, Danville, PA 17822 USA.

Background: Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0168-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5907720PMC
April 2018
32 Reads
1.530 Impact Factor

Pairwise gene GO-based measures for biclustering of high-dimensional expression data.

BioData Min 2018 27;11. Epub 2018 Mar 27.

2Área de Informática, Universidad Pablo de Olavide, Ctra. Utrera km. 1, Seville, 41013 Spain.

Background: Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0165-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5872503PMC
March 2018
7 Reads

A novel joint analysis framework improves identification of differentially expressed genes in cross disease transcriptomic analysis.

Authors:
Wenyi Qin Hui Lu

BioData Min 2018 20;11. Epub 2018 Feb 20.

1Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan, Rm 218, Chicago, IL 60607 USA.

Motivation: Detecting differentially expressed (DE) genes between disease and normal control group is one of the most common analyses in genome-wide transcriptomic data. Since most studies don't have a lot of samples, researchers have used meta-analysis to group different datasets for the same disease. Even then, in many cases the statistical power is still not enough. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0163-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5819186PMC
February 2018
7 Reads

Investigating the parameter space of evolutionary algorithms.

BioData Min 2018 17;11. Epub 2018 Feb 17.

1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, 19104-6021 PA USA.

Evolutionary computation (EC) has been widely applied to biological and biomedical data. The practice of EC involves the tuning of many parameters, such as population size, generation count, selection size, and crossover and mutation rates. Through an series of experiments over multiple evolutionary algorithm implementations and 25 problems we show that parameter space tends to be rife with viable parameters, at least for the problems studied herein. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-018-0164-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5816380PMC
February 2018
10 Reads