MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration.

BMC Bioinformatics 2017 Jan 17;18(1):36. Epub 2017 Jan 17.

Institut de Salut Global de Barcelona (ISGlobal) - Campus Mar, Barcelona Biulding: Biomedical Research Park, c/Dr. Aiguader, 88, 08003, Barcelona, Spain.

Background: Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor's methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples.

Results: To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment.

Conclusions: MultiDataSet is a suitable class for data integration under R and Bioconductor framework.

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-016-1455-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5240259PMC
January 2017
1 Read

Publication Analysis

Top Keywords

data integration
12
data sets
12
data
10
omic data
8
multiple data
8
multidataset
5
data basic
4
data structures
4
samplesresults cover
4
integration analysis
4
biological data
4
basic data
4
clear general
4
subsetting selecting
4
selecting samplesresults
4
methods subsetting
4
general methods
4
performing integration
4
visualize biological
4
structures clear
4

References

(Supplied by CrossRef)

S Pineda et al.
Hum Hered 2015

P Suravajhala et al.
Genet Sel Evol 2016

R Gentleman et al.
Genome Biol 2004

W Huber et al.
Nat Methods 2015

L Kannan et al.
Brief Bioinform 2015

The Cancer Genome Atlas Research Network et al.
Nat Genet 2013

The 1000 Genomes Project Consortium et al.
Nature 2015

The International Cancer Genome Consortium et al.
Nature 2010

Similar Publications