Precision annotation of digital samples in NCBI's gene expression omnibus.

Sci Data 2017 09 19;4:170125. Epub 2017 Sep 19.

Institute for Computational Health Sciences, University of California, San Francisco, California 94158, USA.

The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web application (http://STARGEO.org) to curate better annotations of sample phenotypes uniformly across different studies, and to use these sample annotations to define robust genomic signatures of disease pathology by meta-analysis. In this paper, we target a small group of biomedical graduate students to show rapid crowd-curation of precise sample annotations across all phenotypes, and we demonstrate the biological validity of these crowd-curated annotations for breast cancer. STARGEO.org makes GEO data findable, accessible, interoperable and reusable (i.e., FAIR) to ultimately facilitate knowledge discovery. Our work demonstrates the utility of crowd-curation and interpretation of open 'big data' under FAIR principles as a first step towards realizing an ideal paradigm of precision medicine.

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.125DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5604135PMC
September 2017
55 Reads

Publication Analysis

Top Keywords

sample annotations
8
gene expression
8
digital samples
8
expression omnibus
8
knowledge discovery
4
phenotypes uniformly
4
uniformly studies
4
studies sample
4
annotations define
4
define robust
4
ultimately facilitate
4
disease pathology
4
signatures disease
4
genomic signatures
4
facilitate knowledge
4
robust genomic
4
discovery work
4
annotations sample
4
demonstrates utility
4
cancer stargeoorg
4

Altmetric Statistics

Similar Publications