FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.

J Biomed Semantics 2016 Jun 13;7:39. Epub 2016 Jun 13.

The James Hutton Institute, Dundee, DD2 5DA, UK.

Background: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples.

Description: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations.

Conclusions: Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13326-016-0067-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4907002PMC
June 2016
62 Reads

Publication Analysis

Top Keywords

multiple sources
8
faldo describe
8
nucleotide protein
8
feature annotation
8
sparql endpoints
8
protein annotations
8
feature annotations
8
sources data
8
annotations
6
sequence
5
independent file
4
format represent
4
positions independent
4
sequence positions
4
file formats
4
represent sequence
4
integrate sequence
4
data types
4
types genome
4
genome browser
4

Altmetric Statistics

References

(Supplied by CrossRef)

F Sanger et al.
Biochem J 1949

MO Dayhoff et al.
1965

JE Stajich et al.
Genome Res 2002

PJA Cock et al.
Bioinformatics 2009

N Goto et al.
Bioinformatics 2010

A Prlić et al.
Bioinformatics 2012

T Katayama et al.
J Biomed Semantics 2010

T Katayama et al.
J Biomed Semantics 2011

T Katayama et al.
J Biomed Semantics 2013

T Katayama et al.
J Biomed Semantics 2014

Similar Publications