Publications by authors named "Ugis Sarkans"

43 Publications

Data-deposition protocols for correlative soft X-ray tomography and super-resolution structured illumination microscopy applications.

STAR Protoc 2021 Mar 13;2(1):100253. Epub 2021 Jan 13.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

This protocol illustrates the steps necessary to deposit correlated 3D cryo-imaging data from cryo-structured illumination microscopy and cryo-soft X-ray tomography with the BioStudies and EMPIAR deposition databases of the European Bioinformatics Institute. There is currently a real need for a robust method of data deposition to ensure unhindered access to and independent validation of correlative light and X-ray microscopy data to allow use in further comparative studies, educational activities, and data mining. For complete details on the use and execution of this protocol, please refer to Kounatidis et al. (2020).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.xpro.2020.100253DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7811169PMC
March 2021

From ArrayExpress to BioStudies.

Nucleic Acids Res 2021 01;49(D1):D1502-D1506

European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.

ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for publication-related microarray data and was later extended to accept sequencing-based data. Over the last decade an increasing share of biological experiments involve multiple technologies assaying different biological modalities, such as epigenetics, and RNA and protein expression, and thus the BioStudies database (https://www.ebi.ac.uk/biostudies) was established to deal with such multimodal data. Its central concept is a study, which typically is associated with a publication. BioStudies stores metadata describing the study, provides links to the relevant databases, such as European Nucleotide Archive (ENA), as well as hosts the types of data for which specialized databases do not exist. With BioStudies now fully functional, we are able to further harmonize the archival data infrastructure at EMBL-EBI, and ArrayExpress is being migrated to BioStudies. In future, all functional genomics data will be archived at BioStudies. The process will be seamless for the users, who will continue to submit data using the online tool Annotare and will be able to query and download data largely in the same manner as before. Nevertheless, some technical aspects, particularly programmatic access, will change. This update guides the users through these changes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkaa1062DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778911PMC
January 2021

Network integration and modelling of dynamic drug responses at multi-omics levels.

Commun Biol 2020 10 15;3(1):573. Epub 2020 Oct 15.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK.

Uncovering cellular responses from heterogeneous genomic data is crucial for molecular medicine in particular for drug safety. This can be realized by integrating the molecular activities in networks of interacting proteins. As proof-of-concept we challenge network modeling with time-resolved proteome, transcriptome and methylome measurements in iPSC-derived human 3D cardiac microtissues to elucidate adverse mechanisms of anthracycline cardiotoxicity measured with four different drugs (doxorubicin, epirubicin, idarubicin and daunorubicin). Dynamic molecular analysis at in vivo drug exposure levels reveal a network of 175 disease-associated proteins and identify common modules of anthracycline cardiotoxicity in vitro, related to mitochondrial and sarcomere function as well as remodeling of extracellular matrix. These in vitro-identified modules are transferable and are evaluated with biopsies of cardiomyopathy patients. This to our knowledge most comprehensive study on anthracycline cardiotoxicity demonstrates a reproducible workflow for molecular medicine and serves as a template for detecting adverse drug responses from complex omics data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-020-01302-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7567116PMC
October 2020

The EU-ToxRisk method documentation, data processing and chemical testing pipeline for the regulatory use of new approach methods.

Arch Toxicol 2020 07 6;94(7):2435-2461. Epub 2020 Jul 6.

Unit of Toxicology Sciences, Swedish Toxicology Sciences Research Center (Swetox), Karolinska Institutet, Forskargatan 20, 151 36, Södertälje, Sweden.

Hazard assessment, based on new approach methods (NAM), requires the use of batteries of assays, where individual tests may be contributed by different laboratories. A unified strategy for such collaborative testing is presented. It details all procedures required to allow test information to be usable for integrated hazard assessment, strategic project decisions and/or for regulatory purposes. The EU-ToxRisk project developed a strategy to provide regulatorily valid data, and exemplified this using a panel of > 20 assays (with > 50 individual endpoints), each exposed to 19 well-known test compounds (e.g. rotenone, colchicine, mercury, paracetamol, rifampicine, paraquat, taxol). Examples of strategy implementation are provided for all aspects required to ensure data validity: (i) documentation of test methods in a publicly accessible database; (ii) deposition of standard operating procedures (SOP) at the European Union DB-ALM repository; (iii) test readiness scoring accoding to defined criteria; (iv) disclosure of the pipeline for data processing; (v) link of uncertainty measures and metadata to the data; (vi) definition of test chemicals, their handling and their behavior in test media; (vii) specification of the test purpose and overall evaluation plans. Moreover, data generation was exemplified by providing results from 25 reporter assays. A complete evaluation of the entire test battery will be described elsewhere. A major learning from the retrospective analysis of this large testing project was the need for thorough definitions of the above strategy aspects, ideally in form of a study pre-registration, to allow adequate interpretation of the data and to ensure overall scientific/toxicological validity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00204-020-02802-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7367925PMC
July 2020

A call for public archives for biological image data.

Nat Methods 2018 11;15(11):849-854

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, UK.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-018-0195-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6884425PMC
November 2018

ArrayExpress update - from bulk to single-cell expression data.

Nucleic Acids Res 2019 01;47(D1):D711-D715

European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data from a variety of technologies assaying functional modalities of a genome, such as gene expression or promoter occupancy. The number of experiments based on sequencing technologies, in particular RNA-seq experiments, has been increasing over the last few years and submissions of sequencing data have overtaken microarray experiments in the last 12 months. Additionally, there is a significant increase in experiments investigating single cells, rather than bulk samples, known as single-cell RNA-seq. To accommodate these trends, we have substantially changed our submission tool Annotare which, along with raw and processed data, collects all metadata necessary to interpret these experiments. Selected datasets are re-processed and loaded into our sister resource, the value-added Expression Atlas (and its component Single Cell Expression Atlas), which not only enables users to interpret the data easily but also serves as a test for data quality. With an increasing number of studies that combine different assay modalities (multi-omics experiments), a new more general archival resource the BioStudies Database has been developed, which will eventually supersede ArrayExpress. Data submissions will continue unchanged; all existing ArrayExpress data will be incorporated into BioStudies and the existing accession numbers and application programming interfaces will be maintained.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky964DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323929PMC
January 2019

Publisher Correction: Image Data Resource: a bioimage data integration and publication platform.

Nat Methods 2018 Nov;15(11):984

Centre for Gene Regulation and Expression, University of Dundee, Dundee, UK.

This paper was originally published under standard Nature America Inc. copyright. As of the date of this correction, the Resource is available online as an open-access paper with a CC-BY license. No other part of the paper has been changed.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-018-0169-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7608298PMC
November 2018

The BioStudies database-one stop shop for all data supporting a life sciences study.

Nucleic Acids Res 2018 01;46(D1):D1266-D1270

European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton CB10 1SD, UK.

BioStudies (www.ebi.ac.uk/biostudies) is a new public database that organizes data from biological studies. Typically, but not exclusively, a study is associated with a publication. BioStudies offers a simple way to describe the study structure, and provides flexible data deposition tools and data access interfaces. The actual data can be stored either in BioStudies or remotely, or both. BioStudies imports supplementary data from Europe PMC, and is a resource for authors and publishers for packaging data during the manuscript preparation process. It also can support data management needs of collaborative projects. The growth in multiomics experiments and other multi-faceted approaches to life sciences research mean that studies result in a diversity of data outputs in multiple locations. BioStudies presents a solution to ensuring that all these data and the associated publication(s) can be found coherently in the longer term.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx965DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753238PMC
January 2018

The Image Data Resource: A Bioimage Data Integration and Publication Platform.

Nat Methods 2017 Aug 19;14(8):775-781. Epub 2017 Jun 19.

Centre for Gene Regulation & Expression & Division of Computational Biology, University of Dundee, Dundee, Scotland, UK.

Access to primary research data is vital for the advancement of science. To extend the data types supported by community repositories, we built a prototype Image Data Resource (IDR) that collects and integrates imaging data acquired across many different imaging modalities. IDR links data from several imaging modalities, including high-content screening, super-resolution and time-lapse microscopy, digital pathology, public genetic or chemical databases, and cell and tissue phenotypes expressed using controlled ontologies. Using this integration, IDR facilitates the analysis of gene networks and reveals functional interactions that are inaccessible to individual studies. To enable re-analysis, we also established a computational resource based on Jupyter notebooks that allows remote access to the entire IDR. IDR is also an open source platform that others can use to publish their own image data. Thus IDR provides both a novel on-line resource and a software infrastructure that promotes and extends publication and re-analysis of scientific image data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.4326DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5536224PMC
August 2017

Orchestrating differential data access for translational research: a pilot implementation.

BMC Med Inform Decis Mak 2017 03 23;17(1):30. Epub 2017 Mar 23.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK.

Background: Translational researchers need robust IT solutions to access a range of data types, varying from public data sets to pseudonymised patient information with restricted access, provided on a case by case basis. The reason for this complication is that managing access policies to sensitive human data must consider issues of data confidentiality, identifiability, extent of consent, and data usage agreements. All these ethical, social and legal aspects must be incorporated into a differential management of restricted access to sensitive data.

Methods: In this paper we present a pilot system that uses several common open source software components in a novel combination to coordinate access to heterogeneous biomedical data repositories containing open data (open access) as well as sensitive data (restricted access) in the domain of biobanking and biosample research. Our approach is based on a digital identity federation and software to manage resource access entitlements.

Results: Open source software components were assembled and configured in such a way that they allow for different ways of restricted access according to the protection needs of the data. We have tested the resulting pilot infrastructure and assessed its performance, feasibility and reproducibility.

Conclusions: Common open source software components are sufficient to allow for the creation of a secure system for differential access to sensitive data. The implementation of this system is exemplary for researchers facing similar requirements for restricted access data. Here we report experience and lessons learnt of our pilot implementation, which may be useful for similar use cases. Furthermore, we discuss possible extensions for more complex scenarios.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12911-017-0424-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5363029PMC
March 2017

The BioStudies database.

Mol Syst Biol 2015 Dec 23;11(12):847. Epub 2015 Dec 23.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton Cambridge, UK.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4704487PMC
http://dx.doi.org/10.15252/msb.20156658DOI Listing
December 2015

Cellular phenotype database: a repository for systems microscopy data.

Bioinformatics 2015 Aug 9;31(16):2736-40. Epub 2015 Apr 9.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.

Motivation: The Cellular Phenotype Database (CPD) is a repository for data derived from high-throughput systems microscopy studies. The aims of this resource are: (i) to provide easy access to cellular phenotype and molecular localization data for the broader research community; (ii) to facilitate integration of independent phenotypic studies by means of data aggregation techniques, including use of an ontology and (iii) to facilitate development of analytical methods in this field.

Results: In this article we present CPD, its data structure and user interface, propose a minimal set of information describing RNA interference experiments, and suggest a generic schema for management and aggregation of outputs from phenotypic or molecular localization experiments. The database has a flexible structure for management of data from heterogeneous sources of systems microscopy experimental outputs generated by a variety of protocols and technologies and can be queried by gene, reagent, gene attribute, study keywords, phenotype or ontology terms.

Availability And Implementation: CPD is developed as part of the Systems Microscopy Network of Excellence and is accessible at http://www.ebi.ac.uk/fg/sym.

Contact: [email protected] or [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btv199DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4528631PMC
August 2015

diXa: a data infrastructure for chemical safety assessment.

Bioinformatics 2015 May 12;31(9):1505-7. Epub 2014 Dec 12.

Department of Toxicogenomics, School of Oncology and Developmental Biology (GROW), Maastricht University, 6200 MD Maastricht, The Netherlands, Dana-Farber Cancer Institute, Brigham and Women's Hospital, Harvard Medical School, Boston, 02215, MA, USA, European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SD, UK, Computational and Systems Medicine, Department of Surgery and Cancer, Imperial College London, South Kensington, London SW7 2AZ, UK, Department of Bioinformatics - BiGCaT, Maastricht University, 6200 MD Maastricht, The Netherlands, Genedata AG, CH-4053 Basel, Switzerland, Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Center of Physiology and Pathophysiology, Institute of Neurophysiology, University of Cologne, Cologne 50931, Germany and European Commission, Joint Research Centre, 21027 Ispra VA, Italy.

Motivation: The field of toxicogenomics (the application of '-omics' technologies to risk assessment of compound toxicities) has expanded in the last decade, partly driven by new legislation, aimed at reducing animal testing in chemical risk assessment but mainly as a result of a paradigm change in toxicology towards the use and integration of genome wide data. Many research groups worldwide have generated large amounts of such toxicogenomics data. However, there is no centralized repository for archiving and making these data and associated tools for their analysis easily available.

Results: The Data Infrastructure for Chemical Safety Assessment (diXa) is a robust and sustainable infrastructure storing toxicogenomics data. A central data warehouse is connected to a portal with links to chemical information and molecular and phenotype data. diXa is publicly available through a user-friendly web interface. New data can be readily deposited into diXa using guidelines and templates available online. Analysis descriptions and tools for interrogating the data are available via the diXa portal.

Availability And Implementation: http://www.dixa-fp7.eu

Contact: [email protected]; [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btu827DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410652PMC
May 2015

ArrayExpress update--simplifying data submissions.

Nucleic Acids Res 2015 Jan 31;43(Database issue):D1113-6. Epub 2014 Oct 31.

European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK

The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is an international functional genomics database at the European Bioinformatics Institute (EMBL-EBI) recommended by most journals as a repository for data supporting peer-reviewed publications. It contains data from over 7000 public sequencing and 42,000 array-based studies comprising over 1.5 million assays in total. The proportion of sequencing-based submissions has grown significantly over the last few years and has doubled in the last 18 months, whilst the rate of microarray submissions is growing slightly. All data in ArrayExpress are available in the MAGE-TAB format, which allows robust linking to data analysis and visualization tools and standardized analysis. The main development over the last two years has been the release of a new data submission tool Annotare, which has reduced the average submission time almost 3-fold. In the near future, Annotare will become the only submission route into ArrayExpress, alongside MAGE-TAB format-based pipelines. ArrayExpress is a stable and highly accessed resource. Our future tasks include automation of data flows and further integration with other EMBL-EBI resources for the representation of multi-omics data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku1057DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383899PMC
January 2015

Updates to BioSamples database at European Bioinformatics Institute.

Nucleic Acids Res 2014 Jan 21;42(Database issue):D50-2. Epub 2013 Nov 21.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

The BioSamples database at the EBI (http://www.ebi.ac.uk/biosamples) provides an integration point for BioSamples information between technology specific databases at the EBI, projects such as ENCODE and reference collections such as cell lines. The database delivers a unified query interface and API to query sample information across EBI's databases and provides links back to assay databases. Sample groups are used to manage related samples, e.g. those from an experimental submission, or a single reference collection. Infrastructural improvements include a new user interface with ontological and key word queries, a new query API, a new data submission API, complete RDF data download and a supporting SPARQL endpoint, accessioning at the point of submission to the European Nucleotide Archive and European Genotype Phenotype Archives and improved query response times.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkt1081DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965081PMC
January 2014

Biomarkers in autism spectrum disorder: the old and the new.

Psychopharmacology (Berl) 2014 Mar 6;231(6):1201-16. Epub 2013 Oct 6.

MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, King's College London, De Crespigny Park, London, SE5 8AF, UK.

Rationale: Autism spectrum disorder (ASD) is a complex heterogeneous neurodevelopmental disorder with onset during early childhood and typically a life-long course. The majority of ASD cases stems from complex, 'multiple-hit', oligogenic/polygenic underpinnings involving several loci and possibly gene-environment interactions. These multiple layers of complexity spur interest into the identification of biomarkers able to define biologically homogeneous subgroups, predict autism risk prior to the onset of behavioural abnormalities, aid early diagnoses, predict the developmental trajectory of ASD children, predict response to treatment and identify children at risk for severe adverse reactions to psychoactive drugs.

Objectives: The present paper reviews (a) similarities and differences between the concepts of 'biomarker' and 'endophenotype', (b) established biomarkers and endophenotypes in autism research (biochemical, morphological, hormonal, immunological, neurophysiological and neuroanatomical, neuropsychological, behavioural), (c) -omics approaches towards the discovery of novel biomarker panels for ASD, (d) bioresource infrastructures and (e) data management for biomarker research in autism.

Results: Known biomarkers, such as abnormal blood levels of serotonin, oxytocin, melatonin, immune cytokines and lymphocyte subtypes, multiple neuropsychological, electrophysiological and brain imaging parameters, will eventually merge with novel biomarkers identified using unbiased genomic, epigenomic, transcriptomic, proteomic and metabolomic methods, to generate multimarker panels. Bioresource infrastructures, data management and data analysis using artificial intelligence networks will be instrumental in supporting efforts to identify these biomarker panels.

Conclusions: Biomarker research has great heuristic potential in targeting autism diagnosis and treatment.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00213-013-3290-7DOI Listing
March 2014

ArrayExpress update--trends in database growth and links to data analysis tools.

Nucleic Acids Res 2013 Jan 27;41(Database issue):D987-90. Epub 2012 Nov 27.

Functional Genomics Team, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.

The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks1174DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531147PMC
January 2013

graph2tab, a library to convert experimental workflow graphs into tabular formats.

Bioinformatics 2012 Jun 3;28(12):1665-7. Epub 2012 May 3.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK.

Motivations: Spreadsheet-like tabular formats are ever more popular in the biomedical field as a mean for experimental reporting. The problem of converting the graph of an experimental workflow into a table-based representation occurs in many such formats and is not easy to solve.

Results: We describe graph2tab, a library that implements methods to realise such a conversion in a size-optimised way. Our solution is generic and can be adapted to specific cases of data exporters or data converters that need to be implemented.

Availability And Implementation: The library source code and documentation are available at http://github.com/ISA-tools/graph2tab.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bts258DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371871PMC
June 2012

The BioSample Database (BioSD) at the European Bioinformatics Institute.

Nucleic Acids Res 2012 Jan 16;40(Database issue):D64-70. Epub 2011 Nov 16.

EMBL-EBI, the European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

The BioSample Database (http://www.ebi.ac.uk/biosamples) is a new database at EBI that stores information about biological samples used in molecular experiments, such as sequencing, gene expression or proteomics. The goals of the BioSample Database include: (i) recording and linking of sample information consistently within EBI databases such as ENA, ArrayExpress and PRIDE; (ii) minimizing data entry efforts for EBI database submitters by enabling submitting sample descriptions once and referencing them later in data submissions to assay databases and (iii) supporting cross database queries by sample characteristics. Each sample in the database is assigned an accession number. The database includes a growing set of reference samples, such as cell lines, which are repeatedly used in experiments and can be easily referenced from any database by their accession numbers. Accession numbers for the reference samples will be exchanged with a similar database at NCBI. The samples in the database can be queried by their attributes, such as sample types, disease names or sample providers. A simple tab-delimited format facilitates submissions of sample information to the database, initially via email to [email protected]
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkr937DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245134PMC
January 2012

A genome-wide metabolic QTL analysis in Europeans implicates two loci shaped by recent positive selection.

PLoS Genet 2011 Sep 8;7(9):e1002270. Epub 2011 Sep 8.

Department of Statistics, University of Oxford, Oxford, United Kingdom.

We have performed a metabolite quantitative trait locus (mQTL) study of the (1)H nuclear magnetic resonance spectroscopy ((1)H NMR) metabolome in humans, building on recent targeted knowledge of genetic drivers of metabolic regulation. Urine and plasma samples were collected from two cohorts of individuals of European descent, with one cohort comprised of female twins donating samples longitudinally. Sample metabolite concentrations were quantified by (1)H NMR and tested for association with genome-wide single-nucleotide polymorphisms (SNPs). Four metabolites' concentrations exhibited significant, replicable association with SNP variation (8.6×10(-11)
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1002270DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3169529PMC
September 2011

Human metabolic profiles are stably controlled by genetic and environmental variation.

Mol Syst Biol 2011 Aug 30;7:525. Epub 2011 Aug 30.

Department of Statistics, University of Oxford, Oxford, UK.

¹H Nuclear Magnetic Resonance spectroscopy (¹H NMR) is increasingly used to measure metabolite concentrations in sets of biological samples for top-down systems biology and molecular epidemiology. For such purposes, knowledge of the sources of human variation in metabolite concentrations is valuable, but currently sparse. We conducted and analysed a study to create such a resource. In our unique design, identical and non-identical twin pairs donated plasma and urine samples longitudinally. We acquired ¹H NMR spectra on the samples, and statistically decomposed variation in metabolite concentration into familial (genetic and common-environmental), individual-environmental, and longitudinally unstable components. We estimate that stable variation, comprising familial and individual-environmental factors, accounts on average for 60% (plasma) and 47% (urine) of biological variation in ¹H NMR-detectable metabolite concentrations. Clinically predictive metabolic variation is likely nested within this stable component, so our results have implications for the effective design of biomarker-discovery studies. We provide a power-calculation method which reveals that sample sizes of a few thousand should offer sufficient statistical precision to detect ¹H NMR-based biomarkers quantifying predisposition to disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/msb.2011.57DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3202796PMC
August 2011

ArrayExpress update--an archive of microarray and high-throughput sequencing-based functional genomics experiments.

Nucleic Acids Res 2011 Jan 10;39(Database issue):D1002-4. Epub 2010 Nov 10.

Functional Genomics Team, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.

The ArrayExpress Archive (http://www.ebi.ac.uk/arrayexpress) is one of the three international public repositories of functional genomics data supporting publications. It includes data generated by sequencing or array-based technologies. Data are submitted by users and imported directly from the NCBI Gene Expression Omnibus. The ArrayExpress Archive is closely integrated with the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkq1040DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013660PMC
January 2011

A System for Information Management in BioMedical Studies--SIMBioMS.

Bioinformatics 2009 Oct 24;25(20):2768-9. Epub 2009 Jul 24.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB101SD, UK.

Unlabelled: SIMBioMS is a web-based open source software system for managing data and information in biomedical studies. It provides a solution for the collection, storage, management and retrieval of information about research subjects and biomedical samples, as well as experimental data obtained using a range of high-throughput technologies, including gene expression, genotyping, proteomics and metabonomics. The system can easily be customized and has proven to be successful in several large-scale multi-site collaborative projects. It is compatible with emerging functional genomics data standards and provides data import and export in accepted standard formats. Protocols for transferring data to durable archives at the European Bioinformatics Institute have been implemented.

Availability: The source code, documentation and initialization scripts are available at http://simbioms.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btp420DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2759553PMC
October 2009

ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expression.

Nucleic Acids Res 2009 Jan 10;37(Database issue):D868-72. Epub 2008 Nov 10.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

ArrayExpress http://www.ebi.ac.uk/arrayexpress consists of three components: the ArrayExpress Repository--a public archive of functional genomics experiments and supporting data, the ArrayExpress Warehouse--a database of gene expression profiles and other bio-measurements and the ArrayExpress Atlas--a new summary database and meta-analytical tool of ranked gene expression across multiple experiments and different biological conditions. The Repository contains data from over 6000 experiments comprising approximately 200,000 assays, and the database doubles in size every 15 months. The majority of the data are array based, but other data types are included, most recently-ultra high-throughput sequencing transcriptomics and epigenetic data. The Warehouse and Atlas allow users to query for differentially expressed genes by gene names and properties, experimental conditions and sample properties, or a combination of both. In this update, we describe the ArrayExpress developments over the last two years.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkn889DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2686529PMC
January 2009

Data storage and analysis in ArrayExpress and Expression Profiler.

Curr Protoc Bioinformatics 2008 Sep;Chapter 7:Unit 7.13

European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

ArrayExpress at the European Bioinformatics Institute is a public database for MIAME-compliant microarray and transcriptomics data. It consists of two parts: the ArrayExpress Repository, which is a public archive of microarray data, and the ArrayExpress Warehouse of Gene Expression Profiles, which contains additionally curated subsets of data from the Repository. Archived experiments can be queried by experimental attributes, such as keywords, species, array platform, publication details, or accession numbers. Gene expression profiles can be queried by gene names and properties, such as Gene Ontology terms, allowing expression profiles visualization. The data can be exported and analyzed using the online data analysis tool named Expression Profiler. Data analysis components, such as data preprocessing, filtering, differentially expressed gene finding, clustering methods, and ordination-based techniques, as well as other statistical tools are all available in Expression Profiler, via integration with the statistical package R.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/0471250953.bi0713s23DOI Listing
September 2008

The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics.

Nat Biotechnol 2007 Oct;25(10):1127-33

School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK.

The Functional Genomics Experiment data model (FuGE) has been developed to facilitate convergence of data standards for high-throughput, comprehensive analyses in biology. FuGE models the components of an experimental activity that are common across different technologies, including protocols, samples and data. FuGE provides a foundation for describing entire laboratory workflows and for the development of new data formats. The Microarray Gene Expression Data society and the Proteomics Standards Initiative have committed to using FuGE as the basis for defining their respective standards, and other standards groups, including the Metabolomics Standards Initiative, are evaluating FuGE in their development efforts. Adoption of FuGE by multiple standards bodies will enable uniform reporting of common parts of functional genomics workflows, simplify data-integration efforts and ease the burden on researchers seeking to fulfill multiple minimum reporting requirements. Such advances are important for transparent data management and mining in functional genomics and systems biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt1347DOI Listing
October 2007
-->