Publications by authors named "Susanna-Assunta Sansone"

108 Publications

ISA API: An open platform for interoperable life science experimental metadata.

Gigascience 2021 Sep;10(9)

Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK.

Background: The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab-a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, the JSON serialization ISA-JSON was developed.

Results: In this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python object classes. We describe the ISA API feature set, early adopters, and its growing user community.

Conclusions: The ISA API provides users with rich programmatic metadata-handling functionality to support automation, a common interface, and an interoperable medium between the 2 ISA formats, as well as with other life science data formats required for depositing data in public databases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giab060DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8444265PMC
September 2021

Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group.

Wellcome Open Res 2020 26;5:267. Epub 2021 May 26.

Laurentian University, Ontario, P3E 2C6, Canada.

The systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. The Research Data Alliance COVID-19 Working Group recently published a set of recommendations and guidelines on data sharing and related best practices for COVID-19 research. These guidelines include recommendations for clinicians, researchers, policy- and decision-makers, funders, publishers, public health experts, disaster preparedness and response experts, infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations), and other potential users. These guidelines include recommendations for researchers, policymakers, funders, publishers and infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations). Several overarching themes have emerged from this document such as the need to balance the creation of data adherent to FAIR principles (findable, accessible, interoperable and reusable), with the need for quick data release; the use of trustworthy research data repositories; the use of well-annotated data with meaningful metadata; and practices of documenting methods and software. The resulting document marks an unprecedented cross-disciplinary, cross-sectoral, and cross-jurisdictional effort authored by over 160 experts from around the globe. This letter summarises key points of the Recommendations and Guidelines, highlights the relevant findings, shines a spotlight on the process, and suggests how these developments can be leveraged by the wider scientific community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/wellcomeopenres.16378.2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7808050.2PMC
May 2021

Barely sufficient practices in scientific computing.

Patterns (N Y) 2021 Feb 12;2(2):100206. Epub 2021 Feb 12.

Department of Computer Science, University of Oxford, Oxford, UK.

The importance of software to modern research is well understood, as is the way in which software developed for research can support or undermine important research principles of findability, accessibility, interoperability, and reusability (FAIR). We propose a minimal subset of common software engineering principles that enable FAIRness of computational research and can be used as a baseline for software engineering in any research discipline.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.patter.2021.100206DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7892476PMC
February 2021

Community standards for open cell migration data.

Gigascience 2020 05;9(5)

VIB-UGent Center for Medical Biotechnology, VIB, A. Baertsoenkaai 3, B-9000, Ghent, Belgium.

Cell migration research has become a high-content field. However, the quantitative information encapsulated in these complex and high-dimensional datasets is not fully exploited owing to the diversity of experimental protocols and non-standardized output formats. In addition, typically the datasets are not open for reuse. Making the data open and Findable, Accessible, Interoperable, and Reusable (FAIR) will enable meta-analysis, data integration, and data mining. Standardized data formats and controlled vocabularies are essential for building a suitable infrastructure for that purpose but are not available in the cell migration domain. We here present standardization efforts by the Cell Migration Standardisation Organisation (CMSO), an open community-driven organization to facilitate the development of standards for cell migration data. This work will foster the development of improved algorithms and tools and enable secondary analysis of public datasets, ultimately unlocking new knowledge of the complex biological process of cell migration.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giaa041DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7317087PMC
May 2020

Enabling reusability of plant phenomic datasets with MIAPPE 1.1.

New Phytol 2020 07 25;227(1):260-273. Epub 2020 Apr 25.

Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich, NR4 7UH, UK.

Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/nph.16544DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7317793PMC
July 2020

Semantic concept schema of the linear mixed model of experimental observations.

Sci Data 2020 02 27;7(1):70. Epub 2020 Feb 27.

Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland.

In the information age, smart data modelling and data management can be carried out to address the wealth of data produced in scientific experiments. In this paper, we propose a semantic model for the statistical analysis of datasets by linear mixed models. We tie together disparate statistical concepts in an interdisciplinary context through the application of ontologies, in particular the Statistics Ontology (STATO), to produce FAIR data summaries. We hope to improve the general understanding of statistical modelling and thus contribute to a better description of the statistical conclusions from data analysis, allowing their efficient exploration and automated processing.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-020-0409-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7046786PMC
February 2020

TeSS: a platform for discovering life-science training opportunities.

Bioinformatics 2020 05;36(10):3290-3291

Department of Computer Science, The University of Manchester, Manchester M13 9PL, UK.

Summary: Dispersed across the Internet is an abundance of disparate, disconnected training information, making it hard for researchers to find training opportunities that are relevant to them. To address this issue, we have developed a new platform-TeSS-which aggregates geographically distributed information and presents it in a central, feature-rich portal. Data are gathered automatically from content providers via bespoke scripts. These resources are cross-linked with related data and tools registries, and made available via a search interface, a data API and through widgets.

Availability And Implementation: https://tess.elixir-europe.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa047DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214044PMC
May 2020

Experiment design driven FAIRification of omics data matrices, an exemplar.

Sci Data 2019 12 12;6(1):271. Epub 2019 Dec 12.

Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, United Kingdom.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-019-0286-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6908569PMC
December 2019

FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources.

Cell Syst 2019 11 30;9(5):417-421. Epub 2019 Oct 30.

Department of Population Health and Reproduction, School of Veterinary Medicine, UC Davis, Davis, CA 95616, USA.

As more digital resources are produced by the research community, it is becoming increasingly important to harmonize and organize them for synergistic utilization. The findable, accessible, interoperable, and reusable (FAIR) guiding principles have prompted many stakeholders to consider strategies for tackling this challenge. The FAIRshake toolkit was developed to enable the establishment of community-driven FAIR metrics and rubrics paired with manual and automated FAIR assessments. FAIR assessments are visualized as an insignia that can be embedded within digital-resources-hosting websites. Using FAIRshake, a variety of biomedical digital resources were manually and automatically evaluated for their level of FAIRness.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cels.2019.09.011DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7316196PMC
November 2019

Author Correction: Evaluating FAIR maturity through a scalable, automated, community-governed framework.

Sci Data 2019 Oct 21;6(1):230. Epub 2019 Oct 21.

GO FAIR International Support and Coordination Office, Leiden, The Netherlands.

An amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-019-0248-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6803632PMC
October 2019

Evaluating FAIR maturity through a scalable, automated, community-governed framework.

Sci Data 2019 09 20;6(1):174. Epub 2019 Sep 20.

GO FAIR International Support and Coordination Office, Leiden, The Netherlands.

Transparent evaluations of FAIRness are increasingly required by a wide range of stakeholders, from scientists to publishers, funding agencies and policy makers. We propose a scalable, automatable framework to evaluate digital resources that encompasses measurable indicators, open source tools, and participation guidelines, which come together to accommodate domain relevant community-defined FAIR assessments. The components of the framework are: (1) Maturity Indicators - community-authored specifications that delimit a specific automatically-measurable FAIR behavior; (2) Compliance Tests - small Web apps that test digital resources against individual Maturity Indicators; and (3) the Evaluator, a Web application that registers, assembles, and applies community-relevant sets of Compliance Tests against a digital resource, and provides a detailed report about what a machine "sees" when it visits that resource. We discuss the technical and social considerations of FAIR assessments, and how this translates to our community-driven infrastructure. We then illustrate how the output of the Evaluator tool can serve as a roadmap to assist data stewards to incrementally and realistically improve the FAIRness of their resources.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41597-019-0184-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6754447PMC
September 2019

Interoperable and scalable data analysis with microservices: applications in metabolomics.

Bioinformatics 2019 10;35(19):3752-3760

CEA, LIST, Laboratory for Data Analysis and Systems' Intelligence, MetaboHUB, Gif-sur-Yvette, France.

Motivation: Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator.

Results: We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.

Availability And Implementation: The PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz160DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6761976PMC
October 2019

PhenoMeNal: processing and analysis of metabolomics data in the cloud.

Gigascience 2019 02;8(2)

Leibniz Institute of Plant Biochemistry, Stress and Developmental Biology, Weinberg 3, 06120 Halle (Saale), Germany.

Background: Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological, and many other applied biological domains. Its computationally intensive nature has driven requirements for open data formats, data repositories, and data analysis tools. However, the rapid progress has resulted in a mosaic of independent, and sometimes incompatible, analysis methods that are difficult to connect into a useful and complete data analysis solution.

Findings: PhenoMeNal (Phenome and Metabolome aNalysis) is an advanced and complete solution to set up Infrastructure-as-a-Service (IaaS) that brings workflow-oriented, interoperable metabolomics data analysis platforms into the cloud. PhenoMeNal seamlessly integrates a wide array of existing open-source tools that are tested and packaged as Docker containers through the project's continuous integration process and deployed based on a kubernetes orchestration framework. It also provides a number of standardized, automated, and published analysis workflows in the user interfaces Galaxy, Jupyter, Luigi, and Pachyderm.

Conclusions: PhenoMeNal constitutes a keystone solution in cloud e-infrastructures available for metabolomics. PhenoMeNal is a unique and complete solution for setting up cloud e-infrastructures through easy-to-use web interfaces that can be scaled to any custom public and private cloud environment. By harmonizing and automating software installation and configuration and through ready-to-use scientific workflow user interfaces, PhenoMeNal has succeeded in providing scientists with workflow-driven, reproducible, and shareable metabolomics data analysis platforms that are interfaced through standard data formats, representative datasets, versioned, and have been tested for reproducibility and interoperability. The elastic implementation of PhenoMeNal further allows easy adaptation of the infrastructure to other application areas and 'omics research domains.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giy149DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6377398PMC
February 2019

A design framework and exemplar metrics for FAIRness.

Sci Data 2018 06 26;5:180118. Epub 2018 Jun 26.

Institute of Data Science, Maastricht University, Maastricht, The Netherlands.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2018.118DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6018520PMC
June 2018

High-quality science requires high-quality open data infrastructure.

Sci Data 2018 02 27;5:180027. Epub 2018 Feb 27.

STFC Rutherford Appleton Laboratory, Scientific Computing, Harwell Campus, Didcot OX11 0QX, UK.

Resources for data management, discovery and (re)use are numerous and diverse, and more specifically we need data resources that enable the FAIR principles of Findability, Accessibility, Interoperability and Reusability of data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2018.27DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5827691PMC
February 2018

DataMed - an open source discovery index for finding biomedical datasets.

J Am Med Inform Assoc 2018 Mar;25(3):300-308

School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.

Objective: Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain.

Materials And Methods: DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health-funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine.

Results And Conclusion: Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 ([email protected], the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocx121DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7378878PMC
March 2018

Data discovery with DATS: exemplar adoptions and lessons learned.

J Am Med Inform Assoc 2018 01;25(1):13-16

Oxford e-Research Centre, Engineering Science, University of Oxford, Oxford, UK.

The DAta Tag Suite (DATS) is a model supporting dataset description, indexing, and discovery. It is available as an annotated serialization with schema.org, a vocabulary used by major search engines, thus making the datasets discoverable on the web. DATS underlies DataMed, the National Institutes of Health Big Data to Knowledge Data Discovery Index prototype, which aims to provide a "PubMed for datasets." The experience gained while indexing a heterogeneous range of >60 repositories in DataMed helped in evaluating DATS's entities, attributes, and scope. In this work, 3 additional exemplary and diverse data sources were mapped to DATS by their representatives or experts, offering a deep scan of DATS fitness against a new set of existing data. The procedure, including feedback from users and implementers, resulted in DATS implementation guidelines and best practices, and identification of a path for evolving and optimizing the model. Finally, the work exposed additional needs when defining datasets for indexing, especially in the context of clinical and observational information.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jamia/ocx119DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6481379PMC
January 2018

The future of metabolomics in ELIXIR.

F1000Res 2017 6;6. Epub 2017 Sep 6.

Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, L-4367, Luxembourg.

Metabolomics, the youngest of the major omics technologies, is supported by an active community of researchers and infrastructure developers across Europe. To coordinate and focus efforts around infrastructure building for metabolomics within Europe, a workshop on the "Future of metabolomics in ELIXIR" was organised at Frankfurt Airport in Germany. This one-day strategic workshop involved representatives of ELIXIR Nodes, members of the PhenoMeNal consortium developing an e-infrastructure that supports workflow-based metabolomics analysis pipelines, and experts from the international metabolomics community. The workshop established as the critical area, where a maximal impact of computational metabolomics and data management on other fields could be achieved. In particular, the existing four ELIXIR Use Cases, where the metabolomics community - both industry and academia - would benefit most, and which could be exhaustively mapped onto the current five ELIXIR Platforms were discussed. This opinion article is a call for support for a new ELIXIR metabolomics Use Case, which aligns with and complements the existing and planned ELIXIR Platforms and Use Cases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.12342.2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5627583PMC
September 2017

Developing a strategy for computational lab skills training through Software and Data Carpentry: Experiences from the ELIXIR Pilot action.

F1000Res 2017 3;6. Epub 2017 Jul 3.

ELIXIR-UK, Software Sustainability Institute UK, School of Computer Science, University of Manchester, Manchester, M13 9PL, UK.

Quality training in computational skills for life scientists is essential to allow them to deliver robust, reproducible and cutting-edge research. A pan-European bioinformatics programme, ELIXIR, has adopted a well-established and progressive programme of computational lab and data skills training from Software and Data Carpentry, aimed at increasing the number of skilled life scientists and building a sustainable training community in this field. This article describes the Pilot action, which introduced the Carpentry training model to the ELIXIR community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.11718.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5516217PMC
July 2017

Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data.

PLoS Biol 2017 Jun 29;15(6):e2001414. Epub 2017 Jun 29.

Oxford e-Research Centre, University of Oxford, Oxford, United Kingdom.

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pbio.2001414DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5490878PMC
June 2017

DATS, the data tag suite to enable discoverability of datasets.

Sci Data 2017 06 6;4:170059. Epub 2017 Jun 6.

University of California San Diego, 9500 Gilman Dr, La Jolla, California 92093, USA.

Today's science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)'s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed's goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of dataset, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as an annotated serialization in schema.org, which in turn is widely used by major search engines like Google, Microsoft, Yahoo and Yandex.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2017.59DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5460592PMC
June 2017

Measures for interoperability of phenotypic data: minimum information requirements and formatting.

Plant Methods 2016 9;12:44. Epub 2016 Nov 9.

Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479 Poznań, Poland.

Background: Plant phenotypic data shrouds a wealth of information which, when accurately analysed and linked to other data types, brings to light the knowledge about the mechanisms of life. As phenotyping is a field of research comprising manifold, diverse and time-consuming experiments, the findings can be fostered by reusing and combining existing datasets. Their correct interpretation, and thus replicability, comparability and interoperability, is possible provided that the collected observations are equipped with an adequate set of metadata. So far there have been no common standards governing phenotypic data description, which hampered data exchange and reuse.

Results: In this paper we propose the guidelines for proper handling of the information about plant phenotyping experiments, in terms of both the recommended content of the description and its formatting. We provide a document called "Minimum Information About a Plant Phenotyping Experiment", which specifies what information about each experiment should be given, and a Phenotyping Configuration for the ISA-Tab format, which allows to practically organise this information within a dataset. We provide examples of ISA-Tab-formatted phenotypic data, and a general description of a few systems where the recommendations have been implemented.

Conclusions: Acceptance of the rules described in this paper by the plant phenotyping community will help to achieve findable, accessible, interoperable and reusable data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13007-016-0144-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5103589PMC
November 2016
-->