Publications by authors named "Erwin Winder"

2 Publications

  • Page 1 of 1

MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks.

Bioinformatics 2016 07 21;32(14):2176-83. Epub 2016 Mar 21.

Department of Genetics, University Medical Center Groningen, Genomics Coordination Center, University of Groningen, Groningen, The Netherlands Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.

Motivation: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration.

Results: To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data.

Availability And Implementation: Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect

Contact: : m.a.swertz@rug.nl

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btw155DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4937195PMC
July 2016

Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration.

BMC Res Notes 2014 Dec 11;7:901. Epub 2014 Dec 11.

University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands.

Background: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference.

Findings: Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java 'Genotype-IO' API. All software is open source under license LGPLv3 and available from http://www.molgenis.org/systemsgenetics.

Conclusions: GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1756-0500-7-901DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4307387PMC
December 2014