Publications by authors named "Michael L Heuer"

6 Publications

  • Page 1 of 1

The impact of Docker containers on the performance of genomic pipelines.

PeerJ 2015 24;3:e1273. Epub 2015 Sep 24.

Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG) , Barcelona , Spain ; Universitat Pompeu Fabra (UPF) , Barcelona , Spain.

Genomic pipelines consist of several pieces of third party software and, because of their experimental nature, frequent changes and updates are commonly necessary thus raising serious deployment and reproducibility issues. Docker containers are emerging as a possible solution for many of these problems, as they allow the packaging of pipelines in an isolated and self-contained manner. This makes it easy to distribute and execute pipelines in a portable manner across a wide range of computing platforms. Thus, the question that arises is to what extent the use of Docker containers might affect the performance of these pipelines. Here we address this question and conclude that Docker containers have only a minor impact on the performance of common genomic pipelines, which is negligible when the executed jobs are long in terms of computational time.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.1273DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4586803PMC
September 2015

Minimum information for reporting next generation sequence genotyping (MIRING): Guidelines for reporting HLA and KIR genotyping via next generation sequencing.

Hum Immunol 2015 Dec 25;76(12):954-62. Epub 2015 Sep 25.

National Marrow Donor Program, Minneapolis, MN, USA.

The development of next-generation sequencing (NGS) technologies for HLA and KIR genotyping is rapidly advancing knowledge of genetic variation of these highly polymorphic loci. NGS genotyping is poised to replace older methods for clinical use, but standard methods for reporting and exchanging these new, high quality genotype data are needed. The Immunogenomic NGS Consortium, a broad collaboration of histocompatibility and immunogenetics clinicians, researchers, instrument manufacturers and software developers, has developed the Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines. MIRING is a checklist that specifies the content of NGS genotyping results as well as a set of messaging guidelines for reporting the results. A MIRING message includes five categories of structured information - message annotation, reference context, full genotype, consensus sequence and novel polymorphism - and references to three categories of accessory information - NGS platform documentation, read processing documentation and primary data. These eight categories of information ensure the long-term portability and broad application of this NGS data for all current histocompatibility and immunogenetics use cases. In addition, MIRING can be extended to allow the reporting of genotype data generated using pre-NGS technologies. Because genotyping results reported using MIRING are easily updated in accordance with reference and nomenclature databases, MIRING represents a bold departure from previous methods of reporting HLA and KIR genotyping results, which have provided static and less-portable data. More information about MIRING can be found online at miring.immunogenomics.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.humimm.2015.09.011DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4674382PMC
December 2015

BioJava: an open-source framework for bioinformatics in 2012.

Bioinformatics 2012 Oct 9;28(20):2693-5. Epub 2012 Aug 9.

San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA.

Unlabelled: BioJava is an open-source project for processing of biological data in the Java programming language. We have recently released a new version (3.0.5), which is a major update to the code base that greatly extends its functionality.

Results: BioJava now consists of several independent modules that provide state-of-the-art tools for protein structure comparison, pairwise and multiple sequence alignments, working with DNA and protein sequences, analysis of amino acid properties, detection of protein modifications and prediction of disordered regions in proteins as well as parsers for common file formats using a biologically meaningful data model.

Availability: BioJava is an open-source project distributed under the Lesser GPL (LGPL). BioJava can be downloaded from the BioJava website (http://www.biojava.org). BioJava requires Java 1.6 or higher. All inquiries should be directed to the BioJava mailing lists. Details are available at http://biojava.org/wiki/BioJava:MailingLists.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3467744PMC
http://dx.doi.org/10.1093/bioinformatics/bts494DOI Listing
October 2012

The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data.

Cancer Discov 2012 May;2(5):401-4

Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA.

The cBio Cancer Genomics Portal (http://cbioportal.org) is an open-access resource for interactive exploration of multidimensional cancer genomics data sets, currently providing access to data from more than 5,000 tumor samples from 20 cancer studies. The cBio Cancer Genomics Portal significantly lowers the barriers between complex genomic data and cancer researchers who want rapid, intuitive, and high-quality access to molecular profiles and clinical attributes from large-scale cancer genomics projects and empowers researchers to translate these rich data sets into biologic insights and clinical applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1158/2159-8290.CD-12-0095DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3956037PMC
May 2012

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.

Nucleic Acids Res 2010 Apr 16;38(6):1767-71. Epub 2009 Dec 16.

Plant Pathology, SCRI, Invergowrie, Dundee DD2 5DA, UK.

FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkp1137DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217PMC
April 2010

Databases and information integration for the Medicago truncatula genome and transcriptome.

Plant Physiol 2005 May;138(1):38-46

Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota 55108, USA.

An international consortium is sequencing the euchromatic genespace of Medicago truncatula. Extensive bioinformatic and database resources support the marker-anchored bacterial artificial chromosome (BAC) sequencing strategy. Existing physical and genetic maps and deep BAC-end sequencing help to guide the sequencing effort, while EST databases provide essential resources for genome annotation as well as transcriptome characterization and microarray design. Finished BAC sequences are joined into overlapping sequence assemblies and undergo an automated annotation process that integrates ab initio predictions with EST, protein, and other recognizable features. Because of the sequencing project's international and collaborative nature, data production, storage, and visualization tools are broadly distributed. This paper describes databases and Web resources for the project, which provide support for physical and genetic maps, genome sequence assembly, gene prediction, and integration of EST data. A central project Web site at medicago.org/genome provides access to genome viewers and other resources project-wide, including an Ensembl implementation at medicago.org, physical map and marker resources at mtgenome.ucdavis.edu, and genome viewers at the University of Oklahoma (www.genome.ou.edu), the Institute for Genomic Research (www.tigr.org), and Munich Information for Protein Sequences Center (mips.gsf.de).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1104/pp.104.059204DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1104158PMC
May 2005