StrainSeeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees.

PeerJ 2017 18;5:e3353. Epub 2017 May 18.

Department of Bioinformatics, University of Tartu, Tartu, Estonia.

Background: Fast, accurate and high-throughput identification of bacterial isolates is in great demand. The present work was conducted to investigate the possibility of identifying isolates from unassembled next-generation sequencing reads using custom-made guide trees.

Results: A tool named StrainSeeker was developed that constructs a list of specific -mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1-2 min. It uses a novel algorithm, which analyses the observed and expected fractions of node-specific -mers to test the presence of each node in the sample. This allows StrainSeeker to determine where the isolate branches off the guide tree and assign it to a clade whereas other tools assign each read to a reference genome. Using a dataset of 100 isolates, we demonstrate that StrainSeeker can predict the clades of with 92% accuracy and correct tree branch assignment with 98% accuracy. Twenty-five thousand Illumina HiSeq reads are sufficient for identification of the strain.

Conclusion: StrainSeeker is a software program that identifies bacterial isolates by assigning them to nodes or leaves of a custom-made guide tree. StrainSeeker's web interface and pre-computed guide trees are available at http://bioinfo.ut.ee/strainseeker. Source code is stored at GitHub: https://github.com/bioinfo-ut/StrainSeeker.

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.3353DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5438578PMC
May 2017
49 Reads

Publication Analysis

Top Keywords

identification bacterial
12
bacterial isolates
12
guide tree
8
custom-made guide
8
sequencing reads
8
guide trees
8
guide
5
strainseeker
5
isolates
5
strainseeker software
4
identification strainconclusion
4
expected fractions
4
sample allows
4
observed expected
4
fractions node-specific
4
node-specific -mers
4
presence node
4
strainconclusion strainseeker
4
-mers test
4
node sample
4

Similar Publications