FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads.

Sci Rep 2017 05 31;7(1):2537. Epub 2017 May 31.

Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia.

We have developed a computational method that counts the frequencies of unique k-mers in FASTQ-formatted genome data and uses this information to infer the genotypes of known variants. FastGT can detect the variants in a 30x genome in less than 1 hour using ordinary low-cost server hardware. The overall concordance with the genotypes of two Illumina "Platinum" genomes is 99.96%, and the concordance with the genotypes of the Illumina HumanOmniExpress is 99.82%. Our method provides k-mer database that can be used for the simultaneous genotyping of approximately 30 million single nucleotide variants (SNVs), including >23,000 SNVs from Y chromosome. The source code of FastGT software is available at GitHub (https://github.com/bioinfo-ut/GenomeTester4/).

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-017-02487-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5451431PMC
May 2017
24 Reads

Publication Analysis

Top Keywords

concordance genotypes
8
genotypes illumina
8
k-mer database
4
infer genotypes
4
genotyping single
4
variants fastgt
4
genotypes variants
4
data infer
4
genome data
4
k-mers fastq-formatted
4
genomes 9996%
4
"platinum" genomes
4
fastq-formatted genome
4
illumina "platinum"
4
>23000 snvs
4
including >23000
4
detect variants
4
nucleotide variants
4
ordinary low-cost
4
low-cost server
4

Similar Publications