641 results match your criteria k-mer analysis


Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy.

PLoS One 2021 14;16(10):e0258693. Epub 2021 Oct 14.

Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel.

Information theoretic approaches are ubiquitous and effective in a wide variety of bioinformatics applications. In comparative genomics, alignment-free methods, based on short DNA words, or k-mers, are particularly powerful. We evaluated the utility of varying k-mer lengths for genome comparisons by analyzing their sequence space coverage of 5805 genomes in the KEGG GENOME database. Read More

View Article and Full-Text PDF
October 2021

Draft genome of Semisulcospira libertina, a species of freshwater snail.

Genomics Inform 2021 Sep 30;19(3):e32. Epub 2021 Sep 30.

Department of Orthopaedic Surgery, Gyeongsang National University Hospital, Jinju 52727, Korea.

Semisulcospira libertina, a species of freshwater snail, is widespread in East Asia. It is important as a food source. Additionally, it is a vector of clonorchiasis, paragonimiasis, metagonimiasis, and other parasites. Read More

View Article and Full-Text PDF
September 2021

Anti-cancer Peptide Recognition Based on Grouped Sequence and Spatial Dimension Integrated Networks.

Interdiscip Sci 2021 Oct 12. Epub 2021 Oct 12.

People's Hospital of Xinjiang Uygur Autonomous Region, Urumqi, Xinjiang, China.

The diversification of the characteristic sequences of anti-cancer peptides has imposed difficulties on research. To effectively predict new anti-cancer peptides, this paper proposes a more suitable feature grouping sequence and spatial dimension-integrated network algorithm for anti-cancer peptide sequence prediction called GRCI-Net. The main process is as follows: First, we implemented the fusion reduction of binary structure features and K-mer sparse matrix features through principal component analysis and generated a set of new features; second, we constructed a new bidirectional long- and short-term memory network. Read More

View Article and Full-Text PDF
October 2021

Genome-Wide Association Study Reveals Genetic Markers for Antimicrobial Resistance in Mycoplasma bovis.

Microbiol Spectr 2021 Oct 6:e0026221. Epub 2021 Oct 6.

Department of Pathology, Bacteriology, and Avian Diseases, Faculty of Veterinary Medicine, Ghent Universitygrid.5342.0, Merelbeke, Belgium.

Mycoplasma bovis causes many health and welfare problems in cattle. Due to the absence of clear insights regarding transmission dynamics and the lack of a registered vaccine in Europe, control of an outbreak depends mainly on antimicrobial therapy. Unfortunately, antimicrobial susceptibility testing (AST) is usually not performed, because it is time-consuming and no standard protocol or clinical breakpoints are available. Read More

View Article and Full-Text PDF
October 2021

K-mer counting and curated libraries drive efficient annotation of repeats in plant genomes.

Plant Genome 2021 Sep 25:e20143. Epub 2021 Sep 25.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

The annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole-genome alignment, promoter analysis, or pangenome exploration. Although homology-based annotation methods are computationally expensive, k-mer strategies for masking are orders of magnitude faster. Read More

View Article and Full-Text PDF
September 2021

Unlocking inaccessible historical genomes preserved in formalin.

Mol Ecol Resour 2021 Sep 22. Epub 2021 Sep 22.

National Research Collections Australia, Commonwealth Scientific Industrial Research Organisation, Canberra, ACT, Australia.

Museum specimens represent an unparalleled record of historical genomic data. However, the widespread practice of formalin preservation has thus far impeded genomic analysis of a large proportion of specimens. Limited DNA sequencing from formalin-preserved specimens has yielded low genomic coverage with unpredictable success. Read More

View Article and Full-Text PDF
September 2021

STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions.

Genome Biol 2021 Sep 20;22(1):270. Epub 2021 Sep 20.

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms. Read More

View Article and Full-Text PDF
September 2021

Extraction of long k-mers using spaced seeds.

IEEE/ACM Trans Comput Biol Bioinform 2021 Sep 16;PP. Epub 2021 Sep 16.

The extraction of k-mers from reads is an important task in many bioinformatics applications, such as all DNA sequence analysis methods based on de Bruijn graphs. These methods tend to be more accurate when the used k-mers are unique in the analyzed DNA, and thus the use of longer k-mers is preferred. When the read lengths of short read sequencing technologies increase, the error rate will become the determining factor for the largest possible value of k. Read More

View Article and Full-Text PDF
September 2021

Characterization of spp. Strains Isolated From Wild Birds in Turkey.

Front Microbiol 2021 18;12:712106. Epub 2021 Aug 18.

German Federal Institute for Risk Assessment, Department of Biological Safety, National Reference Laboratory for Campylobacter, Berlin, Germany.

Turkey is an important stopover site for many migrating birds between Europe, Asia and Africa. spp. are frequently found in wildlife, in particular waterfowl, and distinct strains are disseminated within this reservoir. Read More

View Article and Full-Text PDF

Alignment-free methods for polyploid genomes: Quick and reliable genetic distance estimation.

Mol Ecol Resour 2021 Sep 3. Epub 2021 Sep 3.

Biology Department, Wesleyan University, Middletown, CT, USA.

Polyploid genomes pose several inherent challenges to population genetic analyses. While alignment-based methods are fundamentally limited in their applicability to polyploids, alignment-free methods bypass most of these limits. We investigated the use of Mash, a k-mer analysis tool that uses the MinHash method to reduce complexity in large genomic data sets, for basic population genetic analyses of polyploid sequences. Read More

View Article and Full-Text PDF
September 2021

Tissue-specific DNase I footprint analysis confirms the association of Q470* variant with intellectual disability.

J Genet 2021 ;100

Department of Chemistry, Savitribai Phule Pune University, Pune 411 007, India

Intellectual disability (ID) is a neurodevelopmental disorder in which genetics play a key aetiological role. GATA zinc finger domain-containing 2B () gene encodes a zinc-finger protein transcriptional repressor which is a part of the methyl-CpG binding protein-1 complex. Pathogenic variants in this gene are linked to ID, dysmorphic features, and cognitive disability. Read More

View Article and Full-Text PDF
January 2021

KARGA: Multi-platform Toolkit for -mer-based Antibiotic Resistance Gene Analysis of High-throughput Sequencing Data.

IEEE EMBS Int Conf Biomed Health Inform 2021 Jul 10;2021. Epub 2021 Aug 10.

Data Intelligence Systems Lab, Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL, USA.

High-throughput sequencing is widely used for strain detection and characterization of antibiotic resistance in microbial metagenomic samples. Current analytical tools use curated antibiotic resistance gene (ARG) databases to classify individual sequencing reads or assembled contigs. However, identifying ARGs from raw read data can be time consuming (especially if assembly or alignment is required) and challenging, due to genome rearrangements and mutations. Read More

View Article and Full-Text PDF

Sequencing and de Novo Assembly of Abaca ( Née) var. Abuab Genome.

Genes (Basel) 2021 Aug 2;12(8). Epub 2021 Aug 2.

Sustainable Perennial Crops Laboratory, USDA-ARS, Beltsville, MD 20705, USA.

Abaca ( Née), an indigenous crop to the Philippines, is known to be the source of the strongest natural fiber. Despite its huge economic contributions, research on crop improvement is limited due to the lack of genomic data. In this study, the whole genome of the abaca var. Read More

View Article and Full-Text PDF

Ensemble Classifiers for Multiclass MicroRNA Classification.

Methods Mol Biol 2022 ;2257:235-254

Department of Information System, Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel.

Gene regulation is of utmost importance to cell homeostasis; thus, any dysregulation in it often leads to disease. MicroRNAs (miRNAs) are involved in posttranscriptional gene regulation and consequently, their dysregulation has been associated with many diseases.MiRBase version 21 contains microRNAs from about 200 species organized into about 70 clades. Read More

View Article and Full-Text PDF
January 2022

Estimation of Genome Size in the Endemic Species and the Locally Rare Species Using comparative Analyses of Flow Cytometry and K-Mer Approaches.

Plants (Basel) 2021 Jul 3;10(7). Epub 2021 Jul 3.

Department of Botany and Microbiology, College of Science bldg5, King Saud University, Riyadh 11451, Saudi Arabia.

Genome size is one of the fundamental cytogenetic features of a species, which is critical for the design and initiation of any genome sequencing projects and can provide essential insights in studying taxonomy, cytogenetics, phylogenesis, and evolutionary studies. However, this key cytogenetic information is almost lacking in the endemic species and the locally rare species in Saudi Arabia. Therefore, genome size was analyzed by propidium iodide PI flow cytometry and compared to k-mer analysis methods. Read More

View Article and Full-Text PDF

A chromosome-anchored genome assembly for Lake Trout (Salvelinus namaycush).

Mol Ecol Resour 2021 Aug 5. Epub 2021 Aug 5.

Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, QC, Canada.

Here, we present an annotated, chromosome-anchored, genome assembly for Lake Trout (Salvelinus namaycush) - a highly diverse salmonid species of notable conservation concern and an excellent model for research on adaptation and speciation. We leveraged Pacific Biosciences long-read sequencing, paired-end Illumina sequencing, proximity ligation (Hi-C) sequencing, and a previously published linkage map to produce a highly contiguous assembly composed of 7378 contigs (contig N50 = 1.8 Mb) assigned to 4120 scaffolds (scaffold N50 = 44. Read More

View Article and Full-Text PDF

Competitiveness for Nodule Colonization in Sinorhizobium meliloti: Combined -Tagged Strain Competition and Genome-Wide Association Analysis.

mSystems 2021 Aug 27;6(4):e0055021. Epub 2021 Jul 27.

Department of Biology, University of Bari Aldo Morogrid.7644.1, Bari, Italy.

Associations between leguminous plants and symbiotic nitrogen-fixing rhizobia are a classic example of mutualism between a eukaryotic host and a specific group of prokaryotic microbes. Although this symbiosis is in part species specific, different rhizobial strains may colonize the same nodule. Some rhizobial strains are commonly known as better competitors than others, but detailed analyses that aim to predict rhizobial competitive abilities based on genomes are still scarce. Read More

View Article and Full-Text PDF

Minimum functional length analysis of k-mer based on BP neural network.

IEEE/ACM Trans Comput Biol Bioinform 2021 Jul 26;PP. Epub 2021 Jul 26.

BP neural network (BPNN), as a multilayer feed-forward network, can realize the deep cognition to target data and high accuracy to output results. However, there were still no related research of k-mer based on BPNN yet. In present study, BPNN was used to train and test binary classification data of each classification mode respectively. Read More

View Article and Full-Text PDF

Analysis of DNA Sequence Classification Using CNN and Hybrid Models.

Comput Math Methods Med 2021 15;2021:1835056. Epub 2021 Jul 15.

Department of Computer Science, Ambo University, Ambo, Post Box No.: 19, Ethiopia.

In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine learning techniques have used to complete this task in recent years successfully. Identification and classification of viruses are essential to avoid an outbreak like COVID-19. Read More

View Article and Full-Text PDF

The karyotype, genome survey, and assembly of Mud artemisia (Artemisia selengensis).

Mol Biol Rep 2021 Aug 23;48(8):5897-5904. Epub 2021 Jul 23.

Hubei Engineering Research Center for Protection and Utilization of Special Biological Resources in the Hanjiang River BasinSchool of Life Science, Jianghan University, Wuhan, 430056, China.

Background: Artemisia selengensis is traditional Chinese medicine and phytochemical analysis indicated that A. selengensis contains essential oils, fatty acids and phenolic acids. The lack of reference genomic information may lead to tardiness in molecular biology research of A. Read More

View Article and Full-Text PDF

Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.

Nucleic Acids Res 2021 Oct;49(18):e106

Data Science Institute, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia.

Raw sequencing reads of miRNAs contain machine-made substitution errors, or even insertions and deletions (indels). Although the error rate can be low at 0.1%, precise rectification of these errors is critically important because isoform variation analysis at single-base resolution such as novel isomiR discovery, editing events understanding, differential expression analysis, or tissue-specific isoform identification is very sensitive to base positions and copy counts of the reads. Read More

View Article and Full-Text PDF
October 2021

Harmonization of whole-genome sequencing for outbreak surveillance of and .

Microb Genom 2021 Jul;7(7)

Department of Medical Microbiology, Care and Public Health Research Institute (CAPHRI), Maastricht University Medical Center+, Maastricht, The Netherlands.

Whole-genome sequencing (WGS) is becoming the de facto standard for bacterial typing and outbreak surveillance of resistant bacterial pathogens. However, interoperability for WGS of bacterial outbreaks is poorly understood. We hypothesized that harmonization of WGS for outbreak surveillance is achievable through the use of identical protocols for both data generation and data analysis. Read More

View Article and Full-Text PDF

An algorithmic approach to determine expertise development using object-related gaze pattern sequences.

Behav Res Methods 2021 Jul 13. Epub 2021 Jul 13.

ETH Zurich, Leonhardstrasse 21, 8092, Zurich, Switzerland.

Eye tracking (ET) technology is increasingly utilized to quantify visual behavior in the study of the development of domain-specific expertise. However, the identification and measurement of distinct gaze patterns using traditional ET metrics has been challenging, and the insights gained shown to be inconclusive about the nature of expert gaze behavior. In this article, we introduce an algorithmic approach for the extraction of object-related gaze sequences and determine task-related expertise by investigating the development of gaze sequence patterns during a multi-trial study of a simplified airplane assembly task. Read More

View Article and Full-Text PDF

Sequence-specific minimizers via polar sets.

Bioinformatics 2021 07;37(Suppl_1):i187-i195

Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Motivation: Minimizers are efficient methods to sample k-mers from genomic sequences that unconditionally preserve sufficiently long matches between sequences. Well-established methods to construct efficient minimizers focus on sampling fewer k-mers on a random sequence and use universal hitting sets (sets of k-mers that appear frequently enough) to upper bound the sketch size. In contrast, the problem of sequence-specific minimizers, which is to construct efficient minimizers to sample fewer k-mers on a specific sequence such as the reference genome, is less studied. Read More

View Article and Full-Text PDF

Practical selection of representative sets of RNA-seq samples using a hierarchical approach.

Bioinformatics 2021 07;37(Suppl_1):i334-i341

Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Motivation: Despite numerous RNA-seq samples available at large databases, most RNA-seq analysis tools are evaluated on a limited number of RNA-seq samples. This drives a need for methods to select a representative subset from all available RNA-seq samples to facilitate comprehensive, unbiased evaluation of bioinformatics tools. In sequence-based approaches for representative set selection (e. Read More

View Article and Full-Text PDF

Genome survey and microsatellite motif identification of Pogonophryne albipinna.

Biosci Rep 2021 Jul;41(7)

Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul 02841, Korea.

The genus Pogonophryne is a speciose group that includes 28 species inhabiting the coastal or deep waters of the Antarctic Southern Ocean. The genus has been divided into five species groups, among which the P. albipinna group is the most deep-living group and is characterized by a lack of spots on the top of the head. Read More

View Article and Full-Text PDF

Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter.

Sci Rep 2021 07 1;11(1):13701. Epub 2021 Jul 1.

Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Barasat-Barrackpore, Rd, Jagannathpur, Kolkata, West Bengal, 700126, India.

We describe a novel algorithm for information recovery from DNA sequences by using a digital filter. This work proposes a three-part algorithm to decide the k-mer or q-gram word density. Employing a finite impulse response digital filter, one can calculate the sequence's k-mer or q-gram word density. Read More

View Article and Full-Text PDF

Pincho: A Modular Approach to High Quality De Novo Transcriptomics.

Genes (Basel) 2021 06 22;12(7). Epub 2021 Jun 22.

Department of Biology, St. John's University, Queens, NY 11439, USA.

Transcriptomic reconstructions without reference (i.e., de novo) are common for data samples derived from non-model biological systems. Read More

View Article and Full-Text PDF

Fast and Accurate Algorithms for Mapping and Aligning Long Reads.

J Comput Biol 2021 08 23;28(8):789-803. Epub 2021 Jun 23.

Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong.

For DNA sequence analysis, we are facing challenging tasks such as the identification of structural variants, sequencing repetitive regions, and phasing of alleles. Those challenging tasks suffer from the short length of sequencing reads, where each read may cover less than 2 single nucleotide polymorphism (SNP), or less than two occurrences of a repeated region. It is believed that long reads can help to solve those challenging tasks. Read More

View Article and Full-Text PDF

Schistosome W-linked genes inform temporal dynamics of sex chromosome evolution and suggest candidate for sex determination.

Mol Biol Evol 2021 Jun 19. Epub 2021 Jun 19.

Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, 3400, Austria.

Schistosomes, the human parasites responsible for snail fever, are female-heterogametic. Different parts of their ZW sex chromosomes have stopped recombining in distinct lineages, creating "evolutionary strata" of various ages. While the Z-chromosome is well characterized at the genomic and molecular level, the W-chromosome has remained largely unstudied from an evolutionary perspective, as only a few W-linked genes have been detected outside of the model species Schistosoma mansoni. Read More

View Article and Full-Text PDF