Pubfacts - Scientific Publication Data
  • Categories
  • |
  • Journals
  • |
  • Authors
  • Login
  • Categories
  • Journals

Search Our Scientific Publications & Authors

Publications
  • Publications
  • Authors
find publications by category +
Translate page:

NovoGraph: Human genome graph construction from multiple long-read assemblies.

Authors:
Evan Biederstedt Jeffrey C Oliver Nancy F Hansen Aarti Jajoo Nathan Dunn Andrew Olson Ben Busby Alexander T Dilthey

F1000Res 2018 3;7:1391. Epub 2018 Sep 3.

National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20817, USA.

Genome graphs are emerging as an important novel approach to the analysis of high-throughput human sequencing data. By explicitly representing genetic variants and alternative haplotypes in a mappable data structure, they can enable the improved analysis of structurally variable and hyperpolymorphic regions of the genome. In most existing approaches, graphs are constructed from variant call sets derived from short-read sequencing. As long-read sequencing becomes more cost-effective and enables assembly for increasing numbers of whole genomes, a method for the direct construction of a genome graph from sets of assembled human genomes would be desirable. Such assembly-based genome graphs would encompass the wide spectrum of genetic variation accessible to long-read-based assembly, including large structural variants and divergent haplotypes. Here we present NovoGraph, a method for the construction of a human genome graph directly from a set of assemblies. NovoGraph constructs a genome-wide multiple sequence alignment of all input contigs and creates a graph by merging the input sequences at positions that are both homologous and sequence-identical. NovoGraph outputs resulting graphs in VCF format that can be loaded into third-party genome graph toolkits. To demonstrate NovoGraph, we construct a genome graph with 23,478,835 variant sites and 30,582,795 variant alleles from assemblies of seven ethnically diverse human genomes (AK1, CHM1, CHM13, HG003, HG004, HX1, NA19240). Initial evaluations show that mapping against the constructed graph reduces the average mismatch rate of reads from sample NA12878 by approximately 0.2%, albeit at a slightly increased rate of reads that remain unmapped.

Download full-text PDF

Source
http://dx.doi.org/10.12688/f1000research.15895.2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6305223PMC
November 2019

Publication Analysis

Top Keywords

genome graph
20
genome graphs
8
rate reads
8
human genomes
8
human genome
8
genome
8
graph
7
novograph
5
wide spectrum
4
spectrum genetic
4
na19240 initial
4
initial evaluations
4
graphs encompass
4
encompass wide
4
third-party genome
4
hx1 na19240
4
assembly including
4
including large
4
hg004 hx1
4
large structural
4

Altmetric Statistics


Show full details
83 Total Shares
1 Blogs
1 Google+ Users
72 Tweets
74 Citations

Similar Publications

Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

Authors:
Peter Ebert Peter A Audano Qihui Zhu Bernardo Rodriguez-Martin David Porubsky Marc Jan Bonder Arvis Sulovari Jana Ebler Weichen Zhou Rebecca Serra Mari Feyza Yilmaz Xuefang Zhao PingHsun Hsieh Joyce Lee Sushant Kumar Jiadong Lin Tobias Rausch Yu Chen Jingwen Ren Martin Santamarina Wolfram Höps Hufsah Ashraf Nelson T Chuang Xiaofei Yang Katherine M Munson Alexandra P Lewis Susan Fairley Luke J Tallon Wayne E Clarke Anna O Basile Marta Byrska-Bishop André Corvelo Uday S Evani Tsung-Yu Lu Mark J P Chaisson Junjie Chen Chong Li Harrison Brand Aaron M Wenger Maryam Ghareghani William T Harvey Benjamin Raeder Patrick Hasenfeld Allison A Regier Haley J Abel Ira M Hall Paul Flicek Oliver Stegle Mark B Gerstein Jose M C Tubio Zepeng Mu Yang I Li Xinghua Shi Alex R Hastie Kai Ye Zechen Chong Ashley D Sanders Michael C Zody Michael E Talkowski Ryan E Mills Scott E Devine Charles Lee Jan O Korbel Tobias Marschall Evan E Eichler

Science 2021 Feb 25. Epub 2021 Feb 25.

Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Ave NE, Seattle, WA 98195-5065, USA

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average contig N50: 26 Mbp) integrate all forms of genetic variation even across complex loci. Read More

View Article and Full-Text PDF
February 2021
Similar Publications

StLiter: A Novel Algorithm to Iteratively Build the Compacted de Bruijn Graph from Many Complete Genomes.

Authors:
Changong Yu Keming Mao Yuhai Zhao Cheng Chang Guoren Wang

IEEE/ACM Trans Comput Biol Bioinform 2021 Feb 25;PP. Epub 2021 Feb 25.

recently, the compacted de Bruijn graph (cDBG) of complete genome sequences was successfully used in read mapping due to its ability to deal with the repetitions in genomes. However, current approaches are not flexible enough to fit frequently building the graphs with different k-mer lengths. Instead of building the graph directly, how can we build the compacted de Bruijin graph of longer k-mer based on the one of short k-mer In this article, we present StLiter, a novel algorithm to build the compacted de Bruijn graph either directly from genome sequences or indirectly based on the graph of a short k-mer. Read More

View Article and Full-Text PDF
February 2021
Similar Publications

AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads.

Authors:
Shien Huang Xinyu He Guohua Wang Ergude Bao

Brief Bioinform 2021 Feb 23. Epub 2021 Feb 23.

Interdisciplinary Information Sciences, School of Software Engineering, Beijing Jiaotong University, China.

Contigs assembled from the third-generation sequencing long reads are usually more complete than the second-generation short reads. However, the current algorithms still have difficulty in assembling the long reads into the ideal complete and accurate genome, or the theoretical best result [1]. To improve the long read contigs and with more and more fully sequenced genomes available, it could still be possible to use the similar genome-assisted reassembly method [2], which was initially proposed for the short reads making use of a closely related genome (similar genome) to the sequencing genome (target genome). Read More

View Article and Full-Text PDF
February 2021
Similar Publications

Complete Characterization of Incorrect Orthology Assignments in Best Match Graphs.

Authors:
David Schaller Manuela Geiß Peter F Stadler Marc Hellmuth

J Math Biol 2021 Feb 19;82(3):20. Epub 2021 Feb 19.

Department of Mathematics, Faculty of Science, Stockholm University, SE 106 91, Stockholm, Sweden.

Genome-scale orthology assignments are usually based on reciprocal best matches. In the absence of horizontal gene transfer (HGT), every pair of orthologs forms a reciprocal best match. Incorrect orthology assignments therefore are always false positives in the reciprocal best match graph. Read More

View Article and Full-Text PDF
February 2021
Similar Publications

MinYS: mine your symbiont by targeted genome assembly in symbiotic communities.

Authors:
Cervin Guyomar Wesley Delage Fabrice Legeai Christophe Mougel Jean-Christophe Simon Claire Lemaitre

NAR Genom Bioinform 2020 Sep 3;2(3):lqaa047. Epub 2020 Jul 3.

Univ. Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France.

Most metazoans are associated with symbionts. Characterizing the effect of a particular symbiont often requires getting access to its genome, which is usually done by sequencing the whole community. We present MinYS, a targeted assembly approach to assemble a particular genome of interest from such metagenomic data. Read More

View Article and Full-Text PDF
September 2020
Similar Publications
© 2021 PubFacts.
  • About PubFacts
  • Privacy Policy
  • Sitemap