Publications by authors named "Kerstin Howe"

40 Publications

The complete genome sequence of (Tyzzer 1929), a common gut parasite of chickens.

Wellcome Open Res 2021 9;6:225. Epub 2021 Sep 9.

Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

We present a genome assembly from a clonal population of Houghton parasites (Apicomplexa; Conoidasida; Eucoccidiorida; Eimeriidae). The genome sequence is 53.25 megabases in span. The entire assembly is scaffolded into 15 chromosomal pseudomolecules, with complete mitochondrion and apicoplast organellar genomes also present.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/wellcomeopenres.17100.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515493PMC
September 2021

The genome sequence of the European golden eagle, Linnaeus 1758.

Wellcome Open Res 2021 14;6:112. Epub 2021 May 14.

Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

We present a genome assembly from an individual female (the European golden eagle; Chordata; Aves; Accipitridae). The genome sequence is 1.23 gigabases in span. The majority of the assembly is scaffolded into 28 chromosomal pseudomolecules, including the W and Z sex chromosomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/wellcomeopenres.16631.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8499043PMC
May 2021

The genome sequence of the Norway rat, Berkenhout 1769.

Wellcome Open Res 2021 18;6:118. Epub 2021 May 18.

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

We present a genome assembly from an individual male (the Norway rat; Chordata; Mammalia; Rodentia; Muridae). The genome sequence is 2.44 gigabases in span. The majority of the assembly is scaffolded into 20 chromosomal pseudomolecules, with both X and Y sex chromosomes assembled. This genome assembly, mRatBN7.2, represents the new reference genome for and has been adopted by the Genome Reference Consortium.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/wellcomeopenres.16854.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8495504PMC
May 2021

The genome sequence of the brown trout, Linnaeus 1758.

Wellcome Open Res 2021 13;6:108. Epub 2021 May 13.

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

We present a genome assembly from an individual female (the brown trout; Chordata; Actinopteri; Salmoniformes; Salmonidae). The genome sequence is 2.37 gigabases in span. The majority of the assembly is scaffolded into 40 chromosomal pseudomolecules. Gene annotation of this assembly on Ensembl has identified 43,935 protein coding genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/wellcomeopenres.16838.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8488904PMC
May 2021

A Chromosome-Level Genome Assembly of the Reed Warbler (Acrocephalus scirpaceus).

Genome Biol Evol 2021 Sep;13(9)

Centre for Ecological and Evolutionary Synthesis, University of Oslo, Norway.

The reed warbler (Acrocephalus scirpaceus) is a long-distance migrant passerine with a wide distribution across Eurasia. This species has fascinated researchers for decades, especially its role as host of a brood parasite, and its capacity for rapid phenotypic change in the face of climate change. Currently, it is expanding its range northwards in Europe, and is altering its migratory behavior in certain areas. Thus, there is great potential to discover signs of recent evolution and its impact on the genomic composition of the reed warbler. Here, we present a high-quality reference genome for the reed warbler, based on PacBio, 10×, and Hi-C sequencing. The genome has an assembly size of 1,075,083,815 bp with a scaffold N50 of 74,438,198 bp and a contig N50 of 12,742,779 bp. BUSCO analysis using aves_odb10 as a model showed that 95.7% of BUSCO genes were complete. We found unequivocal evidence of two separate macrochromosomal fusions in the reed warbler genome, in addition to the previously identified fusion between chromosome Z and a part of chromosome 4A in the Sylvioidea superfamily. We annotated 14,645 protein-coding genes, and a BUSCO analysis of the protein sequences indicated 97.5% completeness. This reference genome will serve as an important resource, and will provide new insights into the genomic effects of evolutionary drivers such as coevolution, range expansion, and adaptations to climate change, as well as chromosomal rearrangements in birds.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gbe/evab212DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8459166PMC
September 2021

Hi-C scaffolded short- and long-read genome assemblies of the California sea lion are broadly consistent for syntenic inference across 45 million years of evolution.

Mol Ecol Resour 2021 Oct 27;21(7):2455-2470. Epub 2021 Jun 27.

Division of Evolutionary Biology, Faculty of Biology, LMU Munich, München, Germany.

With the advent of chromatin-interaction maps, chromosome-level genome assemblies have become a reality for a wide range of organisms. Scaffolding quality is, however, difficult to judge. To explore this gap, we generated multiple chromosome-scale genome assemblies of an emerging wild animal model for carcinogenesis, the California sea lion (Zalophus californianus). Short-read assemblies were scaffolded with two independent chromatin interaction mapping data sets (Hi-C and Chicago), and long-read assemblies with three data types (Hi-C, optical maps and 10X linked reads) following the "Vertebrate Genomes Project (VGP)" pipeline. In both approaches, 18 major scaffolds recovered the karyotype (2n = 36), with scaffold N50s of 138 and 147 Mb, respectively. Synteny relationships at the chromosome level with other pinniped genomes (2n = 32-36), ferret (2n = 34), red panda (2n = 36) and domestic dog (2n = 78) were consistent across approaches and recovered known fissions and fusions. Comparative chromosome painting and multicolour chromosome tiling with a panel of 264 genome-integrated single-locus canine bacterial artificial chromosome probes provided independent evaluation of genome organization. Broad-scale discrepancies between the approaches were observed within chromosomes, most commonly in translocations centred around centromeres and telomeres, which were better resolved in the VGP assembly. Genomic and cytological approaches agreed on near-perfect synteny of the X chromosome, and in combination allowed detailed investigation of autosomal rearrangements between dog and sea lion. This study presents high-quality genomes of an emerging cancer model and highlights that even highly fragmented short-read assemblies scaffolded with Hi-C can yield reliable chromosome-level scaffolds suitable for comparative genomic analyses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/1755-0998.13443DOI Listing
October 2021

Towards complete and error-free genome assemblies of all vertebrate species.

Nature 2021 Apr 28;592(7856):737-746. Epub 2021 Apr 28.

UQ Genomics, University of Queensland, Brisbane, Queensland, Australia.

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species. To address this issue, the international Genome 10K (G10K) consortium has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03451-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8081667PMC
April 2021

Complete vertebrate mitogenomes reveal widespread repeats and gene duplications.

Genome Biol 2021 04 29;22(1):120. Epub 2021 Apr 29.

Oxford Nanopore Technologies Ltd, Oxford Science Park, Oxford, UK.

Background: Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly.

Results: As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization.

Conclusions: Our results indicate that even in the "simple" case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-021-02336-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8082918PMC
April 2021

Evolutionary and biomedical insights from a marmoset diploid genome assembly.

Nature 2021 06 28;594(7862):227-233. Epub 2021 Apr 28.

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

The accurate and complete assembly of both haplotype sequences of a diploid organism is essential to understanding the role of variation in genome functions, phenotypes and diseases. Here, using a trio-binning approach, we present a high-quality, diploid reference genome, with both haplotypes assembled independently at the chromosome level, for the common marmoset (Callithrix jacchus), an primate model system that is widely used in biomedical research. The full spectrum of heterozygosity between the two haplotypes involves 1.36% of the genome-much higher than the 0.13% indicated by the standard estimation based on single-nucleotide heterozygosity alone. The de novo mutation rate is 0.43 × 10 per site per generation, and the paternal inherited genome acquired twice as many mutations as the maternal. Our diploid assembly enabled us to discover a recent expansion of the sex-differentiation region and unique evolutionary changes in the marmoset Y chromosome. In addition, we identified many genes with signatures of positive selection that might have contributed to the evolution of Callithrix biological features. Brain-related genes were highly conserved between marmosets and humans, although several genes experienced lineage-specific copy number variations or diversifying selection, with implications for the use of marmosets as a model system.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03535-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8189906PMC
June 2021

Construction of Whole Genomes from Scaffolds Using Single Cell Strand-Seq Data.

Int J Mol Sci 2021 Mar 31;22(7). Epub 2021 Mar 31.

Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada.

Accurate reference genome sequences provide the foundation for modern molecular biology and genomics as the interpretation of sequence data to study evolution, gene expression, and epigenetics depends heavily on the quality of the genome assembly used for its alignment. Correctly organising sequenced fragments such as contigs and scaffolds in relation to each other is a critical and often challenging step in the construction of robust genome references. We previously identified misoriented regions in the mouse and human reference assemblies using Strand-seq, a single cell sequencing technique that preserves DNA directionality Here we demonstrate the ability of Strand-seq to build and correct full-length chromosomes by identifying which scaffolds belong to the same chromosome and determining their correct order and orientation, without the need for overlapping sequences. We demonstrate that Strand-seq exquisitely maps assembly fragments into large related groups and chromosome-sized clusters without using new assembly data. Using template strand inheritance as a bi-allelic marker, we employ genetic mapping principles to cluster scaffolds that are derived from the same chromosome and order them within the chromosome based solely on directionality of DNA strand inheritance. We prove the utility of our approach by generating improved genome assemblies for several model organisms including the ferret, pig, Xenopus, zebrafish, Tasmanian devil and the Guinea pig.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/ijms22073617DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8037727PMC
March 2021

A high-quality, chromosome-level genome assembly of the Black Soldier Fly (Hermetia illucens L.).

G3 (Bethesda) 2021 05;11(5)

Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK.

Hermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important species for bioconversion of organic material into animal feed. We generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudochromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 16,478 protein-coding genes using the BRAKER2 pipeline. We analyzed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of the lab population by assessing runs of homozygosity. This provided evidence for inbreeding events including long runs of homozygosity on chromosome 5. The release of this novel chromosome-scale BSF genome assembly will provide an improved resource for further genomic studies, functional characterization of genes of interest and genetic modification of this economically important species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/g3journal/jkab085DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8104945PMC
May 2021

Significantly improving the quality of genome assemblies through curation.

Gigascience 2021 01;10(1)

Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK.

Genome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for eukaryotes. Whilst working towards improved datasets and fully automated pipelines, assembly evaluation and curation is actively used to bridge this shortcoming and significantly reduce the number of assembly errors. In addition to this increase in product value, the insights gained from assembly curation are fed back into the automated assembly strategy and contribute to notable improvements in genome assembly quality. We describe our tried and tested approach for assembly curation using gEVAL, the genome evaluation browser. We outline the procedures applied to genome curation using gEVAL and also our recommendations for assembly curation in a gEVAL-independent context to facilitate the uptake of genome curation in the wider community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giaa153DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7794651PMC
January 2021

Platypus and echidna genomes reveal mammalian biology and evolution.

Nature 2021 Apr 6;592(7856):756-762. Epub 2021 Jan 6.

Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK.

Egg-laying mammals (monotremes) are the only extant mammalian outgroup to therians (marsupial and eutherian animals) and provide key insights into mammalian evolution. Here we generate and analyse reference genomes of the platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus), which represent the only two extant monotreme lineages. The nearly complete platypus genome assembly has anchored almost the entire genome onto chromosomes, markedly improving the genome continuity and gene annotation. Together with our echidna sequence, the genomes of the two species allow us to detect the ancestral and lineage-specific genomic changes that shape both monotreme and mammalian evolution. We provide evidence that the monotreme sex chromosome complex originated from an ancestral chromosome ring configuration. The formation of such a unique chromosome complex may have been facilitated by the unusually extensive interactions between the multi-X and multi-Y chromosomes that are shared by the autosomal homologues in humans. Further comparative genomic analyses unravel marked differences between monotremes and therians in haptoglobin genes, lactation genes and chemosensory receptor genes for smell and taste that underlie the ecological adaptation of monotremes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-03039-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8081666PMC
April 2021

The genome sequence of the eastern grey squirrel, Gmelin, 1788.

Wellcome Open Res 2020 13;5:27. Epub 2020 Feb 13.

Tree of Life, Wellcome Sanger Institute,Wellcome Genome Campus, Hinxton, CB10 1SA, UK.

We present a genome assembly from an individual male (the eastern grey squirrel; Vertebrata; Mammalia; Eutheria; Rodentia; Sciuridae). The genome sequence is 2.82 gigabases in span. The majority of the assembly (92.3%) is scaffolded into 21 chromosomal-level scaffolds, with both X and Y sex chromosomes assembled.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/wellcomeopenres.15721.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7653645PMC
February 2020

The genome sequence of the channel bull blenny, (Günther, 1861).

Wellcome Open Res 2020 24;5:148. Epub 2020 Jun 24.

Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

We present a genome assembly for (channel bull blenny, (Günther, 1861)); Chordata; Actinopterygii (ray-finned fishes), a temperate water outgroup for Antarctic Notothenioids. The size of the genome assembly is 609 megabases, with the majority of the assembly scaffolded into 24 chromosomal pseudomolecules. Gene annotation on Ensembl of this assembly has identified 21,662 coding genes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/wellcomeopenres.16012.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7649722PMC
June 2020

Reference genome and demographic history of the most endangered marine mammal, the vaquita.

Mol Ecol Resour 2021 May 20;21(4):1008-1020. Epub 2020 Nov 20.

Marine Mammal Research, Department of Bioscience, Aarhus University, Roskilde, Denmark.

The vaquita is the most critically endangered marine mammal, with fewer than 19 remaining in the wild. First described in 1958, the vaquita has been in rapid decline for more than 20 years resulting from inadvertent deaths due to the increasing use of large-mesh gillnets. To understand the evolutionary and demographic history of the vaquita, we used combined long-read sequencing and long-range scaffolding methods with long- and short-read RNA sequencing to generate a near error-free annotated reference genome assembly from cell lines derived from a female individual. The genome assembly consists of 99.92% of the assembled sequence contained in 21 nearly gapless chromosome-length autosome scaffolds and the X-chromosome scaffold, with a scaffold N50 of 115 Mb. Genome-wide heterozygosity is the lowest (0.01%) of any mammalian species analysed to date, but heterozygosity is evenly distributed across the chromosomes, consistent with long-term small population size at genetic equilibrium, rather than low diversity resulting from a recent population bottleneck or inbreeding. Historical demography of the vaquita indicates long-term population stability at less than 5,000 (Ne) for over 200,000 years. Together, these analyses indicate that the vaquita genome has had ample opportunity to purge highly deleterious alleles and potentially maintain diversity necessary for population health.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/1755-0998.13284DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8247363PMC
May 2021

Telomere-to-telomere assembly of a complete human X chromosome.

Nature 2020 09 14;585(7823):79-84. Epub 2020 Jul 14.

Arima Genomics, San Diego, CA, USA.

After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist. Here we present a human genome assembly that surpasses the continuity of GRCh38, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-020-2547-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7484160PMC
September 2020

The genome sequence of the Eurasian red squirrel, Linnaeus 1758.

Wellcome Open Res 2020 3;5:18. Epub 2020 Feb 3.

Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

We present a genome assembly from an individual male (the Eurasian red squirrel; Vertebrata; Mammalia; Eutheria; Rodentia; Sciuridae). The genome sequence is 2.88 gigabases in span. The majority of the assembly is scaffolded into 21 chromosomal-level scaffolds, with both X and Y sex chromosomes assembled.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/wellcomeopenres.15679.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7309416PMC
February 2020

An improved pig reference genome sequence to enable pig genetics and genomics research.

Gigascience 2020 06;9(6)

Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK.

Background: The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model given its similarity in size, anatomy, physiology, metabolism, pathology, and pharmacology to humans. The draft reference genome (Sscrofa10.2) of a purebred Duroc female pig established using older clone-based sequencing methods was incomplete, and unresolved redundancies, short-range order and orientation errors, and associated misassembled genes limited its utility.

Results: We present 2 annotated highly contiguous chromosome-level genome assemblies created with more recent long-read technologies and a whole-genome shotgun strategy, 1 for the same Duroc female (Sscrofa11.1) and 1 for an outbred, composite-breed male (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy than Sscrofa10.2.

Conclusions: These highly contiguous assemblies plus annotation of a further 11 short-read assemblies provide an unprecedented view of the genetic make-up of this important agricultural and biomedical model species. We propose that the improved Duroc assembly (Sscrofa11.1) become the reference genome for genomic research in pigs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giaa051DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7448572PMC
June 2020

The gene-rich genome of the scallop Pecten maximus.

Gigascience 2020 05;9(5)

Natural History Museum, Department of Life Sciences,Cromwell Road, London SW7 5BD, UK.

Background: The king scallop, Pecten maximus, is distributed in shallow waters along the Atlantic coast of Europe. It forms the basis of a valuable commercial fishery and plays a key role in coastal ecosystems and food webs. Like other filter feeding bivalves it can accumulate potent phytotoxins, to which it has evolved some immunity. The molecular origins of this immunity are of interest to evolutionary biologists, pharmaceutical companies, and fisheries management.

Findings: Here we report the genome assembly of this species, conducted as part of the Wellcome Sanger 25 Genomes Project. This genome was assembled from PacBio reads and scaffolded with 10X Chromium and Hi-C data. Its 3,983 scaffolds have an N50 of 44.8 Mb (longest scaffold 60.1 Mb), with 92% of the assembly sequence contained in 19 scaffolds, corresponding to the 19 chromosomes found in this species. The total assembly spans 918.3 Mb and is the best-scaffolded marine bivalve genome published to date, exhibiting 95.5% recovery of the metazoan BUSCO set. Gene annotation resulted in 67,741 gene models. Analysis of gene content revealed large numbers of gene duplicates, as previously seen in bivalves, with little gene loss, in comparison with the sequenced genomes of other marine bivalve species.

Conclusions: The genome assembly of P. maximus and its annotated gene set provide a high-quality platform for studies on such disparate topics as shell biomineralization, pigmentation, vision, and resistance to algal toxins. As a result of our findings we highlight the sodium channel gene Nav1, known to confer resistance to saxitoxin and tetrodotoxin, as a candidate for further studies investigating immunity to domoic acid.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gigascience/giaa037DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7191990PMC
May 2020

The genome sequence of the Eurasian river otter, Lutra lutra Linnaeus 1758.

Wellcome Open Res 2020 19;5:33. Epub 2020 Feb 19.

Wellcome Genome Campus, Wellcome Sanger Institute,, Hinxton, CB10 1SA, UK.

We present a genome assembly from an individual male (the Eurasian river otter; Vertebrata; Mammalia; Eutheria; Carnivora; Mustelidae). The genome sequence is 2.44 gigabases in span. The majority of the assembly is scaffolded into 20 chromosomal pseudomolecules, with both X and Y sex chromosomes assembled.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.12688/wellcomeopenres.15722.1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7097881PMC
February 2020

Identifying and removing haplotypic duplication in primary genome assemblies.

Bioinformatics 2020 05;36(9):2896-2898

Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK.

Motivation: Rapid development in long-read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic genomes. However, haplotype divergence in regions of high heterozygosity often results in assemblers creating two copies rather than one copy of a region, leading to breaks in contiguity and compromising downstream steps such as gene annotation. Several tools have been developed to resolve this problem. However, they either focus only on removing contained duplicate regions, also known as haplotigs, or fail to use all the relevant information and hence make errors.

Results: Here we present a novel tool, purge_dups, that uses sequence similarity and read depth to automatically identify and remove both haplotigs and heterozygous overlaps. In comparison with current tools, we demonstrate that purge_dups can reduce heterozygous duplication and increase assembly continuity while maintaining completeness of the primary assembly. Moreover, purge_dups is fully automatic and can easily be integrated into assembly pipelines.

Availability And Implementation: The source code is written in C and is available at https://github.com/dfguan/purge_dups.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa025DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7203741PMC
May 2020

Birth, expansion, and death of VCY-containing palindromes on the human Y chromosome.

Genome Biol 2019 10 14;20(1):207. Epub 2019 Oct 14.

The Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.

Background: Large palindromes (inverted repeats) make up substantial proportions of mammalian sex chromosomes, often contain genes, and have high rates of structural variation arising via ectopic recombination. As a result, they underlie many genomic disorders. Maintenance of the palindromic structure by gene conversion between the arms has been documented, but over longer time periods, palindromes are remarkably labile. Mechanisms of origin and loss of palindromes have, however, received little attention.

Results: Here, we use fiber-FISH, 10x Genomics Linked-Read sequencing, and breakpoint PCR sequencing to characterize the structural variation of the P8 palindrome on the human Y chromosome, which contains two copies of the VCY (Variable Charge Y) gene. We find a deletion of almost an entire arm of the palindrome, leading to death of the palindrome, a size increase by recruitment of adjacent sequence, and other complex changes including the formation of an entire new palindrome nearby. Together, these changes are found in ~ 1% of men, and we can assign likely molecular mechanisms to these mutational events. As a result, healthy men can have 1-4 copies of VCY.

Conclusions: Gross changes, especially duplications, in palindrome structure can be relatively frequent and facilitate the evolution of sex chromosomes in humans, and potentially also in other mammalian species.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-019-1816-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6790999PMC
October 2019

The Genome of C57BL/6J "Eve", the Mother of the Laboratory Mouse Genome Reference Strain.

G3 (Bethesda) 2019 06 5;9(6):1795-1805. Epub 2019 Jun 5.

The Jackson Laboratory for Mammalian Genetics, Bar Harbor ME

Isogenic laboratory mouse strains enhance reproducibility because individual animals are genetically identical. For the most widely used isogenic strain, C57BL/6, there exists a wealth of genetic, phenotypic, and genomic data, including a high-quality reference genome (GRCm38.p6). Now 20 years after the first release of the mouse reference genome, C57BL/6J mice are at least 26 inbreeding generations removed from GRCm38 and the strain is now maintained with periodic reintroduction of cryorecovered mice derived from a single breeder pair, aptly named Adam and Eve. To provide an update to the mouse reference genome that more accurately represents the genome of today's C57BL/6J mice, we took advantage of long read, short read, and optical mapping technologies to generate a assembly of the C57BL/6J Eve genome (B6Eve). Using these data, we have addressed recurring variants observed in previous mouse genomic studies. We have also identified structural variations, closed gaps in the mouse reference assembly, and revealed previously unannotated coding sequences. This B6Eve assembly explains discrepant observations that have been associated with GRCm38-based analyses, and will inform a reference genome that is more representative of the C57BL/6J mice that are in use today.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/g3.119.400071DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6553538PMC
June 2019

Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci.

Nat Genet 2018 11 1;50(11):1574-1583. Epub 2018 Oct 1.

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41588-018-0223-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6205630PMC
November 2018

Repeat associated mechanisms of genome evolution and function revealed by the and genomes.

Genome Res 2018 04 21;28(4):448-459. Epub 2018 Mar 21.

Yale University Medical School, Computational Biology and Bioinformatics Program, New Haven, Connecticut 06520, USA.

Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the and genomes. Together with the and genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of and between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in , which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.234096.117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5880236PMC
April 2018

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

Genome Res 2017 05 10;27(5):849-864. Epub 2017 Apr 10.

Pacific Biosciences, Menlo Park, California 94025, USA.

The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.213611.116DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411779PMC
May 2017

A New Chicken Genome Assembly Provides Insight into Avian Genome Structure.

G3 (Bethesda) 2017 01 5;7(1):109-117. Epub 2017 Jan 5.

The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian EH25 9RG, United Kingdom.

The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken genome assembly (Gallus_gallus-5.0; GCA_000002315.3), built from combined long single molecule sequencing technology, finished BACs, and improved physical maps. In overall assembled bases, we see a gain of 183 Mb, including 16.4 Mb in placed chromosomes with a corresponding gain in the percentage of intact repeat elements characterized. Of the 1.21 Gb genome, we include three previously missing autosomes, GGA30, 31, and 33, and improve sequence contig length 10-fold over the previous Gallus_gallus-4.0. Despite the significant base representation improvements made, 138 Mb of sequence is not yet located to chromosomes. When annotated for gene content, Gallus_gallus-5.0 shows an increase of 4679 annotated genes (2768 noncoding and 1911 protein-coding) over those in Gallus_gallus-4.0. We also revisited the question of what genes are missing in the avian lineage, as assessed by the highest quality avian genome assembly to date, and found that a large fraction of the original set of missing genes are still absent in sequenced bird species. Finally, our new data support a detailed map of MHC-B, encompassing two segments: one with a highly stable gene copy number and another in which the gene copy number is highly variable. The chicken model has been a critical resource for many other fields of study, and this new reference assembly will substantially further these efforts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1534/g3.116.035923DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5217101PMC
January 2017

Structure and evolutionary history of a large family of NLR proteins in the zebrafish.

Open Biol 2016 Apr 27;6(4):160009. Epub 2016 Apr 27.

Institut für Genetik, Universität zu Köln, Köln, Germany The European Molecular Biology Laboratory, Heidelberg, Germany

Multicellular eukaryotes have evolved a range of mechanisms for immune recognition. A widespread family involved in innate immunity are the NACHT-domain and leucine-rich-repeat-containing (NLR) proteins. Mammals have small numbers of NLR proteins, whereas in some species, mostly those without adaptive immune systems, NLRs have expanded into very large families. We describe a family of nearly 400 NLR proteins encoded in the zebrafish genome. The proteins share a defining overall structure, which arose in fishes after a fusion of the core NLR domains with a B30.2 domain, but can be subdivided into four groups based on their NACHT domains. Gene conversion acting differentially on the NACHT and B30.2 domains has shaped the family and created the groups. Evidence of positive selection in the B30.2 domain indicates that this domain rather than the leucine-rich repeats acts as the pathogen recognition module. In an unusual chromosomal organization, the majority of the genes are located on one chromosome arm, interspersed with other large multigene families, including a new family encoding zinc-finger proteins. The NLR-B30.2 proteins represent a new family with diversity in the specific recognition module that is present in fishes in spite of the parallel existence of an adaptive immune system.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1098/rsob.160009DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4852459PMC
April 2016

gEVAL - a web-based browser for evaluating genome assemblies.

Bioinformatics 2016 08 7;32(16):2508-10. Epub 2016 Apr 7.

Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

Motivation: For most research approaches, genome analyses are dependent on the existence of a high quality genome reference assembly. However, the local accuracy of an assembly remains difficult to assess and improve. The gEVAL browser allows the user to interrogate an assembly in any region of the genome by comparing it to different datasets and evaluating the concordance. These analyses include: a wide variety of sequence alignments, comparative analyses of multiple genome assemblies, and consistency with optical and other physical maps. gEVAL highlights allelic variations, regions of low complexity, abnormal coverage, and potential sequence and assembly errors, and offers strategies for improvement. Although gEVAL focuses primarily on sequence integrity, it can also display arbitrary annotation including from Ensembl or TrackHub sources. We provide gEVAL web sites for many human, mouse, zebrafish and chicken assemblies to support the Genome Reference Consortium, and gEVAL is also downloadable to enable its use for any organism and assembly.

Availability And Implementation: Web Browser: http://geval.sanger.ac.uk, Plugin: http://wchow.github.io/wtsi-geval-plugin

Contact: [email protected]

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btw159DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978925PMC
August 2016
-->