Publications by authors named "Jedidiah Carlson"

6 Publications

  • Page 1 of 1

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.

Nature 2021 02 10;590(7845):290-299. Epub 2021 Feb 10.

The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes). In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03205-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875770PMC
February 2021

Quantifying and contextualizing the impact of bioRxiv preprints through automated social media audience segmentation.

PLoS Biol 2020 09 22;18(9):e3000860. Epub 2020 Sep 22.

Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America.

Engagement with scientific manuscripts is frequently facilitated by Twitter and other social media platforms. As such, the demographics of a paper's social media audience provide a wealth of information about how scholarly research is transmitted, consumed, and interpreted by online communities. By paying attention to public perceptions of their publications, scientists can learn whether their research is stimulating positive scholarly and public thought. They can also become aware of potentially negative patterns of interest from groups that misinterpret their work in harmful ways, either willfully or unintentionally, and devise strategies for altering their messaging to mitigate these impacts. In this study, we collected 331,696 Twitter posts referencing 1,800 highly tweeted bioRxiv preprints and leveraged topic modeling to infer the characteristics of various communities engaging with each preprint on Twitter. We agnostically learned the characteristics of these audience sectors from keywords each user's followers provide in their Twitter biographies. We estimate that 96% of the preprints analyzed are dominated by academic audiences on Twitter, suggesting that social media attention does not always correspond to greater public exposure. We further demonstrate how our audience segmentation method can quantify the level of interest from nonspecialist audience sectors such as mental health advocates, dog lovers, video game developers, vegans, bitcoin investors, conspiracy theorists, journalists, religious groups, and political constituencies. Surprisingly, we also found that 10% of the preprints analyzed have sizable (>5%) audience sectors that are associated with right-wing white nationalist communities. Although none of these preprints appear to intentionally espouse any right-wing extremist messages, cases exist in which extremist appropriation comprises more than 50% of the tweets referencing a given preprint. These results present unique opportunities for improving and contextualizing the public discourse surrounding scientific research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pbio.3000860DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7508356PMC
September 2020

Inferring evolutionary dynamics of mutation rates through the lens of mutation spectrum variation.

Curr Opin Genet Dev 2020 06 30;62:50-57. Epub 2020 Jun 30.

Department of Genome Sciences, Foege Hall, University of Washington, Seattle, WA 98105, United States; Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Eastlake Ave E, Seattle, WA 98109, United States. Electronic address:

There are many possible failure points in the transmission of genetic information that can produce heritable germline mutations. Once a mutation has been passed from parents to offspring for several generations, it can be difficult or impossible to identify its root cause; however, sometimes the nature of the ancestral and derived DNA sequences can provide mechanistic clues about a genetic change that happened hundreds or thousands of generations ago. Here, we review evidence that the sequence context 'spectrum' of germline mutagenesis has been evolving surprisingly rapidly over the history of humans and other species. We go on to discuss possible causal factors that might underlie rapid mutation spectrum evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.gde.2020.05.024DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7646088PMC
June 2020

A community-maintained standard library of population genetic models.

Elife 2020 06 23;9. Epub 2020 Jun 23.

Department of Biology and Institute of Ecology and Evolution, University of Oregon, Eugene, United States.

The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.54967DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7438115PMC
June 2020

Helmsman: fast and efficient mutation signature analysis for massive sequencing datasets.

BMC Genomics 2018 Nov 28;19(1):845. Epub 2018 Nov 28.

Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.

Background: The spectrum of somatic single-nucleotide variants in cancer genomes often reflects the signatures of multiple distinct mutational processes, which can provide clinically actionable insights into cancer etiology. Existing software tools for identifying and evaluating these mutational signatures do not scale to analyze large datasets containing thousands of individuals or millions of variants.

Results: We introduce Helmsman, a program designed to perform mutation signature analysis on arbitrarily large sequencing datasets. Helmsman is up to 300 times faster than existing software. Helmsman's memory usage is independent of the number of variants, resulting in a small enough memory footprint to analyze datasets that would otherwise exceed the memory limitations of other programs.

Conclusions: Helmsman is a computationally efficient tool that enables users to evaluate mutational signatures in massive sequencing datasets that are otherwise intractable with existing software. Helmsman is freely available at https://github.com/carjed/helmsman .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-018-5264-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6263557PMC
November 2018

Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans.

Nat Commun 2018 09 14;9(1):3753. Epub 2018 Sep 14.

Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA.

A detailed understanding of the genome-wide variability of single-nucleotide germline mutation rates is essential to studying human genome evolution. Here, we use ~36 million singleton variants from 3560 whole-genome sequences to infer fine-scale patterns of mutation rate heterogeneity. Mutability is jointly affected by adjacent nucleotide context and diverse genomic features of the surrounding region, including histone modifications, replication timing, and recombination rate, sometimes suggesting specific mutagenic mechanisms. Remarkably, GC content, DNase hypersensitivity, CpG islands, and H3K36 trimethylation are associated with both increased and decreased mutation rates depending on nucleotide context. We validate these estimated effects in an independent dataset of ~46,000 de novo mutations, and confirm our estimates are more accurate than previously published results based on ancestrally older variants without considering genomic features. Our results thus provide the most refined portrait to date of the factors contributing to genome-wide variability of the human germline mutation rate.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-05936-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6138700PMC
September 2018
-->