Publications by authors named "Marc Fiume"

14 Publications

  • Page 1 of 1

Publisher Correction: Federated discovery and sharing of genomic data using Beacons.

Nat Biotechnol 2019 04;37(4):480

ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK.

In the version of this article initially published, Lena Dolman's second affiliation was given as Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK. The correct second affiliation is Ontario Institute for Cancer Research, Toronto, Ontario, Canada. The error has been corrected in the HTML and PDF versions of the article.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41587-019-0094-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7608460PMC
April 2019

ClinGen advancing genomic data-sharing standards as a GA4GH driver project.

Hum Mutat 2018 11;39(11):1686-1689

Broad Institute of MIT and Harvard, Cambridge, Massachusetts.

The Clinical Genome Resource (ClinGen)'s work to develop a knowledge base to support the understanding of genes and variants for use in precision medicine and research depends on robust, broadly applicable, and adaptable technical standards for sharing data and information. To forward this goal, ClinGen has joined with the Global Alliance for Genomics and Health (GA4GH) to support the development of open, freely-available technical standards and regulatory frameworks for secure and responsible sharing of genomic and health-related data. In its capacity as one of the 15 inaugural GA4GH "Driver Projects," ClinGen is providing input on the key standards needs of the global genomics community, and has committed to participate on GA4GH Work Streams to support the development of: (1) a standard model for computer-readable variant representation; (2) a data model for linking variant data to annotations; (3) a specification to enable sharing of genomic variant knowledge and associated clinical interpretations; and (4) a set of best practices for use of phenotype and disease ontologies. ClinGen's participation as a GA4GH Driver Project will provide a robust environment to test drive emerging genomic knowledge sharing standards and prove their utility among the community, while accelerating the construction of the ClinGen evidence base.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.23625DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6188700PMC
November 2018

Registered access: authorizing data access.

Eur J Hum Genet 2018 12 2;26(12):1721-1731. Epub 2018 Aug 2.

Vanderbilt University Medical Center, Nashville, TN, USA.

The Global Alliance for Genomics and Health (GA4GH) proposes a data access policy model-"registered access"-to increase and improve access to data requiring an agreement to basic terms and conditions, such as the use of DNA sequence and health data in research. A registered access policy would enable a range of categories of users to gain access, starting with researchers and clinical care professionals. It would also facilitate general use and reuse of data but within the bounds of consent restrictions and other ethical obligations. In piloting registered access with the Scientific Demonstration data sharing projects of GA4GH, we provide additional ethics, policy and technical guidance to facilitate the implementation of this access model in an international setting.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41431-018-0219-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6244209PMC
December 2018

Simplifying research access to genomics and health data with Library Cards.

Sci Data 2018 03 14;5:180039. Epub 2018 Mar 14.

Microsoft, Redmond, WA 98052, USA.

The volume of genomics and health data is growing rapidly, driven by sequencing for both research and clinical use. However, under current practices, the data is fragmented into many distinct datasets, and researchers must go through a separate application process for each dataset. This is time-consuming both for the researchers and the data stewards, and it reduces the velocity of research and new discoveries that could improve human health. We propose to simplify this process, by introducing a standard Library Card that identifies and authenticates researchers across all participating datasets. Each researcher would only need to apply once to establish their bona fides as a qualified researcher, and could then use the Library Card to access a wide range of datasets that use a compatible data access policy and authentication protocol.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sdata.2018.39DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5851345PMC
March 2018

The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants.

CMAJ 2018 02;190(5):E126-E136

The Centre for Applied Genomics (Reuter, Walker, Thiruvahindrapuram, Whitney, Yuen, Trost, Paton, Pereira, Herbrick, Wintle, Merico, Howe, MacDonald, Lu, Nalpathamkalam, Sung, Wang, Patel, Pellecchia, J. Wei, Strug, Bell, Kellam, Mahtani, Hosseini, Fiume, Marshall, Buchanan, Scherer); Divisions of Clinical Pharmacology and Toxicology (I. Cohn), or Clinical, and Metabolic Genetics (Sondheimer, Weksberg, Shuman, Bowdin, Meyn, Monfared), The Hospital for Sick Children; Departments of Paediatrics (Sondheimer, R. Cohn) and Molecular Genetics (Yuen, Weksberg, Shuman, R. Cohn, Ellis, Meyn), University of Toronto; Deep Genomics Inc. (Merico); Department of Psychiatry (Bassett), University Health Network and Centre for Addiction and Mental Health, University of Toronto; Li Ka Shing Knowledge Institute (Bombard), St. Michael's Hospital; Institute of Health Policy, Management and Evaluation (Bombard), University of Toronto; Centre for Genetic Medicine (Stavropoulos, Bowdin, Ray, Monfared); Molecular Genetics Laboratory (Stavropoulos, Ray, Marshall), Division of Genome Diagnostics, Paediatric Laboratory Medicine; Developmental and Stem Cell Biology (Hildebrandt, W. Wei, Romm, Pasceri, Ellis); Ted Rogers Cardiac Genome Clinic (Hosseini); Cytogenetics Laboratory (Joseph-George), Division of Genome Diagnostics, Paediatric Laboratory Medicine, The Hospital for Sick Children; Departments of Biochemistry and Laboratory Medicine, and Pathobiology (Keeley), University of Toronto; DNAstack (Cook, Fiume); McLaughlin Centre (Lee, Scherer), University of Toronto; Medcan Health Management Inc. (Davies, Hazell); Dalla Lana School of Public Health (Szego), Department of Family and Community Medicine, and The Joint Centre for Bioethics, University of Toronto; Centre for Clinical Ethics (Szego), St. Joseph's Health Centre, Toronto, Ont.

Background: The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers.

Methods: Volunteers were screened for eligibility and provided informed consent for open data sharing. Using blood DNA, we performed whole genome sequencing and identified all possible classes of DNA variants. A genetic counsellor explained the implication of the results to each participant.

Results: Whole genome sequencing of the first 56 participants identified 207 662 805 sequence variants and 27 494 copy number variations. We analyzed a prioritized disease-associated data set ( = 1606 variants) according to standardized guidelines, and interpreted 19 variants in 14 participants (25%) as having obvious health implications. Six of these variants (e.g., in or mosaic loss of an X chromosome) were pathogenic or likely pathogenic. Seven were risk factors for cancer, cardiovascular or neurobehavioural conditions. Four other variants - associated with cancer, cardiac or neurodegenerative phenotypes - remained of uncertain significance because of discrepancies among databases. We also identified a large structural chromosome aberration and a likely pathogenic mitochondrial variant. There were 172 recessive disease alleles (e.g., 5 individuals carried mutations for cystic fibrosis). Pharmacogenomics analyses revealed another 3.9 potentially relevant genotypes per individual.

Interpretation: Our analyses identified a spectrum of genetic variants with potential health impact in 25% of participants. When also considering recessive alleles and variants with potential pharmacologic relevance, all 56 participants had medically relevant findings. Although access is mostly limited to research, whole genome sequencing can provide specific and novel information with the potential of major impact for health care.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1503/cmaj.171151DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5798982PMC
February 2018

Consent Codes: Upholding Standard Data Use Conditions.

PLoS Genet 2016 Jan 21;12(1):e1005772. Epub 2016 Jan 21.

National Centre for Biotechnology Information, US National Library of Medicine, Bethesda, Maryland, United States of America.

A systematic way of recording data use conditions that are based on consent permissions as found in the datasets of the main public genome archives (NCBI dbGaP and EMBL-EBI/CRG EGA).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pgen.1005772DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4721915PMC
January 2016

Similarity network fusion for aggregating data types on a genomic scale.

Nat Methods 2014 Mar 26;11(3):333-7. Epub 2014 Jan 26.

1] Genetics and Genome Biology, SickKids Research Institute, Toronto, Ontario, Canada. [2] Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.

Recent technologies have made it cost-effective to collect diverse types of genome-wide data. Computational methods are needed to combine these data to create a comprehensive view of a given disease or a biological process. Similarity network fusion (SNF) solves this problem by constructing networks of samples (e.g., patients) for each available data type and then efficiently fusing these into one network that represents the full spectrum of underlying data. For example, to create a comprehensive view of a disease given a cohort of patients, SNF computes and fuses patient similarity networks obtained from each of their data types separately, taking advantage of the complementarity in the data. We used SNF to combine mRNA expression, DNA methylation and microRNA (miRNA) expression data for five cancer data sets. SNF substantially outperforms single data type analysis and established integrative approaches when identifying cancer subtypes and is effective for predicting survival.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.2810DOI Listing
March 2014

PhenoTips: patient phenotyping software for clinical and research use.

Hum Mutat 2013 Aug 24;34(8):1057-65. Epub 2013 May 24.

Department of Computer Science, University of Toronto, Ontario, Canada.

We have developed PhenoTips: open source software for collecting and analyzing phenotypic information for patients with genetic disorders. Our software combines an easy-to-use interface, compatible with any device that runs a Web browser, with a standardized database back end. The PhenoTips' user interface closely mirrors clinician workflows so as to facilitate the recording of observations made during the patient encounter. Collected data include demographics, medical history, family history, physical and laboratory measurements, physical findings, and additional notes. Phenotypic information is represented using the Human Phenotype Ontology; however, the complexity of the ontology is hidden behind a user interface, which combines simple selection of common phenotypes with error-tolerant, predictive search of the entire ontology. PhenoTips supports accurate diagnosis by analyzing the entered data, then suggesting additional clinical investigations and providing Online Mendelian Inheritance in Man (OMIM) links to likely disorders. By collecting, classifying, and analyzing phenotypic information during the patient encounter, PhenoTips allows for streamlining of clinic workflow, efficient data entry, improved diagnosis, standardization of collected patient phenotypes, and sharing of anonymized patient phenotype data for the study of rare disorders. Our source code and a demo version of PhenoTips are available at http://phenotips.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/humu.22347DOI Listing
August 2013

iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data.

Genome Res 2013 Mar 29;23(3):519-29. Epub 2012 Nov 29.

Department of Computer Science, University of Toronto, Ontario M5S 2E4, Canada.

High-throughput RNA sequencing (RNA-seq) promises to revolutionize our understanding of genes and their role in human disease by characterizing the RNA content of tissues and cells. The realization of this promise, however, is conditional on the development of effective computational methods for the identification and quantification of transcripts from incomplete and noisy data. In this article, we introduce iReckon, a method for simultaneous determination of the isoforms and estimation of their abundances. Our probabilistic approach incorporates multiple biological and technical phenomena, including novel isoforms, intron retention, unspliced pre-mRNA, PCR amplification biases, and multimapped reads. iReckon utilizes regularized expectation-maximization to accurately estimate the abundances of known and novel isoforms. Our results on simulated and real data demonstrate a superior ability to discover novel isoforms with a significantly reduced number of false-positive predictions, and our abundance accuracy prediction outmatches that of other state-of-the-art tools. Furthermore, we have applied iReckon to two cancer transcriptome data sets, a triple-negative breast cancer patient sample and the MCF7 breast cancer cell line, and show that iReckon is able to reconstruct the complex splicing changes that were not previously identified. QT-PCR validations of the isoforms detected in the MCF7 cell line confirmed all of iReckon's predictions and also showed strong agreement (r(2) = 0.94) with the predicted abundances.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.142232.112DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3589540PMC
March 2013

Savant Genome Browser 2: visualization and analysis for population-scale genomics.

Nucleic Acids Res 2012 Jul 25;40(Web Server issue):W615-21. Epub 2012 May 25.

Department of Computer Science, University of Toronto, Ontario, Canada M5S 2E4.

High-throughput sequencing (HTS) technologies are providing an unprecedented capacity for data generation, and there is a corresponding need for efficient data exploration and analysis capabilities. Although most existing tools for HTS data analysis are developed for either automated (e.g. genotyping) or visualization (e.g. genome browsing) purposes, such tools are most powerful when combined. For example, integration of visualization and computation allows users to iteratively refine their analyses by updating computational parameters within the visual framework in real-time. Here we introduce the second version of the Savant Genome Browser, a standalone program for visual and computational analysis of HTS data. Savant substantially improves upon its predecessor and existing tools by introducing innovative visualization modes and navigation interfaces for several genomic datatypes, and synergizing visual and automated analyses in a way that is powerful yet easy even for non-expert users. We also present a number of plugins that were developed by the Savant Community, which demonstrate the power of integrating visual and automated analyses using Savant. The Savant Genome Browser is freely available (open source) at www.savantbrowser.com.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks427DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394255PMC
July 2012

Detecting copy number variation with mated short reads.

Genome Res 2010 Nov 30;20(11):1613-22. Epub 2010 Aug 30.

Department of Computer Science, University of Toronto, Toronto, Ontario M5R 3G4, Canada.

The development of high-throughput sequencing (HTS) technologies has opened the door to novel methods for detecting copy number variants (CNVs) in the human genome. While in the past CNVs have been detected based on array CGH data, recent studies have shown that depth-of-coverage information from HTS technologies can also be used for the reliable identification of large copy-variable regions. Such methods, however, are hindered by sequencing biases that lead certain regions of the genome to be over- or undersampled, lowering their resolution and ability to accurately identify the exact breakpoints of the variants. In this work, we develop a method for CNV detection that supplements the depth-of-coverage with paired-end mapping information, where mate pairs mapping discordantly to the reference serve to indicate the presence of variation. Our algorithm, called CNVer, combines this information within a unified computational framework called the donor graph, allowing us to better mitigate the sequencing biases that cause uneven local coverage and accurately predict CNVs. We use CNVer to detect 4879 CNVs in the recently described genome of a Yoruban individual. Most of the calls (77%) coincide with previously known variants within the Database of Genomic Variants, while 81% of deletion copy number variants previously known for this individual coincide with one of our loss calls. Furthermore, we demonstrate that CNVer can reconstruct the absolute copy counts of segments of the donor genome and evaluate the feasibility of using CNVer with low coverage datasets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.106344.110DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2963824PMC
November 2010

Savant: genome browser for high-throughput sequencing data.

Bioinformatics 2010 Aug 20;26(16):1938-44. Epub 2010 Jun 20.

Department of Computer Science, University of Toronto, Ontario, Canada.

Motivation: The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets.

Results: We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations.

Availability: Savant is freely available at http://compbio.cs.toronto.edu/savant.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btq332DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3271355PMC
August 2010

SHRiMP: accurate mapping of short color-space reads.

PLoS Comput Biol 2009 May 22;5(5):e1000386. Epub 2009 May 22.

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.

The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25-70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP - the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism. Our method is based upon a fast read mapping technique, separate thorough alignment methods for regular letter-space as well as AB SOLiD (color-space) reads, and a statistical model for false positive hits. We use SHRiMP to map reads from a newly sequenced Ciona savignyi individual to the reference genome. We demonstrate that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual. SHRiMP is freely available at http://compbio.cs.toronto.edu/shrimp.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1000386DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2678294PMC
May 2009
-->