Publications by authors named "Anne-Katrin Emde"

27 Publications

  • Page 1 of 1

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.

Nature 2021 02 10;590(7845):290-299. Epub 2021 Feb 10.

The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes). In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-03205-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875770PMC
February 2021

Distinct Classes of Complex Structural Variation Uncovered across Thousands of Cancer Genome Graphs.

Cell 2020 10;183(1):197-210.e32

Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.

Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2020.08.006DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7912537PMC
October 2020

Sequencing and curation strategies for identifying candidate glioblastoma treatments.

BMC Med Genomics 2019 04 25;12(1):56. Epub 2019 Apr 25.

Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY, 10065, USA.

Background: Prompted by the revolution in high-throughput sequencing and its potential impact for treating cancer patients, we initiated a clinical research study to compare the ability of different sequencing assays and analysis methods to analyze glioblastoma tumors and generate real-time potential treatment options for physicians.

Methods: A consortium of seven institutions in New York City enrolled 30 patients with glioblastoma and performed tumor whole genome sequencing (WGS) and RNA sequencing (RNA-seq; collectively WGS/RNA-seq); 20 of these patients were also analyzed with independent targeted panel sequencing. We also compared results of expert manual annotations with those from an automated annotation system, Watson Genomic Analysis (WGA), to assess the reliability and time required to identify potentially relevant pharmacologic interventions.

Results: WGS/RNAseq identified more potentially actionable clinical results than targeted panels in 90% of cases, with an average of 16-fold more unique potentially actionable variants identified per individual; 84 clinically actionable calls were made using WGS/RNA-seq that were not identified by panels. Expert annotation and WGA had good agreement on identifying variants [mean sensitivity = 0.71, SD = 0.18 and positive predictive value (PPV) = 0.80, SD = 0.20] and drug targets when the same variants were called (mean sensitivity = 0.74, SD = 0.34 and PPV = 0.79, SD = 0.23) across patients. Clinicians used the information to modify their treatment plan 10% of the time.

Conclusion: These results present the first comprehensive comparison of technical and machine augmented analysis of targeted panel and WGS/RNA-seq to identify potential cancer treatments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12920-019-0500-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6485090PMC
April 2019

Genetic mechanisms of primary chemotherapy resistance in pediatric acute myeloid leukemia.

Leukemia 2019 08 13;33(8):1934-1943. Epub 2019 Feb 13.

Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA.

Acute myeloid leukemias (AML) are characterized by mutations of tumor suppressor and oncogenes, involving distinct genes in adults and children. While certain mutations have been associated with the increased risk of AML relapse, the genomic landscape of primary chemotherapy-resistant AML is not well defined. As part of the TARGET initiative, we performed whole-genome DNA and transcriptome RNA and miRNA sequencing analysis of pediatric AML with failure of induction chemotherapy. We identified at least three genetic groups of patients with induction failure, including those with NUP98 rearrangements, somatic mutations of WT1 in the absence of apparent NUP98 mutations, and additional recurrent variants including those in KMT2C and MLLT10. Comparison of specimens before and after chemotherapy revealed distinct and invariant gene expression programs. While exhibiting overt therapy resistance, these leukemias nonetheless showed diverse forms of clonal evolution upon chemotherapy exposure. This included selection for mutant alleles of FRMD8, DHX32, PIK3R1, SHANK3, MKLN1, as well as persistence of WT1 and TP53 mutant clones, and elimination of FLT3, PTPN11, and NRAS mutant clones. These findings delineate genetic mechanisms of primary chemotherapy resistance in pediatric AML, which should inform improved approaches for its diagnosis and therapy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41375-019-0402-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687545PMC
August 2019

Human papillomavirus and the landscape of secondary genetic alterations in oral cancers.

Genome Res 2019 01 18;29(1):1-17. Epub 2018 Dec 18.

Department of Lymphoma and Myeloma, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA.

Human papillomavirus (HPV) is a necessary but insufficient cause of a subset of oral squamous cell carcinomas (OSCCs) that is increasing markedly in frequency. To identify contributory, secondary genetic alterations in these cancers, we used comprehensive genomics methods to compare 149 HPV-positive and 335 HPV-negative OSCC tumor/normal pairs. Different behavioral risk factors underlying the two OSCC types were reflected in distinctive genomic mutational signatures. In HPV-positive OSCCs, the signatures of APOBEC cytosine deaminase editing, associated with anti-viral immunity, were strongly linked to overall mutational burden. In contrast, in HPV-negative OSCCs, T>C substitutions in the sequence context 5'-ATN-3' correlated with tobacco exposure. Universal expression of HPV and oncogenes was a sine qua non of HPV-positive OSCCs. Significant enrichment of somatic mutations was confirmed or newly identified in , , , , , , , , , , , , , , , , and Of these, many affect host pathways already targeted by HPV oncoproteins, including the p53 and pRB pathways, or disrupt host defenses against viral infections, including interferon (IFN) and nuclear factor kappa B signaling. Frequent copy number changes were associated with concordant changes in gene expression. Chr 11q (including ) and 14q (including and ) were recurrently lost in HPV-positive OSCCs, in contrast to their gains in HPV-negative OSCCs. High-ranking variant allele fractions implicated , , and mutations as candidate driver events in HPV-positive cancers. We conclude that virus-host interactions cooperatively shape the unique genetic features of these cancers, distinguishing them from their HPV-negative counterparts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.241141.118DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6314162PMC
January 2019

Genome-wide somatic variant calling using localized colored de Bruijn graphs.

Commun Biol 2018 22;1:20. Epub 2018 Mar 22.

New York Genome Center, New York, NY, 10013, USA.

Reliable detection of somatic variations is of critical importance in cancer research. Here we present Lancet, an accurate and sensitive somatic variant caller, which detects SNVs and indels by jointly analyzing reads from tumor and matched normal samples using colored de Bruijn graphs. We demonstrate, through extensive experimental comparison on synthetic and real whole-genome sequencing datasets, that Lancet has better accuracy, especially for indel detection, than widely used somatic callers, such as MuTect, MuTect2, LoFreq, Strelka, and Strelka2. Lancet features a reliable variant scoring system, which is essential for variant prioritization, and detects low-frequency mutations without sacrificing the sensitivity to call longer insertions and deletions empowered by the local-assembly engine. In addition to genome-wide analysis, Lancet allows inspection of somatic variants in graph space, which augments the traditional read alignment visualization to help confirm a variant of interest. Lancet is available as an open-source program at https://github.com/nygenome/lancet.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42003-018-0023-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6123722PMC
March 2018

Analytical Validation of Clinical Whole-Genome and Transcriptome Sequencing of Patient-Derived Tumors for Reporting Targetable Variants in Cancer.

J Mol Diagn 2018 11 21;20(6):822-835. Epub 2018 Aug 21.

New York Genome Center, New York, New York; Columbia University Medical Center, Columbia University, New York, New York. Electronic address:

We developed and validated a clinical whole-genome and transcriptome sequencing (WGTS) assay that provides a comprehensive genomic profile of a patient's tumor. The ability to fully capture the mappable genome with sufficient sequencing coverage to precisely call DNA somatic single nucleotide variants, insertions/deletions, copy number variants, structural variants, and RNA gene fusions was analyzed. New York State's Department of Health next-generation DNA sequencing guidelines were expanded for establishing performance validation applicable to whole-genome and transcriptome sequencing. Whole-genome sequencing laboratory protocols were validated for the Illumina HiSeq X Ten platform and RNA sequencing for Illumina HiSeq2500 platform for fresh or frozen and formalin-fixed, paraffin-embedded tumor samples. Various bioinformatics tools were also tested, and CIs for sensitivity and specificity thresholds in calling clinically significant somatic aberrations were determined. The validation was performed on a set of 125 tumor normal pairs. RNA sequencing was performed to call fusions and to confirm the DNA variants or exonic alterations. Here, we present our results and WGTS standards for variant allele frequency, reproducibility, analytical sensitivity, and present limit of detection analysis for single nucleotide variant calling, copy number identification, and structural variants. We show that The New York Genome Center WGTS clinical assay can provide a comprehensive patient variant discovery approach suitable for directed oncologic therapeutic applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jmoldx.2018.06.007DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6198246PMC
November 2018

Whole Genome Sequencing-Based Discovery of Structural Variants in Glioblastoma.

Methods Mol Biol 2018 ;1741:1-29

New York Genome Center, New York, NY, USA.

Next-generation DNA sequencing (NGS) technologies are currently being applied in both research and clinical settings for the understanding and management of disease. The goal is to use high-throughput sequencing to identify specific variants that drive tumorigenesis within each individual's tumor genomic profile. The significance of copy number and structural variants in glioblastoma makes it essential to broaden the search beyond oncogenic single nucleotide variants toward whole genome profiles of genetic aberrations that may contribute to disease progression. The heterogeneity of glioblastoma and its variability of cancer driver mutations necessitate a more robust examination of a patient's tumor genome. Here, we present patient whole genome sequencing (WGS) information to identify oncogenic structural variants that may contribute to glioblastoma pathogenesis. We provide WGS protocols and bioinformatics approaches to identify copy number and structural variations in 41 glioblastoma patient samples. We present how WGS can identify structural diversity within glioblastoma samples. We specifically show how to apply current bioinformatics tools to detect EGFR variants and other structural aberrations from DNA whole genome sequencing and how to validate those variants within the laboratory. These comprehensive WGS protocols can provide additional information directing more precise therapeutic options in the treatment of glioblastoma.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-7659-1_1DOI Listing
December 2018

Next-Generation Rapid Autopsies Enable Tumor Evolution Tracking and Generation of Preclinical Models.

JCO Precis Oncol 2017 14;2017. Epub 2017 Jun 14.

Weill Cornell Medicine. The Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine and NewYork-Presbyterian Hospital.

Purpose: Patients with cancer who graciously consent for autopsy represent an invaluable resource for the study of cancer biology. To advance the study of tumor evolution, metastases, and resistance to treatment, we developed a next-generation rapid autopsy program integrated within a broader precision medicine clinical trial that interrogates pre- and postmortem tissue samples for patients of all ages and cancer types.

Materials And Methods: One hundred twenty-three (22%) of 554 patients who consented to the clinical trial also consented for rapid autopsy. This report comprises the first 15 autopsies, including patients with metastatic carcinoma (n = 10), melanoma (n = 1), and glioma (n = 4). Whole-exome sequencing (WES) was performed on frozen autopsy tumor samples from multiple anatomic sites and on non-neoplastic tissue. RNA sequencing (RNA-Seq) was performed on a subset of frozen samples. Tissue was also used for the development of preclinical models, including tumor organoids and patient-derived xenografts.

Results: Three hundred forty-six frozen samples were procured in total. WES was performed on 113 samples and RNA-Seq on 72 samples. Successful cell strain, tumor organoid, and/or patient-derived xenograft development was achieved in four samples, including an inoperable pediatric glioma. WES data were used to assess clonal evolution and molecular heterogeneity of tumors in individual patients. Mutational profiles of primary tumors and metastases yielded candidate mediators of metastatic spread and organotropism including and in metastatic ependymoma and in metastatic melanoma to the lung. RNA-Seq data identified novel gene fusion candidates.

Conclusion: A next-generation sequencing-based autopsy program in conjunction with a pre-mortem precision medicine pipeline for diverse tumors affords a valuable window into clonal evolution, metastasis, and alterations underlying treatment. Moreover, such an autopsy program yields robust preclinical models of disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5761727PMC
http://dx.doi.org/10.1200/PO.16.00038DOI Listing
June 2017

Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma.

Neurol Genet 2017 Aug 11;3(4):e164. Epub 2017 Jul 11.

New York Genome Center (K.O.W., M.O.F., N.R., A.-K.E., B.-J.C., K.A., M.S., V.V., E.A.B., J.L.M.V., M.C.Z., V.J., R.B.D.); IBM Thomas J. Watson Research Center (T.K., K.R., F.U., R.N., E.B., L.P., A.K.R.); Columbia University Medical Center (J.N.B., A.B.L., P.C., V.J.); Memorial Sloan-Kettering Cancer Center (C.G.), New York, NY; IBM Watson Health (S.H., V.V.M.), Boca Raton, FL; Laboratory of Molecular Neuro-Oncology (M.O.F., R.B.D.), and Howard Hughes Medical Institute (R.B.D.), The Rockefeller University, New York, NY. B.-J.C. is currently affiliated with Google, New York, NY. V.V. is currently affiliated with 23andMe, Inc., Mountain View, CA. E.A.B. is currently affiliated with Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany.

Objective: To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each.

Methods: Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs.

Results: More variants were identified by WGS/RNA analysis than by targeted panels. WGA completed a comparable analysis in a fraction of the time required by the human analysts.

Conclusions: The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual patients in a more timely and efficient manner than currently possible.

Clinicaltrialsgov Identifier: NCT02725684.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1212/NXG.0000000000000164DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5506390PMC
August 2017

PGBD5 promotes site-specific oncogenic mutations in human tumors.

Nat Genet 2017 Jul 15;49(7):1005-1014. Epub 2017 May 15.

Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York, USA.

Genomic rearrangements are a hallmark of human cancers. Here, we identify the piggyBac transposable element derived 5 (PGBD5) gene as encoding an active DNA transposase expressed in the majority of childhood solid tumors, including lethal rhabdoid tumors. Using assembly-based whole-genome DNA sequencing, we found previously undefined genomic rearrangements in human rhabdoid tumors. These rearrangements involved PGBD5-specific signal (PSS) sequences at their breakpoints and recurrently inactivated tumor-suppressor genes. PGBD5 was physically associated with genomic PSS sequences that were also sufficient to mediate PGBD5-induced DNA rearrangements in rhabdoid tumor cells. Ectopic expression of PGBD5 in primary immortalized human cells was sufficient to promote cell transformation in vivo. This activity required specific catalytic residues in the PGBD5 transposase domain as well as end-joining DNA repair and induced structural rearrangements with PSS breakpoints. These results define PGBD5 as an oncogenic mutator and provide a plausible mechanism for site-specific DNA rearrangements in childhood and adult solid tumors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3866DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5489359PMC
July 2017

Whole-Exome Sequencing of Metastatic Cancer and Biomarkers of Treatment Response.

JAMA Oncol 2015 Jul;1(4):466-74

Centre of Integrative Biology, University of Trento, Trento, Italy.

Importance: Understanding molecular mechanisms of response and resistance to anticancer therapies requires prospective patient follow-up and clinical and functional validation of both common and low-frequency mutations. We describe a whole-exome sequencing (WES) precision medicine trial focused on patients with advanced cancer.

Objective: To understand how WES data affect therapeutic decision making in patients with advanced cancer and to identify novel biomarkers of response.

Design, Setting, And Patients: Patients with metastatic and treatment-resistant cancer were prospectively enrolled at a single academic center for paired metastatic tumor and normal tissue WES during a 19-month period (February 2013 through September 2014). A comprehensive computational pipeline was used to detect point mutations, indels, and copy number alterations. Mutations were categorized as category 1, 2, or 3 on the basis of actionability; clinical reports were generated and discussed in precision tumor board. Patients were observed for 7 to 25 months for correlation of molecular information with clinical response.

Main Outcomes And Measures: Feasibility, use of WES for decision making, and identification of novel biomarkers.

Results: A total of 154 tumor-normal pairs from 97 patients with a range of metastatic cancers were sequenced, with a mean coverage of 95X and 16 somatic alterations detected per patient. In total, 16 mutations were category 1 (targeted therapy available), 98 were category 2 (biologically relevant), and 1474 were category 3 (unknown significance). Overall, WES provided informative results in 91 cases (94%), including alterations for which there is an approved drug, there are therapies in clinical or preclinical development, or they are considered drivers and potentially actionable (category 1-2); however, treatment was guided in only 5 patients (5%) on the basis of these recommendations because of access to clinical trials and/or off-label use of drugs. Among unexpected findings, a patient with prostate cancer with exceptional response to treatment was identified who harbored a somatic hemizygous deletion of the DNA repair gene FANCA and putative partial loss of function of the second allele through germline missense variant. Follow-up experiments established that loss of FANCA function was associated with platinum hypersensitivity both in vitro and in patient-derived xenografts, thus providing biologic rationale and functional evidence for his extreme clinical response.

Conclusions And Relevance: The majority of advanced, treatment-resistant tumors across tumor types harbor biologically informative alterations. The establishment of a clinical trial for WES of metastatic tumors with prospective follow-up of patients can help identify candidate predictive biomarkers of response.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1001/jamaoncol.2015.1313DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4505739PMC
July 2015

Disease variants in genomes of 44 centenarians.

Mol Genet Genomic Med 2014 Sep 15;2(5):438-50. Epub 2014 Jun 15.

The Litwin-Zucker Research Center for the Study of Alzheimer's Disease and Memory Disorders, The Feinstein Institute for Medical Research, North Shore-LIJ Manhasset, New York, 11030.

To identify previously reported disease mutations that are compatible with extraordinary longevity, we screened the coding regions of the genomes of 44 Ashkenazi Jewish centenarians. Individual genome sequences were generated with 30× coverage on the Illumina HiSeq 2000 and single-nucleotide variants were called with the genome analysis toolkit (GATK). We identified 130 coding variants that were annotated as "pathogenic" or "likely pathogenic" based on the ClinVar database and that are infrequent in the general population. These variants were previously reported to cause a wide range of degenerative, neoplastic, and cardiac diseases with autosomal dominant, autosomal recessive, and X-linked inheritance. Several of these variants are located in genes that harbor actionable incidental findings, according to the recommendations of the American College of Medical Genetics. In addition, we found risk variants for late-onset neurodegenerative diseases, such as the APOE ε4 allele that was even present in a homozygous state in one centenarian who did not develop Alzheimer's disease. Our data demonstrate that the incidental finding of certain reported disease variants in an individual genome may not preclude an extraordinarily long life. When the observed variants are encountered in the context of clinical sequencing, it is thus important to exercise caution in justifying clinical decisions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/mgg3.86DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4190879PMC
September 2014

Comparative sequencing analysis reveals high genomic concordance between matched primary and metastatic colorectal cancer lesions.

Genome Biol 2014 Aug 28;15(8):454. Epub 2014 Aug 28.

Background: Colorectal cancer is the second leading cause of cancer death in the United States, with over 50,000 deaths estimated in 2014. Molecular profiling for somatic mutations that predict absence of response to anti-EGFR therapy has become standard practice in the treatment of metastatic colorectal cancer; however, the quantity and type of tissue available for testing is frequently limited. Further, the degree to which the primary tumor is a faithful representation of metastatic disease has been questioned. As next-generation sequencing technology becomes more widely available for clinical use and additional molecularly targeted agents are considered as treatment options in colorectal cancer, it is important to characterize the extent of tumor heterogeneity between primary and metastatic tumors.

Results: We performed deep coverage, targeted next-generation sequencing of 230 key cancer-associated genes for 69 matched primary and metastatic tumors and normal tissue. Mutation profiles were 100% concordant for KRAS, NRAS, and BRAF, and were highly concordant for recurrent alterations in colorectal cancer. Additionally, whole genome sequencing of four patient trios did not reveal any additional site-specific targetable alterations.

Conclusions: Colorectal cancer primary tumors and metastases exhibit high genomic concordance. As current clinical practices in colorectal cancer revolve around KRAS, NRAS, and BRAF mutation status, diagnostic sequencing of either primary or metastatic tissue as available is acceptable for most patients. Additionally, consistency between targeted sequencing and whole genome sequencing results suggests that targeted sequencing may be a suitable strategy for clinical diagnostic applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-014-0454-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4189196PMC
August 2014

Gustaf: Detecting and correctly classifying SVs in the NGS twilight zone.

Bioinformatics 2014 Dec 14;30(24):3484-90. Epub 2014 Jul 14.

Department of Computer Science, Freie Universität Berlin, 14195 Berlin, Germany, Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany and New York Genome Center, New York, NY 10013, USA.

Motivation: The landscape of structural variation (SV) including complex duplication and translocation patterns is far from resolved. SV detection tools usually exhibit low agreement, are often geared toward certain types or size ranges of variation and struggle to correctly classify the type and exact size of SVs.

Results: We present Gustaf (Generic mUlti-SpliT Alignment Finder), a sound generic multi-split SV detection tool that detects and classifies deletions, inversions, dispersed duplications and translocations of ≥ 30 bp. Our approach is based on a generic multi-split alignment strategy that can identify SV breakpoints with base pair resolution. We show that Gustaf correctly identifies SVs, especially in the range from 30 to 100 bp, which we call the next-generation sequencing (NGS) twilight zone of SVs, as well as larger SVs >500 bp. Gustaf performs better than similar tools in our benchmark and is furthermore able to correctly identify size and location of dispersed duplications and translocations, which otherwise might be wrongly classified, for example, as large deletions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btu431DOI Listing
December 2014

Detection of a recurrent DNAJB1-PRKACA chimeric transcript in fibrolamellar hepatocellular carcinoma.

Science 2014 Feb;343(6174):1010-4

Laboratory of Cellular Biophysics, Rockefeller University, 1230 York Avenue, New York, NY 10065, USA.

Fibrolamellar hepatocellular carcinoma (FL-HCC) is a rare liver tumor affecting adolescents and young adults with no history of primary liver disease or cirrhosis. We identified a chimeric transcript that is expressed in FL-HCC but not in adjacent normal liver and that arises as the result of a ~400-kilobase deletion on chromosome 19. The chimeric RNA is predicted to code for a protein containing the amino-terminal domain of DNAJB1, a homolog of the molecular chaperone DNAJ, fused in frame with PRKACA, the catalytic domain of protein kinase A. Immunoprecipitation and Western blot analyses confirmed that the chimeric protein is expressed in tumor tissue, and a cell culture assay indicated that it retains kinase activity. Evidence supporting the presence of the DNAJB1-PRKACA chimeric transcript in 100% of the FL-HCCs examined (15/15) suggests that this genetic alteration contributes to tumor pathogenesis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1249484DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286414PMC
February 2014

Breakpointer: using local mapping artifacts to support sequence breakpoint discovery from single-end reads.

Bioinformatics 2012 Apr 1;28(7):1024-5. Epub 2012 Feb 1.

Department of Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany.

Summary: We developed Breakpointer, a fast algorithm to locate breakpoints of structural variants (SVs) from single-end reads produced by next-generation sequencing. By taking advantage of local non-uniform read distribution and misalignments created by SVs, Breakpointer scans the alignment of single-end reads to identify regions containing potential breakpoints. The detection of such breakpoints can indicate insertions longer than the read length and SVs located in repetitve regions which might be missd by other methods. Thus, Breakpointer complements existing methods to locate SVs from single-end reads.

Availability: https://github.com/ruping/Breakpointer

Contact: [email protected]

Supplementary Information: Supplementary material is available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bts064DOI Listing
April 2012

Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS.

Bioinformatics 2012 Mar 11;28(5):619-27. Epub 2012 Jan 11.

Department of Computer Science, Freie Universität Berlin, Takustrasse 9, Max-Planck-Institute for Molecular Genetics, Berlin, Germany.

Motivation: The reliable detection of genomic variation in resequencing data is still a major challenge, especially for variants larger than a few base pairs. Sequencing reads crossing boundaries of structural variation carry the potential for their identification, but are difficult to map.

Results: Here we present a method for 'split' read mapping, where prefix and suffix match of a read may be interrupted by a longer gap in the read-to-reference alignment. We use this method to accurately detect medium-sized insertions and long deletions with precise breakpoints in genomic resequencing data. Compared with alternative split mapping methods, SplazerS significantly improves sensitivity for detecting large indel events, especially in variant-rich regions. Our method is robust in the presence of sequencing errors as well as alignment errors due to genomic mutations/divergence, and can be used on reads of variable lengths. Our analysis shows that SplazerS is a versatile tool applicable to unanchored or single-end as well as anchored paired-end reads. In addition, application of SplazerS to targeted resequencing data led to the interesting discovery of a complete, possibly functional gene retrocopy variant.

Availability: SplazerS is available from http://www.seqan.de/projects/ splazers.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bts019DOI Listing
March 2012

A novel and well-defined benchmarking method for second generation read mapping.

BMC Bioinformatics 2011 May 26;12:210. Epub 2011 May 26.

Department of Computer Science, Free University of Berlin, Takustr, Germany.

Background: Second generation sequencing technologies yield DNA sequence data at ultra high-throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. The assessment of the quality of read mapping results is not straightforward and has not been formalized so far. Hence, it has not been easy to compare different read mapping approaches in a unified way and to determine which program is the best for what task.

Results: We present a new benchmark method, called Rabema (Read Alignment BEnchMArk), for read mappers. It consists of a strict definition of the read mapping problem and of tools to evaluate the result of arbitrary read mappers supporting the SAM output format.

Conclusions: We show the usefulness of the benchmark program by performing a comparison of popular read mappers. The tools supporting the benchmark are licensed under the GPL and available from http://www.seqan.de/projects/rabema.html.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-12-210DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3128034PMC
May 2011

MicroRazerS: rapid alignment of small RNA reads.

Bioinformatics 2010 Jan 29;26(1):123-4. Epub 2009 Oct 29.

Department of Computer Science, Free University of Berlin, Takustr. 9, Berlin, Germany.

Motivation: Deep sequencing has become the method of choice for determining the small RNA content of a cell. Mapping the sequenced reads onto their reference genome serves as the basis for all further analyses, namely for identification and quantification. A method frequently used is Mega BLAST followed by several filtering steps, even though it is slow and inefficient for this task. Also, none of the currently available short read aligners has established itself for the particular task of small RNA mapping.

Results: We present MicroRazerS, a tool optimized for mapping small RNAs onto a reference genome. It is an order of magnitude faster than Mega BLAST and comparable in speed with other short read mapping tools. In addition, it is more sensitive and easy to handle and adjust.

Availability: MicroRazerS is part of the SeqAn C++ library and can be downloaded from http://www.seqan.de/projects/MicroRazerS.html.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btp601DOI Listing
January 2010

RazerS--fast read mapping with sensitivity control.

Genome Res 2009 Sep 10;19(9):1646-54. Epub 2009 Jul 10.

Department of Computer Science, Free University of Berlin, 14195 Berlin, Germany.

Second-generation sequencing technologies deliver DNA sequence data at unprecedented high throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. Due to the large amounts of data, efficient algorithms and implementations are crucial for this task. We present an efficient read mapping tool called RazerS. It allows the user to align sequencing reads of arbitrary length using either the Hamming distance or the edit distance. Our tool can work either lossless or with a user-defined loss rate at higher speeds. Given the loss rate, we present an approach that guarantees not to lose more reads than specified. This enables the user to adapt to the problem at hand and provides a seamless tradeoff between sensitivity and running time.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.088823.108DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2752123PMC
September 2009

A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads.

Bioinformatics 2009 May 5;25(9):1118-24. Epub 2009 Mar 5.

International Max Planck Research School for Computational Biology and Scientific Computing, Ihnestr. 63-73, Algorithmische Bioinformatik, Institut für Informatik, Takustr. 9, 14195 Berlin, Germany.

Motivation: Novel high-throughput sequencing technologies pose new algorithmic challenges in handling massive amounts of short-read, high-coverage data. A robust and versatile consensus tool is of particular interest for such data since a sound multi-read alignment is a prerequisite for variation analyses, accurate genome assemblies and insert sequencing.

Results: A multi-read alignment algorithm for de novo or reference-guided genome assembly is presented. The program identifies segments shared by multiple reads and then aligns these segments using a consistency-enhanced alignment graph. On real de novo sequencing data obtained from the newly established NCBI Short Read Archive, the program performs similarly in quality to other comparable programs. On more challenging simulated datasets for insert sequencing and variation analyses, our program outperforms the other tools.

Availability: The consensus program can be downloaded from http://www.seqan.de/projects/consensus.html. It can be used stand-alone or in conjunction with the Celera Assembler. Both application scenarios as well as the usage of the tool are described in the documentation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btp131DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2732307PMC
May 2009

Segment-based multiple sequence alignment.

Bioinformatics 2008 Aug;24(16):i187-92

International Max Planck Research School for Computational Biology and Scientific Computing, Ihnestr 63-73, 14195 Berlin, Germany.

Motivation: Many multiple sequence alignment tools have been developed in the past, progressing either in speed or alignment accuracy. Given the importance and wide-spread use of alignment tools, progress in both categories is a contribution to the community and has driven research in the field so far.

Results: We introduce a graph-based extension to the consistency-based, progressive alignment strategy. We apply the consistency notion to segments instead of single characters. The main problem we solve in this context is to define segments of the sequences in such a way that a graph-based alignment is possible. We implemented the algorithm using the SeqAn library and report results on amino acid and DNA sequences. The benefit of our approach is threefold: (1) sequences with conserved blocks can be rapidly aligned, (2) the implementation is conceptually easy, generic and fast and (3) the consistency idea can be extended to align multiple genomic sequences.

Availability: The segment-based multiple sequence alignment tool can be downloaded from http://www.seqan.de/projects/msa.html. A novel version of T-Coffee interfaced with the tool is available from http://www.tcoffee.org. The usage of the tool is described in both documentations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btn281DOI Listing
August 2008

Analytical model of peptide mass cluster centres with applications.

Proteome Sci 2006 Sep 23;4:18. Epub 2006 Sep 23.

School of Mathematics and Statistics, Merz Court, University of Newcastle upon Tyne, NE1 7RU, UK.

Background: The elemental composition of peptides results in formation of distinct, equidistantly spaced clusters across the mass range. The property of peptide mass clustering is used to calibrate peptide mass lists, to identify and remove non-peptide peaks and for data reduction.

Results: We developed an analytical model of the peptide mass cluster centres. Inputs to the model included, the amino acid frequencies in the sequence database, the average length of the proteins in the database, the cleavage specificity of the proteolytic enzyme used and the cleavage probability. We examined the accuracy of our model by comparing it with the model based on an in silico sequence database digest. To identify the crucial parameters we analysed how the cluster centre location depends on the inputs. The distance to the nearest cluster was used to calibrate mass spectrometric peptide peak-lists and to identify non-peptide peaks.

Conclusion: The model introduced here enables us to predict the location of the peptide mass cluster centres. It explains how the location of the cluster centres depends on the input parameters. Fast and efficient calibration and filtering of non-peptide peaks is achieved by a distance measure suggested by Wool and Smilansky.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1477-5956-4-18DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1617084PMC
September 2006
-->