Publications by authors named "Arek Kasprzyk"

19 Publications

  • Page 1 of 1

Pathway-based subnetworks enable cross-disease biomarker discovery.

Nat Commun 2018 11 12;9(1):4746. Epub 2018 Nov 12.

Informatics and Biocomputing Program, Ontario Institute for Cancer Research, Toronto, M5G 0A3, Canada.

Biomarkers lie at the heart of precision medicine. Surprisingly, while rapid genomic profiling is becoming ubiquitous, the development of biomarkers usually involves the application of bespoke techniques that cannot be directly applied to other datasets. There is an urgent need for a systematic methodology to create biologically-interpretable molecular models that robustly predict key phenotypes. Here we present SIMMS (Subnetwork Integration for Multi-Modal Signatures): an algorithm that fragments pathways into functional modules and uses these to predict phenotypes. We apply SIMMS to multiple data types across five diseases, and in each it reproducibly identifies known and novel subtypes, and makes superior predictions to the best bespoke approaches. To demonstrate its ability on a new dataset, we profile 33 genes/nodes of the PI3K pathway in 1734 FFPE breast tumors and create a four-subnetwork prediction model. This model out-performs a clinically-validated molecular test in an independent cohort of 1742 patients. SIMMS is generic and enables systematic data integration for robust biomarker discovery.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-07021-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6232113PMC
November 2018

The BioMart community portal: an innovative alternative to large, centralized data repositories.

Nucleic Acids Res 2015 Jul 20;43(W1):W589-98. Epub 2015 Apr 20.

Oncology Computational Biology, Pfizer, La Jolla, USA.

The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkv350DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4489294PMC
July 2015

Integrating RAS status into prognostic signatures for adenocarcinomas of the lung.

Clin Cancer Res 2015 Mar 21;21(6):1477-86. Epub 2015 Jan 21.

Informatics and Biocomputing Program, Ontario Institute for Cancer Research, Toronto, Canada. Princess Margaret Cancer Centre, University Health Network, Toronto, Canada. Department of Medical Biophysics, University of Toronto, Toronto, Canada. Department of Pharmacology and Toxicology, University of Toronto, Toronto, Canada.

Purpose: While the dysregulation of specific pathways in cancer influences both treatment response and outcome, few current prognostic markers explicitly consider differential pathway activation. Here we explore this concept, focusing on K-Ras mutations in lung adenocarcinoma (present in 25%-35% of patients).

Experimental Design: The effect of K-Ras mutation status on prognostic accuracy of existing signatures was evaluated in 404 patients. Genes associated with K-Ras mutation status were identified and used to create a RAS pathway activation classifier to provide a more accurate measure of RAS pathway status. Next, 8 million random signatures were evaluated to assess differences in prognosing patients with or without RAS activation. Finally, a prognostic signature was created to target patients with RAS pathway activation.

Results: We first show that K-Ras status influences the accuracy of existing prognostic signatures, which are effective in K-Ras-wild-type patients but fail in patients with K-Ras mutations. Next, we show that it is fundamentally more difficult to predict the outcome of patients with RAS activation (RAS(mt)) than that of those without (RAS(wt)). More importantly, we demonstrate that different signatures are prognostic in RAS(wt) and RAS(mt). Finally, to exploit this discovery, we create separate prognostic signatures for RAS(wt) and RAS(mt) patients and show that combining them significantly improves predictions of patient outcome.

Conclusions: We present a nested model for integrated genomic and transcriptomic data. This model is general and is not limited to lung adenocarcinomas but can be expanded to other tumor types and oncogenes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1158/1078-0432.CCR-14-1749DOI Listing
March 2015

The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies.

J Biomed Semantics 2013 Feb 11;4(1). Epub 2013 Feb 11.

Database Center for Life Science, Research Organization of Information and Systems, 2-11-16, Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan.

Background: BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research.

Results: The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization.

Conclusion: We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/2041-1480-4-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3598643PMC
February 2013

The prognostic value of temporal in vitro and in vivo derived hypoxia gene-expression signatures in breast cancer.

Radiother Oncol 2012 Mar 20;102(3):436-43. Epub 2012 Feb 20.

Department of Radiation Oncology, MaastRO, GROW-School for Oncology and Developmental Biology, Maastricht University Medical Center, Maastricht, The Netherlands.

Background And Purpose: Recent data suggest that in vitro and in vivo derived hypoxia gene-expression signatures have prognostic power in breast and possibly other cancers. However, both tumour hypoxia and the biological adaptation to this stress are highly dynamic. Assessment of time-dependent gene-expression changes in response to hypoxia may thus provide additional biological insights and assist in predicting the impact of hypoxia on patient prognosis.

Materials And Methods: Transcriptome profiling was performed for three cell lines derived from diverse tumour-types after hypoxic exposure at eight time-points, which include a normoxic time-point. Time-dependent sets of co-regulated genes were identified from these data. Subsequently, gene ontology (GO) and pathway analyses were performed. The prognostic power of these novel signatures was assessed in parallel with previous in vitro and in vivo derived hypoxia signatures in a large breast cancer microarray meta-dataset (n=2312).

Results: We identified seven recurrent temporal and two general hypoxia signatures. GO and pathway analyses revealed regulation of both common and unique underlying biological processes within these signatures. None of the new or previously published in vitro signatures consisting of hypoxia-induced genes were prognostic in the large breast cancer dataset. In contrast, signatures of repressed genes, as well as the in vivo derived signatures of hypoxia-induced genes showed clear prognostic power.

Conclusions: Only a subset of hypoxia-induced genes in vitro demonstrates prognostic value when evaluated in a large clinical dataset. Despite clear evidence of temporal patterns of gene-expression in vitro, the subset of prognostic hypoxia regulated genes cannot be identified based on temporal pattern alone. In vivo derived signatures appear to identify the prognostic hypoxia induced genes. The prognostic value of hypoxia-repressed genes is likely a surrogate for the known importance of proliferation in breast cancer outcome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.radonc.2012.02.002DOI Listing
March 2012

BioMart Central Portal: an open database network for the biological community.

Database (Oxford) 2011 18;2011:bar041. Epub 2011 Sep 18.

Ontario Institute for Cancer Research, Toronto, M5G 0A3, Canada.

BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bar041DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3263598PMC
January 2012

BioMart: a data federation framework for large collaborative projects.

Database (Oxford) 2011 19;2011:bar038. Epub 2011 Sep 19.

Ontario Institute for Cancer Research, Toronto, Informatics and Biocomputing, Ontario M5G 0A3, Canada.

BioMart is a freely available, open source, federated database system that provides a unified access to disparate, geographically distributed data sources. It is designed to be data agnostic and platform independent, such that existing databases can easily be incorporated into the BioMart framework. BioMart allows databases hosted on different servers to be presented seamlessly to users, facilitating collaborative projects between different research groups. BioMart contains several levels of query optimization to efficiently manage large data sets and offers a diverse selection of graphical user interfaces and application programming interfaces to ensure that queries can be performed in whatever manner is most convenient for the user. The software has now been adopted by a large number of different biological databases spanning a wide range of data types and providing a rich source of annotation available to bioinformaticians and biologists alike.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bar038DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3175789PMC
January 2012

International Cancer Genome Consortium Data Portal--a one-stop shop for cancer genomics data.

Database (Oxford) 2011 19;2011:bar026. Epub 2011 Sep 19.

Ontario Institute for Cancer Research, Toronto, Ontario M5G 0A3, Canada.

The International Cancer Genome Consortium (ICGC) is a collaborative effort to characterize genomic abnormalities in 50 different cancer types. To make this data available, the ICGC has created the ICGC Data Portal. Powered by the BioMart software, the Data Portal allows each ICGC member institution to manage and maintain its own databases locally, while seamlessly presenting all the data in a single access point for users. The Data Portal currently contains data from 24 cancer projects, including ICGC, The Cancer Genome Atlas (TCGA), Johns Hopkins University, and the Tumor Sequencing Project. It consists of 3478 genomes and 13 cancer types and subtypes. Available open access data types include simple somatic mutations, copy number alterations, structural rearrangements, gene expression, microRNAs, DNA methylation and exon junctions. Additionally, simple germline variations are available as controlled access data. The Data Portal uses a web-based graphical user interface (GUI) to offer researchers multiple ways to quickly and easily search and analyze the available data. The web interface can assist in constructing complicated queries across multiple data sets. Several application programming interfaces are also available for programmatic access. Here we describe the organization, functionality, and capabilities of the ICGC Data Portal.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bar026DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3263593PMC
January 2012

The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications.

J Biomed Semantics 2011 Aug 2;2. Epub 2011 Aug 2.

Database Center for Life Science, Research Organization of Information and Systems, 2-11-16 Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan.

Background: The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009.

Results: Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs.

Conclusions: Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/2041-1480-2-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3170566PMC
August 2011

International network of cancer genome projects.

Nature 2010 Apr;464(7291):993-8

The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nature08987DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2902243PMC
April 2010

BioMart Central Portal--unified access to biological data.

Nucleic Acids Res 2009 Jul 6;37(Web Server issue):W23-7. Epub 2009 May 6.

EMBL-European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK.

BioMart Central Portal (www.biomart.org) offers a one-stop shop solution to access a wide array of biological databases. These include major biomolecular sequence, pathway and annotation databases such as Ensembl, Uniprot, Reactome, HGNC, Wormbase and PRIDE; for a complete list, visit, http://www.biomart.org/biomart/martview. Moreover, the web server features seamless data federation making cross querying of these data sources in a user friendly and unified way. The web server not only provides access through a web interface (MartView), it also supports programmatic access through a Perl API as well as RESTful and SOAP oriented web services. The website is free and open to all users and there is no login requirement.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkp265DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2703988PMC
July 2009

BioMart--biological queries made easy.

BMC Genomics 2009 Jan 14;10:22. Epub 2009 Jan 14.

Ontario Institute for Cancer Research, MaRS Centre, 101 College Street, Toronto, Ontario, Canada.

Background: Biologists need to perform complex queries, often across a variety of databases. Typically, each data resource provides an advanced query interface, each of which must be learnt by the biologist before they can begin to query them. Frequently, more than one data source is required and for high-throughput analysis, cutting and pasting results between websites is certainly very time consuming. Therefore, many groups rely on local bioinformatics support to process queries by accessing the resource's programmatic interfaces if they exist. This is not an efficient solution in terms of cost and time. Instead, it would be better if the biologist only had to learn one generic interface. BioMart provides such a solution.

Results: BioMart enables scientists to perform advanced querying of biological data sources through a single web interface. The power of the system comes from integrated querying of data sources regardless of their geographical locations. Once these queries have been defined, they may be automated with its "scripting at the click of a button" functionality. BioMart's capabilities are extended by integration with several widely used software packages such as BioConductor, DAS, Galaxy, Cytoscape, Taverna. In this paper, we describe all aspects of BioMart from a user's perspective and demonstrate how it can be used to solve real biological use cases such as SNP selection for candidate gene screening or annotation of microarray results.

Conclusion: BioMart is an easy to use, generic and scalable system and therefore, has become an integral part of large data resources including Ensembl, UniProt, HapMap, Wormbase, Gramene, Dictybase, PRIDE, MSD and Reactome. BioMart is freely accessible to use at http://www.biomart.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-10-22DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2649164PMC
January 2009

BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis.

Bioinformatics 2005 Aug;21(16):3439-40

Department of Electronical Engineering, ESAT-SCD, K.U.Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium.

biomaRt is a new Bioconductor package that integrates BioMart data resources with data analysis software in Bioconductor. It can annotate a wide range of gene or gene product identifiers (e.g. Entrez-Gene and Affymetrix probe identifiers) with information such as gene symbol, chromosomal coordinates, Gene Ontology and OMIM annotation. Furthermore biomaRt enables retrieval of genomic sequences and single nucleotide polymorphism information, which can be used in data analysis. Fast and up-to-date data retrieval is possible as the package executes direct SQL queries to the BioMart databases (e.g. Ensembl). The biomaRt package provides a tight integration of large, public or locally installed BioMart databases with data analysis in Bioconductor creating a powerful environment for biological data mining.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bti525DOI Listing
August 2005

Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

PLoS Biol 2004 Jun 20;2(6):e162. Epub 2004 Apr 20.

Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan.

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pbio.0020162DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC393292PMC
June 2004

An overview of Ensembl.

Genome Res 2004 May 12;14(5):925-8. Epub 2004 Apr 12.

EMBL European Bioinformatics Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl's aims are to continue to "widen" this biological integration to include other model organisms relevant to understanding human biology as they become available; to "deepen" this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.1860604DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC479121PMC
May 2004

EnsMart: a generic system for fast and flexible access to biological data.

Genome Res 2004 Jan;14(1):160-9

European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SH, UK.

The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools. The system consists of a query-optimized database and interactive, user-friendly interfaces. EnsMart has been applied to Ensembl, where it extends its genomic browser capabilities, facilitating rapid retrieval of customized data sets. A wide variety of complex queries, on various types of annotations, for numerous species are supported. These can be applied to many research problems, ranging from SNP selection for candidate gene screening, through cross-species evolutionary comparisons, to microarray annotation. Users can group and refine biological data according to many criteria, including cross-species analyses, disease links, sequence variations, and expression patterns. Both tabulated list data and biological sequence output can be generated dynamically, in HTML, text, Microsoft Excel, and compressed formats. A wide range of sequence types, such as cDNA, peptides, coding regions, UTRs, and exons, with additional upstream and downstream regions, can be retrieved. The EnsMart database can be accessed via a public Web site, or through a Java application suite. Both implementations and the database are freely available for local installation, and can be extended or adapted to 'non-Ensembl' data sets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.1645104DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC314293PMC
January 2004

Narrowing and genomic annotation of the commonly deleted region of the 5q- syndrome.

Blood 2002 Jun;99(12):4638-41

Leukaemia Research Fund Molecular Haematology Unit, Nuffield Department of Clinical Laboratory Science, University of Oxford, John Radcliffe Hospital, Oxford, United Kingdom.

The 5q- syndrome is the most distinct of the myelodysplastic syndromes, and the molecular basis for this disorder remains unknown. We describe the narrowing of the common deleted region (CDR) of the 5q- syndrome to the approximately 1.5-megabases interval at 5q32 flanked by D5S413 and the GLRA1 gene. The Ensembl gene prediction program has been used for the complete genomic annotation of the CDR. The CDR is gene rich and contains 24 known genes and 16 novel (predicted) genes. Of 40 genes in the CDR, 33 are expressed in CD34(+) cells and, therefore, represent candidate genes since they are expressed within the hematopoietic stem/progenitor cell compartment. A number of the genes assigned to the CDR represent good candidates for the 5q- syndrome, including MEGF1, G3BP, and several of the novel gene predictions. These data now afford a comprehensive mutational/expression analysis of all candidate genes assigned to the CDR.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1182/blood.v99.12.4638DOI Listing
June 2002
-->