Publications by authors named "Simon Twigger"

34 Publications

Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder.

Nat Neurosci 2017 Apr 6;20(4):602-611. Epub 2017 Mar 6.

The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Canada.

We are performing whole-genome sequencing of families with autism spectrum disorder (ASD) to build a resource (MSSNG) for subcategorizing the phenotypes and underlying genetic factors involved. Here we report sequencing of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible on a cloud platform and through a controlled-access internet portal. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertions and deletions or copy number variations per ASD subject. We identified 18 new candidate ASD-risk genes and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (P = 6 × 10). In 294 of 2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried copy number variations and/or chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nn.4524DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5501701PMC
April 2017

InterMOD: integrated data and tools for the unification of model organism research.

Sci Rep 2013 ;3:1802

Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, United Kingdom.

Model organisms are widely used for understanding basic biology, and have significantly contributed to the study of human disease. In recent years, genomic analysis has provided extensive evidence of widespread conservation of gene sequence and function amongst eukaryotes, allowing insights from model organisms to help decipher gene function in a wider range of species. The InterMOD consortium is developing an infrastructure based around the InterMine data warehouse system to integrate genomic and functional data from a number of key model organisms, leading the way to improved cross-species research. So far including budding yeast, nematode worm, fruit fly, zebrafish, rat and mouse, the project has set up data warehouses, synchronized data models, and created analysis tools and links between data from different species. The project unites a number of major model organism databases, improving both the consistency and accessibility of comparative research, to the benefit of the wider scientific community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/srep01802DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3647165PMC
February 2014

The Rat Genome Database pathway portal.

Database (Oxford) 2011 8;2011:bar010. Epub 2011 Apr 8.

Human and Molecular Genetics Center, Medical College of Wisconsin, WI, 53226-3548, USA.

The set of interacting molecules collectively referred to as a pathway or network represents a fundamental structural unit, the building block of the larger, highly integrated networks of biological systems. The scientific community's interest in understanding the fine details of how pathways work, communicate with each other and synergize, and how alterations in one or several pathways may converge into a disease phenotype, places heightened demands on pathway data and information providers. To meet such demands, the Rat Genome Database [(RGD) http://rgd.mcw.edu] has adopted a multitiered approach to pathway data acquisition and presentation. Resources and tools are continuously added or expanded to offer more comprehensive pathway data sets as well as enhanced pathway data manipulation, exploration and visualization capabilities. At RGD, users can easily identify genes in pathways, see how pathways relate to each other and visualize pathways in a dynamic and integrated manner. They can access these and other components from several entry points and effortlessly navigate between them and they can download the data of interest. The Pathway Portal resources at RGD are presented, and future directions are discussed. Database URL: http://rgd.mcw.edu.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bar010DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3072770PMC
September 2011

The Rat Genome Database curation tool suite: a set of optimized software tools enabling efficient acquisition, organization, and presentation of biological data.

Database (Oxford) 2011 14;2011:bar002. Epub 2011 Feb 14.

Human and Molecular Genetics Center, Medical College of Wisconsin, 8701 Watertown Plank Rd, Milwaukee, WI 53226-3548, USA.

The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40,000 rat gene records as well as human and mouse orthologs, 1771 rat and 1911 human quantitative trait loci (QTLs) and 2209 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. A suite of tools has been developed to aid curators in acquiring and validating data objects, assigning nomenclature, attaching biological information to objects and making connections among data types. The software used to assign nomenclature, to create and edit objects and to make annotations to the data objects has been specifically designed to make the curation process as fast and efficient as possible. The user interfaces have been adapted to the work routines of the curators, creating a suite of tools that is intuitive and powerful. Database URL: http://rgd.mcw.edu.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/database/bar002DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041158PMC
May 2011

The rat genome database curators: who, what, where, why.

PLoS Comput Biol 2009 Nov 26;5(11):e1000582. Epub 2009 Nov 26.

Rat Genome Database, Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1000582DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775909PMC
November 2009

Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms.

J Proteome Res 2009 Jun;8(6):3148-53

Biotechnology and Bioengineering Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin 53226, USA.

One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/pr800970zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2691775PMC
June 2009

The Rat Genome Database 2009: variation, ontologies and pathways.

Nucleic Acids Res 2009 Jan 7;37(Database issue):D744-9. Epub 2008 Nov 7.

Department of Physiology and Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI, USA.

The Rat Genome Database (RGD, http://rgd.mcw.edu) was developed to provide a core resource for rat researchers combining genetic, genomic, pathway, phenotype and strain information with a focus on disease. RGD users are provided with access to structured and curated data from the molecular level through to the level of the whole organism, including the variations associated with disease phenotypes. To fully support use of the rat as a translational model for biological systems and human disease, RGD continues to curate these datasets while enhancing and developing tools to allow efficient and effective access to the data in a variety of formats including linear genome viewers, pathway diagrams and biological ontologies. To support pathophysiological analysis of data, RGD Disease Portals provide an entryway to integrated gene, QTL and strain data specific to a particular disease. In addition to tool and content development and maintenance, RGD promotes rat research and provides user education by creating and disseminating tutorials on the curated datasets, submission processes, and tools available at RGD. By curating, storing, integrating, visualizing and promoting rat data, RGD ensures that the investment made into rat genomics and genetics can be leveraged by all interested investigators.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkn842DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2686558PMC
January 2009

Big data: The future of biocuration.

Nature 2008 Sep;455(7209):47-50

The Zebrafish Information Network, 5291 University of Oregon, Eugene, Oregon 97403-5291, USA.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/455047aDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2819144PMC
September 2008

Using multiple ontologies to integrate complex biological data.

Comp Funct Genomics 2005 ;6(7-8):373-8

Human and Molecular Genetics Center Medical College of Wisconsin Milwaukee WI 53226 USA.

The strength of the rat as a model organism lies in its utility in pharmacology, biochemistry and physiology research. Data resulting from such studies is difficult to represent in databases and the creation of user-friendly data mining tools has proved difficult. The Rat Genome Database has developed a comprehensive ontology-based data structure and annotation system to integrate physiological data along with environmental and experimental factors, as well as genetic and genomic information. RGD uses multiple ontologies to integrate complex biological information from the molecular level to the whole organism, and to develop data mining and presentation tools. This approach allows RGD to indicate not only the phenotypes seen in a strain but also the specific values under each diet and atmospheric condition, as well as gender differences. Harnessing the power of ontologies in this way allows the user to gather and filter data in a customized fashion, so that a researcher can retrieve all phenotype readings for which a high hypoxia is a factor. Utilizing the same data structure for expression data, pathways and biological processes, RGD will provide a comprehensive research platform which allows users to investigate the conditions under which biological processes are altered and to elucidate the mechanisms of disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cfg.498DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2447497PMC
July 2011

Protein composition of plasminogen activator inhibitor type 1-derived endothelial microparticles.

Shock 2008 Apr;29(4):504-11

Department of Surgery, Medical College of Wisconsin and Children's Research Institute, Milwaukee, WIsconsin 53226, USA.

Endothelial microparticles (EMPs) are small vesicles released from the plasma membrane of endothelial cells in response to cell injury, apoptosis, or activation. Low levels of MPs are shed into the blood from the endothelium, but in some pathologic states, the number of EMPs is elevated. The mechanism of MP formation and the wide-ranging effects of elevated EMPs are poorly understood. Here, we report the protein composition of EMPs derived from human umbilical cord endothelial cells stimulated with plasminogen activator inhibitor type 1 (PAI-1). Two-dimensional gel electrophoresis followed by mass spectrometry identified 58 proteins, of which some were verified by Western blot analysis. Gene Ontology database searches revealed that proteins identified on PAI-1-derived EMPs are highly diverse. Endothelial microparticles are composed of proteins from different cellular components that exhibit multiple molecular functions and are involved in a variety of biological processes. Important insight is provided into the generation and protein composition of PAI-1-derived EMPs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1097/shk.0b013e3181454898DOI Listing
April 2008

Comparative proteomic analysis of PAI-1 and TNF-alpha-derived endothelial microparticles.

Proteomics 2008 Jun;8(12):2430-46

Department of Surgery, Medical College of Wisconsin and Children's Research Institute, Milwaukee, WI 53226, USA.

Endothelium-derived microparticles (EMPs) are small vesicles released from endothelial cells in response to cell injury, apoptosis, or activation. Elevated concentrations of EMPs have been associated with many inflammatory and vascular diseases. EMPs also mediate long range signaling and alter downstream cell function. Unfortunately, the molecular and cellular basis of microparticle production and downstream cell function is poorly understood. We hypothesize that EMPs generated by different agonists will produce distinct populations of EMPs with unique protein compositions. To test this hypothesis, different EMP populations were generated from human umbilical vein endothelial cells by stimulation with plasminogen activator inhibitor type 1 (PAI-1) or tumor necrosis factor-alpha (TNF-alpha) and subjected to proteomic analysis by LC/MS. We identified 432 common proteins in all EMP populations studied. Also identified were 231 proteins unique to control EMPs, 104 proteins unique to PAI-1 EMPs and 70 proteins unique to TNF-alpha EMPs. Interestingly, variations in protein abundance were found among many of the common EMP proteins, suggesting that differences exist between EMPs on a relative scale. Finally, gene ontology (GO) and KEGG pathway analysis revealed many functional similarities and few differences between the EMP populations studied. In summary, our results clearly indicate that EMPs generated by PAI-1 and TNF-alpha produce EMPs with overlapping but distinct protein compositions. These observations provide fundamental insight into the mechanisms regulating the production of these particles and their physiological role in numerous diseases.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/pmic.200701029DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4753841PMC
June 2008

What everybody should know about the rat genome and its online resources.

Nat Genet 2008 May;40(5):523-7

Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin 53226, USA.

It has been four years since the original publication of the draft sequence of the rat genome. Five groups are now working together to assemble, annotate and release an updated version of the rat genome. As the prevailing model for physiology, complex disease and pharmacological studies, there is an acute need for the rat's genomic resources to keep pace with the rat's prominence in the laboratory. In this commentary, we describe the current status of the rat genome sequence and the plans for its impending 'upgrade'. We then cover the key online resources providing access to the rat genome, including the new SNP views at Ensembl, the RefSeq and Genes databases at the US National Center for Biotechnology Information, Genome Browser at the University of California Santa Cruz and the disease portals for cardiovascular disease and obesity at the Rat Genome Database.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng0508-523DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2505193PMC
May 2008

Progress and prospects in rat genetics: a community view.

Nat Genet 2008 May;40(5):516-22

Medical Research Council Clinical Sciences Centre and Imperial College London, Du Cane Road, London W12 0NN, UK.

The rat is an important system for modeling human disease. Four years ago, the rich 150-year history of rat research was transformed by the sequencing of the rat genome, ushering in an era of exceptional opportunity for identifying genes and pathways underlying disease phenotypes. Genome-wide association studies in human populations have recently provided a direct approach for finding robust genetic associations in common diseases, but identifying the precise genes and their mechanisms of action remains problematic. In the context of significant progress in rat genomic resources over the past decade, we outline achievements in rat gene discovery to date, show how these findings have been translated to human disease, and document an increasing pace of discovery of new disease genes, pathways and mechanisms. Finally, we present a set of principles that justify continuing and strengthening genetic studies in the rat model, and further development of genomic infrastructure for rat research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.147DOI Listing
May 2008

PubSearch and PubFetch: a simple management system for semiautomated retrieval and annotation of biological information from the literature.

Curr Protoc Bioinformatics 2006 Mar;Chapter 9:Unit9.7

Carnegie Institution, Stanford, California, USA.

For most systems in biology, a large body of literature exists that describes the complexity of the system based on experimental results. Manual review of this literature to extract targeted information into biological databases is difficult and time consuming. To address this problem, we developed PubSearch and PubFetch, which store literature, keyword, and gene information in a relational database, index the literature with keywords and gene names, and provide a Web user interface for annotating the genes from experimental data found in the associated literature. A set of protocols is provided in this unit for installing, populating, running, and using PubSearch and PubFetch. In addition, we provide support protocols for performing controlled vocabulary annotations. Intended users of PubSearch and PubFetch are database curators and biology researchers interested in tracking the literature and capturing information about genes of interest in a more effective way than with conventional spreadsheets and lab notebooks.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/0471250953.bi0907s13DOI Listing
March 2006

Exploring phenotypic data at the rat genome database.

Curr Protoc Bioinformatics 2006 Jul;Chapter 1:Unit 1.14

Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.

The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have direct relevance to human-based research. The Rat Genome Database (RGD, http://rgd.mcw.edu) is a model-organism database that provides access to wide variety of curated rat data such as genes and their homologs, quantitative trait loci, phenotypes, comparative mapping, and genome analysis. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat. We show how to make associations with the genome and use comparative tools to link the rat with human and mouse in order to integrate results from these three species of critical biomedical importance.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/0471250953.bi0114s14DOI Listing
July 2006

Interoperability with Moby 1.0--it's better than sharing your toothbrush!

Brief Bioinform 2008 May 31;9(3):220-31. Epub 2008 Jan 31.

The BioMoby project was initiated in 2001 from within the model organism database community. It aimed to standardize methodologies to facilitate information exchange and access to analytical resources, using a consensus driven approach. Six years later, the BioMoby development community is pleased to announce the release of the 1.0 version of the interoperability framework, registry Application Programming Interface and supporting Perl and Java code-bases. Together, these provide interoperable access to over 1400 bioinformatics resources worldwide through the BioMoby platform, and this number continues to grow. Here we highlight and discuss the features of BioMoby that make it distinct from other Semantic Web Service and interoperability initiatives, and that have been instrumental to its deployment and use by a wide community of bioinformatics service providers. The standard, client software, and supporting code libraries are all freely available at http://www.biomoby.org/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbn003DOI Listing
May 2008

The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts.

Authors:
Chisato Yamasaki Katsuhiko Murakami Yasuyuki Fujii Yoshiharu Sato Erimi Harada Jun-ichi Takeda Takayuki Taniya Ryuichi Sakate Shingo Kikugawa Makoto Shimada Motohiko Tanino Kanako O Koyanagi Roberto A Barrero Craig Gough Hong-Woo Chun Takuya Habara Hideki Hanaoka Yosuke Hayakawa Phillip B Hilton Yayoi Kaneko Masako Kanno Yoshihiro Kawahara Toshiyuki Kawamura Akihiro Matsuya Naoki Nagata Kensaku Nishikata Akiko Ogura Noda Shin Nurimoto Naomi Saichi Hiroaki Sakai Ryoko Sanbonmatsu Rie Shiba Mami Suzuki Kazuhiko Takabayashi Aiko Takahashi Takuro Tamura Masayuki Tanaka Susumu Tanaka Fusano Todokoro Kaori Yamaguchi Naoyuki Yamamoto Toshihisa Okido Jun Mashima Aki Hashizume Lihua Jin Kyung-Bum Lee Yi-Chueh Lin Asami Nozaki Katsunaga Sakai Masahito Tada Satoru Miyazaki Takashi Makino Hajime Ohyanagi Naoki Osato Nobuhiko Tanaka Yoshiyuki Suzuki Kazuho Ikeo Naruya Saitou Hideaki Sugawara Claire O'Donovan Tamara Kulikova Eleanor Whitfield Brian Halligan Mary Shimoyama Simon Twigger Kei Yura Kouichi Kimura Tomohiro Yasuda Tetsuo Nishikawa Yutaka Akiyama Chie Motono Yuri Mukai Hideki Nagasaki Makiko Suwa Paul Horton Reiko Kikuno Osamu Ohara Doron Lancet Eric Eveno Esther Graudens Sandrine Imbeaud Marie Anne Debily Yoshihide Hayashizaki Clara Amid Michael Han Andreas Osanger Toshinori Endo Michael A Thomas Mika Hirakawa Wojciech Makalowski Mitsuteru Nakao Nam-Soon Kim Hyang-Sook Yoo Sandro J De Souza Maria de Fatima Bonaldo Yoshihito Niimura Vladimir Kuryshev Ingo Schupp Stefan Wiemann Matthew Bellgard Masafumi Shionyu Libin Jia Danielle Thierry-Mieg Jean Thierry-Mieg Lukas Wagner Qinghua Zhang Mitiko Go Shinsei Minoshima Masafumi Ohtsubo Kousuke Hanada Peter Tonellato Takao Isogai Ji Zhang Boris Lenhard Sangsoo Kim Zhu Chen Ursula Hinz Anne Estreicher Kenta Nakai Izabela Makalowska Winston Hide Nicola Tiffin Laurens Wilming Ranajit Chakraborty Marcelo Bento Soares Maria Luisa Chiusano Yutaka Suzuki Charles Auffray Yumi Yamaguchi-Kabata Takeshi Itoh Teruyoshi Hishiki Satoshi Fukuchi Ken Nishikawa Sumio Sugano Nobuo Nomura Yoshio Tateno Tadashi Imanishi Takashi Gojobori

Nucleic Acids Res 2008 Jan 18;36(Database issue):D793-9. Epub 2007 Dec 18.

Japan Biological Information Research Center, Japan Biological Informatics Consortium, Japan.

Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of full-length cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkm999DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2238988PMC
January 2008

Structures of proteins of biomedical interest from the Center for Eukaryotic Structural Genomics.

J Struct Funct Genomics 2007 Sep 6;8(2-3):73-84. Epub 2007 Sep 6.

Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA.

The Center for Eukaryotic Structural Genomics (CESG) produces and solves the structures of proteins from eukaryotes. We have developed and operate a pipeline to both solve structures and to test new methodologies. Both NMR and X-ray crystallography methods are used for structure solution. CESG chooses targets based on sequence dissimilarity to known structures, medical relevance, and nominations from members of the scientific community. Many times proteins qualify in more than one of these categories. Here we review some of the structures that have connections to human health and disease.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10969-007-9023-6DOI Listing
September 2007

The Rat Genome Database, update 2007--easing the path from disease to data and back again.

Nucleic Acids Res 2007 Jan 6;35(Database issue):D658-62. Epub 2006 Dec 6.

Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.

The Rat Genome Database (RGD, http://rgd.mcw.edu) is one of the core resources for rat genomics and recent developments have focused on providing support for disease-based research using the rat model. Recognizing the importance of the rat as a disease model we have employed targeted curation strategies to curate genes, QTL and strain data for neurological and cardiovascular disease areas. This work has centered on rat but also includes data for mouse and human to create 'disease portals' that provide a unified view of the genes, QTL and strain models for these diseases across the three species. The disease curation efforts combined with normal curation activities have served to greatly increase the content of the database, particularly for biological information, including gene ontology, disease, pathway and phenotype ontology annotations. In addition to improving the features and database content, community outreach has been expanded to demonstrate how investigators can leverage the resources at RGD to facilitate their research and to elicit suggestions and needs for future developments. We have published a number of papers that provide additional information on the ontology annotations and the tools at RGD for data mining and analysis to better enable researchers to fully utilize the database.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkl988DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1761441PMC
January 2007

A new method for identifying informative genetic markers in selectively bred rats.

Mamm Genome 2005 Oct 29;16(10):784-91. Epub 2005 Oct 29.

Department of Human Genetics, Emory University, Whitehead 301, 615 Michael Street, Atlanta, Georgia 30322, USA.

Microsatellite length polymorphisms are useful for the mapping of heritable traits in rats. Over 4000 such microsatellites have been characterized for 48 inbred rat strains and used successfully to map phenotypes that differ between strains. At present, however, it is difficult to use this microsatellite database for mapping phenotypes in selectively bred rats of unknown genotype derived from outbred populations because it is not immediately obvious which markers might differ between strains and be informative. We predicted that markers represented by many alleles among the known inbred rat strains would also be most likely to differ between selectively bred strains derived from outbred populations. Here we describe the development and successful application of a new genotyping tool (HUMMER) that assigns "heterozygosity" (Het) and "uncertainty" (Unc) scores to each microsatellite marker that corresponds to its degree of heterozygosity among the 48 genotyped inbred strains. We tested the efficiency of HUMMER on two rat strains that were selectively bred from an outbred Sprague-Dawley stock for either high or low activity in the forced swim test (SwHi rats and SwLo rats, respectively). We found that the markers with high Het and Unc scores allowed the efficient selection of markers that differed between SwHi and SwLo rats, while markers with low Het and Unc scores typically identified markers that did not differ between strains. Thus, picking markers based on Het and Unc scores is a valuable method for identifying informative microsatellite markers in selectively bred rodent strains derived from outbred populations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00335-005-0047-6DOI Listing
October 2005

Tools and strategies for physiological genomics: the Rat Genome Database.

Physiol Genomics 2005 Oct 16;23(2):246-56. Epub 2005 Aug 16.

Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.

The broad goal of physiological genomics research is to link genes to their functions using appropriate experimental and computational techniques. Modern genomics experiments enable the generation of vast quantities of data, and interpretation of this data requires the integration of information derived from many diverse sources. Computational biology and bioinformatics offer the ability to manage and channel this information torrent. The Rat Genome Database (RGD; http://rgd.mcw.edu) has developed computational tools and strategies specifically supporting the goal of linking genes to their functional roles in rat and, using comparative genomics, to human and mouse. We present an overview of the database with a focus on these unique computational tools and describe strategies for the use of these resources in the area of physiological genomics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1152/physiolgenomics.00040.2005DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4505745PMC
October 2005

DeNovoID: a web-based tool for identifying peptides from sequence and mass tags deduced from de novo peptide sequencing by mass spectroscopy.

Nucleic Acids Res 2005 Jul;33(Web Server issue):W376-81

Bioinformatics Research Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53213, USA.

One of the core activities of high-throughput proteomics is the identification of peptides from mass spectra. Some peptides can be identified using spectral matching programs like Sequest or Mascot, but many spectra do not produce high quality database matches. De novo peptide sequencing is an approach to determine partial peptide sequences for some of the unidentified spectra. A drawback of de novo peptide sequencing is that it produces a series of ordered and disordered sequence tags and mass tags rather than a complete, non-degenerate peptide amino acid sequence. This incomplete data is difficult to use in conventional search programs such as BLAST or FASTA. DeNovoID is a program that has been specifically designed to use degenerate amino acid sequence and mass data derived from MS experiments to search a peptide database. Since the algorithm employed depends on the amino acid composition of the peptide and not its sequence, DeNovoID does not have to consider all possible sequences, but rather a smaller number of compositions consistent with a spectrum. DeNovoID also uses a geometric indexing scheme that reduces the number of calculations required to determine the best peptide match in the database. DeNovoID is available at http://proteomics.mcw.edu/denovoid.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gki461DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1160222PMC
July 2005

Simultaneous quantification and identification using 18O labeling with an ion trap mass spectrometer and the analysis software application "ZoomQuant".

J Am Soc Mass Spectrom 2005 Jun 15;16(6):916-25. Epub 2005 Apr 15.

National Proteomics Research Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53213, USA.

Stable isotope labeling with (18)O is a promising technique for obtaining both qualitative and quantitative information from a single differential protein expression experiment. The small 4 Da mass shift produced by incorporation of two molecules of (18)O, and the lack of available methods for automated quantification of large data sets has limited the use of this approach with electrospray ionization-ion trap (ESI-IT) mass spectrometers. In this paper, we describe a method of acquiring ESI-IT mass spectrometric data that provides accurate calculation of relative ratios of peptides that have been differentially labeled using(18)O. The method utilizes zoom scans to provide high resolution data. This allows for accurate calculation of (18)O/(16)O ratios for peptides even when as much as 50% of a (18)O labeled peptide is present as the singly labeled species. The use of zoom scan data also provides sufficient resolution for calculating accurate ratios for peptides of +3 and lower charge states. Sequence coverage is comparable to that obtained with data acquisition modes that use only MS and MS/MS scans. We have employed a newly developed analysis software tool, ZoomQuant, which allows for the automated analysis of large data sets. We show that the combination of zoom scan data acquisition and analysis using ZoomQuant provides calculation of isotopic ratios accurate to approximately 21%. This compares well with data produced from (18)O labeling experiments using time of flight (TOF) and Fourier transform-ion cyclotron resonance (FT-ICR) MS instruments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jasms.2005.02.024DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2771642PMC
June 2005

Using Comparative Genomics to Leverage Animal Models in the Identification of Cancer Genes. Examples in Prostate Cancer.

Cancer Genomics Proteomics 2005 May-Jun;2(3):137-144. Epub 2005 May 1.

Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wisconsin, U.S.A.

The identification of cancer biomarkers that will predict susceptibility to disease and subsequent clinical outcome are key components of future genomics-based tailored medical care. Animal models of disease provide a rich resource for the identification of potential cancer biomarkers. Animal models of prostate cancer in particular offer the potential to identify cancer genes associated with dietary and environmental factors. The key issue is the timely and efficient identification of candidate genes that are likely to impact on human prostate cancer. Here, we demonstrate comparative genomics-based methods for the identification of candidate genes in animal models that are associated with human chromosomal regions implicated in prostate cancer. Using publicly available bioinformatics tools, comparisons can be made between cancer-specific datasets, genomic sequencing data and cross-species comparative maps to identify potential cancer biomarkers. This process is demonstrated by using rat models of prostate cancer to identify candidate human prostate cancer genes. Genes identified through these techniques can be screened as biomarkers for response to chemopreventive agents, as well as being used in transgenic or knockout mice to engineer better animal models of human prostate cancer. The bioinformatics techniques outlined here can be used to leverage genomic data from any animal cancer model for use in the study and treatment of human cancer.
View Article and Find Full Text PDF

Download full-text PDF

Source
May 2005

ZoomQuant: an application for the quantitation of stable isotope labeled peptides.

J Am Soc Mass Spectrom 2005 Mar 13;16(3):302-6. Epub 2005 Jan 13.

Bioinformatics, Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wisconsin 53213, USA.

The main goal of comparative proteomics is the quantitation of the differences in abundance of many proteins between two different biological samples in a single experiment. By differentially labeling the peptides from the two samples and combining them in a single analysis, relative ratios of protein abundance can be accurately determined. Protease catalyzed (18)O exchange is a simple method to differentially label peptides, but the lack of robust software tools to analyze the data from mass spectra of (18)O labeled peptides generated by common ion trap mass spectrometers has been a limitation. ZoomQuant is a stand-alone computational tool that analyzes the mass spectra of (18)O labeled peptides from ion trap instruments and determines relative abundance ratios between two samples. Starting with a filtered list of candidate peptides that have been successfully identified by Sequest, ZoomQuant analyzes the isotopic forms of the peptides using high-resolution zoom scan spectrum data. The theoretical isotope distribution is determined from the peptide sequence and is used to deconvolute the peak areas associated with the unlabeled, partially labeled, and fully labeled species. The ratio between the labeled and unlabeled peptides is then calculated using several different methods. ZoomQuant's graphical user interface allows the user to view and adjust the parameters for peak calling and quantitation and select which peptides should contribute to the overall abundance ratio calculation. Finally, ZoomQuant generates a summary report of the relative abundance of the peptides identified in the two samples.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jasms.2004.11.014DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2793075PMC
March 2005

The Rat Genome Database (RGD): developments towards a phenome database.

Nucleic Acids Res 2005 Jan;33(Database issue):D485-91

Human and Molecular Genetics Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53213, USA.

The Rat Genome Database (RGD) (http://rgd.mcw.edu) aims to meet the needs of its community by providing genetic and genomic infrastructure while also annotating the strengths of rat research: biochemistry, nutrition, pharmacology and physiology. Here, we report on RGD's development towards creating a phenome database. Recent developments can be categorized into three groups. (i) Improved data collection and integration to match increased volume and biological scope of research. (ii) Knowledge representation augmented by the implementation of a new ontology and annotation system. (iii) The addition of quantitative trait loci data, from rat, mouse and human to our advanced comparative genomics tools, as well as the creation of new, and enhancement of existing, tools to enable users to efficiently browse and survey research data. The emphasis is on helping researchers find genes responsible for disease through the use of rat models. These improvements, combined with the genomic sequence of the rat, have led to a successful year at RGD with over two million page accesses that represent an over 4-fold increase in a year. Future plans call for increased annotation of biological information on the rat elucidated through its use as a model for human pathobiology. The continued development of toolsets will facilitate integration of these data into the context of rat genomic sequence, as well as allow comparisons of biological and genomic data with the human genomic sequence and of an increasing number of organisms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gki050DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC540004PMC
January 2005

Peptide identification using peptide amino acid attribute vectors.

J Proteome Res 2004 Jul-Aug;3(4):813-20

Bioinformatics Research Center, Medical College of Wisconsin, Milwaukee, Wisconsin 53213, USA.

We describe the theoretical basis for a peptide identification method wherein peptides are represented as vectors based on their amino acid composition and grouped into clusters. Unknown peptides are identified by finding the database cluster and peptide entries with the shortest Euclidian distance. We demonstrate that the amino acid composition of peptides is virtually as informative as the sequence and allows rapid peptide identification more accurately than peptide mass alone.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/pr0499444DOI Listing
January 2005

ProMoST (Protein Modification Screening Tool): a web-based tool for mapping protein modifications on two-dimensional gels.

Nucleic Acids Res 2004 Jul;32(Web Server issue):W638-44

Bioinformatics Research Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53213, USA.

ProMoST is a flexible web tool that calculates the effect of single or multiple posttranslational modifications (PTMs) on protein isoelectric point (pI) and molecular weight and displays the calculated patterns as two-dimensional (2D) gel images. PTMs of proteins control many biological regulatory and signaling mechanisms and 2D gel electrophoresis is able to resolve many PTM-induced isoforms, such as those due to phosphorylation, acetylation, deamination, alkylation, cysteine oxidation or tyrosine nitration. These modifications cause changes in the pI of the protein by adding, removing or changing titratable groups. Proteins differ widely in buffering capacity and pI and therefore the same PTMs may give rise to quite different patterns of pI shifts in different proteins. It is impossible by visual inspection of a pattern of spots on a gel to determine which modifications are most likely to be present. The patterns of PTM shifts for different proteins can be calculated and are often quite distinctive. The theoretical gel images produced by ProMoST can be compared to the experimental 2D gel results to implicate probable PTMs and focus efforts on more detailed study of modified proteins. ProMoST has been implemented as cgi script in Perl available on a WWW server at http://proteomics.mcw.edu/promost.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkh356DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC441494PMC
July 2004

ChromSorter PC: a database of chromosomal regions associated with human prostate cancer.

BMC Genomics 2004 Apr 28;5(1):27. Epub 2004 Apr 28.

Department of Pathology, Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA.

Background: Our increasing use of genetic and genomic strategies to understand human prostate cancer means that we need access to simplified and integrated information present in the associated biomedical literature. In particular, microarray gene expression studies and associated genetic mapping studies in prostate cancer would benefit from a generalized understanding of the prior work associated with this disease. This would allow us to focus subsequent laboratory studies to genomic regions already related to prostate cancer by other scientific methods. We have developed a database of prostate cancer related chromosomal information from the existing biomedical literature. The input material was based on a broad literature search with subsequent hand annotation of information relevant to prostate cancer.

Description: The database was then analyzed for identifiable trends in the whole scale literature. We have used this database, named ChromSorter PC, to present graphical summaries of chromosomal regions associated with prostate cancer broken down by age, ethnicity and experimental method. In addition we have placed the database information on the human genome using the Generic Genome Browser tool that allows the visualization of the data with respect to user generated datasets.

Conclusions: We have used this database as an additional dataset for the filtering of genes identified through genetics and genomics studies as warranting follow-up validation studies. We would like to make this dataset publicly available for use by other groups. Using the Genome Browser allows for the graphical analysis of the associated data http://www.prostategenomics.org/datamining/chrom-sorter_pc.html. Additional material from the database can be obtained by contacting the authors (mdatta@mcw.edu).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-5-27DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC416659PMC
April 2004

Integrative genomics: in silico coupling of rat physiology and complex traits with mouse and human data.

Genome Res 2004 Apr;14(4):651-60

Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, USA.

Integration of the large variety of genome maps from several organisms provides the mechanism by which physiological knowledge obtained in model systems such as the rat can be projected onto the human genome to further the research on human disease. The release of the rat genome sequence provides new information for studies using the rat model and is a key reference against which existing and new rat physiological results can be aligned. Previously, we described comparative maps of the rat, mouse, and human based on EST sequence comparisons combined with radiation hybrid maps. Here, we use new data and introduce the Integrated Genomics Environment, an extensive database of curated and integrated maps, markers, and physiological results. These results are integrated by using VCMapview, a java-based map integration and visualization tool. This unique environment allows researchers to relate results from cytogenetic, genetic, and radiation hybrid studies to the genome sequence and compare regions of interest between human, mouse, and rat. Integrating rat physiology with mouse genetics and clinical results from human by using the respective genomes provides a novel route to capitalize on comparative genomics and the strengths of model organism biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.1974504DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC383309PMC
April 2004