Publications by authors named "Igor Rodchenkov"

10 Publications

  • Page 1 of 1

Pathway Commons 2019 Update: integration, analysis and exploration of pathway data.

Nucleic Acids Res 2020 01;48(D1):D489-D497

cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA.

Pathway Commons (https://www.pathwaycommons.org) is an integrated resource of publicly available information about biological pathways including biochemical reactions, assembly of biomolecular complexes, transport and catalysis events and physical interactions involving proteins, DNA, RNA, and small molecules (e.g. metabolites and drug compounds). Data is collected from multiple providers in standard formats, including the Biological Pathway Exchange (BioPAX) language and the Proteomics Standards Initiative Molecular Interactions format, and then integrated. Pathway Commons provides biologists with (i) tools to search this comprehensive resource, (ii) a download site offering integrated bulk sets of pathway data (e.g. tables of interactions and gene sets), (iii) reusable software libraries for working with pathway information in several programming languages (Java, R, Python and Javascript) and (iv) a web service for programmatically querying the entire dataset. Visualization of pathways is supported using the Systems Biological Graphical Notation (SBGN). Pathway Commons currently contains data from 22 databases with 4794 detailed human biochemical processes (i.e. pathways) and ∼2.3 million interactions. To enhance the usability of this large resource for end-users, we develop and maintain interactive web applications and training materials that enable pathway exploration and advanced analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz946DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7145667PMC
January 2020

Using biological pathway data with paxtools.

PLoS Comput Biol 2013 19;9(9):e1003194. Epub 2013 Sep 19.

Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America.

A rapidly growing corpus of formal, computable pathway information can be used to answer important biological questions including finding non-trivial connections between cellular processes, identifying significantly altered portions of the cellular network in a disease state and building predictive models that can be used for precision medicine. Due to its complexity and fragmented nature, however, working with pathway data is still difficult. We present Paxtools, a Java library that contains algorithms, software components and converters for biological pathways represented in the standard BioPAX language. Paxtools allows scientists to focus on their scientific problem by removing technical barriers to access and analyse pathway information. Paxtools can run on any platform that has a Java Runtime Environment and was tested on most modern operating systems. Paxtools is open source and is available under the Lesser GNU public license (LGPL), which allows users to freely use the code in their software systems with a requirement for attribution. Source code for the current release (4.2.0) can be found in Software S1. A detailed manual for obtaining and using Paxtools can be found in Protocol S1. The latest sources and release bundles can be obtained from biopax.org/paxtools.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1003194DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3777916PMC
April 2014

Pattern search in BioPAX models.

Bioinformatics 2014 Jan 16;30(1):139-40. Epub 2013 Sep 16.

Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA, Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY 10065, USA and Banting and Best Department of Medical Research, The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada.

Motivation: BioPAX is a standard language for representing complex cellular processes, including metabolic networks, signal transduction and gene regulation. Owing to the inherent complexity of a BioPAX model, searching for a specific type of subnetwork can be non-trivial and difficult.

Results: We developed an open source and extensible framework for defining and searching graph patterns in BioPAX models. We demonstrate its use with a sample pattern that captures directed signaling relations between proteins. We provide search results for the pattern obtained from the Pathway Commons database and compare these results with the current data in signaling databases SPIKE and SignaLink. Results show that a pattern search in public pathway data can identify a substantial amount of signaling relations that do not exist in signaling databases.

Availability: BioPAX-pattern software was developed in Java. Source code and documentation is freely available at http://code.google.com/p/biopax-pattern under Lesser GNU Public License.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt539DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866551PMC
January 2014

The BioPAX Validator.

Bioinformatics 2013 Oct 5;29(20):2659-60. Epub 2013 Aug 5.

The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada and Computational Biology Center, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY, USA.

Summary: BioPAX is a community-developed standard language for biological pathway data. A key functionality required for efficient BioPAX data exchange is validation-detecting errors and inconsistencies in BioPAX documents. The BioPAX Validator is a command-line tool, Java library and online web service for BioPAX that performs >100 classes of consistency checks.

Availability And Implementation: The validator recognizes common syntactic errors and semantic inconsistencies and reports them in a customizable human readable format. It can also automatically fix some errors and normalize BioPAX data. Since its release, the validator has become a critical tool for the pathway informatics community, detecting thousands of errors and helping substantially increase the conformity and uniformity of BioPAX-formatted data. The BioPAX Validator is open source and released under LGPL v3 license. All sources, binaries and documentation can be found at sf.net/p/biopax, and the latest stable version of the web application is available at biopax.org/validator.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt452DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3789551PMC
October 2013

Pathway Commons, a web resource for biological pathway data.

Nucleic Acids Res 2011 Jan 10;39(Database issue):D685-90. Epub 2010 Nov 10.

Computational Biology Center, Memorial Sloan-Kettering Cancer Center 1275 York Avenue, Box 460, New York, NY 10065, USA.

Pathway Commons (http://www.pathwaycommons.org) is a collection of publicly available pathway data from multiple organisms. Pathway Commons provides a web-based interface that enables biologists to browse and search a comprehensive collection of pathways from multiple sources represented in a common language, a download site that provides integrated bulk sets of pathway information in standard or convenient formats and a web service that software developers can use to conveniently query and access all data. Database providers can share their pathway data via a common repository. Pathways include biochemical reactions, complex assembly, transport and catalysis events and physical interactions involving proteins, DNA, RNA, small molecules and complexes. Pathway Commons aims to collect and integrate all public pathway data available in standard formats. Pathway Commons currently contains data from nine databases with over 1400 pathways and 687,000 interactions and will be continually expanded and updated.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkq1039DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013659PMC
January 2011

The BioPAX community standard for pathway data sharing.

Nat Biotechnol 2010 Sep 9;28(9):935-42. Epub 2010 Sep 9.

Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, New York, USA.

Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nbt.1666DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3001121PMC
September 2010

CCancer: a bird's eye view on gene lists reported in cancer-related studies.

Nucleic Acids Res 2010 Jul 6;38(Web Server issue):W118-23. Epub 2010 Jun 6.

Freiburg Center for Data Analysis and Modeling, University of Freiburg, Eckerstr. 1, D-79104 Freiburg, Germany.

CCancer is an automatically collected database of gene lists, which were reported mostly by experimental studies in various biological and clinical contexts. At the moment, the database covers 3369 gene lists extracted from 2644 papers published in approximately 80 peer-reviewed journals. As input, CCancer accepts a gene list. An enrichment analyses is implemented to generate, as output, a highly informative survey over recently published studies that report gene lists, which significantly intersect with the query gene list. A report on gene pairs from the input list which were frequently reported together by other biological studies is also provided. CCancer is freely available at http://mips.helmholtz-muenchen.de/proj/ccancer.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkq515DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2896190PMC
July 2010

PPI spider: a tool for the interpretation of proteomics data in the context of protein-protein interaction networks.

Proteomics 2009 May;9(10):2740-9

GSF National Research Center for Environment and Health, Institute for Bioinformatics, Ingolstädter Landstrasse 1, Neuherberg, Germany.

Recent advances in experimental technologies allow for the detection of a complete cell proteome. Proteins that are expressed at a particular cell state or in a particular compartment as well as proteins with differential expression between various cells states are commonly delivered by many proteomics studies. Once a list of proteins is derived, a major challenge is to interpret the identified set of proteins in the biological context. Protein-protein interaction (PPI) data represents abundant information that can be employed for this purpose. However, these data have not yet been fully exploited due to the absence of a methodological framework that can integrate this type of information. Here, we propose to infer a network model from an experimentally identified protein list based on the available information about the topology of the global PPI network. We propose to use a Monte Carlo simulation procedure to compute the statistical significance of the inferred models. The method has been implemented as a freely available web-based tool, PPI spider (http://mips.helmholtz-muenchen.de/proj/ppispider). To support the practical significance of PPI spider, we collected several hundreds of recently published experimental proteomics studies that reported lists of proteins in various biological contexts. We reanalyzed them using PPI spider and demonstrated that in most cases PPI spider could provide statistically significant hypotheses that are helpful for understanding of the protein list.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/pmic.200800612DOI Listing
May 2009

PLIPS, an automatically collected database of protein lists reported by proteomics studies.

J Proteome Res 2009 Mar;8(3):1193-7

Helmholtz Zentrum München-German Research Center for Environmental Health (GmbH), Institute for Bioinformatics and System Biology, Ingolstadter Landstrasse 1, D-85764 Neuherberg, Germany.

The spectrum of problems covered by proteomics studies range from the discovery of compartment specific cell proteomes to clinical applications, including the identification of diagnostic markers and monitoring the effects of drug treatments. In most cases, the ultimate results of a proteomics study are lists of proteins found to be present (or differentially present) at cell physiological conditions under study. Normally, the results are published directly in the article in one or several tables. In many cases, this type of information remains disseminated in hundreds of proteomics publications. We have developed a Web mining tool which allows the collection of this information by searching through full text papers and automatically selecting tables, which report a list of protein identifiers. By searching through major proteomics journals, we have collected approximately 800 independent studies published recently, which reported about 1000 different protein lists. On the basis of this data, we developed a computational tool PLIPS (Protein Lists Identified in Proteomics Studies). PLIPS accepts as input a list of protein/gene identifiers. With the use of statistical analyses, PLIPS infers recently published proteomics studies, which report protein lists that significantly intersect with a query list. PLIPS is a freely available Web-based tool ( http://mips.helmholtz-muenchen.de/proj/plips ).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/pr800804dDOI Listing
March 2009

Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information.

Bioinformatics 2008 Mar 3;24(5):621-8. Epub 2008 Jan 3.

Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Neuherberg, Germany.

Motivation: Accurate automatic assignment of protein functions remains a challenge for genome annotation. We have developed and compared the automatic annotation of four bacterial genomes employing a 5-fold cross-validation procedure and several machine learning methods.

Results: The analyzed genomes were manually annotated with FunCat categories in MIPS providing a gold standard. Features describing a pair of sequences rather than each sequence alone were used. The descriptors were derived from sequence alignment scores, InterPro domains, synteny information, sequence length and calculated protein properties. Following training we scored all pairs from the validation sets, selected a pair with the highest predicted score and annotated the target protein with functional categories of the prototype protein. The data integration using machine-learning methods provided significantly higher annotation accuracy compared to the use of individual descriptors alone. The neural network approach showed the best performance. The descriptors derived from the InterPro domains and sequence similarity provided the highest contribution to the method performance. The predicted annotation scores allow differentiation of reliable versus non-reliable annotations. The developed approach was applied to annotate the protein sequences from 180 complete bacterial genomes.

Availability: The FUNcat Annotation Tool (FUNAT) is available on-line as Web Services at http://mips.gsf.de/proj/funat.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btm633DOI Listing
March 2008