Publications by authors named "Stefan Verhoeven"

19 Publications

  • Page 1 of 1

Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships.

PLoS Comput Biol 2021 02 16;17(2):e1008724. Epub 2021 Feb 16.

Bioinformatics Group, Wageningen University, Wageningen, the Netherlands.

Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities have been described, little development of alternative scores has been undertaken. Here, we introduce Spec2Vec, a novel spectral similarity score inspired by a natural language processing algorithm-Word2Vec. Spec2Vec learns fragmental relationships within a large set of spectral data to derive abstract spectral embeddings that can be used to assess spectral similarities. Using data derived from GNPS MS/MS libraries including spectra for nearly 13,000 unique molecules, we show how Spec2Vec scores correlate better with structural similarity than cosine-based scores. We demonstrate the advantages of Spec2Vec in library matching and molecular networking. Spec2Vec is computationally more scalable allowing structural analogue searches in large databases within seconds.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1008724DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7909622PMC
February 2021

sv-callers: a highly portable parallel workflow for structural variant detection in whole-genome sequence data.

PeerJ 2020 6;8:e8214. Epub 2020 Jan 6.

Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands.

Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases including cancer. Despite the advances in whole genome sequencing, comprehensive and accurate detection of SVs in short-read data still poses some practical and computational challenges. We present , a highly portable workflow that enables parallel execution of multiple SV detection tools, as well as provide users with example analyses of detected SV callsets in a Jupyter Notebook. This workflow supports easy deployment of software dependencies, configuration and addition of new analysis tools. Moreover, porting it to different computing systems requires minimal effort. Finally, we demonstrate the utility of the workflow by performing both somatic and germline SV analyses on different high-performance computing systems.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.8214DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6951283PMC
January 2020

3D-e-Chem: Structural Cheminformatics Workflows for Computer-Aided Drug Discovery.

ChemMedChem 2018 03 14;13(6):614-626. Epub 2018 Feb 14.

Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.

eScience technologies are needed to process the information available in many heterogeneous types of protein-ligand interaction data and to capture these data into models that enable the design of efficacious and safe medicines. Here we present scientific KNIME tools and workflows that enable the integration of chemical, pharmacological, and structural information for: i) structure-based bioactivity data mapping, ii) structure-based identification of scaffold replacement strategies for ligand design, iii) ligand-based target prediction, iv) protein sequence-based binding site identification and ligand repurposing, and v) structure-based pharmacophore comparison for ligand repurposing across protein families. The modular setup of the workflows and the use of well-established standards allows the re-use of these protocols and facilitates the design of customized computer-aided drug discovery workflows.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/cmdc.201700754DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5900740PMC
March 2018

A Structural Framework for GPCR Chemogenomics: What's In a Residue Number?

Methods Mol Biol 2018 ;1705:73-113

Department of Medicinal Chemistry, Amsterdam Institute for Molecules Medicines and Systems, Vrije Universiteit Amsterdam, De Boelelaan 1108, 1081 HV, Amsterdam, The Netherlands.

The recent surge of crystal structures of G protein-coupled receptors (GPCRs), as well as comprehensive collections of sequence, structural, ligand bioactivity, and mutation data, has enabled the development of integrated chemogenomics workflows for this important target family. This chapter will focus on cross-family and cross-class studies of GPCRs that have pinpointed the need for, and the implementation of, a generic numbering scheme for referring to specific structural elements of GPCRs. Sequence- and structure-based numbering schemes for different receptor classes will be introduced and the remaining caveats will be discussed. The use of these numbering schemes has facilitated many chemogenomics studies such as consensus binding site definition, binding site comparison, ligand repurposing (e.g. for orphan receptors), sequence-based pharmacophore generation for homology modeling or virtual screening, and class-wide chemogenomics studies of GPCRs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-4939-7465-8_4DOI Listing
July 2018

3D-e-Chem-VM: Structural Cheminformatics Research Infrastructure in a Freely Available Virtual Machine.

J Chem Inf Model 2017 02 14;57(2):115-121. Epub 2017 Feb 14.

Division of Medicinal Chemistry, Faculty of Sciences, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS), Vrije Universiteit Amsterdam , 1081 HZ Amsterdam, The Netherlands.

3D-e-Chem-VM is an open source, freely available Virtual Machine ( http://3d-e-chem.github.io/3D-e-Chem-VM/ ) that integrates cheminformatics and bioinformatics tools for the analysis of protein-ligand interaction data. 3D-e-Chem-VM consists of software libraries, and database and workflow tools that can analyze and combine small molecule and protein structural information in a graphical programming environment. New chemical and biological data analytics tools and workflows have been developed for the efficient exploitation of structural and pharmacological protein-ligand interaction data from proteomewide databases (e.g., ChEMBLdb and PDB), as well as customized information systems focused on, e.g., G protein-coupled receptors (GPCRdb) and protein kinases (KLIFS). The integrated structural cheminformatics research infrastructure compiled in the 3D-e-Chem-VM enables the design of new approaches in virtual ligand screening (Chemdb4VS), ligand-based metabolism prediction (SyGMa), and structure-based protein binding site comparison and bioisosteric replacement for ligand design (KRIPOdb).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.6b00686DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5342320PMC
February 2017

In silico prediction and automatic LC-MS(n) annotation of green tea metabolites in urine.

Anal Chem 2014 May 29;86(10):4767-74. Epub 2014 Apr 29.

Laboratory of Biochemistry, Wageningen University , Dreijenlaan 3, 6703 HA, Wageningen, The Netherlands.

The colonic breakdown and human biotransformation of small molecules present in food can give rise to a large variety of potentially bioactive metabolites in the human body. However, the absence of reference data for many of these components limits their identification in complex biological samples, such as plasma and urine. We present an in silico workflow for automatic chemical annotation of metabolite profiling data from liquid chromatography coupled with multistage accurate mass spectrometry (LC-MS(n)), which we used to systematically screen for the presence of tea-derived metabolites in human urine samples after green tea consumption. Reaction rules for intestinal degradation and human biotransformation were systematically applied to chemical structures of 75 green tea components, resulting in a virtual library of 27,245 potential metabolites. All matching precursor ions in the urine LC-MS(n) data sets, as well as the corresponding fragment ions, were automatically annotated by in silico generated (sub)structures. The results were evaluated based on 74 previously identified urinary metabolites and lead to the putative identification of 26 additional green tea-derived metabolites. A total of 77% of all annotated metabolites were not present in the Pubchem database, demonstrating the benefit of in silico metabolite prediction for the automatic annotation of yet unknown metabolites in LC-MS(n) data from nutritional metabolite profiling experiments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/ac403875bDOI Listing
May 2014

Automatic Compound Annotation from Mass Spectrometry Data Using MAGMa.

Mass Spectrom (Tokyo) 2014 2;3(Spec Iss 2):S0033. Epub 2014 Jul 2.

Netherlands eScience Center.

The MAGMa software for automatic annotation of mass spectrometry based fragmentation data was applied to 16 MS/MS datasets of the CASMI 2013 contest. Eight solutions were submitted in category 1 (molecular formula assignments) and twelve in category 2 (molecular structure assignment). The MS/MS peaks of each challenge were matched with in silico generated substructures of candidate molecules from PubChem, resulting in penalty scores that were used for candidate ranking. In 6 of the 12 submitted solutions in category 2, the correct chemical structure obtained the best score, whereas 3 molecules were ranked outside the top 5. All top ranked molecular formulas submitted in category 1 were correct. In addition, we present MAGMa results generated retrospectively for the remaining challenges. Successful application of the MAGMa algorithm required inclusion of the relevant candidate molecules, application of the appropriate mass tolerance and a sufficient degree of in silico fragmentation of the candidate molecules. Furthermore, the effect of the exhaustiveness of the candidate lists and limitations of substructure based scoring are discussed.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.5702/massspectrometry.S0033DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321337PMC
January 2016

Automatic chemical structure annotation of an LC-MS(n) based metabolic profile from green tea.

Anal Chem 2013 Jun 31;85(12):6033-40. Epub 2013 May 31.

Laboratory of Biochemistry, Wageningen University, Dreijenlaan 3, 6703 HA Wageningen, The Netherlands.

Liquid chromatography coupled with multistage accurate mass spectrometry (LC-MS(n)) can generate comprehensive spectral information of metabolites in crude extracts. To support structural characterization of the many metabolites present in such complex samples, we present a novel method ( http://www.emetabolomics.org/magma ) to automatically process and annotate the LC-MS(n) data sets on the basis of candidate molecules from chemical databases, such as PubChem or the Human Metabolite Database. Multistage MS(n) spectral data is automatically annotated with hierarchical trees of in silico generated substructures of candidate molecules to explain the observed fragment ions and alternative candidates are ranked on the basis of the calculated matching score. We tested this method on an untargeted LC-MS(n) (n ≤ 3) data set of a green tea extract, generated on an LC-LTQ/Orbitrap hybrid MS system. For the 623 spectral trees obtained in a single LC-MS(n) run, a total of 116,240 candidate molecules with monoisotopic masses matching within 5 ppm mass accuracy were retrieved from the PubChem database, ranging from 4 to 1327 candidates per molecular ion. The matching scores were used to rank the candidate molecules for each LC-MS(n) component. The median and third quartile fractional ranks for 85 previously identified tea compounds were 3.5 and 7.5, respectively. The substructure annotations and rankings provided detailed structural information of the detected components, beyond annotation with elemental formula only. Twenty-four additional components were putatively identified by expert interpretation of the automatically annotated data set, illustrating the potential to support systematic and untargeted metabolite identification.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/ac400861aDOI Listing
June 2013

Identification of new biomarker candidates for glucocorticoid induced insulin resistance using literature mining.

BioData Min 2013 Feb 4;6(1). Epub 2013 Feb 4.

Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P,O, Box 9101, 6500 HB, Nijmegen, The Netherlands.

Unlabelled:

Background: Glucocorticoids are potent anti-inflammatory agents used for the treatment of diseases such as rheumatoid arthritis, asthma, inflammatory bowel disease and psoriasis. Unfortunately, usage is limited because of metabolic side-effects, e.g. insulin resistance, glucose intolerance and diabetes. To gain more insight into the mechanisms behind glucocorticoid induced insulin resistance, it is important to understand which genes play a role in the development of insulin resistance and which genes are affected by glucocorticoids.Medline abstracts contain many studies about insulin resistance and the molecular effects of glucocorticoids and thus are a good resource to study these effects.

Results: We developed CoPubGene a method to automatically identify gene-disease associations in Medline abstracts. We used this method to create a literature network of genes related to insulin resistance and to evaluate the importance of the genes in this network for glucocorticoid induced metabolic side effects and anti-inflammatory processes.With this approach we found several genes that already are considered markers of GC induced IR, such as phosphoenolpyruvate carboxykinase (PCK) and glucose-6-phosphatase, catalytic subunit (G6PC). In addition, we found genes involved in steroid synthesis that have not yet been recognized as mediators of GC induced IR.

Conclusions: With this approach we are able to construct a robust informative literature network of insulin resistance related genes that gave new insights to better understand the mechanisms behind GC induced IR. The method has been set up in a generic way so it can be applied to a wide variety of disease networks.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1756-0381-6-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3577498PMC
February 2013

Substructure-based annotation of high-resolution multistage MS(n) spectral trees.

Rapid Commun Mass Spectrom 2012 Oct;26(20):2461-71

Netherlands eScience Center, Science Park 140, 1098 XG, Amsterdam, The Netherlands.

Rationale: High-resolution multistage MS(n) data contains detailed information that can be used for structural elucidation of compounds observed in metabolomics studies. However, full exploitation of this complex data requires significant analysis efforts by human experts. In silico methods currently used to support data annotation by assigning substructures of candidate molecules are limited to a single level of MS fragmentation.

Methods: We present an extended substructure-based approach which allows annotation of hierarchical spectral trees obtained from high-resolution multistage MS(n) experiments. The algorithm yields a hierarchical tree of substructures of a candidate molecule to explain the fragment peaks observed at consecutive levels of the multistage MS(n) spectral tree. A matching score is calculated that indicates how well the candidate structure can explain the observed hierarchical fragmentation pattern.

Results: The method is applied to MS(n) spectral trees of a set of compounds representing important chemical classes in metabolomics. Based on the calculated score, the correct molecules were successfully prioritized among extensive sets of candidates structures retrieved from the PubChem database.

Conclusions: The results indicate that the inclusion of subsequent levels of fragmentation in the automatic annotation of MS(n) data improves the identification of the correct compounds. We show that, especially in the case of lower mass accuracy, this improvement is not only due to the inclusion of additional fragment ions in the analysis, but also to the specific hierarchical information present in the MS(n) spectral trees. This method may significantly reduce the time required by MS experts to analyze complex MS(n) data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/rcm.6364DOI Listing
October 2012

A prospective cross-screening study on G-protein-coupled receptors: lessons learned in virtual compound library design.

J Med Chem 2012 Jun 23;55(11):5311-25. Epub 2012 May 23.

Computational Drug Discovery Group, Radboud University Nijmegen Medical Centre, Geert Grooteplein, Nijmegen, The Netherlands.

We present the systematic prospective evaluation of a protein-based and a ligand-based virtual screening platform against a set of three G-protein-coupled receptors (GPCRs): the β-2 adrenoreceptor (ADRB2), the adenosine A(2A) receptor (AA2AR), and the sphingosine 1-phosphate receptor (S1PR1). Novel bioactive compounds were identified using a consensus scoring procedure combining ligand-based (frequent substructure ranking) and structure-based (Snooker) tools, and all 900 selected compounds were screened against all three receptors. A striking number of ligands showed affinity/activity for GPCRs other than the intended target, which could be partly attributed to the fuzziness and overlap of protein-based pharmacophore models. Surprisingly, the phosphodiesterase 5 (PDE5) inhibitor sildenafil was found to possess submicromolar affinity for AA2AR. Overall, this is one of the first published prospective chemogenomics studies that demonstrate the identification of novel cross-pharmacology between unrelated protein targets. The lessons learned from this study can be used to guide future virtual ligand design efforts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/jm300280eDOI Listing
June 2012

Snooker: a structure-based pharmacophore generation tool applied to class A GPCRs.

J Chem Inf Model 2011 Sep 6;51(9):2277-92. Epub 2011 Sep 6.

Computational Drug Discovery Group, CMBI, Radboud University Nijmegen, Nijmegen, The Netherlands.

G-protein coupled receptors (GPCRs) are important drug targets for various diseases and of major interest to pharmaceutical companies. The function of individual members of this protein family can be modulated by the binding of small molecules at the extracellular side of the structurally conserved transmembrane (TM) domain. Here, we present Snooker, a structure-based approach to generate pharmacophore hypotheses for compounds binding to this extracellular side of the TM domain. Snooker does not require knowledge of ligands, is therefore suitable for apo-proteins, and can be applied to all receptors of the GPCR protein family. The method comprises the construction of a homology model of the TM domains and prioritization of residues on the probability of being ligand binding. Subsequently, protein properties are converted to ligand space, and pharmacophore features are generated at positions where protein ligand interactions are likely. Using this semiautomated knowledge-driven bioinformatics approach we have created pharmacophore hypotheses for 15 different GPCRs from several different subfamilies. For the beta-2-adrenergic receptor we show that ligand poses predicted by Snooker pharmacophore hypotheses reproduce literature supported binding modes for ∼75% of compounds fulfilling pharmacophore constraints. All 15 pharmacophore hypotheses represent interactions with essential residues for ligand binding as observed in mutagenesis experiments and compound selections based on these hypotheses are shown to be target specific. For 8 out of 15 targets enrichment factors above 10-fold are observed in the top 0.5% ranked compounds in a virtual screen. Additionally, prospectively predicted ligand binding poses in the human dopamine D3 receptor based on Snooker pharmacophores were ranked among the best models in the community wide GPCR dock 2010.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1021/ci200088dDOI Listing
September 2011

ss-TEA: Entropy based identification of receptor specific ligand binding residues from a multiple sequence alignment of class A GPCRs.

BMC Bioinformatics 2011 Aug 10;12:332. Epub 2011 Aug 10.

Computational Drug Discovery Group, Radboud University NijmegenMedical Centre, Geert Grooteplein, Nijmegen, The Netherlands.

Background: G-protein coupled receptors (GPCRs) are involved in many different physiological processes and their function can be modulated by small molecules which bind in the transmembrane (TM) domain. Because of their structural and sequence conservation, the TM domains are often used in bioinformatics approaches to first create a multiple sequence alignment (MSA) and subsequently identify ligand binding positions. So far methods have been developed to predict the common ligand binding residue positions for class A GPCRs.

Results: Here we present 1) ss-TEA, a method to identify specific ligand binding residue positions for any receptor, predicated on high quality sequence information. 2) The largest MSA of class A non olfactory GPCRs in the public domain consisting of 13324 sequences covering most of the species homologues of the human set of GPCRs. A set of ligand binding residue positions extracted from literature of 10 different receptors shows that our method has the best ligand binding residue prediction for 9 of these 10 receptors compared to another state-of-the-art method.

Conclusions: The combination of the large multi species alignment and the newly introduced residue selection method ss-TEA can be used to rapidly identify subfamily specific ligand binding residues. This approach can aid the design of site directed mutagenesis experiments, explain receptor function and improve modelling. The method is also available online via GPCRDB at http://www.gpcr.org/7tm/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-12-332DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3162937PMC
August 2011

CoPub update: CoPub 5.0 a text mining system to answer biological questions.

Nucleic Acids Res 2011 Jul 27;39(Web Server issue):W450-4. Epub 2011 May 27.

Computational Drug Discovery, CMBI, NCMLS, Radboud University Nijmegen Medical Centre, 6500 HB Nijmegen, The Netherlands.

In this article, we present CoPub 5.0, a publicly available text mining system, which uses Medline abstracts to calculate robust statistics for keyword co-occurrences. CoPub was initially developed for the analysis of microarray data, but we broadened the scope by implementing new technology and new thesauri. In CoPub 5.0, we integrated existing CoPub technology with new features, and provided a new advanced interface, which can be used to answer a variety of biological questions. CoPub 5.0 allows searching for keywords of interest and its relations to curated thesauri and provides highlighting and sorting mechanisms, using its statistics, to retrieve the most important abstracts in which the terms co-occur. It also provides a way to search for indirect relations between genes, drugs, pathways and diseases, following an ABC principle, in which A and C have no direct connection but are connected via shared B intermediates. With CoPub 5.0, it is possible to create, annotate and analyze networks using the layout and highlight options of Cytoscape web, allowing for literature based systems biology. Finally, operations of the CoPub 5.0 Web service enable to implement the CoPub technology in bioinformatics workflows. CoPub 5.0 can be accessed through the CoPub portal http://www.copub.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkr310DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3125746PMC
July 2011

GPCRDB: information system for G protein-coupled receptors.

Nucleic Acids Res 2011 Jan 2;39(Database issue):D309-19. Epub 2010 Nov 2.

CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, The Netherlands.

The GPCRDB is a Molecular Class-Specific Information System (MCSIS) that collects, combines, validates and disseminates large amounts of heterogeneous data on G protein-coupled receptors (GPCRs). The GPCRDB contains experimental data on sequences, ligand-binding constants, mutations and oligomers, as well as many different types of computationally derived data such as multiple sequence alignments and homology models. The GPCRDB provides access to the data via a number of different access methods. It offers visualization and analysis tools, and a number of query systems. The data is updated automatically on a monthly basis. The GPCRDB can be found online at http://www.gpcr.org/7tm/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkq1009DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013641PMC
January 2011

[An internship in another country has its perks].

Tijdschr Diergeneeskd 2008 Sep;133(18):766-7

View Article and Find Full Text PDF

Download full-text PDF

Source
September 2008

Literature-based compound profiling: application to toxicogenomics.

Pharmacogenomics 2007 Nov;8(11):1521-34

Radboud University Nijmegen Medical Centre, Centre for Molecular and Biomolecular Informatics, Nijmegen Centre for Molecular Life Sciences, PO Box 9101, 6500 HB Nijmegen, The Netherlands.

Introduction: To reduce continuously increasing costs in drug development, adverse effects of drugs need to be detected as early as possible in the process. In recent years, compound-induced gene expression profiling methodologies have been developed to assess compound toxicity, including Gene Ontology term and pathway over-representation analyses. The objective of this study was to introduce an additional approach, in which literature information is used for compound profiling to evaluate compound toxicity and mode of toxicity.

Methods: Gene annotations were built by text mining in Medline abstracts for retrieval of co-publications between genes, pathology terms, biological processes and pathways. This literature information was used to generate compound-specific keyword fingerprints, representing over-represented keywords calculated in a set of regulated genes after compound administration. To see whether keyword fingerprints can be used for assessment of compound toxicity, we analyzed microarray data sets of rat liver treated with 11 hepatotoxicants.

Results: Analysis of keyword fingerprints of two genotoxic carcinogens, two nongenotoxic carcinogens, two peroxisome proliferators and two randomly generated gene sets, showed that each compound produced a specific keyword fingerprint that correlated with the experimentally observed histopathological events induced by the individual compounds. By contrast, the random sets produced a flat aspecific keyword profile, indicating that the fingerprints induced by the compounds reflect biological events rather than random noise. A more detailed analysis of the keyword profiles of diethylhexylphthalate, dimethylnitrosamine and methapyrilene (MPy) showed that the differences in the keyword fingerprints of these three compounds are based upon known distinct modes of action. Visualization of MPy-linked keywords and MPy-induced genes in a literature network enabled us to construct a mode of toxicity proposal for MPy, which is in agreement with known effects of MPy in literature.

Conclusion: Compound keyword fingerprinting based on information retrieved from literature is a powerful approach for compound profiling, allowing evaluation of compound toxicity and analysis of the mode of action.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.2217/14622416.8.11.1521DOI Listing
November 2007

CoPub Mapper: mining MEDLINE based on search term co-publication.

BMC Bioinformatics 2005 Mar 11;6:51. Epub 2005 Mar 11.

Department of Molecular Design & Informatics, Organon NV, P.O. Box 20, 5340 BH Oss, The Netherlands.

Background: High throughput microarray analyses result in many differentially expressed genes that are potentially responsible for the biological process of interest. In order to identify biological similarities between genes, publications from MEDLINE were identified in which pairs of gene names and combinations of gene name with specific keywords were co-mentioned.

Results: MEDLINE search strings for 15,621 known genes and 3,731 keywords were generated and validated. PubMed IDs were retrieved from MEDLINE and relative probability of co-occurrences of all gene-gene and gene-keyword pairs determined. To assess gene clustering according to literature co-publication, 150 genes consisting of 8 sets with known connections (same pathway, same protein complex, or same cellular localization, etc.) were run through the program. Receiver operator characteristics (ROC) analyses showed that most gene sets were clustered much better than expected by random chance. To test grouping of genes from real microarray data, 221 differentially expressed genes from a microarray experiment were analyzed with CoPub Mapper, which resulted in several relevant clusters of genes with biological process and disease keywords. In addition, all genes versus keywords were hierarchical clustered to reveal a complete grouping of published genes based on co-occurrence.

Conclusion: The CoPub Mapper program allows for quick and versatile querying of co-published genes and keywords and can be successfully used to cluster predefined groups of genes and microarray data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-6-51DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1274248PMC
March 2005
-->