Publications by authors named "Bill Andreopoulos"

18 Publications

  • Page 1 of 1

Large-scale genome sequencing of mycorrhizal fungi provides insights into the early evolution of symbiotic traits.

Nat Commun 2020 10 12;11(1):5125. Epub 2020 Oct 12.

Université de Lorraine, Institut national de recherche pour l'agriculture, l'alimentation et l' environnement, UMR Interactions Arbres/Microorganismes, Centre INRAE Grand Est-Nancy, 54280, Champenoux, France.

Mycorrhizal fungi are mutualists that play crucial roles in nutrient acquisition in terrestrial ecosystems. Mycorrhizal symbioses arose repeatedly across multiple lineages of Mucoromycotina, Ascomycota, and Basidiomycota. Considerable variation exists in the capacity of mycorrhizal fungi to acquire carbon from soil organic matter. Here, we present a combined analysis of 135 fungal genomes from 73 saprotrophic, endophytic and pathogenic species, and 62 mycorrhizal species, including 29 new mycorrhizal genomes. This study samples ecologically dominant fungal guilds for which there were previously no symbiotic genomes available, including ectomycorrhizal Russulales, Thelephorales and Cantharellales. Our analyses show that transitions from saprotrophy to symbiosis involve (1) widespread losses of degrading enzymes acting on lignin and cellulose, (2) co-option of genes present in saprotrophic ancestors to fulfill new symbiotic functions, (3) diversification of novel, lineage-specific symbiosis-induced genes, (4) proliferation of transposable elements and (5) divergent genetic innovations underlying the convergent origins of the ectomycorrhizal guild.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-18795-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7550596PMC
October 2020

Phage-specific metabolic reprogramming of virocells.

ISME J 2020 04 2;14(4):881-895. Epub 2020 Jan 2.

Department of Ecology and Evolutionary Biology, University of Michigan, 1105 North University Ave, Ann Arbor, MI, 48109, USA.

Ocean viruses are abundant and infect 20-40% of surface microbes. Infected cells, termed virocells, are thus a predominant microbial state. Yet, virocells and their ecosystem impacts are understudied, thus precluding their incorporation into ecosystem models. Here we investigated how unrelated bacterial viruses (phages) reprogram one host into contrasting virocells with different potential ecosystem footprints. We independently infected the marine Pseudoalteromonas bacterium with siphovirus PSA-HS2 and podovirus PSA-HP1. Time-resolved multi-omics unveiled drastically different metabolic reprogramming and resource requirements by each virocell, which were related to phage-host genomic complementarity and viral fitness. Namely, HS2 was more complementary to the host in nucleotides and amino acids, and fitter during infection than HP1. Functionally, HS2 virocells hardly differed from uninfected cells, with minimal host metabolism impacts. HS2 virocells repressed energy-consuming metabolisms, including motility and translation. Contrastingly, HP1 virocells substantially differed from uninfected cells. They repressed host transcription, responded to infection continuously, and drastically reprogrammed resource acquisition, central carbon and energy metabolisms. Ecologically, this work suggests that one cell, infected versus uninfected, can have immensely different metabolisms that affect the ecosystem differently. Finally, we relate phage-host genome complementarity, virocell metabolic reprogramming, and viral fitness in a conceptual model to guide incorporating viruses into ecosystem models.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41396-019-0580-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7082346PMC
April 2020

Comprehensive genomic and transcriptomic analysis of polycyclic aromatic hydrocarbon degradation by a mycoremediation fungus, Dentipellis sp. KUC8613.

Appl Microbiol Biotechnol 2019 Oct 3;103(19):8145-8155. Epub 2019 Sep 3.

Department of Biotechnology, College of Life Sciences and Biotechnology and Graduate School, Korea University, Seoul, 02841, South Korea.

The environmental accumulation of polycyclic aromatic hydrocarbons (PAHs) is of great concern due to potential carcinogenic and mutagenic risks, as well as their resistance to remediation. While many fungi have been reported to break down PAHs in environments, the details of gene-based metabolic pathways are not yet comprehensively understood. Specifically, the genome-scale transcriptional responses of fungal PAH degradation have rarely been reported. In this study, we report the genomic and transcriptomic basis of PAH bioremediation by a potent fungal degrader, Dentipellis sp. KUC8613. The genome size of this fungus was 36.71 Mbp long encoding 14,320 putative protein-coding genes. The strain efficiently removed more than 90% of 100 mg/l concentration of PAHs within 10 days. The genomic and transcriptomic analysis of this white rot fungus highlights that the strain primarily utilized non-ligninolytic enzymes to remove various PAHs, rather than typical ligninolytic enzymes known for playing important roles in PAH degradation. PAH removal by non-ligninolytic enzymes was initiated by both different PAH-specific and common upregulation of P450s, followed by downstream PAH-transforming enzymes such as epoxide hydrolases, dehydrogenases, FAD-dependent monooxygenases, dioxygenases, and glycosyl- or glutathione transferases. Among the various PAHs, phenanthrene induced a more dynamic transcriptomic response possibly due to its greater cytotoxicity, leading to highly upregulated genes involved in the translocation of PAHs, a defense system against reactive oxygen species, and ATP synthesis. Our genomic and transcriptomic data provide a foundation of understanding regarding the mycoremediation of PAHs and the application of this strain for polluted environments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00253-019-10089-6DOI Listing
October 2019

Draft Genome Sequences of Three Monokaryotic Isolates of the White-Rot Basidiomycete Fungus Dichomitus squalens.

Microbiol Resour Announc 2019 May 2;8(18). Epub 2019 May 2.

Fungal Physiology, Westerdijk Fungal Biodiversity Centre, Utrecht, The Netherlands

Here, we report the draft genome sequences of three isolates of the wood-decaying white-rot basidiomycete fungus The genomes of these monokaryons were sequenced to provide more information on the intraspecies genomic diversity of this fungus and were compared to the previously sequenced genome of LYAD-421 SS1.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/MRA.00264-19DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6498232PMC
May 2019

Leveraging single-cell genomics to expand the fungal tree of life.

Nat Microbiol 2018 12 8;3(12):1417-1428. Epub 2018 Oct 8.

US Department of Energy Joint Genome Institute, Walnut Creek, CA, USA.

Environmental DNA surveys reveal that most fungal diversity represents uncultured species. We sequenced the genomes of eight uncultured species across the fungal tree of life using a new single-cell genomics pipeline. We show that, despite a large variation in genome and gene space recovery from each single amplified genome (SAG), ≥90% can be recovered by combining multiple SAGs. SAGs provide robust placement for early-diverging lineages and infer a diploid ancestor of fungi. Early-diverging fungi share metabolic deficiencies and show unique gene expansions correlated with parasitism and unculturability. Single-cell genomics holds great promise in exploring fungal diversity, life cycles and metabolic potential.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41564-018-0261-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6784888PMC
December 2018

Widespread adenine N6-methylation of active genes in fungi.

Nat Genet 2017 Jun 8;49(6):964-968. Epub 2017 May 8.

US Department of Energy Joint Genome Institute, Walnut Creek, California, USA.

N6-methyldeoxyadenine (6mA) is a noncanonical DNA base modification present at low levels in plant and animal genomes, but its prevalence and association with genome function in other eukaryotic lineages remains poorly understood. Here we report that abundant 6mA is associated with transcriptionally active genes in early-diverging fungal lineages. Using single-molecule long-read sequencing of 16 diverse fungal genomes, we observed that up to 2.8% of all adenines were methylated in early-diverging fungi, far exceeding levels observed in other eukaryotes and more derived fungi. 6mA occurred symmetrically at ApT dinucleotides and was concentrated in dense methylated adenine clusters surrounding the transcriptional start sites of expressed genes; its distribution was inversely correlated with that of 5-methylcytosine. Our results show a striking contrast in the genomic distributions of 6mA and 5-methylcytosine and reinforce a distinct role for 6mA as a gene-expression-associated epigenomic mark in eukaryotes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ng.3859DOI Listing
June 2017

Next generation sequencing data of a defined microbial mock community.

Sci Data 2016 Sep 27;3:160081. Epub 2016 Sep 27.

DOE Joint Genome Institute, Walnut Creek, California 94598, USA.

Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here report the next generation metagenomic sequence data of a defined mock community (Mock Bacteria ARchaea Community; MBARC-26), composed of 23 bacterial and 3 archaeal strains with finished genomes. These strains span 10 phyla and 14 classes, a range of GC contents, genome sizes, repeat content and encompass a diverse abundance profile. Short read Illumina and long-read PacBio SMRT sequences of this mock community are described. These data represent a valuable resource for the scientific community, enabling extensive benchmarking and comparative evaluation of bioinformatics tools without the need to simulate data. As such, these data can aid in improving our current sequence data analysis toolkit and spur interest in the development of new tools.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5037974PMC
http://dx.doi.org/10.1038/sdata.2016.81DOI Listing
September 2016

Integrated Analysis Reveals hsa-miR-142 as a Representative of a Lymphocyte-Specific Gene Expression and Methylation Signature.

Cancer Inform 2012 12;11:61-75. Epub 2012 Mar 12.

Center for Computational Biology and Bioinformatics, Department of Electrical Engineering, Columbia University, New York, NY 10027, USA.

Gene expression profiling has provided insights into different cancer types and revealed tissue-specific expression signatures. Alterations in microRNA expression contribute to the pathogenesis of many types of human diseases. Few studies have integrated all levels of gene expression, miRNA and methylation to uncover correlations between these data types. We performed an integrated profiling to discover instances of miRNAs associated with a gene expression and DNA methylation signature across multiple cancer types. Using data from The Cancer Genome Atlas (TCGA), we revealed a concordant gene expression and methylation signature associated with the microRNA hsa-miR-142 across the same samples. In all cancer types examined, we found a signature of co-expression of a gene set R and methylated sites M, which correlate positively (M+) or negatively (M-) with the expression of hsa-miR-142. The set R consistently contains many genes, such as TRAF3IP3, NCKAP1L, CD53, LAPTM5, PTPRC, EVI2B, DOCK2, LCP2, CYBB and FYB. The signature is preserved across glioblastoma, ovarian, breast, colon, kidney, lung, uterine and rectum cancer. There is 28% overlap of methylation sites in M between glioblastoma (GBM) and ovarian cancer. There is 60% overlap of genes in R between GBM and ovarian (P = 1.3e(-11)). Most of the genes in R are known to be expressed in lymphocytes and haematopoietic stem cells, while M reflects membrane proteins involved in cell-cell adhesion functions. We speculate that the hsa-miR-142 associated signature may signal haematopoietic-specific processes and an accumulation of methylation events triggering a progressive loss of cell-cell adhesion. We also observed that GBM samples belonging to the proneural subtype tend to have underexpressed hsa-miR-142 and R genes, hypomethylated M+ and hypermethylated M-, while the mesenchymal samples have the opposite profile.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.4137/CIN.S9037DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3306237PMC
August 2012

Efficient unfolding pattern recognition in single molecule force spectroscopy data.

Algorithms Mol Biol 2011 Jun 6;6(1):16. Epub 2011 Jun 6.

Department of Bioinformatics, Biotechnological Center, University of Technology Dresden, Dresden, Germany.

Background: Single-molecule force spectroscopy (SMFS) is a technique that measures the force necessary to unfold a protein. SMFS experiments generate Force-Distance (F-D) curves. A statistical analysis of a set of F-D curves reveals different unfolding pathways. Information on protein structure, conformation, functional states, and inter- and intra-molecular interactions can be derived.

Results: In the present work, we propose a pattern recognition algorithm and apply our algorithm to datasets from SMFS experiments on the membrane protein bacterioRhodopsin (bR). We discuss the unfolding pathways found in bR, which are characterised by main peaks and side peaks. A main peak is the result of the pairwise unfolding of the transmembrane helices. In contrast, a side peak is an unfolding event in the alpha-helix or other secondary structural element. The algorithm is capable of detecting side peaks along with main peaks.Therefore, we can detect the individual unfolding pathway as the sequence of events labeled with their occurrences and co-occurrences special to bR's unfolding pathway. We find that side peaks do not co-occur with one another in curves as frequently as main peaks do, which may imply a synergistic effect occurring between helices. While main peaks co-occur as pairs in at least 50% of curves, the side peaks co-occur with one another in less than 10% of curves. Moreover, the algorithm runtime scales well as the dataset size increases.

Conclusions: Our algorithm satisfies the requirements of an automated methodology that combines high accuracy with efficiency in analyzing SMFS datasets. The algorithm tackles the force spectroscopy analysis bottleneck leading to more consistent and reproducible results.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1748-7188-6-16DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3126767PMC
June 2011

Triangle network motifs predict complexes by complementing high-error interactomes with structural information.

BMC Bioinformatics 2009 Jun 27;10:196. Epub 2009 Jun 27.

Biotechnology Center (BIOTEC), Technische Universität Dresden, 01307 Dresden, Germany.

Background: A lot of high-throughput studies produce protein-protein interaction networks (PPINs) with many errors and missing information. Even for genome-wide approaches, there is often a low overlap between PPINs produced by different studies. Second-level neighbors separated by two protein-protein interactions (PPIs) were previously used for predicting protein function and finding complexes in high-error PPINs. We retrieve second level neighbors in PPINs, and complement these with structural domain-domain interactions (SDDIs) representing binding evidence on proteins, forming PPI-SDDI-PPI triangles.

Results: We find low overlap between PPINs, SDDIs and known complexes, all well below 10%. We evaluate the overlap of PPI-SDDI-PPI triangles with known complexes from Munich Information center for Protein Sequences (MIPS). PPI-SDDI-PPI triangles have ~20 times higher overlap with MIPS complexes than using second-level neighbors in PPINs without SDDIs. The biological interpretation for triangles is that a SDDI causes two proteins to be observed with common interaction partners in high-throughput experiments. The relatively few SDDIs overlapping with PPINs are part of highly connected SDDI components, and are more likely to be detected in experimental studies. We demonstrate the utility of PPI-SDDI-PPI triangles by reconstructing myosin-actin processes in the nucleus, cytoplasm, and cytoskeleton, which were not obvious in the original PPIN. Using other complementary datatypes in place of SDDIs to form triangles, such as PubMed co-occurrences or threading information, results in a similar ability to find protein complexes.

Conclusion: Given high-error PPINs with missing information, triangles of mixed datatypes are a promising direction for finding protein complexes. Integrating PPINs with SDDIs improves finding complexes. Structural SDDIs partially explain the high functional similarity of second-level neighbors in PPINs. We estimate that relatively little structural information would be sufficient for finding complexes involving most of the proteins and interactions in a typical PPIN.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-10-196DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2714575PMC
June 2009

A roadmap of clustering algorithms: finding a match for a biomedical application.

Brief Bioinform 2009 May 24;10(3):297-314. Epub 2009 Feb 24.

Biotechnological Centre, Technische Universität Dresden, Germany.

Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbn058DOI Listing
May 2009

Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy.

BMC Bioinformatics 2009 Jan 21;10:28. Epub 2009 Jan 21.

Biotechnology Center (BIOTEC), Technische Universität Dresden, 01062, Dresden, Germany.

Background: Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively.

Results: The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate.

Conclusion: Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90% success given a consistently modelled ontology. Overall, the results show that well structured ontologies can play a very important role to improve disambiguation.

Availability: The three benchmark datasets created for the purpose of disambiguation are available in Additional file 1.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-10-28DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2663782PMC
January 2009

Efficient layered density-based clustering of categorical data.

J Biomed Inform 2009 Apr 10;42(2):365-76. Epub 2008 Dec 10.

Biotechnological Centre, Technische Universität Dresden, 47-51 Tatzberg, 01307 Dresden Sachsen, Germany.

A challenge involved in applying density-based clustering to categorical biomedical data is that the "cube" of attribute values has no ordering defined, making the search for dense subspaces slow. We propose the HIERDENC algorithm for hierarchical density-based clustering of categorical data, and a complementary index for searching for dense subspaces efficiently. The HIERDENC index is updated when new objects are introduced, such that clustering does not need to be repeated on all objects. The updating and cluster retrieval are efficient. Comparisons with several other clustering algorithms showed that on large datasets HIERDENC achieved better runtime scalability on the number of objects, as well as cluster quality. By fast collapsing the bicliques in large networks we achieved an edge reduction of as much as 86.5%. HIERDENC is suitable for large and quickly growing datasets, since it is independent of object ordering, does not require re-clustering when new data emerges, and requires no user-specified input parameters.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2008.11.004DOI Listing
April 2009

Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering.

Int J Data Min Bioinform 2008 ;2(3):193-215

Biotechnological Centre, Technischen Universität Dresden, Germany.

With more and more genomes being sequenced, a lot of effort is devoted to their annotation with terms from controlled vocabularies such as the GeneOntology. Manual annotation based on relevant literature is tedious, but automation of this process is difficult. One particularly challenging problem is word sense disambiguation. Terms such as 'development' can refer to developmental biology or to the more general sense. Here, we present two approaches to address this problem by using term co-occurrences and document clustering. To evaluate our method we defined a corpus of 331 documents on development and developmental biology. Term co-occurrence analysis achieves an F-measure of 77%. Additionally, applying document clustering improves precision to 82%. We applied the same approach to disambiguate 'nucleus', 'transport', and 'spindle', and we achieved consistent results. Thus, our method is a viable approach towards the automation of literature-based genome annotation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1504/ijdmb.2008.020522DOI Listing
December 2008

Unraveling protein networks with power graph analysis.

PLoS Comput Biol 2008 Jul 11;4(7):e1000108. Epub 2008 Jul 11.

Biotechnology Center, Technische Universität Dresden, Dresden, Germany.

Networks play a crucial role in computational biology, yet their analysis and representation is still an open problem. Power Graph Analysis is a lossless transformation of biological networks into a compact, less redundant representation, exploiting the abundance of cliques and bicliques as elementary topological motifs. We demonstrate with five examples the advantages of Power Graph Analysis. Investigating protein-protein interaction networks, we show how the catalytic subunits of the casein kinase II complex are distinguishable from the regulatory subunits, how interaction profiles and sequence phylogeny of SH3 domains correlate, and how false positive interactions among high-throughput interactions are spotted. Additionally, we demonstrate the generality of Power Graph Analysis by applying it to two other types of networks. We show how power graphs induce a clustering of both transcription factors and target genes in bipartite transcription networks, and how the erosion of a phosphatase domain in type 22 non-receptor tyrosine phosphatases is detected. We apply Power Graph Analysis to high-throughput protein interaction networks and show that up to 85% (56% on average) of the information is redundant. Experimental networks are more compressible than rewired ones of same degree distribution, indicating that experimental networks are rich in cliques and bicliques. Power Graphs are a novel representation of networks, which reduces network complexity by explicitly representing re-occurring network motifs. Power Graphs compress up to 85% of the edges in protein interaction networks and are applicable to all types of networks such as protein interactions, regulatory networks, or homology networks.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1000108DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2424176PMC
July 2008

Bi-level clustering of mixed categorical and numerical biomedical data.

Int J Data Min Bioinform 2006 ;1(1):19-56

Department of Computer Science and Engineering, York University, M3J1P3, Toronto, Ontario, Canada.

Biomedical data sets often have mixed categorical and numerical types, where the former represent semantic information on the objects and the latter represent experimental results. We present the BILCOM algorithm for 'Bi-Level Clustering of Mixed categorical and numerical data types'. BILCOM performs a pseudo-Bayesian process, where the prior is categorical clustering. BILCOM partitions biomedical data sets of mixed types, such as hepatitis, thyroid disease and yeast gene expression data with Gene Ontology annotations, more accurately than if using one type alone.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1504/ijdmb.2006.009920DOI Listing
May 2008

Finding molecular complexes through multiple layer clustering of protein interaction networks.

Int J Bioinform Res Appl 2007 ;3(1):65-85

Department of Computer Science and Engineering, York University, M3J1P3, Toronto, Ontario, Canada.

Clustering protein-protein interaction networks (PINs) helps to identify complexes that guide the cell machinery. Clustering algorithms often create a flat clustering, without considering the layered structure of PINs. We propose the MULIC clustering algorithm that produces layered clusters. We applied MULIC to five PINs. Clusters correlate with known MIPS protein complexes. For example, a cluster of 79 proteins overlaps with a known complex of 88 proteins. Proteins in top cluster layers tend to be more representative of complexes than proteins in bottom layers. Lab work on finding unknown complexes or determining drug effects can be guided by top layer proteins.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1504/IJBRA.2007.011835DOI Listing
January 2008

Clustering by common friends finds locally significant proteins mediating modules.

Bioinformatics 2007 May 21;23(9):1124-31. Epub 2007 Feb 21.

Biotechnological Centre, Technische Universität Dresden, Germany.

Motivation: Much research has been dedicated to large-scale protein interaction networks including the analysis of scale-free topologies, network modules and the relation of domain-domain to protein-protein interaction networks. Identifying locally significant proteins that mediate the function of modules is still an open problem.

Method: We use a layered clustering algorithm for interaction networks, which groups proteins by the similarity of their direct neighborhoods. We identify locally significant proteins, called mediators, which link different clusters. We apply the algorithm to a yeast network.

Results: Clusters and mediators are organized in hierarchies, where clusters are mediated by and act as mediators for other clusters. We compare the clusters and mediators to known yeast complexes and find agreement with precision of 71% and recall of 61%. We analyzed the functions, processes and locations of mediators and clusters. We found that 55% of mediators to a cluster are enriched with a set of diverse processes and locations, often related to translocation of biomolecules. Additionally, 82% of clusters are enriched with one or more functions. The important role of mediators is further corroborated by a comparatively higher degree of conservation across genomes. We illustrate the above findings with an example of membrane protein translocation from the cytoplasm to the inner nuclear membrane.

Availability: All software is freely available under Supplementary information.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btm064DOI Listing
May 2007