Publications by authors named "Philipp Bucher"

71 Publications

Intraductal xenografts show lobular carcinoma cells rely on their own extracellular matrix and LOXL1.

EMBO Mol Med 2021 Mar 22;13(3):e13180. Epub 2021 Feb 22.

ISREC - Swiss Institute for Experimental Cancer Research, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.

Invasive lobular carcinoma (ILC) is the most frequent special histological subtype of breast cancer, typically characterized by loss of E-cadherin. It has clinical features distinct from other estrogen receptor-positive (ER ) breast cancers but the molecular mechanisms underlying its characteristic biology are poorly understood because we lack experimental models to study them. Here, we recapitulate the human disease, including its metastatic pattern, by grafting ILC-derived breast cancer cell lines, SUM-44 PE and MDA-MB-134-VI cells, into the mouse milk ducts. Using patient-derived intraductal xenografts from lobular and non-lobular ER HER2 tumors to compare global gene expression, we identify extracellular matrix modulation as a lobular carcinoma cell-intrinsic trait. Analysis of TCGA patient datasets shows matrisome signature is enriched in lobular carcinomas with overexpression of elastin, collagens, and the collagen modifying enzyme LOXL1. Treatment with the pan LOX inhibitor BAPN and silencing of LOXL1 expression decrease tumor growth, invasion, and metastasis by disrupting ECM structure resulting in decreased ER signaling. We conclude that LOXL1 inhibition is a promising therapeutic strategy for ILC.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.15252/emmm.202013180DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7933935PMC
March 2021

Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study.

Genome Biol 2020 05 11;21(1):114. Epub 2020 May 11.

School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland.

Background: Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets.

Results: Here we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in vitro and in vivo performance measures. However, for some in vivo experiments, the best performing PWM is assigned to an unrelated TF, indicating a binding mode involving protein-protein cooperativity.

Conclusions: In an all-against-all setting, we compute more than 18 million performance measure values for different PWM-experiment combinations and offer these results as a public resource to the research community. The benchmarking protocols are provided via a web interface and as docker images. The methods and results from this study may help others make better use of public TF specificity models, as well as public TF binding data sets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-01996-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7212583PMC
May 2020

The secreted protease Adamts18 links hormone action to activation of the mammary stem cell niche.

Nat Commun 2020 03 26;11(1):1571. Epub 2020 Mar 26.

Ecole Polytechnique Fédérale de Lausanne, Station 19, CH-1015, Lausanne, Switzerland.

Estrogens and progesterone control breast development and carcinogenesis via their cognate receptors expressed in a subset of luminal cells in the mammary epithelium. How they control the extracellular matrix, important to breast physiology and tumorigenesis, remains unclear. Here we report that both hormones induce the secreted protease Adamts18 in myoepithelial cells by controlling Wnt4 expression with consequent paracrine canonical Wnt signaling activation. Adamts18 is required for stem cell activation, has multiple binding partners in the basement membrane and interacts genetically with the basal membrane-specific proteoglycan, Col18a1, pointing to the basement membrane as part of the stem cell niche. In vitro, ADAMTS18 cleaves fibronectin; in vivo, Adamts18 deletion causes increased collagen deposition during puberty, which results in impaired Hippo signaling and reduced Fgfr2 expression both of which control stem cell function. Thus, Adamts18 links luminal hormone receptor signaling to basement membrane remodeling and stem cell activation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-15357-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7099066PMC
March 2020

Common genetic variants associated with Parkinson's disease display widespread signature of epigenetic plasticity.

Sci Rep 2019 12 5;9(1):18464. Epub 2019 Dec 5.

Department of Neurology, University Clinic Bonn, Bonn, Germany.

Parkinson disease (PD) is characterized by a pivotal progressive loss of substantia nigra dopaminergic neurons and aggregation of α-synuclein protein encoded by the SNCA gene. Genome-wide association studies identified almost 100 sequence variants linked to PD in SNCA. However, the consequences of this genetic variability are rather unclear. Herein, our analysis on selective single nucleotide polymorphisms (SNPs) which are highly associated with the PD susceptibility revealed that several SNP sites attribute to the nucleosomes and overlay with bivalent regions poised to adopt either active or repressed chromatin states. We also identified large number of transcription factor (TF) binding sites associated with these variants. In addition, we located two docking sites in the intron-1 methylation prone region of SNCA which are required for the putative interactions with DNMT1. Taken together, our analysis reflects an additional layer of epigenomic contribution for the regulation of the SNCA gene in PD.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-019-54865-wDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6895091PMC
December 2019

EPD in 2020: enhanced data visualization and extension to ncRNA promoters.

Nucleic Acids Res 2020 01;48(D1):D65-D69

Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland.

The Eukaryotic Promoter Database (EPD), available online at https://epd.epfl.ch, provides accurate transcription start site (TSS) information for promoters of 15 model organisms plus corresponding functional genomics data that can be viewed in a genome browser, queried or analyzed via web interfaces, or exported in standard formats (FASTA, BED, CSV) for subsequent analysis with other tools. Recent work has focused on the improvement of the EPD promoter viewers, which use the UCSC Genome Browser as visualization platform. Thousands of high-resolution tracks for CAGE, ChIP-seq and similar data have been generated and organized into public track hubs. Customized, reproducible promoter views, combining EPD-supplied tracks with native UCSC Genome Browser tracks, can be accessed from the organism summary pages or from individual promoter entries. Moreover, thanks to recent improvements and stabilization of ncRNA gene catalogs, we were able to release promoter collections for certain classes of ncRNAs from human and mouse. Furthermore, we developed automatic computational protocols to assign orphan TSS peaks to downstream genes based on paired-end (RAMPAGE) TSS mapping data, which enabled us to add nearly 9000 new entries to the human promoter collection. Since our last article in this journal, EPD was extended to five more model organisms: rhesus monkey, rat, dog, chicken and Plasmodium falciparum.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz1014DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7145694PMC
January 2020

Opposing chromatin remodelers control transcription initiation frequency and start site selection.

Nat Struct Mol Biol 2019 08 5;26(8):744-754. Epub 2019 Aug 5.

Department of Molecular Biology and Institute of Genetics and Genomics of Geneva (iGE3), Geneva, Switzerland.

Precise nucleosome organization at eukaryotic promoters is thought to be generated by multiple chromatin remodeler (CR) enzymes and to affect transcription initiation. Using an integrated analysis of chromatin remodeler binding and nucleosome occupancy following rapid remodeler depletion, we investigated the interplay between these enzymes and their impact on transcription in yeast. We show that many promoters are affected by multiple CRs that operate in concert or in opposition to position the key transcription start site (TSS)-associated +1 nucleosome. We also show that nucleosome movement after CR inactivation usually results from the activity of another CR and that in the absence of any remodeling activity, +1 nucleosomes largely maintain their positions. Finally, we present functional assays suggesting that +1 nucleosome positioning often reflects a trade-off between maximizing RNA polymerase recruitment and minimizing transcription initiation at incorrect sites. Our results provide a detailed picture of fundamental mechanisms linking promoter nucleosome architecture to transcription initiation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41594-019-0273-3DOI Listing
August 2019

SPar-K: a method to partition NGS signal data.

Bioinformatics 2019 11;35(21):4440-4441

The Swiss Institute for Experimental Cancer Research (ISREC), Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne 1015, Switzerland.

Summary: We present SPar-K (Signal Partitioning with K-means), a method to search for archetypical chromatin architectures by partitioning a set of genomic regions characterized by chromatin signal profiles around ChIP-seq peaks and other kinds of functional sites. This method efficiently deals with problems of data heterogeneity, limited misalignment of anchor points and unknown orientation of asymmetric patterns.

Availability And Implementation: SPar-K is a C++ program available on GitHub https://github.com/romaingroux/SPar-K and Docker Hub https://hub.docker.com/r/rgroux/spar-k.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz416DOI Listing
November 2019

Short-lived AUF1 p42-binding mRNAs of RANKL and BCL6 have two distinct instability elements each.

PLoS One 2018 12;13(11):e0206823. Epub 2018 Nov 12.

Ecole Polytechnique Fédérale de Lausanne (EPFL), SV-Sciences de la Vie, ISREC-Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland.

Regulation of mRNA stability by RNA-protein interactions contributes significantly to quantitative aspects of gene expression. We have identified potential mRNA targets of the AU-rich element binding protein AUF1. Myc-tagged AUF1 p42 was induced in mouse NIH/3T3 cells and RNA-protein complexes isolated using anti-myc tag antibody beads. Bound mRNAs were analyzed with Affymetrix microarrays. We have identified 508 potential target mRNAs that were at least 3-fold enriched compared to control cells without myc-AUF1. 22.3% of the enriched mRNAs had an AU-rich cluster in the ARED Organism database, against 16.3% of non-enriched control mRNAs. The enrichment towards AU-rich elements was also visible by AREScore with an average value of 5.2 in the enriched mRNAs versus 4.2 in the control group. Yet, numerous mRNAs were enriched without a high ARE score. The enrichment of tetrameric and pentameric sequences suggests a broad AUF1 p42-binding spectrum at short U-rich sequences flanked by A or G. Still, some enriched mRNAs were highly unstable, as those of TNFSF11 (known as RANKL), KLF10, HES1, CCNT2, SMAD6, and BCL6. We have mapped some of the instability determinants. HES1 mRNA appeared to have a coding region determinant. Detailed analysis of the RANKL and BCL6 3'UTR revealed for both that full instability required two elements, which are conserved in evolution. In RANKL mRNA both elements are AU-rich and separated by 30 bases, while in BCL6 mRNA one is AU-rich and 60 bases from a non AU-rich element that potentially forms a stem-loop structure.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0206823PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6231638PMC
April 2019

Detection and benchmarking of somatic mutations in cancer genomes using RNA-seq data.

PeerJ 2018 31;6:e5362. Epub 2018 Jul 31.

Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, Livestrong Cancer Institutes, University of Texas at Austin, Austin, TX, USA.

To detect functional somatic mutations in tumor samples, whole-exome sequencing (WES) is often used for its reliability and relative low cost. RNA-seq, while generally used to measure gene expression, can potentially also be used for identification of somatic mutations. However there has been little systematic evaluation of the utility of RNA-seq for identifying somatic mutations. Here, we develop and evaluate a pipeline for processing RNA-seq data from glioblastoma multiforme (GBM) tumors in order to identify somatic mutations. The pipeline entails the use of the STAR aligner 2-pass procedure jointly with MuTect2 from genome analysis toolkit (GATK) to detect somatic variants. Variants identified from RNA-seq data were evaluated by comparison against the COSMIC and dbSNP databases, and also compared to somatic variants identified by exome sequencing. We also estimated the putative functional impact of coding variants in the most frequently mutated genes in GBM. Interestingly, variants identified by RNA-seq alone showed better representation of GBM-related mutations cataloged by COSMIC. RNA-seq-only data substantially outperformed the ability of WES to reveal potentially new somatic mutations in known GBM-related pathways, and allowed us to build a high-quality set of somatic mutations common to exome and RNA-seq calls. Using RNA-seq data in parallel with WES data to detect somatic mutations in cancer genomes can thus broaden the scope of discoveries and lend additional support to somatic variants identified by exome sequencing alone.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7717/peerj.5362DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6074801PMC
July 2018

PWMScan: a fast tool for scanning entire genomes with a position-specific weight matrix.

Bioinformatics 2018 07;34(14):2483-2484

The Swiss Institute for Experimental Cancer Research (ISREC), Swiss Federal Institute of Technology Lausanne (EPFL).

Summary: Transcription factors regulate gene expression by binding to specific short DNA sequences of 5-20 bp to regulate the rate of transcription of genetic information from DNA to messenger RNA. We present PWMScan, a fast web-based tool to scan server-resident genomes for matches to a user-supplied PWM or transcription factor binding site model from a public database.

Availability And Implementation: The web server and source code are available at http://ccg.vital-it.ch/pwmscan and https://sourceforge.net/projects/pwmscan, respectively.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty127DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6041753PMC
July 2018

MGA repository: a curated data resource for ChIP-seq and other genome annotated data.

Nucleic Acids Res 2018 01;46(D1):D175-D180

Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland.

The Mass Genome Annotation (MGA) repository is a resource designed to store published next generation sequencing data and other genome annotation data (such as gene start sites, SNPs, etc.) in a completely standardised format. Each sample has undergone local processing in order the meet the strict MGA format requirements. The original data source, the reformatting procedure and the biological characteristics of the samples are described in an accompanying documentation file manually edited by data curators. 10 model organisms are currently represented: Homo sapiens, Mus musculus, Danio rerio, Drosophila melanogaster, Apis mellifera, Caenorhabditis elegans, Arabidopsis thaliana, Zea mays, Saccharomyces cerevisiae and Schizosaccharomyces pombe. As of today, the resource contains over 24 000 samples. In conjunction with other tools developed by our group (the ChIP-Seq and SSA servers), it allows users to carry out a great variety of analysis task with MGA samples, such as making aggregation plots and heat maps for selected genomic regions, finding peak regions, generating custom tracks for visualizing genomic features in a UCSC genome browser window, or downloading chromatin data in a table format suitable for local processing with more advanced statistical analysis software such as R. Home page: http://ccg.vital-it.ch/mga/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx995DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753388PMC
January 2018

SMiLE-seq identifies binding motifs of single and dimeric transcription factors.

Nat Methods 2017 03 16;14(3):316-322. Epub 2017 Jan 16.

Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.

Resolving the DNA-binding specificities of transcription factors (TFs) is of critical value for understanding gene regulation. Here, we present a novel, semiautomated protein-DNA interaction characterization technology, selective microfluidics-based ligand enrichment followed by sequencing (SMiLE-seq). SMiLE-seq is neither limited by DNA bait length nor biased toward strong affinity binders; it probes the DNA-binding properties of TFs over a wide affinity range in a fast and cost-effective fashion. We validated SMiLE-seq by analyzing 58 full-length human, mouse, and Drosophila TFs from distinct structural classes. All tested TFs yielded DNA-binding models with predictive power comparable to or greater than that of other in vitro assays. De novo motif discovery on all JUN-FOS heterodimers and several nuclear receptor-TF complexes provided novel insights into partner-specific heterodimer DNA-binding preferences. We also successfully analyzed the DNA-binding properties of uncharacterized human C2H2 zinc-finger proteins and validated several using ChIP-exo.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.4143DOI Listing
March 2017

The eukaryotic promoter database in its 30th year: focus on non-vertebrate organisms.

Nucleic Acids Res 2017 01 28;45(D1):D51-D55. Epub 2016 Nov 28.

Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland.

We present an update of the Eukaryotic Promoter Database EPD (http://epd.vital-it.ch), more specifically on the EPDnew division, which contains comprehensive organisms-specific transcription start site (TSS) collections automatically derived from next generation sequencing (NGS) data. Thanks to the abundant release of new high-throughput transcript mapping data (CAGE, TSS-seq, GRO-cap) the database could be extended to plant and fungal species. We further report on the expansion of the mass genome annotation (MGA) repository containing promoter-relevant chromatin profiling data and on improvements for the EPD entry viewers. Finally, we present a new data access tool, ChIP-Extract, which enables computational biologists to extract diverse types of promoter-associated data in numerical table formats that are readily imported into statistical analysis platforms such as R.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw1069DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210552PMC
January 2017

SNP2TFBS - a database of regulatory SNPs affecting predicted transcription factor binding site affinity.

Nucleic Acids Res 2017 01 28;45(D1):D139-D144. Epub 2016 Nov 28.

Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland

SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular intervals by an automatic procedure that takes as input a reference genome, a comprehensive SNP catalogue and a collection of PWMs. SNP2TFBS is also accessible over a web interface, enabling users to view the information provided for an individual SNP, to extract SNPs based on various search criteria, to annotate uploaded sets of SNPs or to display statistics about the frequencies of binding sites affected by selected SNPs. Homepage: http://ccg.vital-it.ch/snp2tfbs/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw1064DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210548PMC
January 2017

The ChIP-Seq tools and web server: a resource for analyzing ChIP-seq and other types of genomic data.

BMC Genomics 2016 11 18;17(1):938. Epub 2016 Nov 18.

School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland.

Background: ChIP-seq and related high-throughput chromatin profilig assays generate ever increasing volumes of highly valuable biological data. To make sense out of it, biologists need versatile, efficient and user-friendly tools for access, visualization and itegrative analysis of such data.

Results: Here we present the ChIP-Seq command line tools and web server, implementing basic algorithms for ChIP-seq data analysis starting with a read alignment file. The tools are optimized for memory-efficiency and speed thus allowing for processing of large data volumes on inexpensive hardware. The web interface provides access to a large database of public data. The ChIP-Seq tools have a modular and interoperable design in that the output from one application can serve as input to another one. Complex and innovative tasks can thus be achieved by running several tools in a cascade.

Conclusions: The various ChIP-Seq command line tools and web services either complement or compare favorably to related bioinformatics resources in terms of computational efficiency, ease of access to public data and interoperability with other web-based tools. The ChIP-Seq server is accessible at http://ccg.vital-it.ch/chipseq/ .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-016-3288-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5116162PMC
November 2016

Influence of Rotational Nucleosome Positioning on Transcription Start Site Selection in Animal Promoters.

PLoS Comput Biol 2016 Oct 7;12(10):e1005144. Epub 2016 Oct 7.

Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.

The recruitment of RNA-Pol-II to the transcription start site (TSS) is an important step in gene regulation in all organisms. Core promoter elements (CPE) are conserved sequence motifs that guide Pol-II to the TSS by interacting with specific transcription factors (TFs). However, only a minority of animal promoters contains CPEs. It is still unknown how Pol-II selects the TSS in their absence. Here we present a comparative analysis of promoters' sequence composition and chromatin architecture in five eukaryotic model organisms, which shows the presence of common and unique DNA-encoded features used to organize chromatin. Analysis of Pol-II initiation patterns uncovers that, in the absence of certain CPEs, there is a strong correlation between the spread of initiation and the intensity of the 10 bp periodic signal in the nearest downstream nucleosome. Moreover, promoters' primary and secondary initiation sites show a characteristic 10 bp periodicity in the absence of CPEs. We also show that DNA natural variants in the region immediately downstream the TSS are able to affect both the nucleosome-DNA affinity and Pol-II initiation pattern. These findings support the notion that, in addition to CPEs mediated selection, sequence-induced nucleosome positioning could be a common and conserved mechanism of TSS selection in animals.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1005144DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5055345PMC
October 2016

A maximum-likelihood approach for building cell-type trees by lifting.

BMC Genomics 2016 Jan 11;17 Suppl 1:14. Epub 2016 Jan 11.

School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne (EPFL), EPFL IC IIF LCBB, INJ 211 (Batiment INJ), Station 14, Lausanne, CH-1015, Switzerland.

Background: In cell differentiation, a less specialized cell differentiates into a more specialized one, even though all cells in one organism have (almost) the same genome. Epigenetic factors such as histone modifications are known to play a significant role in cell differentiation. We previously introduce cell-type trees to represent the differentiation of cells into more specialized types, a representation that partakes of both ontogeny and phylogeny.

Results: We propose a maximum-likelihood (ML) approach to build cell-type trees and show that this ML approach outperforms our earlier distance-based and parsimony-based approaches. We then study the reconstruction of ancestral cell types; since both ancestral and derived cell types can coexist in adult organisms, we propose a lifting algorithm to infer internal nodes. We present results on our lifting algorithm obtained both through simulations and on real datasets.

Conclusions: We show that our ML-based approach outperforms previously proposed techniques such as distance-based and parsimony-based methods. We show our lifting-based approach works well on both simulated and real data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-015-2297-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895258PMC
January 2016

Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features.

BMC Bioinformatics 2016 Jan 11;17 Suppl 1. Epub 2016 Jan 11.

Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, EPFL, Station 15, Lausanne, CH-1015, Switzerland.

Background: Understanding the mechanisms by which transcription factors (TF) are recruited to their physiological target sites is crucial for understanding gene regulation. DNA sequence intrinsic features such as predicted binding affinity are often not very effective in predicting in vivo site occupancy and in any case could not explain cell-type specific binding events. Recent reports show that chromatin accessibility, nucleosome occupancy and specific histone post-translational modifications greatly influence TF site occupancy in vivo. In this work, we use machine-learning methods to build predictive models and assess the relative importance of different sequence-intrinsic and chromatin features in the TF-to-target-site recruitment process.

Methods: Our study primarily relies on recent data published by the ENCODE consortium. Five dissimilar TFs assayed in multiple cell-types were selected as examples: CTCF, JunD, REST, GABP and USF2. We used two types of candidate target sites: (a) predicted sites obtained by scanning the whole genome with a position weight matrix, and (b) cell-type specific peak lists provided by ENCODE. Quantitative in vivo occupancy levels in different cell-types were based on ChIP-seq data for the corresponding TFs. In parallel, we computed a number of associated sequence-intrinsic and experimental features (histone modification, DNase I hypersensitivity, etc.) for each site. Machine learning algorithms were then used in a binary classification and regression framework to predict site occupancy and binding strength, for the purpose of assessing the relative importance of different contextual features.

Results: We observed striking differences in the feature importance rankings between the five factors tested. PWM-scores were amongst the most important features only for CTCF and REST but of little value for JunD and USF2. Chromatin accessibility and active histone marks are potent predictors for all factors except REST. Structural DNA parameters, repressive and gene body associated histone marks are generally of little or no predictive value.

Conclusions: We define a general and extensible computational framework for analyzing the importance of various DNA-intrinsic and chromatin-associated features in determining cell-type specific TF binding to target sites. The application of our methodology to ENCODE data has led to new insights on transcription regulatory processes and may serve as example for future studies encompassing even larger datasets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12859-015-0846-zDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895346PMC
January 2016

The Eukaryotic Promoter Database: expansion of EPDnew and new promoter analysis tools.

Nucleic Acids Res 2015 Jan 6;43(Database issue):D92-6. Epub 2014 Nov 6.

Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland

We present an update of EPDNew (http://epd.vital-it.ch), a recently introduced new part of the Eukaryotic Promoter Database (EPD) which has been described in more detail in a previous NAR Database Issue. EPD is an old database of experimentally characterized eukaryotic POL II promoters, which are conceptually defined as transcription initiation sites or regions. EPDnew is a collection of automatically compiled, organism-specific promoter lists complementing the old corpus of manually compiled promoter entries of EPD. This new part is exclusively derived from next generation sequencing data from high-throughput promoter mapping experiments. We report on the recent growth of EPDnew, its extension to additional model organisms and its improved integration with other bioinformatics resources developed by our group, in particular the Signal Search Analysis and ChIP-Seq web servers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku1111DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383928PMC
January 2015

Study of cell differentiation by phylogenetic analysis using histone modification data.

BMC Bioinformatics 2014 Aug 8;15:269. Epub 2014 Aug 8.

School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne (EPFL), EPFL IC IIF LCBB, INJ 211 (Batiment INJ), Station 14, CH-1015 Lausanne, Switzerland.

Background: In cell differentiation, a cell of a less specialized type becomes one of a more specialized type, even though all cells have the same genome. Transcription factors and epigenetic marks like histone modifications can play a significant role in the differentiation process.

Results: In this paper, we present a simple analysis of cell types and differentiation paths using phylogenetic inference based on ChIP-Seq histone modification data. We precisely defined the notion of cell-type trees and provided a procedure of building such trees. We propose new data representation techniques and distance measures for ChIP-Seq data and use these together with standard phylogenetic inference methods to build biologically meaningful cell-type trees that indicate how diverse types of cells are related. We demonstrate our approach on various kinds of histone modifications for various cell types, also using the datasets to explore various issues surrounding replicate data, variability between cells of the same type, and robustness. We use the results to get some interesting biological findings like important patterns of histone modification changes during cell differentiation process.

Conclusions: We introduced and studied the novel problem of inferring cell type trees from histone modification data. The promising results we obtain point the way to a new approach to the study of cell differentiation. We also discuss how cell-type trees can be used to study the evolution of cell types.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-15-269DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4138389PMC
August 2014

Classification of selectively constrained DNA elements using feature vectors and rule-based classifiers.

Genomics 2014 Aug 22;104(2):79-86. Epub 2014 Jul 22.

Institute of Biosciences and Applications, National Center for Scientific Research "Demokritos", 15310 Athens, Greece. Electronic address:

Scarce work has been done in the analysis of the composition of conserved non-coding elements (CNEs) that are identified by comparisons of two or more genomes and are found to exist in all metazoan genomes. Here we present the analysis of CNEs with a methodology that takes into account word occurrence at various lengths scales in the form of feature vector representation and rule based classifiers. We implement our approach on both protein-coding exons and CNEs, originating from human, insect (Drosophila melanogaster) and worm (Caenorhabditis elegans) genomes, that are either identified in the present study or obtained from the literature. Alignment free feature vector representation of sequences combined with rule-based classification methods leads to successful classification of the different CNEs classes. Biologically meaningful results are derived by comparison with the genomic signatures approach, and classification rates for a variety of functional elements of the genomes along with surrogates are presented.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ygeno.2014.07.004DOI Listing
August 2014

Probabilistic partitioning methods to find significant patterns in ChIP-Seq data.

Bioinformatics 2014 Sep 7;30(17):2406-13. Epub 2014 May 7.

Laboratory for Computational Biology and Bioinformatics, School of Computer and Communication Sciences, Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne and Swiss Institute for Bioinformatics, 1015 Lausanne, Switzerland Laboratory for Computational Biology and Bioinformatics, School of Computer and Communication Sciences, Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne and Swiss Institute for Bioinformatics, 1015 Lausanne, Switzerland.

Motivation: We have witnessed an enormous increase in ChIP-Seq data for histone modifications in the past few years. Discovering significant patterns in these data is an important problem for understanding biological mechanisms.

Results: We propose probabilistic partitioning methods to discover significant patterns in ChIP-Seq data. Our methods take into account signal magnitude, shape, strand orientation and shifts. We compare our methods with some current methods and demonstrate significant improvements, especially with sparse data. Besides pattern discovery and classification, probabilistic partitioning can serve other purposes in ChIP-Seq data analysis. Specifically, we exemplify its merits in the context of peak finding and partitioning of nucleosome positioning patterns in human promoters.

Availability And Implementation: The software and code are available in the supplementary material.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btu318DOI Listing
September 2014

Fifteen years SIB Swiss Institute of Bioinformatics: life science databases, tools and support.

Nucleic Acids Res 2014 Jul 3;42(Web Server issue):W436-41. Epub 2014 May 3.

SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland University of Geneva, CH-1211 Geneva 4, Switzerland

The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) was created in 1998 as an institution to foster excellence in bioinformatics. It is renowned worldwide for its databases and software tools, such as UniProtKB/Swiss-Prot, PROSITE, SWISS-MODEL, STRING, etc, that are all accessible on ExPASy.org, SIB's Bioinformatics Resource Portal. This article provides an overview of the scientific and training resources SIB has consistently been offering to the life science community for more than 15 years.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku380DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086091PMC
July 2014

Nuclear Factor I genomic binding associates with chromatin boundaries.

BMC Genomics 2013 Feb 12;14:99. Epub 2013 Feb 12.

Institute of Biotechnology and Center for Biotecghnology UNIL-EPFL, University of Lausanne, 1015, Lausanne, Switzerland.

Background: The Nuclear Factor I (NFI) family of DNA binding proteins (also called CCAAT box transcription factors or CTF) is involved in both DNA replication and gene expression regulation. Using chromatin immuno-precipitation and high throughput sequencing (ChIP-Seq), we performed a genome-wide mapping of NFI DNA binding sites in primary mouse embryonic fibroblasts.

Results: We found that in vivo and in vitro NFI DNA binding specificities are indistinguishable, as in vivo ChIP-Seq NFI binding sites matched predictions based on previously established position weight matrix models of its in vitro binding specificity. Combining ChIP-Seq with mRNA profiling data, we found that NFI preferentially associates with highly expressed genes that it up-regulates, while binding sites were under-represented at expressed but unregulated genes. Genomic binding also correlated with markers of transcribed genes such as histone modifications H3K4me3 and H3K36me3, even outside of annotated transcribed loci, implying NFI in the control of the deposition of these modifications. Positional correlation between + and - strand ChIP-Seq tags revealed that, in contrast to other transcription factors, NFI associates with a nucleosomal length of cleavage-resistant DNA, suggesting an interaction with positioned nucleosomes. In addition, NFI binding prominently occurred at boundaries displaying discontinuities in histone modifications specific of expressed and silent chromatin, such as loci submitted to parental allele-specific imprinted expression.

Conclusions: Our data thus suggest that NFI nucleosomal interaction may contribute to the partitioning of distinct chromatin domains and to epigenetic gene expression regulation.NFI ChIP-Seq and input control DNA data were deposited at Gene Expression Omnibus (GEO) repository under accession number GSE15844. Gene expression microarray data for mouse embryonic fibroblasts are on GEO accession number GSE15871.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-14-99DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3610271PMC
February 2013

EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era.

Nucleic Acids Res 2013 Jan 27;41(Database issue):D157-64. Epub 2012 Nov 27.

Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland.

The Eukaryotic Promoter Database (EPD), available online at http://epd.vital-it.ch, is a collection of experimentally defined eukaryotic POL II promoters which has been maintained for more than 25 years. A promoter is represented by a single position in the genome, typically the major transcription start site (TSS). EPD primarily serves biologists interested in analysing the motif content, chromatin structure or DNA methylation status of co-regulated promoter subsets. Initially, promoter evidence came from TSS mapping experiments targeted at single genes and published in journal articles. Today, the TSS positions provided by EPD are inferred from next-generation sequencing data distributed in electronic form. Traditionally, EPD has been a high-quality database with low coverage. The focus of recent efforts has been to reach complete gene coverage for important model organisms. To this end, we introduced a new section called EPDnew, which is automatically assembled from multiple, carefully selected input datasets. As another novelty, we started to use chromatin signatures in addition to mRNA 5'tags to locate promoters of weekly expressed genes. Regarding user interfaces, we introduced a new promoter viewer which enables users to explore promoter-defining experimental evidence in a UCSC genome browser window.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks1233DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531148PMC
January 2013

UCNEbase--a database of ultraconserved non-coding elements and genomic regulatory blocks.

Nucleic Acids Res 2013 Jan 27;41(Database issue):D101-9. Epub 2012 Nov 27.

Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.

UCNEbase (http://ccg.vital-it.ch/UCNEbase) is a free, web-accessible information resource on the evolution and genomic organization of ultra-conserved non-coding elements (UCNEs). It currently covers 4351 such elements in 18 different species. The majority of UCNEs are supposed to be transcriptional regulators of key developmental genes. As most of them occur as clusters near potential target genes, the database is organized along two hierarchical levels: individual UCNEs and ultra-conserved genomic regulatory blocks (UGRBs). UCNEbase introduces a coherent nomenclature for UCNEs reflecting their respective associations with likely target genes. Orthologous and paralogous UCNEs share components of their names and are systematically cross-linked. Detailed synteny maps between the human and other genomes are provided for all UGRBs. UCNEbase is managed by a relational database system and can be accessed by a variety of web-based query pages. As it relies on the UCSC genome browser as visualization platform, a large part of its data content is also available as browser viewable custom track files. UCNEbase is potentially useful to any computational, experimental or evolutionary biologist interested in conserved non-coding DNA elements in vertebrates.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks1092DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531063PMC
January 2013

Genomic context analysis reveals dense interaction network between vertebrate ultraconserved non-coding elements.

Bioinformatics 2012 Sep;28(18):i395-i401

Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.

Motivation: Genomic context analysis, also known as phylogenetic profiling, is widely used to infer functional interactions between proteins but rarely applied to non-coding cis-regulatory DNA elements. We were wondering whether this approach could provide insights about utlraconserved non-coding elements (UCNEs). These elements are organized as large clusters, so-called gene regulatory blocks (GRBs) around key developmental genes. Their molecular functions and the reasons for their high degree of conservation remain enigmatic.

Results: In a special setting of genomic context analysis, we analyzed the fate of GRBs after a whole-genome duplication event in five fish genomes. We found that in most cases all UCNEs were retained together as a single block, whereas the corresponding target genes were often retained in two copies, one completely devoid of UCNEs. This 'winner-takes-all' pattern suggests that UCNEs of a GRB function in a highly cooperative manner. We propose that the multitude of interactions between UCNEs is the reason for their extreme sequence conservation.

Supplementary Information: Supplementary data are available at Bioinformatics online and at http://ccg.vital-it.ch/ucne/
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bts400DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436827PMC
September 2012

ChIPnorm: a statistical method for normalizing and identifying differential regions in histone modification ChIP-seq libraries.

PLoS One 2012 3;7(8):e39573. Epub 2012 Aug 3.

Laboratory for Computational Biology and Bioinformatics, School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.

The advent of high-throughput technologies such as ChIP-seq has made possible the study of histone modifications. A problem of particular interest is the identification of regions of the genome where different cell types from the same organism exhibit different patterns of histone enrichment. This problem turns out to be surprisingly difficult, even in simple pairwise comparisons, because of the significant level of noise in ChIP-seq data. In this paper we propose a two-stage statistical method, called ChIPnorm, to normalize ChIP-seq data, and to find differential regions in the genome, given two libraries of histone modifications of different cell types. We show that the ChIPnorm method removes most of the noise and bias in the data and outperforms other normalization methods. We correlate the histone marks with gene expression data and confirm that histone modifications H3K27me3 and H3K4me3 act as respectively a repressor and an activator of genes. Compared to what was previously reported in the literature, we find that a substantially higher fraction of bivalent marks in ES cells for H3K27me3 and H3K4me3 move into a K27-only state. We find that most of the promoter regions in protein-coding genes have differential histone-modification sites. The software for this work can be downloaded from http://lcbb.epfl.ch/software.html.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0039573PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3411705PMC
February 2013

Practicality and time complexity of a sparsified RNA folding algorithm.

J Bioinform Comput Biol 2012 Apr;10(2):1241007

Swiss Institute of Bioinformatics and Swiss Institute for Experimental Cancer Research, Swiss Federal Institute of Technology, Lausanne, 1015, Switzerland.

Commonly used RNA folding programs compute the minimum free energy structure of a sequence under the pseudoknot exclusion constraint. They are based on Zuker's algorithm which runs in time O(n(3)). Recently, it has been claimed that RNA folding can be achieved in average time O(n(2)) using a sparsification technique. A proof of quadratic time complexity was based on the assumption that computational RNA folding obeys the "polymer-zeta property". Several variants of sparse RNA folding algorithms were later developed. Here, we present our own version, which is readily applicable to existing RNA folding programs, as it is extremely simple and does not require any new data structure. We applied it to the widely used Vienna RNAfold program, to create sibRNAfold, the first public sparsified version of a standard RNA folding program. To gain a better understanding of the time complexity of sparsified RNA folding in general, we carried out a thorough run time analysis with synthetic random sequences, both in the context of energy minimization and base pairing maximization. Contrary to previous claims, the asymptotic time complexity of a sparsified RNA folding algorithm using standard energy parameters remains O(n(3)) under a wide variety of conditions. Consistent with our run-time analysis, we found that RNA folding does not obey the "polymer-zeta property" as claimed previously. Yet, a basic version of a sparsified RNA folding algorithm provides 15- to 50-fold speed gain. Surprisingly, the same sparsification technique has a different effect when applied to base pairing optimization. There, its asymptotic running time complexity appears to be either quadratic or cubic depending on the base composition. The code used in this work is available at: .
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0219720012410077DOI Listing
April 2012

RNA profiling and chromatin immunoprecipitation-sequencing reveal that PTF1a stabilizes pancreas progenitor identity via the control of MNX1/HLXB9 and a network of other transcription factors.

Mol Cell Biol 2012 Mar 9;32(6):1189-99. Epub 2012 Jan 9.

Swiss Institute for Experimental Cancer Researcha and Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.

Pancreas development is initiated by the specification and expansion of a small group of endodermal cells. Several transcription factors are crucial for progenitor maintenance and expansion, but their interactions and the downstream targets mediating their activity are poorly understood. Among those factors, PTF1a, a basic helix-loop-helix (bHLH) transcription factor which controls pancreas exocrine cell differentiation, maintenance, and functionality, is also needed for the early specification of pancreas progenitors. We used RNA profiling and chromatin immunoprecipitation (ChIP) sequencing to identify a set of targets in pancreas progenitors. We demonstrate that Mnx1, a gene that is absolutely required in pancreas progenitors, is a major direct target of PTF1a and is regulated by a distant enhancer element. Pdx1, Nkx6.1, and Onecut1 are also direct PTF1a targets whose expression is promoted by PTF1a. These proteins, most of which were previously shown to be necessary for pancreas bud maintenance or formation, form a transcription factor network that allows the maintenance of pancreas progenitors. In addition, we identify Bmp7, Nr5a2, RhoV, and P2rx1 as new targets of PTF1a in pancreas progenitors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/MCB.06318-11DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3295004PMC
March 2012