Publications by authors named "Romain Groux"

8 Publications

  • Page 1 of 1

Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study.

Genome Biol 2020 05 11;21(1):114. Epub 2020 May 11.

School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland.

Background: Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets.

Results: Here we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in vitro and in vivo performance measures. However, for some in vivo experiments, the best performing PWM is assigned to an unrelated TF, indicating a binding mode involving protein-protein cooperativity.

Conclusions: In an all-against-all setting, we compute more than 18 million performance measure values for different PWM-experiment combinations and offer these results as a public resource to the research community. The benchmarking protocols are provided via a web interface and as docker images. The methods and results from this study may help others make better use of public TF specificity models, as well as public TF binding data sets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-01996-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7212583PMC
May 2020

EPD in 2020: enhanced data visualization and extension to ncRNA promoters.

Nucleic Acids Res 2020 01;48(D1):D65-D69

Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland.

The Eukaryotic Promoter Database (EPD), available online at https://epd.epfl.ch, provides accurate transcription start site (TSS) information for promoters of 15 model organisms plus corresponding functional genomics data that can be viewed in a genome browser, queried or analyzed via web interfaces, or exported in standard formats (FASTA, BED, CSV) for subsequent analysis with other tools. Recent work has focused on the improvement of the EPD promoter viewers, which use the UCSC Genome Browser as visualization platform. Thousands of high-resolution tracks for CAGE, ChIP-seq and similar data have been generated and organized into public track hubs. Customized, reproducible promoter views, combining EPD-supplied tracks with native UCSC Genome Browser tracks, can be accessed from the organism summary pages or from individual promoter entries. Moreover, thanks to recent improvements and stabilization of ncRNA gene catalogs, we were able to release promoter collections for certain classes of ncRNAs from human and mouse. Furthermore, we developed automatic computational protocols to assign orphan TSS peaks to downstream genes based on paired-end (RAMPAGE) TSS mapping data, which enabled us to add nearly 9000 new entries to the human promoter collection. Since our last article in this journal, EPD was extended to five more model organisms: rhesus monkey, rat, dog, chicken and Plasmodium falciparum.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz1014DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7145694PMC
January 2020

SPar-K: a method to partition NGS signal data.

Bioinformatics 2019 11;35(21):4440-4441

The Swiss Institute for Experimental Cancer Research (ISREC), Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne 1015, Switzerland.

Summary: We present SPar-K (Signal Partitioning with K-means), a method to search for archetypical chromatin architectures by partitioning a set of genomic regions characterized by chromatin signal profiles around ChIP-seq peaks and other kinds of functional sites. This method efficiently deals with problems of data heterogeneity, limited misalignment of anchor points and unknown orientation of asymmetric patterns.

Availability And Implementation: SPar-K is a C++ program available on GitHub https://github.com/romaingroux/SPar-K and Docker Hub https://hub.docker.com/r/rgroux/spar-k.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz416DOI Listing
November 2019

PWMScan: a fast tool for scanning entire genomes with a position-specific weight matrix.

Bioinformatics 2018 07;34(14):2483-2484

The Swiss Institute for Experimental Cancer Research (ISREC), Swiss Federal Institute of Technology Lausanne (EPFL).

Summary: Transcription factors regulate gene expression by binding to specific short DNA sequences of 5-20 bp to regulate the rate of transcription of genetic information from DNA to messenger RNA. We present PWMScan, a fast web-based tool to scan server-resident genomes for matches to a user-supplied PWM or transcription factor binding site model from a public database.

Availability And Implementation: The web server and source code are available at http://ccg.vital-it.ch/pwmscan and https://sourceforge.net/projects/pwmscan, respectively.

Supplementary Information: Supplementary data are available at Bioinformatics online.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty127DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6041753PMC
July 2018

MGA repository: a curated data resource for ChIP-seq and other genome annotated data.

Nucleic Acids Res 2018 01;46(D1):D175-D180

Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland.

The Mass Genome Annotation (MGA) repository is a resource designed to store published next generation sequencing data and other genome annotation data (such as gene start sites, SNPs, etc.) in a completely standardised format. Each sample has undergone local processing in order the meet the strict MGA format requirements. The original data source, the reformatting procedure and the biological characteristics of the samples are described in an accompanying documentation file manually edited by data curators. 10 model organisms are currently represented: Homo sapiens, Mus musculus, Danio rerio, Drosophila melanogaster, Apis mellifera, Caenorhabditis elegans, Arabidopsis thaliana, Zea mays, Saccharomyces cerevisiae and Schizosaccharomyces pombe. As of today, the resource contains over 24 000 samples. In conjunction with other tools developed by our group (the ChIP-Seq and SSA servers), it allows users to carry out a great variety of analysis task with MGA samples, such as making aggregation plots and heat maps for selected genomic regions, finding peak regions, generating custom tracks for visualizing genomic features in a UCSC genome browser window, or downloading chromatin data in a table format suitable for local processing with more advanced statistical analysis software such as R. Home page: http://ccg.vital-it.ch/mga/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx995DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753388PMC
January 2018

SMiLE-seq identifies binding motifs of single and dimeric transcription factors.

Nat Methods 2017 03 16;14(3):316-322. Epub 2017 Jan 16.

Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.

Resolving the DNA-binding specificities of transcription factors (TFs) is of critical value for understanding gene regulation. Here, we present a novel, semiautomated protein-DNA interaction characterization technology, selective microfluidics-based ligand enrichment followed by sequencing (SMiLE-seq). SMiLE-seq is neither limited by DNA bait length nor biased toward strong affinity binders; it probes the DNA-binding properties of TFs over a wide affinity range in a fast and cost-effective fashion. We validated SMiLE-seq by analyzing 58 full-length human, mouse, and Drosophila TFs from distinct structural classes. All tested TFs yielded DNA-binding models with predictive power comparable to or greater than that of other in vitro assays. De novo motif discovery on all JUN-FOS heterodimers and several nuclear receptor-TF complexes provided novel insights into partner-specific heterodimer DNA-binding preferences. We also successfully analyzed the DNA-binding properties of uncharacterized human C2H2 zinc-finger proteins and validated several using ChIP-exo.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nmeth.4143DOI Listing
March 2017

The eukaryotic promoter database in its 30th year: focus on non-vertebrate organisms.

Nucleic Acids Res 2017 01 28;45(D1):D51-D55. Epub 2016 Nov 28.

Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland.

We present an update of the Eukaryotic Promoter Database EPD (http://epd.vital-it.ch), more specifically on the EPDnew division, which contains comprehensive organisms-specific transcription start site (TSS) collections automatically derived from next generation sequencing (NGS) data. Thanks to the abundant release of new high-throughput transcript mapping data (CAGE, TSS-seq, GRO-cap) the database could be extended to plant and fungal species. We further report on the expansion of the mass genome annotation (MGA) repository containing promoter-relevant chromatin profiling data and on improvements for the EPD entry viewers. Finally, we present a new data access tool, ChIP-Extract, which enables computational biologists to extract diverse types of promoter-associated data in numerical table formats that are readily imported into statistical analysis platforms such as R.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkw1069DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210552PMC
January 2017

Endocytosis of the anthrax toxin is mediated by clathrin, actin and unconventional adaptors.

PLoS Pathog 2010 Mar 5;6(3):e1000792. Epub 2010 Mar 5.

Global Health Institute, Ecole Polytechnique Fédérale de Lausanne, Faculty of Life Sciences, Lausanne, Switzerland.

The anthrax toxin is a tripartite toxin, where the two enzymatic subunits require the third subunit, the protective antigen (PA), to interact with cells and be escorted to their cytoplasmic targets. PA binds to cells via one of two receptors, TEM8 and CMG2. Interestingly, the toxin times and triggers its own endocytosis, in particular through the heptamerization of PA. Here we show that PA triggers the ubiquitination of its receptors in a beta-arrestin-dependent manner and that this step is required for clathrin-mediated endocytosis. In addition, we find that endocytosis is dependent on the heterotetrameric adaptor AP-1 but not the more conventional AP-2. Finally, we show that endocytosis of PA is strongly dependent on actin. Unexpectedly, actin was also found to be essential for efficient heptamerization of PA, but only when bound to one of its 2 receptors, TEM8, due to the active organization of TEM8 into actin-dependent domains. Endocytic pathways are highly modular systems. Here we identify some of the key players that allow efficient heptamerization of PA and subsequent ubiquitin-dependent, clathrin-mediated endocytosis of the anthrax toxin.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.ppat.1000792DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2832758PMC
March 2010