Publications by authors named "Benoit Ballester"

27 Publications

  • Page 1 of 1

A predictable conserved DNA base composition signature defines human core DNA replication origins.

Nat Commun 2020 09 21;11(1):4826. Epub 2020 Sep 21.

Institute of Human Genetics, CNRS - University of Montpellier, Montpellier, France.

DNA replication initiates from multiple genomic locations called replication origins. In metazoa, DNA sequence elements involved in origin specification remain elusive. Here, we examine pluripotent, primary, differentiating, and immortalized human cells, and demonstrate that a class of origins, termed core origins, is shared by different cell types and host ~80% of all DNA replication initiation events in any cell population. We detect a shared G-rich DNA sequence signature that coincides with most core origins in both human and mouse genomes. Transcription and G-rich elements can independently associate with replication origin activity. Computational algorithms show that core origins can be predicted, based solely on DNA sequence patterns but not on consensus motifs. Our results demonstrate that, despite an attributed stochasticity, core origins are chosen from a limited pool of genomic regions. Immortalization through oncogenic gene expression, but not normal cellular differentiation, results in increased stochastic firing from heterochromatin and decreased origin density at TAD borders.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-18527-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7506530PMC
September 2020

Author Correction: Involvement of G-quadruplex regions in mammalian replication origin activity.

Nat Commun 2020 06 11;11(1):3058. Epub 2020 Jun 11.

Institute of Human Genetics, CNRS-University of Montpellier, 141 rue de la Cardonille, 34396, Montpellier, France.

An amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-16122-xDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7289864PMC
June 2020

Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study.

Genome Biol 2020 05 11;21(1):114. Epub 2020 May 11.

School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland.

Background: Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets.

Results: Here we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in vitro and in vivo performance measures. However, for some in vivo experiments, the best performing PWM is assigned to an unrelated TF, indicating a binding mode involving protein-protein cooperativity.

Conclusions: In an all-against-all setting, we compute more than 18 million performance measure values for different PWM-experiment combinations and offer these results as a public resource to the research community. The benchmarking protocols are provided via a web interface and as docker images. The methods and results from this study may help others make better use of public TF specificity models, as well as public TF binding data sets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13059-020-01996-3DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7212583PMC
May 2020

JASPAR 2020: update of the open-access database of transcription factor binding profiles.

Nucleic Acids Res 2020 01;48(D1):D87-D92

Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway.

JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles. Moreover, we created a Q&A forum to ease the communication between the user community and JASPAR curators. Finally, we updated the genomic tracks, inference tool, and TF-binding profile similarity clusters. All the data is available through the JASPAR website, its associated RESTful API, and through the JASPAR2020 R/Bioconductor package.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz1001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7145627PMC
January 2020

ReMap 2020: a database of regulatory regions from an integrative analysis of Human and Arabidopsis DNA-binding sequencing experiments.

Nucleic Acids Res 2020 01;48(D1):D180-D188

Aix Marseille Univ, INSERM, TAGC, Marseille, France.

ReMap (http://remap.univ-amu.fr) aims to provide the largest catalogs of high-quality regulatory regions resulting from a large-scale integrative analysis of hundreds of transcription factors and regulators from DNA-binding experiments in Human and Arabidopsis (Arabidopsis thaliana). In this 2020 update of ReMap we have collected, analyzed and retained after quality control 2764 new human ChIP-seq and 208 ChIP-exo datasets available from public sources. The updated human atlas totalize 5798 datasets covering a total of 1135 transcriptional regulators (TRs) with a catalog of 165 million (M) peaks. This ReMap update comes with two unique Arabidopsis regulatory catalogs. First, a catalog of 372 Arabidopsis TRs across 2.6M peaks as a result of the integration of 509 ChIP-seq and DAP-seq datasets. Second, a catalog of 33 histone modifications and variants across 4.5M peaks from the integration of 286 ChIP-seq datasets. All catalogs are made available through track hubs at Ensembl and UCSC Genome Browsers. Additionally, this update comes with a new web framework providing an interactive user-interface, including improved search features. Finally, full programmatically access to the underlying data is available using a RESTful API together with a new R Shiny interface for a TRs binding enrichment analysis tool.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz945DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7145625PMC
January 2020

Involvement of G-quadruplex regions in mammalian replication origin activity.

Nat Commun 2019 07 22;10(1):3274. Epub 2019 Jul 22.

Institute of Human Genetics, CNRS-University of Montpellier, 141 rue de la Cardonille, 34396, Montpellier, France.

Genome-wide studies of DNA replication origins revealed that origins preferentially associate with an Origin G-rich Repeated Element (OGRE), potentially forming G-quadruplexes (G4). Here, we functionally address their requirements for DNA replication initiation in a series of independent approaches. Deletion of the OGRE/G4 sequence strongly decreased the corresponding origin activity. Conversely, the insertion of an OGRE/G4 element created a new replication origin. This element also promoted replication of episomal EBV vectors lacking the viral origin, but not if the OGRE/G4 sequence was deleted. A potent G4 ligand, PhenDC3, stabilized G4s but did not alter the global origin activity. However, a set of new, G4-associated origins was created, whereas suppressed origins were largely G4-free. In vitro Xenopus laevis replication systems showed that OGRE/G4 sequences are involved in the activation of DNA replication, but not in the pre-replication complex formation. Altogether, these results converge to the functional importance of OGRE/G4 elements in DNA replication initiation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-019-11104-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6646384PMC
July 2019

A map of direct TF-DNA interactions in the human genome.

Nucleic Acids Res 2019 Aug;47(14):7715

Centre for Molecular Medicine Norway (NCMM), University of Oslo, Oslo, Norway.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkz582DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6698730PMC
August 2019

A map of direct TF-DNA interactions in the human genome.

Nucleic Acids Res 2019 02;47(4):e21

Centre for Molecular Medicine Norway (NCMM), University of Oslo, Oslo, Norway.

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the most popular assay to identify genomic regions, called ChIP-seq peaks, that are bound in vivo by transcription factors (TFs). These regions are derived from direct TF-DNA interactions, indirect binding of the TF to the DNA (through a co-binding partner), nonspecific binding to the DNA, and noise/bias/artifacts. Delineating the bona fide direct TF-DNA interactions within the ChIP-seq peaks remains challenging. We developed a dedicated software, ChIP-eat, that combines computational TF binding models and ChIP-seq peaks to automatically predict direct TF-DNA interactions. Our work culminated with predicted interactions covering >4% of the human genome, obtained by uniformly processing 1983 ChIP-seq peak data sets from the ReMap database for 232 unique TFs. The predictions were a posteriori assessed using protein binding microarray and ChIP-exo data, and were predominantly found in high quality ChIP-seq peaks. The set of predicted direct TF-DNA interactions suggested that high-occupancy target regions are likely not derived from direct binding of the TFs to the DNA. Our predictions derived co-binding TFs supported by protein-protein interaction data and defined cis-regulatory modules enriched for disease- and trait-associated SNPs. We provide this collection of direct TF-DNA interactions and cis-regulatory modules through the UniBind web-interface (http://unibind.uio.no).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gky1210DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6393237PMC
February 2019

JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework.

Nucleic Acids Res 2018 01;46(D1):D260-D266

Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway.

JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx1126DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753243PMC
January 2018

ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments.

Nucleic Acids Res 2018 01;46(D1):D267-D275

INSERM, UMR1090 TAGC, Marseille F-13288, France.

With this latest release of ReMap (http://remap.cisreg.eu), we present a unique collection of regulatory regions in human, as a result of a large-scale integrative analysis of ChIP-seq experiments for hundreds of transcriptional regulators (TRs) such as transcription factors, transcriptional co-activators and chromatin regulators. In 2015, we introduced the ReMap database to capture the genome regulatory space by integrating public ChIP-seq datasets, covering 237 TRs across 13 million (M) peaks. In this release, we have extended this catalog to constitute a unique collection of regulatory regions. Specifically, we have collected, analyzed and retained after quality control a total of 2829 ChIP-seq datasets available from public sources, covering a total of 485 TRs with a catalog of 80M peaks. Additionally, the updated database includes new search features for TR names as well as aliases, including cell line names and the ability to navigate the data directly within genome browsers via public track hubs. Finally, full access to this catalog is available online together with a TR binding enrichment analysis tool. ReMap 2018 provides a significant update of the ReMap database, providing an in depth view of the complexity of the regulatory landscape in human.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkx1092DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753247PMC
January 2018

The chromatin environment shapes DNA replication origin organization and defines origin classes.

Genome Res 2015 Dec 11;25(12):1873-85. Epub 2015 Nov 11.

Institute of Human Genetics, CNRS, 34396 Montpellier, France;

To unveil the still-elusive nature of metazoan replication origins, we identified them genome-wide and at unprecedented high-resolution in mouse ES cells. This allowed initiation sites (IS) and initiation zones (IZ) to be differentiated. We then characterized their genetic signatures and organization and integrated these data with 43 chromatin marks and factors. Our results reveal that replication origins can be grouped into three main classes with distinct organization, chromatin environment, and sequence motifs. Class 1 contains relatively isolated, low-efficiency origins that are poor in epigenetic marks and are enriched in an asymmetric AC repeat at the initiation site. Late origins are mainly found in this class. Class 2 origins are particularly rich in enhancer elements. Class 3 origins are the most efficient and are associated with open chromatin and polycomb protein-enriched regions. The presence of Origin G-rich Repeated elements (OGRE) potentially forming G-quadruplexes (G4) was confirmed at most origins. These coincide with nucleosome-depleted regions located upstream of the initiation sites, which are associated with a labile nucleosome containing H3K64ac. These data demonstrate that specific chromatin landscapes and combinations of specific signatures regulate origin localization. They explain the frequently observed links between DNA replication and transcription. They also emphasize the plasticity of metazoan replication origins and suggest that in multicellular eukaryotes, the combination of distinct genetic features and chromatin configurations act in synergy to define and adapt the origin profile.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.192799.115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4665008PMC
December 2015

High-throughput and quantitative assessment of enhancer activity in mammals by CapStarr-seq.

Nat Commun 2015 Apr 15;6:6905. Epub 2015 Apr 15.

1] Inserm U1090, Technological Advances for Genomics and Clinics (TAGC), F-13009 Marseille, France [2] Aix-Marseille University UMR-S 1090, TAGC, F-13009 Marseille, France.

Cell-type specific regulation of gene expression requires the activation of promoters by distal genomic elements defined as enhancers. The identification and the characterization of enhancers are challenging in mammals due to their genome complexity. Here we develop CapStarr-Seq, a novel high-throughput strategy to quantitatively assess enhancer activity in mammals. This approach couples capture of regions of interest to previously developed Starr-seq technique. Extensive assessment of CapStarr-seq demonstrates accurate quantification of enhancer activity. Furthermore, we find that enhancer strength is associated with binding complexity of tissue-specific transcription factors and super-enhancers, while additive enhancer activity isolates key genes involved in cell identity and function. The CapStarr-Seq thus provides a fast and cost-effective approach to assess the activity of potential enhancers for a given cell type and will be helpful in decrypting transcription regulation mechanisms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms7905DOI Listing
April 2015

Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape.

Nucleic Acids Res 2015 Feb 3;43(4):e27. Epub 2014 Dec 3.

INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France

The large collections of ChIP-seq data rapidly accumulating in public data warehouses provide genome-wide binding site maps for hundreds of transcription factors (TFs). However, the extent of the regulatory occupancy space in the human genome has not yet been fully apprehended by integrating public ChIP-seq data sets and combining it with ENCODE TFs map. To enable genome-wide identification of regulatory elements we have collected, analysed and retained 395 available ChIP-seq data sets merged with ENCODE peaks covering a total of 237 TFs. This enhanced repertoire complements and refines current genome-wide occupancy maps by increasing the human genome regulatory search space by 14% compared to ENCODE alone, and also increases the complexity of the regulatory dictionary. As a direct application we used this unified binding repertoire to annotate variant enhancer loci (VELs) from H3K4me1 mark in two cancer cell lines (MCF-7, CRC) and observed enrichments of specific TFs involved in biological key functions to cancer development and proliferation. Those enrichments of TFs within VELs provide a direct annotation of non-coding regions detected in cancer genomes. Finally, full access to this catalogue is available online together with the TFs enrichment analysis tool (http://tagc.univ-mrs.fr/remap/).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gku1280DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4344487PMC
February 2015

Multi-species, multi-transcription factor binding highlights conserved control of tissue-specific biological pathways.

Elife 2014 Oct 3;3:e02626. Epub 2014 Oct 3.

Genetics and Genome Biology Program, SickKids Research Institute, Toronto, Canada.

As exome sequencing gives way to genome sequencing, the need to interpret the function of regulatory DNA becomes increasingly important. To test whether evolutionary conservation of cis-regulatory modules (CRMs) gives insight into human gene regulation, we determined transcription factor (TF) binding locations of four liver-essential TFs in liver tissue from human, macaque, mouse, rat, and dog. Approximately, two thirds of the TF-bound regions fell into CRMs. Less than half of the human CRMs were found as a CRM in the orthologous region of a second species. Shared CRMs were associated with liver pathways and disease loci identified by genome-wide association studies. Recurrent rare human disease causing mutations at the promoters of several blood coagulation and lipid metabolism genes were also identified within CRMs shared in multiple species. This suggests that multi-species analyses of experimentally determined combinatorial TF binding will help identify genomic regions critical for tissue-specific gene control.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.02626DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4359374PMC
October 2014

TAF4, a subunit of transcription factor II D, directs promoter occupancy of nuclear receptor HNF4A during post-natal hepatocyte differentiation.

Elife 2014 Sep 10;3:e03613. Epub 2014 Sep 10.

Department of Functional Genomics and Cancer, Institut de Genetique et de Biologie Moleculaire et Cellulaire, CNRS/INSERM/UDS, Illkirch, France.

The functions of the TAF subunits of mammalian TFIID in physiological processes remain poorly characterised. In this study, we describe a novel function of TAFs in directing genomic occupancy of a transcriptional activator. Using liver-specific inactivation in mice, we show that the TAF4 subunit of TFIID is required for post-natal hepatocyte maturation. TAF4 promotes pre-initiation complex (PIC) formation at post-natal expressed liver function genes and down-regulates a subset of embryonic expressed genes by increased RNA polymerase II pausing. The TAF4-TAF12 heterodimer interacts directly with HNF4A and in vivo TAF4 is necessary to maintain HNF4A-directed embryonic gene expression at post-natal stages and promotes HNF4A occupancy of functional cis-regulatory elements adjacent to the transcription start sites of post-natal expressed genes. Stable HNF4A occupancy of these regulatory elements requires TAF4-dependent PIC formation highlighting that these are mutually dependent events. Local promoter-proximal HNF4A-TFIID interactions therefore act as instructive signals for post-natal hepatocyte differentiation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.03613DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4359380PMC
September 2014

A CpG mutational hotspot in a ONECUT binding site accounts for the prevalent variant of hemophilia B Leyden.

Am J Hum Genet 2013 Mar;92(3):460-7

School of Biotechnology and Biomolecular Sciences, University of New South Wales, Kensington, NSW 2052, Australia.

Hemophilia B, or the "royal disease," arises from mutations in coagulation factor IX (F9). Mutations within the F9 promoter are associated with a remarkable hemophilia B subtype, termed hemophilia B Leyden, in which symptoms ameliorate after puberty. Mutations at the -5/-6 site (nucleotides -5 and -6 relative to the transcription start site, designated +1) account for the majority of Leyden cases and have been postulated to disrupt the binding of a transcriptional activator, the identity of which has remained elusive for more than 20 years. Here, we show that ONECUT transcription factors (ONECUT1 and ONECUT2) bind to the -5/-6 site. The various hemophilia B Leyden mutations that have been reported in this site inhibit ONECUT binding to varying degrees, which correlate well with their associated clinical severities. In addition, expression of F9 is crucially dependent on ONECUT factors in vivo, and as such, mice deficient in ONECUT1, ONECUT2, or both exhibit depleted levels of F9. Taken together, our findings establish ONECUT transcription factors as the missing hemophilia B Leyden regulators that operate through the -5/-6 site.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ajhg.2013.02.003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591849PMC
March 2013

Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages.

Cell 2012 Jan 12;148(1-2):335-48. Epub 2012 Jan 12.

Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.

CTCF-binding locations represent regulatory sequences that are highly constrained over the course of evolution. To gain insight into how these DNA elements are conserved and spread through the genome, we defined the full spectrum of CTCF-binding sites, including a 33/34-mer motif, and identified over five thousand highly conserved, robust, and tissue-independent CTCF-binding locations by comparing ChIP-seq data from six mammals. Our data indicate that activation of retroelements has produced species-specific expansions of CTCF binding in rodents, dogs, and opossum, which often functionally serve as chromatin and transcriptional insulators. We discovered fossilized repeat elements flanking deeply conserved CTCF-binding regions, indicating that similar retrotransposon expansions occurred hundreds of millions of years ago. Repeat-driven dispersal of CTCF binding is a fundamental, ancient, and still highly active mechanism of genome evolution in mammalian lineages.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cell.2011.11.058DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3368268PMC
January 2012

Identification of proteomic signatures of mantle cell lymphoma, small lymphocytic lymphoma, and marginal zone lymphoma biopsies by surface enhanced laser desorption/ionization-time of flight mass spectrometry.

Leuk Lymphoma 2011 Apr 11;52(4):648-58. Epub 2011 Jan 11.

INSERM U836, Equipe 7 Université Joseph Fourier, Grenoble, France.

Mantle cell lymphoma (MCL), small lymphocytic lymphoma (SLL), and marginal zone lymphoma (MZL) are small B-cell non-Hodgkin lymphomas (NHLs) that may be difficult to distinguish. In order to identify specific proteomic biomarkers, differential proteomic analysis of these three NHLs was performed using surface enhanced laser desorption/ionization-time of flight mass spectrometry (SELDI-TOF-MS). Whole cell lysates obtained from 18 MCL, 20 SLL, and 20 MZL biopsies were applied on two different ProteinChips (cationic and anionic). Hierarchical clustering and discriminating scores combined with an innovative bio-informatics microdissection strategy allowed us to distinguish specific lymphoma proteomic signatures based on the expression of 37 protein peaks. SELDI-assisted protein purification combined with nano-liquid chromatography (LC) quadrupole-time of flight tandem mass spectrometry (Q-TOF MS/MS) was used to identify proteins overexpressed in both MCL and SLL tumors. Among them two histones, H2B and H4, were identified in MCL tumor biopsies and the signal recognition particle 9 kDa protein, SRP9, in SLL tumor biopsies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3109/10428194.2010.549256DOI Listing
April 2011

Consistent annotation of gene expression arrays.

BMC Genomics 2010 May 11;11:294. Epub 2010 May 11.

European Bioinformatics Institute EMBL, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Background: Gene expression arrays are valuable and widely used tools for biomedical research. Today's commercial arrays attempt to measure the expression level of all of the genes in the genome. Effectively translating the results from the microarray into a biological interpretation requires an accurate mapping between the probesets on the array and the genes that they are targeting. Although major array manufacturers provide annotations of their gene expression arrays, the methods used by various manufacturers are different and the annotations are difficult to keep up to date in the rapidly changing world of biological sequence databases.

Results: We have created a consistent microarray annotation protocol applicable to all of the major array manufacturers. We constantly keep our annotations updated with the latest Ensembl Gene predictions, and thus cross-referenced with a large number of external biomedical sequence database identifiers. We show that these annotations are accurate and address in detail reasons for the minority of probesets that cannot be annotated. Annotations are publicly accessible through the Ensembl Genome Browser and programmatically through the Ensembl Application Programming Interface. They are also seamlessly integrated into the BioMart data-mining tool and the biomaRt package of BioConductor.

Conclusions: Consistent, accurate and updated gene expression array annotations remain critical for biological research. Our annotations facilitate accurate biological interpretation of gene expression profiles.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-11-294DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2894801PMC
May 2010

Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding.

Science 2010 May 8;328(5981):1036-40. Epub 2010 Apr 8.

Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.

Transcription factors (TFs) direct gene expression by binding to DNA regulatory regions. To explore the evolution of gene regulation, we used chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) to determine experimentally the genome-wide occupancy of two TFs, CCAAT/enhancer-binding protein alpha and hepatocyte nuclear factor 4 alpha, in the livers of five vertebrates. Although each TF displays highly conserved DNA binding preferences, most binding is species-specific, and aligned binding events present in all five species are rare. Regions near genes with expression levels that are dependent on a TF are often bound by the TF in multiple species yet show no enhanced DNA sequence constraint. Binding divergence between species can be largely explained by sequence changes to the bound motifs. Among the binding events lost in one lineage, only half are recovered by another binding event within 10 kilobases. Our results reveal large interspecies differences in transcriptional regulation and provide insight into regulatory evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1186176DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3008766PMC
May 2010

Ensembl's 10th year.

Nucleic Acids Res 2010 Jan 11;38(Database issue):D557-62. Epub 2009 Nov 11.

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkp972DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808936PMC
January 2010

BioMart Central Portal--unified access to biological data.

Nucleic Acids Res 2009 Jul 6;37(Web Server issue):W23-7. Epub 2009 May 6.

EMBL-European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK.

BioMart Central Portal (www.biomart.org) offers a one-stop shop solution to access a wide array of biological databases. These include major biomolecular sequence, pathway and annotation databases such as Ensembl, Uniprot, Reactome, HGNC, Wormbase and PRIDE; for a complete list, visit, http://www.biomart.org/biomart/martview. Moreover, the web server features seamless data federation making cross querying of these data sources in a user friendly and unified way. The web server not only provides access through a web interface (MartView), it also supports programmatic access through a Perl API as well as RESTful and SOAP oriented web services. The website is free and open to all users and there is no login requirement.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkp265DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2703988PMC
July 2009

Strand selective generation of endo-siRNAs from the Na/phosphate transporter gene Slc34a1 in murine tissues.

Nucleic Acids Res 2009 Apr 23;37(7):2274-82. Epub 2009 Feb 23.

Institute for Cell and Molecular Biosciences, Newcastle University, Framlington Place, Newcastle, UK.

Natural antisense transcripts (NATs) are important regulators of gene expression. Recently, a link between antisense transcription and the formation of endo-siRNAs has emerged. We investigated the bi-directionally transcribed Na/phosphate cotransporter gene (Slc34a1) under the aspect of endo-siRNA processing. Mouse Slc34a1 produces an antisense transcript that represents an alternative splice product of the Pfn3 gene located downstream of Slc34a1. The antisense transcript is prominently found in testis and in kidney. Co-expression of in vitro synthesized sense/antisense transcripts in Xenopus oocytes indicated processing of the overlapping transcripts into endo-siRNAs in the nucleus. Truncation experiments revealed that an overlap of at least 29 base-pairs is required to induce processing. We detected endo-siRNAs in mouse tissues that co express Slc34a1 sense/antisense transcripts by northern blotting. The orientation of endo-siRNAs was tissue specific in mouse kidney and testis. In kidney where the Na/phosphate cotransporter fulfils its physiological function endo-siRNAs complementary to the NAT were detected, in testis both orientations were found. Considering the wide spread expression of NATs and the gene silencing potential of endo-siRNAs we hypothesized a genome-wide link between antisense transcription and monoallelic expression. Significant correlation between random imprinting and antisense transcription could indeed be established. Our findings suggest a novel, more general role for NATs in gene regulation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkp088DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673434PMC
April 2009

BioMart--biological queries made easy.

BMC Genomics 2009 Jan 14;10:22. Epub 2009 Jan 14.

Ontario Institute for Cancer Research, MaRS Centre, 101 College Street, Toronto, Ontario, Canada.

Background: Biologists need to perform complex queries, often across a variety of databases. Typically, each data resource provides an advanced query interface, each of which must be learnt by the biologist before they can begin to query them. Frequently, more than one data source is required and for high-throughput analysis, cutting and pasting results between websites is certainly very time consuming. Therefore, many groups rely on local bioinformatics support to process queries by accessing the resource's programmatic interfaces if they exist. This is not an efficient solution in terms of cost and time. Instead, it would be better if the biologist only had to learn one generic interface. BioMart provides such a solution.

Results: BioMart enables scientists to perform advanced querying of biological data sources through a single web interface. The power of the system comes from integrated querying of data sources regardless of their geographical locations. Once these queries have been defined, they may be automated with its "scripting at the click of a button" functionality. BioMart's capabilities are extended by integration with several widely used software packages such as BioConductor, DAS, Galaxy, Cytoscape, Taverna. In this paper, we describe all aspects of BioMart from a user's perspective and demonstrate how it can be used to solve real biological use cases such as SNP selection for candidate gene screening or annotation of microarray results.

Conclusion: BioMart is an easy to use, generic and scalable system and therefore, has become an integral part of large data resources including Ensembl, UniProt, HapMap, Wormbase, Gramene, Dictybase, PRIDE, MSD and Reactome. BioMart is freely accessible to use at http://www.biomart.org.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-10-22DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2649164PMC
January 2009

Gene profiling reveals specific oncogenic mechanisms and signaling pathways in oncocytic and papillary thyroid carcinoma.

Oncogene 2005 Jun;24(25):4155-61

INSERM EMI-U 0018, Laboratoire de Biochimie et Biologie Moléculaire, CHU, 4 rue Larrey, Angers F-49033, France.

The oncogenic pathways in mitochondrial-rich thyroid carcinomas are not clearly understood. To investigate the possible implication of mitochondrial abundance in the genesis of thyroid tumors, we have explored the gene expression profile of six oncocytic carcinomas and six mitochondrial-rich papillary carcinomas using cDNA-microarray technology. A supervised approach allowed us to identify 83 genes differentially expressed in the two types of carcinoma. These genes were classified according to their ontologic profiles. Three genes, NOS3, alpha-actinin-2 and alpha-catenin, suspected of playing a role in tumor genesis, were explored by quantitative RT-PCR analysis and immunohistochemistry. Of the 59 genes overexpressed in papillary carcinomas, 51% were involved in cell communication. Of the 24 genes overexpressed in oncocytic carcinomas, 84% were involved in mitochondrial and cellular metabolism. Our results suggest that mitochondrial respiratory chain complexes III and IV play a significant role in the regulation of reactive oxygen species production by oncocytic tumors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/sj.onc.1208578DOI Listing
June 2005

The Sgp3 locus on mouse chromosome 13 regulates nephritogenic gp70 autoantigen expression and predisposes to autoimmunity.

J Immunol 2003 Oct;171(7):3872-7

Institut National de la Santé et de la Recherche Médicale Unité 399, Faculté de Médecine, Marseille, France.

By interval mapping of a backcross progeny between New Zealand White (NZW) and C57BL/6 (B6) mice bearing the Y chromosome-linked autoimmune acceleration gene Yaa, we previously identified a genetic locus on mid-chromosome 13, here designated as Sgp3, showing a major effect on the expression of a nephritogenic autoantigen, gp70. In this study, the NZW-derived Sgp3 region was transferred by backcross procedure and marker-assisted selection on the B6 background to produce three independent congenic strains B6.NZW-Sgp3/1, -Sgp3/2, and -Sgp3/3. We show that NZW homozygosity at a single 3 centiMorgans ( approximately 12 megabases (Mb)) interval between markers D13Mit142 and D13Mit254 mediates increased basal serum levels of gp70 in B6.NZW-Sgp3/1 and B6.NZW-Sgp3/2 mice and with a higher degree in males ( approximately 15 micro g/ml) than in females ( approximately 9 micro g/ml) as compared with B6 ( approximately 2 micro g/ml), revealing a gender effect. However, their gp70 levels are still lower than that of NZW mice ( approximately 60 micro g/ml). In addition, B6.NZW-Sgp3/1 and B6.NZW-Sgp3/2 mice showed a moderate 2- to 3-fold increase in serum gp70 in response to LPS, which contrasted with over a 10-fold increase in NZW mice. Although both B6.NZW-Sgp3/1 and B6.NZW-Sgp3/2 mice failed to produce significant amounts of gp70 anti-gp70 immune complexes, unexpectedly, aged B6.NZW-Sgp3/2 congenic males bearing the Yaa gene developed increased titers of IgG autoantibodies to DNA and chromatin. Our data indicate that Sgp3 is involved in a complex process of gp70 production under polygenic control and may provide a significant contribution to lupus susceptibility not only through up-regulation of gp70 autoantigen production but also predisposition to autoimmunity.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.4049/jimmunol.171.7.3872DOI Listing
October 2003