Publications by authors named "Claude Thermes"

42 Publications

Contrasting Gene Decay in Subterranean Vertebrates: Insights from Cavefishes and Fossorial Mammals.

Mol Biol Evol 2021 01;38(2):589-605

CNRS, IRD, UMR Évolution, Génomes, Comportement et Écologie, Université Paris-Saclay, Gif-sur-Yvette, France.

Evolution sometimes proceeds by loss, especially when structures and genes become dispensable after an environmental shift relaxes functional constraints. Subterranean vertebrates are outstanding models to analyze this process, and gene decay can serve as a readout. We sought to understand some general principles on the extent and tempo of the decay of genes involved in vision, circadian clock, and pigmentation in cavefishes. The analysis of the genomes of two Cuban species belonging to the genus Lucifuga provided evidence for the largest loss of eye-specific genes and nonvisual opsin genes reported so far in cavefishes. Comparisons with a recently evolved cave population of Astyanax mexicanus and three species belonging to the Chinese tetraploid genus Sinocyclocheilus revealed the combined effects of the level of eye regression, time, and genome ploidy on eye-specific gene pseudogenization. The limited extent of gene decay in all these cavefishes and the very small number of loss-of-function mutations per pseudogene suggest that their eye degeneration may not be very ancient, ranging from early to late Pleistocene. This is in sharp contrast with the identification of several vision genes carrying many loss-of-function mutations in ancient fossorial mammals, further suggesting that blind fishes cannot thrive more than a few million years in cave ecosystems.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msaa249DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7826195PMC
January 2021

Transcription-mediated organization of the replication initiation program across large genes sets common fragile sites genome-wide.

Nat Commun 2019 12 13;10(1):5693. Epub 2019 Dec 13.

Curie Institute, PSL Research University, CNRS UMR 3244, F-75005, Paris, France.

Common fragile sites (CFSs) are chromosome regions prone to breakage upon replication stress known to drive chromosome rearrangements during oncogenesis. Most CFSs nest in large expressed genes, suggesting that transcription could elicit their instability; however, the underlying mechanisms remain elusive. Genome-wide replication timing analyses here show that stress-induced delayed/under-replication is the hallmark of CFSs. Extensive genome-wide analyses of nascent transcripts, replication origin positioning and fork directionality reveal that 80% of CFSs nest in large transcribed domains poor in initiation events, replicated by long-travelling forks. Forks that travel long in late S phase explains CFS replication features, whereas formation of sequence-dependent fork barriers or head-on transcription-replication conflicts do not. We further show that transcription inhibition during S phase, which suppresses transcription-replication encounters and prevents origin resetting, could not rescue CFS stability. Altogether, our results show that transcription-dependent suppression of initiation events delays replication of large gene bodies, committing them to instability.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-019-13674-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6911102PMC
December 2019

Improving Small RNA-seq: Less Bias and Better Detection of 2'-O-Methyl RNAs.

J Vis Exp 2019 09 16(151). Epub 2019 Sep 16.

Institute for Integrative Biology of the Cell, UMR9198, CNRS CEA Univ Paris-Sud, Université Paris-Saclay.

The study of small RNAs (sRNAs) by next-generation sequencing (NGS) is challenged by bias issues during library preparation. Several types of sRNA such as plant microRNAs (miRNAs) carry a 2'-O-methyl (2'-OMe) modification at their 3' terminal nucleotide. This modification adds another difficulty as it inhibits 3' adapter ligation. We previously demonstrated that modified versions of the 'TruSeq (TS)' protocol have less bias and an improved detection of 2'-OMe RNAs. Here we describe in detail protocol 'TS5', which showed the best overall performance. TS5 can be followed either using homemade reagents or reagents from the TS kit, with equal performance.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3791/60056DOI Listing
September 2019

The Third Revolution in Sequencing Technology.

Trends Genet 2018 09 22;34(9):666-681. Epub 2018 Jun 22.

Institute for Integrative Biology of the Cell, UMR9198, CNRS CEA Université Paris-Sud, Université Paris-Saclay, 9198 Gif sur Yvette Cedex, France.

Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time. A second revolution came when next-generation sequencing (NGS) technologies appeared, which made genome sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality. Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology. Here we review and compare the various long-read methods. We discuss their applications and their respective strengths and weaknesses and provide future perspectives.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.tig.2018.05.008DOI Listing
September 2018

The evolution of the temporal program of genome replication.

Nat Commun 2018 06 6;9(1):2199. Epub 2018 Jun 6.

Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France.

Genome replication is highly regulated in time and space, but the rules governing the remodeling of these programs during evolution remain largely unknown. We generated genome-wide replication timing profiles for ten Lachancea yeasts, covering a continuous evolutionary range from closely related to more divergent species. We show that replication programs primarily evolve through a highly dynamic evolutionary renewal of the cohort of active replication origins. We found that gained origins appear with low activity yet become more efficient and fire earlier as they evolutionarily age. By contrast, origins that are lost comprise the complete range of firing strength. Additionally, they preferentially occur in close vicinity to strong origins. Interestingly, despite high evolutionary turnover, active replication origins remain regularly spaced along chromosomes in all species, suggesting that origin distribution is optimized to limit large inter-origin intervals. We propose a model on the evolutionary birth, death, and conservation of active replication origins.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-04628-4DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5989221PMC
June 2018

Complete Sequence of the Intronless Mitochondrial Genome of the Saccharomyces cerevisiae Strain CW252.

Genome Announc 2018 Apr 26;6(17). Epub 2018 Apr 26.

I2BC Next-Generation Sequencing Facility, Institut de Biologie Intégrative de la Cellule, UMR9198, CNRS CEA Université Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, France.

The mitochondrial genomes of strains contain up to 13 introns. An intronless recombinant genome introduced into the nuclear background of strain W303 gave the CW252 strain, which is used to model mitochondrial respiratory pathologies. The complete sequence of this mitochondrial genome was obtained using a hybrid assembling methodology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1128/genomeA.00219-18DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5920167PMC
April 2018

Evidence for late Pleistocene origin of Astyanax mexicanus cavefish.

BMC Evol Biol 2018 04 18;18(1):43. Epub 2018 Apr 18.

Évolution, Génomes, Comportement, Écologie, CNRS, IRD, Univ Paris-Sud. Université Paris-Saclay, F-91198, Gif-sur-Yvette, France.

Background: Cavefish populations belonging to the Mexican tetra species Astyanax mexicanus are outstanding models to study the tempo and mode of adaptation to a radical environmental change. They are currently assigned to two main groups, the so-called "old" and "new" lineages, which would have populated several caves independently and at different times. However, we do not have yet accurate estimations of the time frames of evolution of these populations.

Results: We reanalyzed the geographic distribution of mitochondrial and nuclear DNA polymorphisms and we found that these data do not support the existence of two cavefish lineages. Using IMa2, a program that allows dating population divergence in addition to demographic parameters, we found that microsatellite polymorphism strongly supports a very recent origin of cave populations (< 20,000 years). We identified a large number of single-nucleotide polymorphisms (SNPs) in transcript sequences of pools of embryos (Pool-seq) belonging to Pachón cave population and a surface population from Texas. Based on summary statistics that can be computed with this SNP data set together with simulations of evolution of SNP polymorphisms in two recently isolated populations, we looked for sets of demographic parameters that allow the computation of summary statistics with simulated populations that are similar to the ones with the sampled populations. In most simulations for which we could find a good fit between the summary statistics of observed and simulated data, the best fit occurred when the divergence between simulated populations was less than 30,000 years.

Conclusions: Although it is often assumed that some cave populations have a very ancient origin, a recent origin of these populations is strongly supported by our analyses of independent sets of nuclear DNA polymorphism. Moreover, the observation of two divergent haplogroups of mitochondrial and nuclear genes with different geographic distributions support a recent admixture of two divergent surface populations, before the isolation of cave populations. If cave populations are indeed only several thousand years old, many phenotypic changes observed in cavefish would thus have mainly involved the fixation of genetic variants present in surface fish populations and within a very short period of time.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12862-018-1156-7DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5905186PMC
April 2018

Systematic comparison of small RNA library preparation protocols for next-generation sequencing.

BMC Genomics 2018 02 5;19(1):118. Epub 2018 Feb 5.

Institute for Integrative Biology of the Cell, UMR9198, CNRS CEA Univ Paris-Sud, Université Paris-Saclay, 9198, Gif sur Yvette Cedex, France.

Background: Next-generation sequencing technologies have revolutionized the study of small RNAs (sRNAs) on a genome-wide scale. However, classical sRNA library preparation methods introduce serious bias, mainly during adapter ligation steps. Several types of sRNA including plant microRNAs (miRNA), piwi-interacting RNAs (piRNA) in insects, nematodes and mammals, and small interfering RNAs (siRNA) in insects and plants contain a 2'-O-methyl (2'-OMe) modification at their 3' terminal nucleotide. This inhibits 3' adapter ligation and makes library preparation particularly challenging. To reduce bias, the NEBNext kit (New England Biolabs) uses polyethylene glycol (PEG), the NEXTflex V2 kit (BIOO Scientific) uses both randomised adapters and PEG, and the novel SMARTer (Clontech) and CATS (Diagenode) kits avoid ligation altogether. Here we compared these methods with Illumina's classical TruSeq protocol regarding the detection of normal and 2' OMe RNAs. In addition, we modified the TruSeq and NEXTflex protocols to identify conditions that improve performance.

Results: Among the five kits tested with their respective standard protocols, the SMARTer and CATS kits had the lowest levels of bias but also had a strong formation of side products, and as a result performed relatively poorly with biological samples; NEXTflex detected the largest numbers of different miRNAs. The use of a novel type of randomised adapters called MidRand-Like (MRL) adapters and PEG improved the detection of 2' OMe RNAs both in the TruSeq as well as in the NEXTflex protocol.

Conclusions: While it is commonly accepted that biases in sRNA library preparation protocols are mainly due to adapter ligation steps, the ligation-free protocols were not the best performing methods. Our modified versions of the TruSeq and NEXTflex protocols provide an improved tool for the study of 2' OMe RNAs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12864-018-4491-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5799908PMC
February 2018

Postembryonic Fish Brain Proliferation Zones Exhibit Neuroepithelial-Type Gene Expression Profile.

Stem Cells 2017 06 14;35(6):1505-1518. Epub 2017 Mar 14.

INRA CASBAH Group, Neuro-PSI, UMR 9197, CNRS, Gif-sur-Yvette, France.

In mammals, neuroepithelial cells play an essential role in embryonic neurogenesis, whereas glial stem cells are the principal source of neurons at postembryonic stages. By contrast, neuroepithelial-like stem/progenitor (NE) cells have been shown to be present throughout life in teleosts. We used three-dimensional (3D) reconstructions of cleared transgenic wdr12:GFP medaka brains to demonstrate that this cell type is widespread in juvenile and to identify new regions containing NE cells. We established the gene expression profile of optic tectum (OT) NE cells by cell sorting followed by RNA-seq. Our results demonstrate that most OT NE cells are indeed active stem cells and that some of them exhibit long G2 phases. We identified several novel pathways (e.g., DNA repair pathways) potentially involved in NE cell homeostasis. In situ hybridization studies showed that all NE populations in the postembryonic medaka brain have a similar molecular signature. Our findings highlight the importance of NE progenitors in medaka and improve our understanding of NE-cell biology. These cells are potentially useful not only for neural stem cell studies but also for improving the characterization of neurodevelopmental diseases, such as microcephaly. Stem Cells 2017;35:1505-1518.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/stem.2588DOI Listing
June 2017

Replication landscape of the human genome.

Nat Commun 2016 Jan 11;7:10208. Epub 2016 Jan 11.

Ecole Normale Supérieure, Institut de Biologie de l'ENS (IBENS), and Inserm U1024, and CNRS UMR 8197, 46 rue d'Ulm, Paris F-75005, France.

Despite intense investigation, human replication origins and termini remain elusive. Existing data have shown strong discrepancies. Here we sequenced highly purified Okazaki fragments from two cell types and, for the first time, quantitated replication fork directionality and delineated initiation and termination zones genome-wide. Replication initiates stochastically, primarily within non-transcribed, broad (up to 150 kb) zones that often abut transcribed genes, and terminates dispersively between them. Replication fork progression is significantly co-oriented with the transcription. Initiation and termination zones are frequently contiguous, sometimes separated by regions of unidirectional replication. Initiation zones are enriched in open chromatin and enhancer marks, even when not flanked by genes, and often border 'topologically associating domains' (TADs). Initiation zones are enriched in origin recognition complex (ORC)-binding sites and better align to origins previously mapped using bubble-trap than λ-exonuclease. This novel panorama of replication reveals how chromatin and transcription modulate the initiation process to create cell-type-specific replication programs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms10208DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4729899PMC
January 2016

Nucleoid organization in the radioresistant bacterium Deinococcus radiodurans.

Mol Microbiol 2015 Aug 25;97(4):759-74. Epub 2015 Jun 25.

Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris Sud, Bâtiment 409, Orsay, 91405, France.

Processes favoring the exceptional resistance to genotoxic stress of Deinococcus radiodurans are not yet completely characterized. It was postulated that its nucleoid and chromosome(s) organization could participate in the DNA double strand break repair process. Here, we investigated the organization of chromosome 1 by localization of three chromosomal loci including oriC, Ter and a locus located in its left arm. For this purpose, we used a ParB-parS system to visualize the position of the loci before and after exposure to γ-rays. By comparing the number of fluorescent foci with the number of copies of the studied loci present in the cells measured by quantitative polymerase chain reaction (qPCR), we demonstrated that the 4-10 copies of chromosome 1 per cell are dispersed within the nucleoid before irradiation, indicating that the chromosome copies are not prealigned. Chromosome segregation is progressive but not co-ordinated, allowing each locus to be paired with its sister during part of the cell cycle. After irradiation, the nucleoid organization is modified, involving a transient alignment of the loci in the late stage of DNA repair and a delay of segregation of the Ter locus. We discuss how these events can influence DNA double strand break repair.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/mmi.13064DOI Listing
August 2015

Large replication skew domains delimit GC-poor gene deserts in human.

Comput Biol Chem 2014 Dec 27;53 Pt A:153-65. Epub 2014 Aug 27.

Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France. Electronic address:

Besides their large-scale organization in isochores, mammalian genomes display megabase-sized regions, spanning both genes and intergenes, where the strand nucleotide composition asymmetry decreases linearly, possibly due to replication activity. These so-called skew-N domains cover about a third of the human genome and are bordered by two skew upward jumps that were hypothesized to compose a subset of "master" replication origins active in the germline. Skew-N domains were shown to exhibit a particular gene organization. Genes with CpG-rich promoters likely expressed in the germline are over represented near the master replication origins, with large genes being co-oriented with replication fork progression, which suggests some coordination of replication and transcription. In this study, we describe another skew structure that covers ∼13% of the human genome and that is bordered by putative master replication origins similar to the ones flanking skew-N domains. These skew-split-N domains have a shape reminiscent of a N, but split in half, leaving in the center a region of null skew whose length increases with domain size. These central regions (median size ∼860 kb) have a homogeneous composition, i.e. both a null and constant skew and a constant and low GC content. They correspond to heterochromatin gene deserts found in low-GC isochores with an average gene density of 0.81 promoters/Mb as compared to 7.73 promoters/Mb genome wide. The analysis of epigenetic marks and replication timing data confirms that, in these late replicating heterochomatic regions, the initiation of replication is likely to be random. This contrasts with the transcriptionally active euchromatin state found around the bordering well positioned master replication origins. Altogether skew-N domains and skew-split-N domains cover about 50% of the human genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiolchem.2014.08.020DOI Listing
December 2014

Ten years of next-generation sequencing technology.

Trends Genet 2014 Sep 6;30(9):418-26. Epub 2014 Aug 6.

Centre de Génétique Moléculaire - CNRS, Avenue de la Terrasse, 91198 Gif sur Yvette, France.

Ten years ago next-generation sequencing (NGS) technologies appeared on the market. During the past decade, tremendous progress has been made in terms of speed, read length, and throughput, along with a sharp reduction in per-base cost. Together, these advances democratized NGS and paved the way for the development of a large number of novel NGS applications in basic science as well as in translational research areas such as clinical diagnostics, agrigenomics, and forensic science. Here we provide an overview of the evolution of NGS and discuss the most significant improvements in sequencing technologies and library preparation protocols. We also explore the current landscape of NGS applications and provide a perspective for future developments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.tig.2014.07.001DOI Listing
September 2014

CIRCUS: a package for Circos display of structural genome variations from paired-end and mate-pair sequencing data.

BMC Bioinformatics 2014 Jun 18;15:198. Epub 2014 Jun 18.

Plateforme Intégrée IMAGIF - CNRS, Avenue de la Terrasse, Gif sur Yvette 91198, France.

Background: Detection of large genomic rearrangements, such as large indels, duplications or translocations is now commonly achieved by next generation sequencing (NGS) approaches. Recently, several tools have been developed to analyze NGS data but the resulting files are difficult to interpret without an additional visualization step. Circos (Genome Res, 19:1639-1645, 2009), a Perl script, is a powerful visualization software that requires setting up numerous configuration files with a large number of parameters to handle. R packages like RCircos (BMC Bioinformatics, 14:244, 2013) or ggbio (Genome Biol, 13:R77, 2012) provide functions to display genomic data as circular Circos-like plots. However, these tools are very general and lack the functions needed to filter, format and adjust specific input genomic data.

Results: We implemented an R package called CIRCUS to analyze genomic structural variations. It generates both data and configuration files necessary for Circos, to produce graphs. Only few R pre-requisites are necessary. Options are available to deal with heterogeneous data, various chromosome numbers and multi-scale analysis.

Conclusion: CIRCUS allows fast and versatile analysis of genomic structural variants with Circos plots for users with limited coding skills.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2105-15-198DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4071023PMC
June 2014

Library preparation methods for next-generation sequencing: tone down the bias.

Exp Cell Res 2014 Mar 15;322(1):12-20. Epub 2014 Jan 15.

Centre de Génétique Moléculaire - CNRS, Avenue de la Terrasse, 91198 Gif sur Yvette, France.

Next-generation sequencing (NGS) has caused a revolution in biology. NGS requires the preparation of libraries in which (fragments of) DNA or RNA molecules are fused with adapters followed by PCR amplification and sequencing. It is evident that robust library preparation methods that produce a representative, non-biased source of nucleic acid material from the genome under investigation are of crucial importance. Nevertheless, it has become clear that NGS libraries for all types of applications contain biases that compromise the quality of NGS datasets and can lead to their erroneous interpretation. A detailed knowledge of the nature of these biases will be essential for a careful interpretation of NGS data on the one hand and will help to find ways to improve library quality or to develop bioinformatics tools to compensate for the bias on the other hand. In this review we discuss the literature on bias in the most common NGS library preparation protocols, both for DNA sequencing (DNA-seq) as well as for RNA sequencing (RNA-seq). Strikingly, almost all steps of the various protocols have been reported to introduce bias, especially in the case of RNA-seq, which is technically more challenging than DNA-seq. For each type of bias we discuss methods for improvement with a view to providing some useful advice to the researcher who wishes to convert any kind of raw nucleic acid into an NGS library.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.yexcr.2014.01.008DOI Listing
March 2014

From simple bacterial and archaeal replicons to replication N/U-domains.

J Mol Biol 2013 Nov 3;425(23):4673-89. Epub 2013 Oct 3.

Ecole Normale Supérieure, IBENS UMR8197 U1024, Paris 75005, France. Electronic address:

The Replicon Theory proposed 50 years ago has proven to apply for replicons of the three domains of life. Here, we review our knowledge of genome organization into single and multiple replicons in bacteria, archaea and eukarya. Bacterial and archaeal replicator/initiator systems are quite specific and efficient, whereas eukaryotic replicons show degenerate specificity and efficiency, allowing for complex regulation of origin firing time. We expand on recent evidence that ~50% of the human genome is organized as ~1,500 megabase-sized replication domains with a characteristic parabolic (U-shaped) replication timing profile and linear (N-shaped) gradient of replication fork polarity. These N/U-domains correspond to self-interacting segments of the chromatin fiber bordered by open chromatin zones and replicate by cascades of origin firing initiating at their borders and propagating to their center, possibly by fork-stimulated initiation. The conserved occurrence of this replication pattern in the germline of mammals has resulted over evolutionary times in the formation of megabase-sized domains with an N-shaped nucleotide compositional skew profile due to replication-associated mutational asymmetries. Overall, these results reveal an evolutionarily conserved but developmentally plastic organization of replication that is driving mammalian genome evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jmb.2013.09.021DOI Listing
November 2013

Zinc-mediated RNA fragmentation allows robust transcript reassembly upon whole transcriptome RNA-Seq.

Methods 2013 Sep 21;63(1):25-31. Epub 2013 Mar 21.

ncRNA, epigenetic and genome fluidity, Institut Curie, Centre de Recherche, CNRS UMR 3244, Université Pierre et Marie Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, France.

Whole transcriptome RNA-Seq has emerged as a powerful tool in transcriptomics, enabling genome-wide quantitative analysis of gene expression and qualitative identification of novel coding or non-coding RNA species through transcriptome reassembly. Common protocols for preparation of RNA-Seq libraries include an RNA fragmentation step for which several RNA sizing techniques are commercially available. To date, there is no global information about their putative bias on transcriptome analysis. Here we compared the effects of RNase III- and zinc-mediated RNA fragmentation on transcript expression measurement and transcriptome reassembly in the budding yeast Saccharomyces cerevisiae. We observed that RNA cleavage by RNase III is heterogeneous along transcripts with a striking decrease of autocorrelation between adjacent nucleotides along the transcriptome. This had little impact on mRNA expression measurement, but specific classes of transcripts such as abundant non-coding RNAs were underrepresented in the libraries constructed using RNase III. Furthermore, zinc-mediated fragmentation allows proper reassembly of more transcripts, with more precise 5' and 3' ends. Together, our results show that transcriptome reassembly from RNA-Seq data is very sensitive to the RNA fragmentation technique, and that zinc-mediated fragmentation provides more robust and accurate transcript identification than cleavage by RNase III.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ymeth.2013.03.009DOI Listing
September 2013

Multiscale analysis of genome-wide replication timing profiles using a wavelet-based signal-processing algorithm.

Nat Protoc 2013 Jan 13;8(1):98-110. Epub 2012 Dec 13.

Université de Lyon, Lyon, France.

In this protocol, we describe the use of the LastWave open-source signal-processing command language (http://perso.ens-lyon.fr/benjamin.audit/LastWave/) for analyzing cellular DNA replication timing profiles. LastWave makes use of a multiscale, wavelet-based signal-processing algorithm that is based on a rigorous theoretical analysis linking timing profiles to fundamental features of the cell's DNA replication program, such as the average replication fork polarity and the difference between replication origin density and termination site density. We describe the flow of signal-processing operations to obtain interactive visual analyses of DNA replication timing profiles. We focus on procedures for exploring the space-scale map of apparent replication speeds to detect peaks in the replication timing profiles that represent preferential replication initiation zones, and for delimiting U-shaped domains in the replication timing profile. In comparison with the generally adopted approach that involves genome segmentation into regions of constant timing separated by timing transition regions, the present protocol enables the recognition of more complex patterns of the spatio-temporal replication program and has a broader range of applications. Completing the full procedure should not take more than 1 h, although learning the basics of the program can take a few hours and achieving full proficiency in the use of the software may take days.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/nprot.2012.145DOI Listing
January 2013

Megabase replication domains along the human genome: relation to chromatin structure and genome organisation.

Subcell Biochem 2013 ;61:57-80

Université de Lyon, F-69000, Lyon, France,

In higher eukaryotes, the absence of specific sequence motifs, marking the origins of replication has been a serious hindrance to the understanding of (i) the mechanisms that regulate the spatio-temporal replication program, and (ii) the links between origins activation, chromatin structure and transcription. In this chapter, we review the partitioning of the human genome into megabased-size replication domains delineated as N-shaped motifs in the strand compositional asymmetry profiles. They collectively span 28.3% of the genome and are bordered by more than 1,000 putative replication origins. We recapitulate the comparison of this partition of the human genome with high-resolution experimental data that confirms that replication domain borders are likely to be preferential replication initiation zones in the germline. In addition, we highlight the specific distribution of experimental and numerical chromatin marks along replication domains. Domain borders correspond to particular open chromatin regions, possibly encoded in the DNA sequence, and around which replication and transcription are highly coordinated. These regions also present a high evolutionary breakpoint density, suggesting that susceptibility to breakage might be linked to local open chromatin fiber state. Altogether, this chapter presents a compartmentalization of the human genome into replication domains that are landmarks of the human genome organization and are likely to play a key role in genome dynamics during evolution and in pathological situations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-94-007-4525-4_3DOI Listing
February 2014

3D chromatin conformation correlates with replication timing and is conserved in resting cells.

Nucleic Acids Res 2012 Oct 8;40(19):9470-81. Epub 2012 Aug 8.

Laboratoire Joliot-Curie, Ecole Normale Supérieure de Lyon, CNRS, F-69007 Lyon, France.

Although chromatin folding is known to be of functional importance to control the gene expression program, less is known regarding its interplay with DNA replication. Here, using Circular Chromatin Conformation Capture combined with high-throughput sequencing, we identified megabase-sized self-interacting domains in the nucleus of a human lymphoblastoid cell line, as well as in cycling and resting peripheral blood mononuclear cells (PBMC). Strikingly, the boundaries of those domains coincide with early-initiation zones in every cell types. Preferential interactions have been observed between the consecutive early-initiation zones, but also between those separated by several tens of megabases. Thus, the 3D conformation of chromatin is strongly correlated with the replication timing along the whole chromosome. We furthermore provide direct clues that, in addition to the timing value per se, the shape of the timing profile at a given locus defines its set of genomic contacts. As this timing-related scheme of chromatin organization exists in lymphoblastoid cells, resting and cycling PBMC, this indicates that it is maintained several weeks or months after the previous S-phase. Lastly, our work highlights that the major chromatin changes accompanying PBMC entry into cell cycle occur while keeping largely unchanged the long-range chromatin contacts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gks736DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3479194PMC
October 2012

BRASERO: A Resource for Benchmarking RNA Secondary Structure Comparison Algorithms.

Adv Bioinformatics 2012 23;2012:893048. Epub 2012 May 23.

LaBRI, UMR 5800 CNRS, Université Bordeaux, 351, Cours de la Libération, 33405 Talence Cédex, France.

The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forests, arc annotated sequences, and multilevel trees) and computational principles (edit distance, alignment). We describe here the website BRASERO that offers tools for evaluating such software tools on real and synthetic datasets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1155/2012/893048DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3366197PMC
August 2012

Replication fork polarity gradients revealed by megabase-sized U-shaped replication timing domains in human cell lines.

PLoS Comput Biol 2012 5;8(4):e1002443. Epub 2012 Apr 5.

Université de Lyon, Lyon, France.

In higher eukaryotes, replication program specification in different cell types remains to be fully understood. We show for seven human cell lines that about half of the genome is divided in domains that display a characteristic U-shaped replication timing profile with early initiation zones at borders and late replication at centers. Significant overlap is observed between U-domains of different cell lines and also with germline replication domains exhibiting a N-shaped nucleotide compositional skew. From the demonstration that the average fork polarity is directly reflected by both the compositional skew and the derivative of the replication timing profile, we argue that the fact that this derivative displays a N-shape in U-domains sustains the existence of large-scale gradients of replication fork polarity in somatic and germline cells. Analysis of chromatin interaction (Hi-C) and chromatin marker data reveals that U-domains correspond to high-order chromatin structural units. We discuss possible models for replication origin activation within U/N-domains. The compartmentalization of the genome into replication U/N-domains provides new insights on the organization of the replication program in the human genome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1002443DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3320577PMC
August 2012

Evidence for sequential and increasing activation of replication origins along replication timing gradients in the human genome.

PLoS Comput Biol 2011 Dec 29;7(12):e1002322. Epub 2011 Dec 29.

Institut de Biologie de l'Ecole Normale Supérieure (IBENS), CNRS UMR8197, Inserm U1024, Paris, France.

Genome-wide replication timing studies have suggested that mammalian chromosomes consist of megabase-scale domains of coordinated origin firing separated by large originless transition regions. Here, we report a quantitative genome-wide analysis of DNA replication kinetics in several human cell types that contradicts this view. DNA combing in HeLa cells sorted into four temporal compartments of S phase shows that replication origins are spaced at 40 kb intervals and fire as small clusters whose synchrony increases during S phase and that replication fork velocity (mean 0.7 kb/min, maximum 2.0 kb/min) remains constant and narrowly distributed through S phase. However, multi-scale analysis of a genome-wide replication timing profile shows a broad distribution of replication timing gradients with practically no regions larger than 100 kb replicating at less than 2 kb/min. Therefore, HeLa cells lack large regions of unidirectional fork progression. Temporal transition regions are replicated by sequential activation of origins at a rate that increases during S phase and replication timing gradients are set by the delay and the spacing between successive origin firings rather than by the velocity of single forks. Activation of internal origins in a specific temporal transition region is directly demonstrated by DNA combing of the IGH locus in HeLa cells. Analysis of published origin maps in HeLa cells and published replication timing and DNA combing data in several other cell types corroborate these findings, with the interesting exception of embryonic stem cells where regions of unidirectional fork progression seem more abundant. These results can be explained if origins fire independently of each other but under the control of long-range chromatin structure, or if replication forks progressing from early origins stimulate initiation in nearby unreplicated DNA. These findings shed a new light on the replication timing program of mammalian genomes and provide a general model for their replication kinetics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1002322DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3248390PMC
December 2011

Tight protein-DNA interactions favor gene silencing.

Genes Dev 2011 Jul;25(13):1365-70

Institut Curie, Paris, F-75248 France.

The heterochromatin-like structure formed by the yeast silent information regulator complex (SIR) represses transcription at the silent mating type loci and telomeres. Here, we report that tight protein-DNA complexes induce ectopic recruitment of the SIR complex, promoting gene silencing and changes in subnuclear localization when cis-acting elements are nearby. Importantly, lack of the replication fork-associated helicase Rrm3 enhances this induced gene repression. Additionally, Sir3 and Sir4 are enriched genome-wide at natural replication pause sites, including tRNA genes. Consistently, inserting a tRNA gene promotes SIR-mediated silencing of a nearby gene. These results reveal that replication stress arising from tight DNA-protein interactions favors heterochromatin formation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gad.611011DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3134080PMC
July 2011

Replication-associated mutational asymmetry in the human genome.

Mol Biol Evol 2011 Aug 2;28(8):2327-37. Epub 2011 Mar 2.

Centre de Génétique Moléculaire, Centre National de la Recherche Scientifique (CNRS), Gif-sur-Yvette, France.

During evolution, mutations occur at rates that can differ between the two DNA strands. In the human genome, nucleotide substitutions occur at different rates on the transcribed and non-transcribed strands that may result from transcription-coupled repair. These mutational asymmetries generate transcription-associated compositional skews. To date, the existence of such asymmetries associated with replication has not yet been established. Here, we compute the nucleotide substitution matrices around replication initiation zones identified as sharp peaks in replication timing profiles and associated with abrupt jumps in the compositional skew profile. We show that the substitution matrices computed in these regions fully explain the jumps in the compositional skew profile when crossing initiation zones. In intergenic regions, we observe mutational asymmetries measured as differences between complementary substitution rates; their sign changes when crossing initiation zones. These mutational asymmetries are unlikely to result from cryptic transcription but can be explained by a model based on replication errors and strand-biased repair. In transcribed regions, mutational asymmetries associated with replication superimpose on the previously described mutational asymmetries associated with transcription. We separate the substitution asymmetries associated with both mechanisms, which allows us to determine for the first time in eukaryotes, the mutational asymmetries associated with replication and to reevaluate those associated with transcription. Replication-associated mutational asymmetry may result from unequal rates of complementary base misincorporation by the DNA polymerases coupled with DNA mismatch repair (MMR) acting with different efficiencies on the leading and lagging strands. Replication, acting in germ line cells during long evolutionary times, contributed equally with transcription to produce the present abrupt jumps in the compositional skew. These results demonstrate that DNA replication is one of the major processes that shape human genome composition.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msr056DOI Listing
August 2011

Evolution of Hox gene clusters in gnathostomes: insights from a survey of a shark (Scyliorhinus canicula) transcriptome.

Mol Biol Evol 2010 Dec 8;27(12):2829-38. Epub 2010 Jul 8.

Laboratoire Evolution, Génomes et Spéciation, UPR 9034 Centre National de la Recherche Scientifique and Université Paris Diderot-Paris 7, 91198 Gif sur Yvette, France.

It is now well established that there were four Hox gene clusters in the genome of the last common ancestor of extant gnathostomes. To better understand the evolution of the organization and expression of these genomic regions, we have studied the Hox gene clusters of a shark (Scyliorhinus canicula). We sequenced 225,580 expressed sequence tags from several embryonic cDNA libraries. Blast searches identified corresponding transcripts to almost all the HoxA, HoxB, and HoxD cluster genes. No HoxC transcript was identified, suggesting that this cluster is absent or highly degenerate. Using Hox gene sequences as probes, we selected and sequenced seven clones from a bacterial artificial chromosome library covering the complete region of the three gene clusters. Mapping of cDNAs to these genomic sequences showed extensive alternative splicing and untranslated exon sharing between neighboring Hox genes. Homologous noncoding exons could not be identified in transcripts from other species using sequence similarity. However, by comparing conserved noncoding sequences upstream of these exons in different species, we were able to identify homology between some exons. Some alternative splicing variants are probably very ancient and were already coded for by the ancestral Hox gene cluster. We also identified several transcripts that do not code for Hox proteins, are probably not translated, and all but one are in the reverse orientation to the Hox genes. This survey of the transcriptome of the Hox gene clusters of a shark shows that the high complexity observed in mammals is a gnathostome ancestral feature.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/molbev/msq172DOI Listing
December 2010

Impact of replication timing on non-CpG and CpG substitution rates in mammalian genomes.

Genome Res 2010 Apr 26;20(4):447-57. Epub 2010 Jan 26.

Centre de Génétique Moléculaire, Allée de la Terrasse, 91198 Gif-sur-Yvette, France.

Neutral nucleotide substitutions occur at varying rates along genomes, and it remains a major issue to unravel the mechanisms that cause these variations and to analyze their evolutionary consequences. Here, we study the role of replication in the neutral substitution pattern. We obtained a high-resolution replication timing profile of the whole human genome by massively parallel sequencing of nascent BrdU-labeled replicating DNA. These data were compared to the neutral substitution rates along the human genome, obtained by aligning human and chimpanzee genomes using macaque and orangutan as outgroups. All substitution rates increase monotonously with replication timing even after controlling for local or regional nucleotide composition, crossover rate, distance to telomeres, and chromatin compaction. The increase in non-CpG substitution rates might result from several mechanisms including the increase in mutation-prone activities or the decrease in efficiency of DNA repair during the S phase. In contrast, the rate of C --> T transitions in CpG dinucleotides increases in later-replicating regions due to increasing DNA methylation level that reflects a negative correlation between timing and gene expression. Similar results are observed in the mouse, which indicates that replication timing is a main factor affecting nucleotide substitution dynamics at non-CpG sites and constitutes a major neutral process driving mammalian genome evolution.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.098947.109DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847748PMC
April 2010

A novel strategy of transcription regulation by intragenic nucleosome ordering.

Genome Res 2010 Jan 26;20(1):59-67. Epub 2009 Oct 26.

Université de Lyon, Lyon, France.

Numerous studies of chromatin structure showed that nucleosome free regions (NFRs) located at 5' gene ends contribute to transcription initiation regulation. Here, we determine the role of intragenic chromatin structure on gene expression regulation. We show that, along Saccharomyces cerevisiae genes, nucleosomes are highly organized following two types of architecture that depend only on the distance between the NFRs located at the 5' and 3' gene ends. In the first type, this distance constrains in vivo the positioning of n nucleosomes regularly organized in a "crystal-like" array. In the second type, this distance is such that the corresponding genes can accommodate either n or (n + 1) nucleosomes, thereby displaying two possible crystal-like arrays of n weakly compacted or n + 1 highly compacted nucleosomes. This adaptability confers "bi-stable" properties to chromatin and is a key to its dynamics. Compared to crystal-like genes, bi-stable genes present higher transcriptional plasticity, higher sensitivity to chromatin regulators, higher H3 turnover rate, and lower H2A.Z enrichment. The results strongly suggest that transcription elongation is facilitated by higher chromatin compaction. The data allow us to propose a new paradigm of transcriptional control mediated by the stability and the level of compaction of the intragenic chromatin architecture and open new ways for investigating eukaryotic gene expression regulation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.096644.109DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2798831PMC
January 2010

Open chromatin encoded in DNA sequence is the signature of 'master' replication origins in human cells.

Nucleic Acids Res 2009 Oct 10;37(18):6064-75. Epub 2009 Aug 10.

Université de Lyon, F-69000 Lyon, France.

For years, progress in elucidating the mechanisms underlying replication initiation and its coupling to transcriptional activities and to local chromatin structure has been hampered by the small number (approximately 30) of well-established origins in the human genome and more generally in mammalian genomes. Recent in silico studies of compositional strand asymmetries revealed a high level of organization of human genes around 1000 putative replication origins. Here, by comparing with recently experimentally identified replication origins, we provide further support that these putative origins are active in vivo. We show that regions approximately 300-kb wide surrounding most of these putative replication origins that replicate early in the S phase are hypersensitive to DNase I cleavage, hypomethylated and present a significant enrichment in genomic energy barriers that impair nucleosome formation (nucleosome-free regions). This suggests that these putative replication origins are specified by an open chromatin structure favored by the DNA sequence. We discuss how this distinctive attribute makes these origins, further qualified as 'master' replication origins, priviledged loci for future research to decipher the human spatio-temporal replication program. Finally, we argue that these 'master' origins are likely to play a key role in genome dynamics during evolution and in pathological situations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/nar/gkp631DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2764438PMC
October 2009

Novel long non-protein coding RNAs involved in Arabidopsis differentiation and stress responses.

Genome Res 2009 Jan 7;19(1):57-69. Epub 2008 Nov 7.

Institut des Sciences du Végétal (ISV), CNRS, 91198 Gif-sur-Yvette, France.

Long non-protein coding RNAs (npcRNA) represent an emerging class of riboregulators, which either act directly in this long form or are processed to shorter miRNA and siRNA. Genome-wide bioinformatic analysis of full-length cDNA databases identified 76 Arabidopsis npcRNAs. Fourteen npcRNAs were antisense to protein-coding mRNAs, suggesting cis-regulatory roles. Numerous 24-nt siRNA matched to five different npcRNAs, suggesting that these npcRNAs are precursors of this type of siRNA. Expression analyses of the 76 npcRNAs identified a novel npcRNA that accumulates in a dcl1 mutant but does not appear to produce trans-acting siRNA or miRNA. Additionally, another npcRNA was the precursor of miR869 and shown to be up-regulated in dcl4 but not in dcl1 mutants, indicative of a young miRNA gene. Abiotic stress altered the accumulation of 22 npcRNAs among the 76, a fraction significantly higher than that observed for the RNA binding protein-coding fraction of the transcriptome. Overexpression analyses in Arabidopsis identified two npcRNAs as regulators of root growth during salt stress and leaf morphology, respectively. Hence, together with small RNAs, long npcRNAs encompass a sensitive component of the transcriptome that have diverse roles during growth and differentiation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1101/gr.080275.108DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2612962PMC
January 2009