BMC Genomics 2012 Dec 18;13:708. Epub 2012 Dec 18.
International Centre for Genetic Engineering and Biotechnology (ICGEB), Padriciano 99, I 34149, Trieste, Italy.
Background: In higher eukaryotes, gene expression is regulated at different levels. In particular, 3'UTRs play a central role in translation, stability and subcellular localization of transcripts. In recent years, the development of high throughput sequencing techniques has facilitated the acquisition of transcriptional data at a genome wide level. However, annotation of the 3' ends of genes is still incomplete, thus limiting the interpretation of the data generated. For example, we have previously reported two different genes, ADD2 and CPEB3, with conserved 3'UTR alternative isoforms not annotated in the current versions of Ensembl and RefSeq human databases.
Results: In order to evaluate the existence of other conserved 3' ends not annotated in these databases we have now used comparative genomics and transcriptomics across several vertebrate species. In general, we have observed that 3'UTR conservation is lost after the end of the mature transcript. Using this change in conservation before and after the 3' end of the mature transcripts we have shown that many conserved ends were still not annotated. In addition, we used orthologous transcripts to predict 3'UTR extensions and validated these predictions using total RNA sequencing data. Finally, we used this method to identify not annotated 3' ends in rats and dogs. As a result, we report several hundred novel 3'UTR extensions in rats and a few thousand in dogs.
Conclusions: The methods presented here can efficiently facilitate the identification of not-yet-annotated conserved 3'UTR extensions. The application of these methods will increase the confidence of orthologous gene models across vertebrates.