Hum Mol Genet 2002 Feb;11(4):451-64
Advanced Computation and Modeling Center, University of Queensland, St Lucia, 4072, Australia.
By spliced alignment of human DNA and transcript sequence data we constructed a data set of transcript-confirmed exons and introns from 2793 genes, 796 of which (28%) were seen to have multiple isoforms. We find that over one-third of human exons can translate in more than one frame, and that this is highly correlated with G+C content. Introns containing adenosine at donor site position +3 (A3), rather than guanosine (G3), are more common in low G+C regions, while the converse is true in high G+C regions. These two classes of introns are shown to have distinct lengths, consensus sequences and correlations among splice signals, leading to the hypothesis that A3 donor sites are associated with exon definition, and G3 donor sites with intron definition. Minor classes of introns, including GC-AG, U12-type GT-AG, weak, and putative AG-dependant introns are identified and characterized. Cassette exons are more prevalent in low G+C regions, while exon isoforms are more prevalent in high G+C regions. Cassette exon events outnumber other alternative events, while exon isoform events involve truncation twice as often as extension, and occur at acceptor sites twice as often as at donor sites. Alternative splicing is usually associated with weak splice signals, and in a majority of cases, preserves the coding frame. The reported characteristics of constitutive and alternative splice signals, and the hypotheses offered regarding alternative splicing and genome organization, have important implications for experimental research into RNA processing. The 'AltExtron' data sets are available at http://www.bit.uq.edu.au/altExtron/ and http://www.ebi.ac.uk/~thanaraj/altExtron/.