Hum Mol Genet 2013 Sep 2;22(17):3449-59. Epub 2013 May 2.
Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow 127994, Russia.
Proper splicing is often crucial for gene functioning and its disruption may be strongly deleterious. Nevertheless, even the essential for splicing canonical dinucleotides of the splice sites are often polymorphic. Here, we use data from The 1000 Genomes Project to study single-nucleotide polymorphisms (SNPs) in the canonical dinucleotides. Splice sites carrying SNPs are enriched in weakly expressed genes and in rarely used alternative splice sites. Genes with disrupted splice sites tend to have low selective constraint, and the splice sites disrupted by SNPs are less likely to be conserved in mouse. Furthermore, SNPs are enriched in splice sites whose effects on gene function are minor: splice sites located outside of protein-coding regions, in shorter exons, closer to the 3'-ends of proteins, and outside of functional protein domains. Most of these effects are more pronounced for high-frequency SNPs. Despite these trends, many of the polymorphic sites may still substantially affect the function of the corresponding genes. A number of the observed splice site-disrupting SNPs, including several high-frequency ones, were found among mutations described in OMIM.