The genetic code not only determines protein amino acid residue sequence but also defines the 'splicing code' of cis- and trans-acting regulatory elements that control pre-mRNA splicing. Single nucleotide variant (SNV) changes at key regions in pre-mRNA may disrupt splicing resulting in disease [1, 2]. Understanding which SNVs cause aberrant splicing and which are benign is important for understanding disease pathogenesis. SNVs at consensus splice sites, at exon-intron junctions, are known to cause aberrant splicing and contribute to at least 10% of inherited diseases [2]. However, SNVs outside consensus splice sites can still disrupt splicing [3]. Current, bioinformatics tools limit analysis to SNVs at or near consensus splice sites and lack the ability to generalize to SNVs beyond the consensus splice site [4-7]. In this application, I propose to substantially improve the ability to interpret the consequences of mutations on pre-mRNA splicing. This goal will be achieved by: 1) developing novel features, useful in predicting the impact of variation on cis- splicing regulation; 2) training a supervised machine learning algorithm that uses the novel features to predict the impact of SNVs; 3) sharing the algorithm in a publically available software package; and 4) comparing algorithm predictions to the relationships between SNVs and splicing patterns derived from matched DNA- and RNA-sequencing studies.