The objective of this project is to facilitate the interpretation of genetic variants identified in clinical whole- genome and whole-exome sequencing studies through the development of computational methods to predict functional effects of individual variants. Genome-scale sequencing technologies are increasingly enabling studies of genetic variation in large numbers of individuals; however, interpreting the clinical significance of the hundreds of thousands of genetic variants identified in these studies remains a critical challenge. Gene expression regulation is one mechanism by which variants can result in disease or other clinically-significant phenotypes. This mechanism is likely to be particularly important for disease-causing variants that do not directly affect protein structure by altering an amino acid sequence. The methods developed here will enable researchers to predict whether genetic variants are likely to have a regulatory effect on gene expression. The first stage of the project is to build computational models to predict such regulatory effects using a random forest machine learning approach. These models will be trained to recognize regulatory variation using a set of variants that have been shown to be involved in expression regulation in a recent study of gene expression across hundreds of individuals. Separate algorithms will be developed to predict two different types of regulatory effects: changes in the total amount of RNA produced from a particular gene (expression level variation) and changes in the specific form of RNA produced from a particular gene (splicing or isoform ratio variation). The second stage of the project is to evaluate the performance of these models on gene expression datasets from a separate human population and from different tissues within the human body, to explore their generalizability and to determine to what extent the characteristics of regulatory variants are conserved across tissues and populations. The final stage of the project is to use genetic variants in publicly-available databases that are known to be pathogenic to characterize how well these models perform at predicting clinical significance. This stage will test the hypothesis that variants that regulate gene expression are more likely to be clinically significant than variants that do not regulate expression. This project will impact public health by providing useful tools to improve prediction of the clinical significance of genetic variants identified in genome- scale sequencing studies. In addition, the project will provide biological insight into the tissue-specificity and population-specificity of genomic features that characterize regulatory genetic variants.