Naturally-occurring polymorphisms can affect gene expression and thereby influence human diseases, such as cancer and diabetes. A deeper understanding of cis-regulatory variation will also facilitate the design of better algorithms for predicting cis-regulatory sites and help us elucidate the impact of changes in gene regulation on speciation and phenotypic evolution. Previously, by analyzing polymorphisms in human microRNA binding sites, we identified several candidate causal variants of human disease and a set of human microRNA binding sites not conserved in other mammals. Here we propose to extend this work to transcription factor binding sites. We will use the yeast, S. cerevisiae, as a model system because it offers experimental tractability, a well-studied gene regulatory network and multiple fully-sequenced strains. Our specific aims are (1) Extend our previous techniques to accommodate degenerate motifs and insertion/deletion polymorphisms, and use our new techniques to study the function and evolution of computationally predicted yeast transcription factor binding sites (2) Fit a statistical model of transcriptional regulation to publicly available microarray data for a set of 112 segregants from an experimental cross between two S. cerevisiae strains and use the model to predict polymorphisms that significantly affect gene expression (3) Experimentally validate the candidate polymorphisms using site-directed mutagenesis and quantitative PCR. The applicant's long-term goal is to use computational and experimental approaches to identify cis-regulatory variants between different human populations and pathological conditions (e.g. cancers). Since the applicant's training is in computational biology, the main impact of this award would be to provide training in a unified computational-experimental approach with Mark Siegal, an experimentalist and Nikolaus Rajewsky, a bioinformatician. Relevance: Many human diseases, such as cancer and diabetes, are caused in part by the aberrant regulation of specific genes. Identifying the genetic mutations responsible for the changes in control of these genes is the first step towards diagnosing and ultimately curing these diseases. The long-term aim of this project is to develop and validate computational methods that can be used to compile a comprehensive catalogue of gene regulatory variation in the human genome. [unreadable] [unreadable] [unreadable]