Naturally-occurring polymorphisms can affect gene expression and thereby underlie human diseases, such as cancer and diabetes. A deeper understanding of cis-regulatory variation will also facilitate the design of better algorithms for predicting cis-regulatory sites and help us elucidate the impact of changes in gene regulation on speciation and phenotypic evolution. Previously, by analyzing polymorphisms in human microRNA binding sites, we identified several candidate causal variants of human disease and a set of human microRNA binding sites not conserved in other mammals. Here we propose to extend this work to cis-regulatory sites that mediate transcriptional and post-transcriptional control. We will use the yeast, S. cerevisiae, as a model system because it offers experimental tractability, a well-studied regulatory network and multiple fully-sequenced strains. Our specific aims are (1) Extend our techniques to accommodate degenerate motifs and insertion/deletion SNPs, and use them to predict polymorphisms in yeast transcription factor binding sites (2) Experimentally validate our predicted cis-regulatory polymorphisms using quantitative PCR and pyrosequencing (3) Design an Expectation-Maximization algorithm to predict cis-regulatory sites from both polymorphism and divergence data, and use it to predict cis-regulatory sites mediating yeast transcriptional and post-transcriptional regulation (4) Extend statistical models of transcription to include post-transcriptional regulation and use it to improve our predictions of cis-regulatory polymorphisms. The P.I.'s long-term goal is to use computational and experimental approaches to identify cis-regulatory variants between different human populations and pathological conditions (e.g. cancers). Since the Pal's training is in computational biology, the main impact of this award would be to provide training in a unified computational-experimental approach with Nikolaus Rajewsky, a bioinformatician, and Mark Siegal, an experimentalist. [unreadable] [unreadable] Relevance: Many human diseases, such as cancer and diabetes, are partly caused by the aberrant regulation of specific genes. Identifying the genetic mutations responsible for the changes in control of these genes is the first step towards diagnosing and ultimately curing these diseases. The aim of this project is to develop and validate computational methods that can eventually be used to create a comprehensive catalogue of gene regulatory variation in the human genome. [unreadable] [unreadable] [unreadable]