The ability to control the activity level of different genes is key to fundamental biological processes such as development and differentiation, and many human diseases are caused by defects in this regulatory process. This regulation is encoded within specific regions of the genome, termed regulatory regions, and indeed, in many studies of cancer and of other diseases and human phenotypes, changes in gene activity that are tightly linked to the disease state have in turn been linked to changes in the DNA sequence of the genes'regulatory regions. However, we currently have a poor understanding of the how gene activity is encoded by DNA sequence, and thus, we do not understand by what mechanism these disease-linked sequence changes cause the observed changes in gene activities. Given the many studies of gene regulation that have been carried out, it is actually surprising how little we know about this mapping between gene activity and DNA sequence. In principle, such questions can be directly answered through accurate measurements of regulatory regions in which various sequence elements are varied systematically. However, such data does not currently exist, most likely due to the technical difficulties in constructing such sequences and accurately measuring their activity. Here, we aim to derive a mechanistic understanding of how gene activity patterns are encoded in DNA sequence, and arrive at a quantitative model that describes the entire process, from the activity of the regulating proteins, termed transcription factors, to their binding to regulatory regions, through the important role of DNA packaging in this process, and up to the gene activity patterns resulting from the DNA binding activity of the regulating transcription factors. A systematic study of such interactions requires the ability to efficiently synthesize and accurately measure the activity of many different regulatory sequences. We have recently developed such capabilities, which we will utilize in this project. Specifically, we will design regulatory sequences that systematically test the quantitative contribution of various types of sequence elements to gene activity, measure their activity, integrate the resulting data into a unified model of gene regulation, and then use this model to examine how such regulatory sequence elements are used in native promoters to achieve biologically meaningful activity patterns, and how changes in these sequence elements during evolution contribute to evolutionary changes in gene activity. Finally, we will apply the model to predict gene activity changes among human individuals, using the emerging genotype data that is rapidly being collected. If successful, our project should have far reaching implications. Most notably, since changes in gene activity levels play a key role in the development of cancer and of many other diseases, even a partial ability to predict gene activity changes among human individuals from the genotype information that is rapidly being collected for them, could have important medical implications. PUBLIC HEALTH RELEVANCE: The ability to control the activity level of different genes is key to fundamental biological processes such as development and differentiation, and many human diseases are caused by defects in this regulatory process. This proposal aims to unravel the rules by which this control is encoded in the language of DNA sequence, and to arrive at a quantitative model that can be used to predict changes in gene activity levels across human individuals, based on the emerging genotype data that is rapidly being collected for them. Since changes in gene activity levels play a key role in the development of cancer and of many other diseases, such a predictive ability could have important medical implications.