As we approach a time when human genome sequencing will be routine and widely available, we remain surprisingly ignorant about the functional consequences of sequence variation. Although imperfect, methods that draw on our extensive knowledge of protein structure, function and evolution can identify those polymorphisms in protein-coding genes that are most likely to affect their function. However, our poor understanding of the relationship between DNA sequence and function in non-coding DNA, where the overwhelming majority of polymorphisms are found, makes it essentially impossible to predict which polymorphisms will have biochemical, cellular or organismal consequences. My research is focused on understanding the relationship between sequence variation and function in the non- coding sequences that control gene expression. I use as a model early anterior-posterior patterning in Drosophila melanogaster and its relatives. Gene regulation in the early Drosophila embryo is a superb model for such studies, as there is a long history of genetic and biochemical dissection of the system, and it has been the target of extensive experimental genomic studies in the past few years. In Aim 1, I proposed to use 37 currently available genomes of genetically distinct D. melanogaster lines to study patterns of selection on sequences involved in gene regulation. Completed population genetic analyses with genomic data suggest that the selective forces that have acted on the transcription factor binding sites that regulate anterior-posterior patterning are not significantly different than the rest of the genome. In Aim 2, I will extend these studies to include data from experimental surveys of transcription factor binding in embryos from all of these fully sequenced lines (these data are currently being generated in my thesis lab). My goal is to relate observed variation (or lack of variation) in transcription factor binding with measures of selection on transcription factor binding sites, and to subsequently develop models that can predict the consequences of variation in binding sites. Although these studies will be carried out in a model organism, there is extensive conservation in the basic mechanisms of gene regulation between flies and humans, and I expect that the lessons learned in Drosophila will be of immediate use in both identifying and understanding human non-coding sequence variants linked to disease and other clinically relevant phenotypes. PUBLIC HEALTH RELEVANCE: As we approach a time when human genome sequencing will be routine and widely available, we remain surprisingly ignorant about the functional consequences of sequence variation, particularly in non-coding DNA where the overwhelming majority of polymorphisms are found. The model of early anterior-posterior patterning in Drosophila melanogaster provides thousands of functional non-coding sequences in which we can assay the impact of naturally occurring sequence variations. The lessons learned in Drosophila will be of immediate use in both identifying and understanding human non-coding sequence variants linked to disease and other clinically relevant phenotypes.