Project Summary A major function of the non-protein-coding genome is to direct specific patterns of gene expression by encoding binding sites for transcription factors. However, large genomes contain millions of spurious copies of the short, degenerate sequence motifs that transcription factors recognize. It is not understood how genuinely functional binding sites are distinguished from non-functional motifs. This is a critical unsolved problem, because growing evidence indicates that genetic variants that disrupt or create transcription factor binding sites play a widespread role in disease, but we currently cannot accurately identify variants affect active binding sites. To address this issue, our long term goal is to understand how active transcription factor binding sites are specified by their DNA sequence features. Recent results suggest that functional binding sites are distinguished from spurious motifs by critical local sequences that flank the core motif. Using the mammalian retina as a physiologically relevant model system to address this broad issue, we will investigate binding sites for the photoreceptor transcription factor CRX. Our specific aims are: First, to understand how flanking DNA sequence features distinguish transcriptionally active CRX binding sites from inactive genomic sequences with spurious CRX motifs. Second, to quantify and model the effects of local flanking sequence features using a tractable system of synthetic CRX binding sites. The major innovation of this proposal is to measure both transcription factor binding and cis-regulatory activity on a large set of wild-type, mutant, and synthetic sequences, and thereby directly quantify the relationship between flanking DNA sequence, binding, and activity. Using recently developed high-throughput assays to measure cis-regulatory activity and CRX binding affinity, we will directly test the functional role of local flanking sequences by disrupting flanking sequence features of CRX binding sites. By combining functional genomics and synthetic biology to investigate natural and synthetic CRX binding sites, we will discover how different DNA sequence features combine to specify active CRX sites in the genome. The result will improve our understanding of how sequence variants outside core motifs affect transcription factor binding sites.