The interactions between transcription factors (TFs) and their DNA binding sites are an integral part of the regulatory networks within cells. These interactions control critical steps in development and responses to environmental stresses, and in humans their dysfunction can contribute to the progression of various diseases. Thus far only a small handful of sequence-specific TFs have been characterized well enough for us to know all the sequences that they can, and just as importantly, can not bind. This sparseness of this binding site sequence data is highly problematic, because these sparse datasets are then used to search for genomic occurrences of these sites, with many false positive and false negative binding sites being predicted. Ultimately what the biological community needs is much more complete TF binding site data on all possible DNA sequence variants. These data will allow us to improve the accuracy with which we can predict functional cis regulatory elements within genomic sequence. We have calculated what we believe is a maximally compact representation of all possible binding sites that still allows the sequence specificities of DNA binding molecules to be recovered. The advantage of this technology is that all possible DNA sequence variants can be represented on DNA microarrays in a space- and cost- efficient manner, so that only a minimal number of individual DNA sequences and individual DNA spots need to be synthesized. In this project, we will: (1) develop the use of compact combinatorial DNA microarrays in protein binding microarray (PBM) experiments for identifying all possible DNA binding sites of sequence-specific TFs; (2) determine the binding affinities of all possible DNA binding sites for -15 Saccharomyces cerevisiae TFs using compact combinatorial DNA microarrays and create a database of these data; and (3) evaluate the utility of complete binding specificity data and binding affinity data for improved prediction of in vivo TF binding sites. There exists no other technology for the determination of the relative binding affinities of all candidate DNA binding sites for TFs that is as high-throughput as the compact combinatorial DNA microarray PBM technology. These studies should permit a better understanding of the importance of the binding affinities of TF binding sites in eukaryotic genomes. Such data may also increase the accuracy with which cis regulatory modules can be predicted in higher eukaryotic genomes.