The human genome contains over 700 genes encoding proteins with zinc finger domains, more than half of which contain eight or more fingers organized in a tandem fashion. Many of these genes function as transcription factors, insulator binding proteins, or chromatin modifiers. Despite their importance, we still lack a comprehensive knowledge on the rules that determine these proteins? binding to DNA, and the existing prediction programs do not perform satisfactorily. Recently, we have developed two new methods for isolation and deep sequencing of zinc finger protein binding sites. The first, Affinity-seq directly determines the relative affinity of tens of thousands of binding sites genome-wide with high binding specificity. It also provides the opportunity for mutational analysis of binding site specificities using alternate sources of genomic DNA. The second, Spec-sec, determines the changes in binding energy for thousands of variants of a preferred sequence, and their sensitivity to DNA methylation. We propose to apply these methods for comprehensive analysis of DNA binding sites of over twenty mouse and human natural protein variants of the recombination regulator PRDM9, as well as over one hundred other human and mouse zinc finger proteins, which represent different groups of long zinc finger array proteins, and whose binding sites has not been determined previously. In Aim 1, we will determine the specificities of PRDM9 protein variants binding to DNA. Aim 1a will determine how systematic changes in contact amino acids, numbers, and interactions between ZFs in PRDM9 protein variants affect their DNA binding by Affinity-seq. Aim 1b will determine the quantitative specificity and sensitivity to DNA methylation of each PRDM9 protein variant by Spec-seq. Aim 1c will use cell culture approaches to determine how conserved features of ZF arrays and combinations of motifs in the same array affect the biological activity of engineered PRDM9 protein variants. In Aim 2, we will determine whether DNA- binding specificities of different laZFP groups co-evolve with their additional domains. Aim 2a will determine the commonality or uniqueness of the rules governing binding to DNA of laZFPs belonging to BTB-, SCAN-, SET-, and KRAB- containing groups, and those without additional domains, by Affinity-seq. Aim 2b will determine their quantitative specificity and sensitivity to CpG methylation (mCpG) status by Spec-seq. In Aim 3, we will develop new and improved computational algorithms for binding site modeling and motif prediction based on laZFP sequences, including mCpG sensitivity. Aim 3a will develop enhanced specificity representations of ZFPs that take full advantage of the Spec-seq data and don?t impose the positional independence inherent in PWM models. Aim 3b will develop improved motif prediction models including methylation sensitivity.