PROJECT SUMMARY It has long been known that methylation of genomic DNA influences gene expression. The underlying structural mechanisms, however, largely remain obscure. In this project, we will pursue a new strategy for predicting how methylation affects transcription factor (TF) binding, thereby influencing the intricate genomewide landscape of local chromatin structure and gene expression that characterizes each cell. We will explore the hypothesis that methylation causes local changes in DNA shape, which in turn modify TF binding affinity. Motivation comes from our recent analysis of the intrinsic specificity of the endonuclease DNase I. We found that cytosine methylation greatly increases the rate at which DNase I cleaves the DNA backbone adjacent to CpG dinucleotides. The explanation for this is that adding a methyl group in the major groove causes changes in DNA shape that locally narrow the minor groove and enhance the electrostatic interaction between negative backbone phosphates of the DNA and positive amino-acid residues of DNase I. Recognition of DNA shape via the minor groove can also contribute to the binding specificity of eukaryotic TFs, suggesting that methylation sensitivity can be predicted from a shape-based analysis of TF binding preferences among unmethylated DNA sequences, for which ample high-throughput in vitro binding data is available. To explore this, we will first develop and fit models of TF binding specificity that integrate DNA base and shape readout by extending the biophysical model underlying our FeatureREDUCE algorithm to include information about DNA shape from computer simulations of free DNA molecules. Next, we will use these integrated base/shape recognition models to make predictions regarding the methylation sensitivity of TFs, and validate these experimentally. In a parallel approach, we will extend our recently developed SELEX-seq method by using barcoded mixtures of methylated and unmethylated DNA ligands to create detailed maps of the effect of methylation on binding affinity for a representative set of TFs. Finally, we will analyze how the binding specificity of a TF depends on its amino-acid sequence using family-level modeling. Using biophysical base and shape recognition parameters estimated for a large number of TFs from the same structural TF family, along with a novel geometric representation of base preference, we will predict how the binding specificity of basic helix-loop-helix (bHLH) and basic leucine zipper (bZIP) proteins changes when amino-acid residues are mutated, and experimentally validate these predictions. We will use the same family-based approach to demonstrate the existence of alternative dimeric binding modes for bHLH factors, and investigate whether the propensity of a TF to use these alternative modes can be predicted from its protein sequence.