Sequence-specific transcription factors (TFs) regulate gene expression through their interactions with DNA sequences in the genome. The overarching goal of this project is to continue developing approaches and data sets for understanding the DNA binding specificities of TFs, and to understand the effects of coding polymorphisms with TFs DNA binding domains on their DNA binding preferences. Identification of TFs DNA binding specificities is important in understanding transcriptional regulatory networks, in particular in the prediction of cis regulatory modules, inference of cis regulatory codes, and interpretation of in vivo TF binding data and gene expression data. Identification of the DNA binding effects of such polymorphic TF variants will be essential in studies aimed at understanding the gene regulatory effects resulting from natural genetic variation. This project will focus on human TFs associated with diseases, in particular Crohns disease, ulcerative colitis, diabetes, autism and other neuropsychiatric disorders. Our results will provide data that will likely be of importance to other systems, and more generally, our data, approaches, technologies, and database will be useful not only for human TFs but also for model organism studies. Specifically, we will: (1) develop systematic approaches for extracting fine-level features of TFs DNA binding specificity when comparing very similar proteins; (2) generate clone resources and proteins for ~200 total human proteins (reference and polymorphic variants) with disease/trait associations; (3) determine the DNA binding specificities of the ~200 total reference and polymorphic TFs, and identify the effects of coding polymorphisms within TFs on their DNA binding specificities; (4) develop high-throughput, array-based technology for determining the effects of CpG DNA methylation on TF-DNA binding; (5) maintain the UniPROBE database that hosts high-resolution, universal protein binding microarray data on TFs DNA binding specificities.