It has long been puzzling how a relatively small number of transcription factors (TFs) can precisely control expression of ~21,000 ORFs and probably 10-fold more non-coding RNAs in humans. Another major gap in the transcription field is a lack of a simple set of rules that explain the specificity of protein-DNA interactions. These gaps represent a major problem because, until they are filled, understanding of the transcription circuitry and its underlining principles will remain highly incomplete. The long-term goals are to characterize the human protein-DNA interaction (PDI) network and elucidate the underlining molecular mechanisms of transcriptional regulation using the combined force of protein microarray technologies and bioinformatics. The objectives of this particular application are to identify a comprehensive list of sequence-specific unconventional DNA-binding proteins (uDBPs) and to better define rules that specify TF-DNA interactions. The central hypothesis is that unbiased, high-throughput profiling of PDIs will reveal rules and organization of transcriptional regulatory networks and pathways. This hypothesis has been formulated on the basis of preliminary data produced in the applicants<laboratories. The rationale for the proposed research is that, once a comprehensive list of sequence-specific uDBPs is generated and once a comprehensive analysis of PDIs and crystal structures of TFs is completed, we will be able to predict and test novel physiological roles of uDBPs and generate better rules that define DNA-binding specificity for many TF subfamilies in humans. Guided by strong preliminary data, this hypothesis will be tested by pursuing two specific aims: 1) Comprehensively identify DNA-binding proteins using human proteome microarrays;and 2) Identify and characterize TF recognition domains that define DNA-binding preference. Under the first aim, 500 predicted and known DNA motifs will be probed to a human proteome microarray composed of ~17,000 individually purified proteins, a new tool fabricated at the applicants'laboratory, to generate a comprehensive list of uDBPs. On the basis of a series of bioinformatics analysis and prediction, 1-2 uDBPs will be characterized in-depth to elucidate the physiological roles in transcription regulation in the applicants'laboratories. Under the second aim, the existing crystal structures of TF-DNA complexes of various species will be analyzed to identify non-contacting amino acid residues (AAs) of TFs that dictate PDIs. A selected set of the identified residues will be further tested experimentally. The approach is innovative, because it utilizes activity-based screens for uDBPs in the human proteome and an unbiased survey for contribution by non-contacting AAs to dictate DNA-binding specificity. This proposed research is significant, because it will be the first systematic profiling of DNA-binding activities in humans that will offer a comprehensive list of participants in transcriptional control and regulation, and because a set of better-defined rules will ultimately provide the scientific community with a Rosetta Stone for decoding human transcriptional regulatory circuitry. Ultimately, such knowledge has the potential to inform the development of better therapeutics for TF-related diseases. PUBLIC HEALTH RELEVANCE: The propose research is relevant to public health because generation of a comprehensive list of participants in human transcriptional control and regulation and characterization of the rules that define DNA-binding specificity is ultimately expected to increase understanding of diseases that result from defects in transcription factor function. The proposed research is thus relevant to part of the NIH's mission as this advance in fundamental biological knowledge will help identify better therapeutics for a broad range of human diseases.