The control of gene expression is the most fundamental process in the life of any cell and it is primarily mediated (at the single gene level) by transcription factors, the DMA-binding regulatory proteins. It has been reported that the DMA target recognition in vivo sometimes differs from the in vitro-based models. Understanding the mechanisms that govern the specific DMA recognition in a cellular environment will profoundly augment our understanding of the mechanisms of transcription factor function and will also have a major impact in biomedical research. Furthermore, it becomes apparent that new motif finding algorithms need to be developed that specifically for high-throughput protein-DNA in vivo interaction data. The immediate goal of the proposed work is to develop the methodologies and tools to efficiently analyze high-throughput in vivo protein-DNA association data (like ChIP on chip) and identify the biologically important cis-regulatory elements. The more distant goal is to understand the rules that govern the interactions of transcription factors with their genomic DMA targets. The proposed activity aims, initially, to develop such a new motif finding software by expanding and testing various methods and strategies. Tests will be based on artificial and "real" data and the strengths and weaknesses of the various methods will be assessed. The best performing methods will be used to analyze existing and new ChIP on chip data, and predict the cis-regulatory motifs, which they will be subsequently confirmed with biochemical methods. Example transcription factors will be used to study the effect of particular cis-regulatory modules on gene expression with a goal of developing the methodology that will allow for complete computational models of gene regulation to be built. Finally, a database and web-interface will be developed on and around the tools and the data we will produce that ill allow for efficient data dissemination, analysis and mining. To accomplish these goals a combination of biochemical experimentation and computational algorithmic development is needed. Chromatin immunoprecipitation experiments will be coupled with promoter microarray hybridization (ChlP-on-chip) to identify possible targets for TGFbetal-induced transcription factors in primary lung cells. The data will be analyzed statistically to infer the appropriate quantitative models of the transcription factor binding. Publicly available and newly generated gene expression data will also be analyzed statistically to assess the effect of certain cis-regulatory modules in the expression of the downstream genes.