It is estimated that there are approximately 80,000 genes in the human genome (Fields C., et al. 1994). To turn this genetic blueprint into a functional organism, genes must be expressed in a specific temporal and spatial pattern. Finding signals that control this expression and understanding their language is one of the major challenges of the post- genome era. Laboratory identification of regulatory elements, modules, and regions in genomic sequences is often an arduous, time-consuming, and expensive process. If specific approaches can be developed, computational analyses promise to accelerate this process at minimal cost. The long term goal of the proposed research is to develop and apply Bayesian bioinformatics computational methods which will describe the complete wiring diagram for a genome's transcription regulation system. This description will include four components: 1) the identification of all superfamilies of transcription factors and their classification into functionally related subclasses based on both the DNA recognition motifs and the activator domains; 2) the identification and characterization of a genome's transcriptional regulatory modules and all factor binding elements within them; 3) the full delineation of the connections between factors and their binding elements; 4) a characterization of alternative transcriptional regulatory motifs, including those based on DNA composition, and DNA and RNA structure. These goals will be addressed using Bayesian statistical models and algorithms, the foundations for which we developed during the current award period. These include Gibbs sampling algorithms to assembly superfamilies of transcription factors and multiply align them, transcription factor classification algorithms, exact Bayesian algorithms for the description of compositional and structural heterogeneity, RNA secondary structure, and phylogenetic footprinting, and recursive Gibbs sampling HMM for regulatory module identification and characterization.