We propose to develop and apply a comprehensive set of computational and experimental methods in order to fully characterize the regulation of mammalian gene expression at the sequence level. At the core of our approach is an information- theoretic framework for sensitive and highly specific identification of DNA and RNA regulatory elements from large-scale gene expression data and genomic sequence information. We will develop and apply a non-alignment based approach based on network-level conservation in order to identify comprehensive catalogues of regulatory elements conserved between pairs of mammalian genomes. These high-confidence predictions will then be used in order to identify distal regulatory elements composed of clusters of transcription factor binding sites. A Bayesian network learning algorithm will be employed to learn the context-dependent and combinatorial rules by which the discovered elements function to affect gene expression-both within local promoters/3'UTRs and through distal regulatory modules such as enhancers and silencers. We propose a versatile approach based on microarray profiling of phage- display selections in order to rapidly and efficiently identify the protein trans factors that specifically interact with the hundreds of novel DNA and RNA regulatory elements we expect to identify. The proposed research will significantly advance the rate and scale at which regulatory networks are characterized-both in humans, but also across a range of other complex genomes of biomedical and industrial importance. PUBLIC HEALTH RELEVANCE: The proposed research will yield tools that enable biologists to understand the regulatory code that orchestrates gene expression patterns in the human genome. The research is focused on aberrations of gene expression that accompany human cancers. As such, it promises to significantly advance our basic understanding of the cancer phenotype, with potentially important implications for therapy.