The long-term goal is to understand how human gene transcription is controlled and regulated. The hypothesis is that such an understanding may be achieved by developing mathematical models that are predictive of promoter position and tissue-specific activity by using local genetic and epigenetic information. Recently, large- scale experimental technologies have mapped a great number of active promoters in a genome and while powerful, their rates of false positives (due to aberrant, likely nonfunctional mRNA transcripts), false negatives (due to incomplete sampling of tissues and developmental stages), and other errors (due to protocol biases) remain uncertain. Consequently, it is important to have additional approaches that incorporate more comprehensive or stringent criteria, and to examine sequence characteristics that, in addition to illuminating molecular mechanisms, may permit computational prediction and direct experimental detection of additional promoters. Even when all human promoters are mapped, merely documenting their positions will not tell us how they are recognized and deployed for transcription. Therefore, as more experimental mapping data become available, the more essential it becomes to develop mathematical models to understand promoter architecture, function and evolution. Now with the complete sequencing of the human genome and localization of almost all of the protein coding genes, understanding how each of these genes are controlled and regulated has become a major challenge in the genome research. Since a gene can often produce multiple transcripts through alternative promoter usage in different cells, at different developmental stages and/or in response to different signals, understanding key elements that define and regulate alternative promoters will be a crucial task before more comprehensive gene regulation networks can be constructed Powered by the ENCODE project, new high throughput genomics technologies for attacking such problems are being developed at a rapid pace. Advanced computational approaches coupled with experimental validations are essential for the ultimate understanding of the regulatory mechanisms of gene expression. The new specific aims are: A1. Extract, compare and classify tissue-specific promoters in mammals so that they may be grouped into different (not necessarily mutually exclusive) expressional and/or epigenetical classes; A2. Identify cis-regulatory motifs/modules as promoter architecture features and their relation to tissue-specific chromatin and expression patterns; A3. Build mathematical models for tissue-specific promoter and expression predictions; A4. Conduct case studies in real regulation pathways in selected tissues. The proposed research will combine experimental and computational approaches and technologies in order to better understand mammalian promoters in terms of genetic and epigenetic cis-regulatory codes. Such models are likely to offer new insights into mechanisms of gene regulation or mis-regulation, and will generate many hypotheses for further functional studies on global regulation of gene expression. [unreadable] [unreadable] [unreadable]