Protein-DNA interaction constitutes a basic mechanism for genetic regulation of target gene expression. Deciphering this mechanism is challenging due to the difficulty in characterizing protein-bound DNA on a genomic scale. The recent arrival of ultra-high throughput sequencing technologies has revolutionized this field by allowing quantitative sequencing analysis of target DNAs in a rapid and cost-effective way. ChIP-Seq, which couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, provides millions of short-read sequences, representing tags of DNAs bound by specific transcription factors and other chromatin-associated proteins. The rapid accumulation of ChIP-Seq data has created a daunting analysis challenge. Here we propose a hidden Markov model (HMM)-based algorithm to detect genomic regions that are significantly enriched by ChIP-Seq. Our method will address complications such as sequencing bias and read alignment uncertainty. We also propose a multi-level hierarchical HMM that will allow integration of data from both ChIP-Seq and ChIP- chip. Next, we will build model-based de novo motif finding strategies that utilizing ChIP-Seq data. We believe efficient mining of all sequences identified by ChIP-Seq allows us to precisely characterize the protein-DNA interaction sites. Our long term biomedical research interest is in prostate cancer. We will apply ChIP-Seq and the data analysis tools developed in this project to investigate prostate cancer transcription (dys-) regulation. We believe effective data integration under a coherent probability framework will eventually lead to an in-depth understanding of mechanisms mediating transcription regulation in prostate cancer progression. PUBLIC HEALTH RELEVANCE: Transcription regulation plays an important role in cancer progression. The development of statistical and computational strategies proposed here will help us gain in-depth understanding of mechanisms mediating transcriptional regulation in prostate cancer progression.