Proper regulation of gene expression is essential to the normal development and health of organisms, whereas aberrant gene regulation is known to cause many genetic diseases, including some inherited anemias, and it is thought to be a major contributor to complex phenotypes such as susceptibility to common diseases. Understanding the molecular mechanisms of gene regulation may provide novel candidates for therapeutic interventions. Our studies aim for a deeper molecular understanding of global aspects of gene regulation in an important biological process, the maturation of erythroid precursor cells to become red blood cells. Building on our progress using patterns in sequence alignments to predict cis-regulatory modules for erythroid genes and deciphering functional correlations of their evolutionary history, we propose to acquire genome-wide information on biochemical features associated with regulation to reach a more complete understanding of gene regulation in erythroid cells. Specifically, we propose to use high throughput biochemical assays such as chromatin immunoprecipitation followed by hybridization to microarrays and deep re-sequencing to acquire data on genomic DNA sequences (Aim 1) occupied in vivo by critical tissue-specific transcription factors, (Aim 2) bound by histones with modifications associated with gene activation or repression, (Aim 3) in chromatin with an altered structure, and (Aim 4) transcribed in a mouse erythroid cell model that undergoes maturation upon restoration of the critical transcription factor GATA-1. Then we will (Aim 5) apply existing software and develop new data-processing algorithms to determine peaks of signals that are likely to represent the locations of the features targeted in aims 1-4. Aim 6 will mine the peak-calling results, along with raw data, multiple sequence alignments and other information to investigate their covariation structure and integrate them to predict cis-regulatory modules, classify the modules by function, identify motifs associated with specific protein occupancy, and deduce the phylogenetic depth of preservation of critical motifs in the regulatory modules. Aim 7 will experimentally test biological hypotheses that arise from the analyses in Aims 6 and 7, determining the extent to which we can validate the locations of protein occupancy and transcripts, the predictions of both positive and negative cis-regulatory modules by gain-of-function cell transfection assays, and the role of motifs implicated in occupancy by directed mutagenesis and in vivo binding assays. We will test whether the motif- constraint hypothesis for protein-occupied DNA segments involved in enhancement applies to transcription factors in addition to GATA-1, and we will conduct additional experiments probing deeper biological issues. This research will provide not only global insights into mechanisms and effects of gene regulation during erythroid maturation, but the techniques and analytical tools developed here can be applied to better understand the development and differentiation of any tissue. PUBLIC HEALTH RELEVANCE: Proper regulation of gene expression is essential to the normal development and health of organisms, whereas aberrant gene regulation can cause genetic diseases, and it appears to be a major contributor to susceptibility to common diseases. Understanding the molecular mechanisms of gene regulation may provide novel candidates for therapeutic interventions. Our studies collecting genome-wide data on many biochemical features associated with gene regulation, mining the data deeply to predict functional DNA sequences, and experimentally testing those bioinformatic predictions will provide global insights into mechanisms and effects of gene regulation during erythroid maturation and provide techniques and analytical tools to better understand the development and differentiation of any tissue.