The goal of the ENCODE Project is to provide the scientific community with a complete annotation of the human genome by delineating the DNA sequence features that comprise all genes, including exons, introns, promoters and cw-regulatory sequences. The pilot ENCODE project sought to develop and test a variety of experimental, computational and analytical platforms to determine the best ways to approach this problem by focusing on a defined 1% of the human genome. During this initial phase of ENCODE, the applicants of this proposal developed robust high-throughput methods for detecting and validating functional transcription promoters, DNA methylation patterns, and transcription factor occupancy in the pilot regions, and demonstrated that these approaches can be scaled fully to the entire human genome with high robustness, sensitivity and specificity. These experiences, together with the resulting technology and analysis platforms and an existing, highly productive infrastructure, lead to this response to NHGRI's RFA-HG-07-030. This application presents an ambitious proposal to expand a program to map and functionally annotate cisregulatory sequences of the human genome. The plan emphasizes full genome-comprehensivity for three experimental pipelines.: 1) a new sequence-based method called ChlPSeq to elucidate more than 600 comprehensive transcription factorDNA interactomes; 2) a similar new method called MethSeq to determine the methylation status of all the CpG-rich regions in the human genome in more than 1,000 human cell types and cell states; and 3) a high throughput transfection assay pipeline to measure transcriptional activities of 25,000 human "promoter-plus" proximal cw-regulatory domains, including at least one major promoter for each of the annotated protein-coding genes. A second major product of the promoter pipeline will be a physical resource of proximate reporter constructs for all human genes, designed to accommodate future fine-structure dissection of the promoter regulatory motifs and testing of long-distance elements. All of the experimental work in this project will be subjected to analysis with appropriate quality metrics. In addition, comparative genomics and other computational analyses will be integrated with the experimental production to help prioritize and shape input to the pipelines and to capture information in forms useful to both biologists and genomicists. These analyses will produce several large-scale deliverables, including hundreds of ChIP data-driven sequence motif models, some of which additionally leverage evolutionary conservation for each of hundreds of transcription factors.