This application addresses broad Challenge Area (08) Genomics, and specific Challenge Topic, 08-OD-101, Computational approaches for epigenomic analysis. While the primary DNA sequence of the human genome is ultimately responsible for the encoding and functioning of each cell, a plethora of chromatin and DNA modifications have been described in recent years that can modulate the interpretation of this primary sequence. These epigenetic modifications lead to the diversity of function across different human cell types, and play key roles in the establishment and maintenance of cellular identity during development, and also in health and disease. The human ENCODE project, the NIH Epigenome Roadmap, and several other large-scale experimental efforts are currently underway to map dozens of histone and DNA modifications across multiple human cell types and disease states, generating a diversity of rich epigenomic datasets. This creates a pressing need for the development of rigorous computational methods for the systematic integrative analysis of epigenomic datasets, and for understanding their relationship to other genomic datasets, including gene expression, disease association, and phenotypic profiling. In this proposal, we will develop and apply graphical probabilistic models for describing chromatin modifications, based on multivariate hidden Markov models. We will use these models to discover the set of underlying chromatin state, based on recurrent combinations of epigenetic marks across the entire genome (Aim 1). We will validate and functionally characterize these states based on their enrichments and positional biases with respect to existing functional elements, as well as large-scale gene expression and disease association datasets (Aim 2). Lastly, we will extend these methods to study dynamics of chromatin state across both healthy and disease cell types, and study how these correlate with functional differences between the observed cell types (Aim 3). We will work closely with the scientists involved in data production and facilitate communication and data integration across them, and also with data analysis and coordination centers already established to facilitate sharing of methods and results across the ENCODE and Epigenome Roadmap consortia, and with the larger community. Overall, the proposed integrative analysis of large-scale genomic and epigenomic datasets will provide a unified view of current and planned epigenomic datasets, towards a systematic understanding of gene and genome regulation in health and disease. While the primary DNA sequence of the human genome is ultimately responsible for the encoding and functioning of each cell, a plethora of chromatin and DNA modifications have been described in recent years that can modulate the interpretation of this primary sequence, leading to the diversity of function across different human cell types. This project will create a computational framework and resource to integrate large-scale genomic and epigenomic datasets, to understand their functional role in health and disease, and to understand their dynamics across different cell lines and disease states. The knowledge gained can play key roles in understanding the establishment and maintenance of cellular identity during healthy development, and how dysregulation of these processes can lead to the onset of disease.