The NIH Roadmap Epigenomics and ENCODE projects have generated a collection of 3000+ epigenomics datasets, including histone modification, DNA methylation, gene expression, and DNaseI hypersensitivity profiled across 190 cell and tissue types. In order to maximize its impact on gene regulation, cellular differentiation, and human health, novel computational analyses are needed. To address this challenge, we will develop new methods for epigenomic analysis, building on our extensive experience interpreting epigenomic information, and our preliminary studies building chromatin states, activity clusters, and regulatory motif maps for the Roadmap Epigenomics and ENCODE datasets. In Aim 1, we will characterize epigenomic differences and changes during lineage differentiation by developing new tools for systematic comparison of groups of epigenomes that directly exploit the complexity of epigenomic datasets; we will also develop methods for clustering epigenomes into developmental lineages based on automatically-learned diverse epigenomic features that distinguish them; and methods that learn the unidirectional epigenomic changes that pluripotent cells undergo during lineage commitment to gain more insights into differentiation and automatically learn to classify lineages and differentiation trajectories. In Am 2, we will seek to characterize higher-order chromatin architecture and chromatin conformation to enable systematic interpretation of cis-regulatory modules: we will develop a novel statistical approach for enhancer-enhancer and enhancer-gene linking to reveal interacting regions and their target genes based on their coordinated activity patterns across cell and tissue types; we will train a supervised learning method for predicting both constitutive and tissue-specific chromatin conformation information based on chromatin state information, individual chromatin marks, genomic distance, activity, regulatory motif information, and DNA sequence; and we will use these higher-order interaction maps to predict gene expression levels based on the combined action of multiple regulatory regions and to define the cis-regulatory architecture of each gene in the human genome. The resulting resources will be invaluable for studies of gene regulation, by revealing the set of regulatory elements that are linked to each gene, and for the interpretation of genetic studies, by revealing the set of regulatory elements which jointly act to regulate each target gene and the potential target genes of non-coding variants associated with human disease.