ABSTRACT ? PROJECT 2: UW-CNOF DATA ANALYSIS AND MODELING Making effective use of the large and diverse nucleome data sets generated by the UW-CNOF and other members of the 4D Nucleome Consortium requires sophisticated computational methods deployed through robust, user-friendly software. Here, we propose to create and validate such methods and to disseminate the resulting software tools to the wider scientific community. Interpreting genomic and epigenomic data requires methods that scale to very large data sets and that handle heterogeneous data types, each with its own idiosyncratic patterns of statistical dependence and noise. In addition, 4D nucleome data of the type to be generated by the UW-CNOF gives rise to new challenges. First, Hi-C data are defined over pairs of genomic loci, rendering time series analyses based on, e.g., Markov chains, inapplicable. Instead, the data are best understood under a projection into three-dimensional coordinates, with a hierarchical model that captures multiple levels of chromatin conformation. Second, the 4D nucleome includes two distinct notions of time: the relatively fast, cyclic time of the cell cycle, coupled with the slower, branching time of differentiation. Third, as we move from bulk Hi-C data to single cell Hi-C data, potentially coupled with concurrent data measuring RNA expression and chromatin accessibility in the same cells, we must explicitly account for cell-to-cell variability while still retaining computational tractability and statistical power. The project will produce two complementary software toolkits that directly address these challenges. The first toolkit (Aims 1 and 2) uses a hierarchical probabilistic mixture modeling approach to model 3D and 4D nucleome architecture, taking into account diploidy and cell-to-cell variability. In particular, we employ a cylindrical ?pseudotime? projection that jointly models cell cycle and differentiation time scales. The second toolkit (Aim 3) provides a general framework for relating Hi-C data or corresponding 3D or 4D models to more traditional genomic and epigenomic data sets, with particular emphasis on relating 4D nucleome data to gene regulation and replication timing. The proposed project builds upon the two investigators' expertise in 3D modeling of Hi-C data (Noble) and single-cell analyses (Trapnell). The software tools will be developed in close collaboration with other investigators in the UW-CNOF, helping to validate the novel assays developed in Project 1 and in turn being validated by the experiments described in Project 3 and applied to disease-relevant systems in Project 4. The software tools themselves will be made available under an open source license and will be disseminated (Aim 4) via published articles and protocols, as well as through hands-on training activities.