Project Abstract/Summary The three-dimensional organization of the genome is a major player in long-range gene regulation, where regulatory elements such as enhancers affect the expression of a gene hundreds of kilobases away. Changes in three-dimensional organization are associated with tissue-specific gene expression and have been implicated in several human diseases including cancer, diabetes and obesity. Advances in chromosome conformation capture (3C) technologies have expanded our repertoire of long-range interactions between enhancers and promoters in model cell lines and have shown that such interactions are established through a complex interplay of chromatin state, transcription factor binding and three-dimensional proximity of genomic regions. However, our current understanding of the dynamics of long-range gene regulation is limited, both across different cell types as well as across different species. This is because of the absence of such datasets in most species and cell types, lack of systematic methods to predict and interpret these interactions, and due to limited approaches to compare both the regions and their interactions across different cell types and especially across species. The overarching goals of this proposal are to develop novel computational methods to jointly identify candidate regulatory elements in multiple species and predict their long- range interactions in new cell types and species where high-throughput 3C datasets are not available or difficult to obtain. In Aim 1, we will develop a phylogenetically aware method of jointly identifying regulatory elements such as enhancers in multiple species. Aim 2 will develop multi-task and transfer learning approaches to predict interactions in new species and cell types by integrating available high-throughput 3C datasets from multiple cell types and 3C platforms. In Aim 3, we will collect a novel multi-species chromatin mark dataset in species-specific endothelial cells to enable a systematic study of long-range gene regulation dynamics. We will apply our computational approaches developed in Aims 1 and 2 on this multi-species epigenomic dataset to identify different regulatory elements and predict long-range interactions in multiple species. We will develop rigorous computational measures to evaluate the quality of predictions from our novel methods and the improvements compared to existing methods based on published 3C datasets. We will further experimentally validate predicted interactions using Capture-HiC in multiple species and using CRISPR/Cas9 experiments. We will examine individual and groups of interactions to identify species-specific, and clade- specific interactions and interpret the corresponding genes in the context of known pathways and curated gene sets associated with cardiovascular diseases. Our methods will be widely applicable to dissect long-range gene regulation in complex phenotypes including diseases. Software tools, resources, original data and experimental protocols developed by this project will be made publicly available.