During this reporting period the Laboratory of Genetics and Physiology has made major progress in our understanding of strengths and weaknesses of CRISPR/Cas9 genome editing technologies. These tools are widely used for the introduction of mutations into the mammalian genome, both for basic and translational research. We have also made progress in understanding molecular mechanisms on how hormones activate genes in the mammary gland. Lastly, we were part of a project developing software needed to conveniently analyze large number of genome-wide data sets aimed at investigating the binding of proteins to the genome. Applying CRISPR/Cas9 editing in the mouse genome CRISPR/Cas9 genome editing technologies provide unprecedented opportunities to conduct tailored engineering of the genome. It permits scientists to address fundamental problems, both in basic science and in translational medicine. However, successful use of CRISPR/Cas9 editing requires exceptional fidelity, i.e. only the introduction of intended genetic changes and a complete absence of unintended changes. This is particularly important for its application in gene therapy where precision cannot be compromised. Unwanted genetic changes introduced by CRISPR/Cas9 could result in the disruption of vital cellular functions. It was therefore critical to determine the degree of fidelity of CRISPR/Cas9 genome editing. There was little published information on potential negative byproducts of the CRISPR/Cas9 genome editing technology in mammals. We addressed this issue using two distinct approaches. First, we used CRISPR/Cas9 editing to mutate 17 sites in the mouse genome and subsequently analyzed molecular consequences at the target sites. Secondly, we used whole genome sequencing (WGS) to determine whether CRISPR/Cas9 introduces unwanted mutations in any of the more than three billion bases of the mammalian genome. Our comprehensive analysis of 17 genomic sites has demonstrated a high frequency of large deletions and insertions at target sites (PMID: 28561021). This study has now been expanded to more than 30 genomic sites edited by CRISPR. Notably, the simultaneous introduction of mutations in linked loci (sites on the same chromosome) resulted in the removal of sequences between target sites, demonstrating severe limitations of CRISPR/Cas9 genome editing. To avoid such problems, we currently experiment with deaminase base-editing tools. Our studies published in May of 2017 (PMID: 28561021) have been supported by papers published in 2018 in Nature (PMID: 30089924, 30089922) and Nature Biotechnology (PMID: 30010673). A key concern of CRISPR/Cas9 genome editing is the potential for creating mutations at non-target sites and the identification of hundreds of non-targeted mutations in CRISPR/Cas9-treated mice had been reported by others (PMID: 28557981). Shortcomings of that analysis were the failure to compare parents to progeny, a necessary prerequisite for discriminating de novo mutations from pre-existing variants in the strain background and the small number of samples examined (one control plus two CRISPR-Cas9-edited animals). As discussed in an Editorial in Nature Methods, there is a need for an understanding of in vivo genomic effects of CRISPR. We have addressed this question, designed a parent-progeny study and conducted unbiased whole genome sequencing (WGS) on six CRISPR-Cas9-edited mice, six control mice and their 24 wild-type parents (C57BL6/N strain) with the goal of determining the frequency of de novo mutations. We did not detect an increased number of spurious off-target mutations in edited mice (Willi et al., accepted for publication). However, unwanted deletions and insertions at target sites remain a concern (PMID: 28557981). Structure and function of super-enhancers Super-enhancers comprise dense transcription factor platforms highly enriched for active chromatin marks. A paucity of functional data led us to investigate over the past two years the role of super-enhancers in the mammary gland, an organ characterized by exceptional gene regulatory dynamics during pregnancy. Previously we have identified 440 mammary-specific super-enhancers where the master regulator STAT5A integrates the glucocorticoid receptor, H3K27ac and MED1 to highly activate genetic programs during pregnancy (PMID: 27376239). As part of this study we discovered a hierarchy of enhancers within super-enhancers, pointing to complex interactions between regulatory elements. We have now built on these discoveries and expanded our knowledge on mechanisms by which super-enhancers activate genetic programs along a developmental program. We have investigated how persistent hormonal exposure during lactation shapes an evolving enhancer landscape and impacts the biology of mammary cells. Employing ChIP-seq, we have uncovered a changing transcription factor occupancy at mammary enhancers, suggesting that their activities evolve with advancing differentiation. Using mouse genetics, we demonstrate that the functions of individual enhancers within the Wap super-enhancer evolve as lactation progresses. Most profoundly, a seed enhancer, which is mandatory for the activation of the Wap super-enhancer during pregnancy, is not required during lactation, suggesting compensatory flexibility. Combinatorial deletions of structurally equivalent constituent enhancers demonstrated differentiation-specific compensatory activities during lactation. In summary, our studies have been the first comprehensive in vivo approach to understand complex genetic programs activated by cytokines. Developing a software package to conveniently analyze large data sets Next generation sequencing technologies permit the relatively easy generation of vast data sets, and results from thousands of RNA-seq and ChIP-seq studies have been deposited in public data bases. A bottle neck in understanding the biological significance of these results is the complexity of retrieval and analysis of large-scale data sets. In a collaborative effort with a research group, headed by Keunsoo Kang - a former member from our lab, from Dankook University, a pipeline was developed that permits an automated analysis of public epigenomic and transcriptomic next generation sequencing data (PMID: 29420797). Octopus-toolkit (PMID: 29420797) is a stand-alone application for retrieving and processing large sets of next-generation sequencing (NGS) data with a single step. Octopus-toolkit is an automated set-up-and-analysis pipeline. Upon the installation, Octopus can automatically retrieve original files of various epigenomic and transcriptomic data sets, including ChIP-seq, ATAC-seq, DNase-seq, MeDIP-seq, MNase-seq and RNA-seq, from the gene expression omnibus data repository. The downloaded files can then be sequentially processed for advanced analyses and visualization. Overall, Octopus-toolkit facilitates the systematic and integrative analysis of available epigenomic and transcriptomic NGS big data.