PROJECT SUMMARY My long-term research goal is to understand the organization and function of Cis-regulatory modules (CRMs) in the human genome, with a focus on their impact on development and disease. CRMs, such as promoters, enhancers, and insulators, are DNA elements that regulate gene expression. Genome-wide association studies (GWAS) have shown that most variants associated with a phenotype or disease are located outside of protein- coding regions and are postulated to affect gene expression levels through CRMs. Therefore, understanding the organization and function of CRMs is key to identifying the causes of genetic diseases and providing an essential backbone for precision medicine. Even though millions of putative CRMs have recently been identified with the help of high-throughput assays, it remains challenging to pinpoint functional CRMs that regulate tissue and developmental stage-specific transcription. In fact, a large proportion of the CRM variants identified so far have no-to-mild effects on the phenotype. As a result, those insights have very limited clinical application. Over the next five years, the goal of my research is to accurately identify causal CRM variants that affect normal blood cell development and impact childhood blood disorders. Several major hurdles must be overcome to achieve this goal. First, mounting evidence indicates that the expression fluctuation is an important trait for genes. Importantly, the tolerance of expression fluctuation varies among different genes. We reason that CRMs modulating transcription of highly expression-sensitive genes tend to be essential to cell function and harbor pathological non-coding variants. However, our understanding on expression-sensitive genes and their underlying biology is still rudimentary. Secondly, different epigenetic modification markers are routinely used to map potential CRMs. However, in many loci, those epigenetic markers are not required by CRM functions. Overreliance on associative, instead of causative, markers can confuse accurate identification of biologically important CRMs. Thirdly, while the genetic code of protein-coding sequences has been discovered for decades, the similar ?grammar? for non-coding sequences and CRMs in particular is still lacking. As a result, we are not able to predict how CRM variants affect their regulatory functions. Based on those challenges, we ask three fundamental questions: 1) How to systematically identify expression-sensitive genes? 2) How to decipher the causative mechanism of CRMs? 3) How can single-nucleotide variants (SNV) affect CRM functions? If successful, the proposed studies will identify functionally important CRMs controlling health-related traits and pinpoint pathological non-coding variants within those CRMs. Better understanding the anatomy and function of CRMs will facilitate precision medicine by allowing us to treat genetic diseases by manipulation of CRM function via gene editing or pharmacological approaches.