Project Summary/Abstract Transcription factors and microRNAs are essential regulatory molecules (RM) that control messenger RNAs (mRNA) and are known to be dysregulated in human diseases. Each RM may affect multiple pathways of the cell which is both a blessing and a curse. If a therapy targets the proper RM, it can attack the disease from multiple fronts and increase efficacy. On the other hand, targeted therapy may result in serious adverse effects due to our limited knowledge of the downstream causal effect of RM manipulation. Although the local bindings between single RMs and their targets have been studied computationally and experimentally, the intensity of functional consequences of such bindings on the transcriptome is unclear. Here, I propose statistical machine learning techniques and causal inference methods to predict the observed variability of gene expression using only regulatory molecules and estimate their downstream causal effect on the entire transcriptome. To achieve this goal, I start in Aim 1 by building a multi-response predictive model to predict the whole transcriptome using only RMs. This goal is challenging because the dimension of the response vector is more than the number of samples and I will use techniques from high-dimensional statistics to address this issue. In Aim 2, I will go beyond predictive modeling by estimating the causal effect of RMs on the transcriptome using invariant causal prediction. I will leverage the rapidly growing literature which connects causal inference to invariant prediction accuracy across heterogeneous data sources to infer the causal effect of RMs on mRNA. Having developed both predictive and causal models of RMs contribution to gene regulation, in Aim 3 during the R00 phase, I will focus on the most recent advances in double/debiased machine learning which allows the use of scalable machine learning methods for reliable estimation of causal effect of RMs on transcription. My proposed research will bring the most advanced statistical machine learning and causal inference techniques to genomics research and help design more effective targeted therapies by providing insights into the global role of RMs in gene expression regulation. During the training phase of the award, with the support of my outstanding mentoring team and scientific advisory committee, I will gain expertise in molecular biology and genomics while perfecting my knowledge of causal inference and machine learning. The Ohio State University Comprehensive Cancer Center ? James Hospital and the Mathematical Biosciences Institute will provide me with the ideal interdisciplinary environment to bridge data science and genomics and will help me achieve my career development goals and transition to a tenure-track faculty position.