The sequence of DNA plays an essential role in initiating and controlling activities of proteins that bind and operate on DNA templates. There has been a tremendous interest in determining the molecular mechanisms that explain DNA sequence dependence of the activities of these proteins. Deciphering these mechanisms will significantly expand our understanding of many basic cellular mechanisms and will be also important in the development of means to control or inhibit the activities of these proteins for therapeutic purposes. Despite significant progress in this area, much still remains to be learned. This project is focused on RNA polymerase (RNAP), which is an outstanding example of a protein whose many functions depend on the sequence of DNA. RNA polymerase performs DNA template directed synthesis of mRNA. Each step in this multistep reaction may be dependent on the sequence of DNA template in many (sometimes convoluted) ways that are difficult to sort out. We propose that the understanding of DNA template dependence of RNAP functions (and more generally, any protein operating on DNA template) could be greatly enhanced and accelerated by experimental approaches that will allow highly parallel analysis of a large number of DNA template sequence variants. The goal of this project is to develop Next Generation Sequencing (NGS) based approach that will allow rapid accumulation of experimental data relating DNA template sequence and RNAP activity. Our focus will be on the role of the first ~20 bp of DNA template transcribed by RNAP (Initially Transcribed Sequence; ITS) in controlling promoter escape, a process where RNAP polymerase leaves the promoter to begin processive elongation of RNA product. ITS has remarkable effect on the outcomes of gene expression but the mechanisms linking the sequence of ITS with their effects on transcription remain one of the least understood aspects of promoter DNA function. Our preliminary data demonstrate that with our proposed approach, ITS sequence dependence of RNAP activity for hundreds of thousands of sequence variants could be studied in parallel. We will use this approach to collect exhaustive set of experimental data correlating ITS sequence with RNAP activities that might be important in promoter escape. Computational analysis and follow up experiments will be employed to obtain mechanistic insights into the role of ITS. The impact of this project will be threefold. First, we will obtain the data that will enable filling important gaps in understanding of the factors that control activity of RNA polymerase and determine gene expression. Second, we will develop a template for experimental approach that could be replicated in studies on other systems involving proteins operating on DNA template. Third, an exhaustive database of experimental results correlating DNA template sequence with RNAP activity will be established and could be used as a resource for hypothesis checking and hypothesis generation.