Our ability to identify the majority of exons in the human genome has been dramatically facilitated by the availability of extensive experimental data (EST, cDNA, and protein sequences) thereby providing training sets for the development of effective algorithms for the cfe novo prediction of such elements. In stark contrast, the vocabulary of gene regulatory regions in the human genome remains poorly defined, in large part, due to the lack of parallel experimental training sets for these sequences. Recent advances in our ability to predict which non-coding sequences have a higher likelihood of acting as transcriptional enhancers based on deep evolutionary conservation have provided some leverage for addressing this problem. In preliminary studies, we have examined 150 extremely conserved non-coding sequences in a transgenic mouse reporter assay and demonstrate that 58 of these sequences have distinct tissue specific enhancer activity. With this background, we propose here to couple our expertise in comparative genomics and high throughput mouse transgenesis to define the enhancer activity of 1,500 deeply conserved non-coding elements located throughout the human genome. We will make the results of our in vivo studies publicly available through an online database with extensive search capabilities, allowing users to bin sequences producing similar expression patterns to identify shared sequence features. These datasets will provide an essential resource for a broad group of investigators in computational, developmental, and clinical biology focused on deciphering the rules that govern human gene expression. Accordingly, this grant aims to classify the gene regulatory properties of non-coding DNA in the human genome through: (1) the characterization of 1,500 extremely conserved human DNA fragments for spatial enhancer activity in transgenic mice and (2) the development of a publicly available in vivo enhancer database to display these results. In addition, to provide the bioinformatic community with a means to test ab initio predictions of enhancers based on their analyses of our data generated in Aim 1, we further propose to (3) test 15-20 predicted enhancers by outside investigators per year in our transgenic mouse system. Lay Person Summary: The generation of the entire human genome sequence serves as a routine starting point for a huge investigator base and has aided in defining the majority of genes in our genome. However, our understanding of the sequences that regulate these genes is meager, despite their presumed alterations in human disease. Here, we propose to leverage human-fish genome comparisons to identify deeply conserved non-gene sequences and to test their ability to act as gene regulatory sequences in transgenic mice. Such a community resource is expected to significantly fill our void in gene regulatory annotation of the human genome and to decipher their mutation as a cause of human disease. . [unreadable] [unreadable] [unreadable]