Gene regulation is the framework on which neuronal cellular diversity is built. The substantial cellular diversity that characterizes the central nervous system of vertebrates, such as humans, must therefore require immense regulatory complexity. Although regulatory control acts at many levels, we will focus on the roles played by cis- regulatory elements (REs) in controlling the timing, location and levels of neuronal transcripts. However, the biological relevance of non-coding sequences cannot be inferred by examination of sequence alone. Perhaps the most commonly used indicator of non-coding REs is evolutionary sequence conservation. Although conservation can uncover functionally constrained sequences, it cannot predict biological function and regulatory function is not always confined to conserved sequences. At its simplest level, regulatory instructions are inscribed in transcription factor binding sites (TFBS) within REs. Yet, while many TFBS have been identified, TFBS combinations predictive of specific regulatory control have not yet emerged for vertebrates. We posit that motif combinations accounting for tissue-specific regulatory control can be identified in REs of genes expressed in those cell types. The long-range goal for this application is to begin to identify TFBS combinations that can predict neuronal REs - a first step in developing a neuronal regulatory lexicon. We propose 3 aims to directly approach this important challenge. First, we will evaluate ~500 putative neuronal REs in vivo, prioritizing genes critical in catecholaminergic (CA) neurogenesis and function because of the prominent role of these neurons in neurodegenerative and psychiatric disorders (Aim 1), establishing a repository of regulatory data to support the study neuronal development and dysfunction. Critically such an undertaking would not be cost effective in mice. We have developed a highly efficient reporter transgene system in zebrafish that can accurately evaluate the regulatory control of mammalian sequences, enabling characterization of reporter expression during development at a fraction of the cost. Second, we will directly determine what fraction of regulatory information may be overlooked by conservation, tiling across 4 loci (approximately 150 amplicons) and testing all non-coding sequences in our in vivo assay (Aim 2). Third, we will use these and published data sets to improve upon existing computational tools, predicting/evaluating the biological relevance of sequences at genes not tested in Aims 1 and 2 (Aim 3). This application is a crucial first step towards a neuronal regulatory lexicon, independent of conservation, and subsequently for other cell types. PUBLIC HEALTH RELEVANCE: We wish to better understand how the regulatory instructions of critical developmental and disease genes are encoded in DNA sequence. We will focus on genes important for the neurons that are lost in disorders like Parkinson's disease. We also aim to establish new computational paradigms, and generate reagents, that will have wide applicability to understanding the wealth of information arising out of genome sequencing efforts.