Although numerous genomes, including the human genome, have been completely sequenced, the specific function of the most of the DNA remains unknown. Identifying all the functional components of genomes has become an important goal of the NIH (e.g., via the ENCODE and modENCODE initiatives). A significant fraction of this DNA is believed to be involved in regulating gene expression, a fundamental process that plays key roles in both normal development and in disease. A basic unit for gene regulation is the cis-regulatory module (CRM;often referred to as an "enhancer"), but identification of these modules on a genomic scale has proven difficult. For the most part, computational methods for CRM discovery have been effective only in those situations where there is already an extensive body of knowledge about the transcription factors that bind to the CRMs, and the sequences (motifs) to which they bind. In this proposal, we develop novel computational tools for CRM discovery. In particular, we depart from current approaches to CRM discovery by developing algorithms that do not rely on prior knowledge of transcription factor binding motifs. By doing so, we are able to identify CRMs even in less well-studied biological contexts where significant prior knowledge is minimal or lacking. We then expand upon this approach by additionally developing methods that utilize partial prior knowledge of CRMs known to be involved in a particular biological process. We will combine our new methods with promising existing approaches to generate a computational pipeline that uses complementary strategies for sensitive and specific CRM discovery, and conduct extensive prediction of CRMs that function in many tissues and cell types. We will take advantage of the powerful genomic and experimental resources available for the model organism Drosophila melanogaster to subject all of our methods to validation both in silico and in vivo, using a large body of existing CRM data that we have compiled and extensive empirical testing in transgenic animals, respectively. The methods we develop here will be instrumental in helping to identify an important class of genomic functional element, the cis-regulatory module, in any metazoan genome. cis-Regulatory modules (CRMs) are key mediators of normal phenotypic variation, drivers of evolutionary change, and causes of birth defects as well as chronic and acute disease. Identifying CRMs genome-wide is an important first step on the way to comprehending both normal and pathological aspects of gene regulation and gene function with broad implications for understanding disease, predicting disease risk, and preventing and curing disease.