The regulation of gene transcription is crucial for the function and development of all organisms. While gene prediction programs that identify protein coding sequence have been used with remarkable success in the annotation of recently published genomes, the development of computational methods to analyse noncoding regions and delineate transcriptional control elements is still in its infancy. Using Drosophila melanogaster as a model, we intend to develop and validate computational strategies to locate the transcriptional control modules in the noncoding regions of the genome of higher eukaryotes where multiple inputs come together to control a gene in a particular context, and to parse them into individual transcription factor binding sites. Focusing on two transcriptional paradigms, the patterning of the early embryo by the segmentation genes and the specification of the glial cell fate by the master transcriptional regulator, Glial cells missing, we will develop algorithms that use raw genomic sequence and supplemental information such as the regulatory region of the orthologous gene in a related species, examples of transcription factor binding sites pertinent to a certain class of sequence modules, and one or more related sequence modules with unknown protein binding sites. All computational strategies will rely on probabilistic models of how the genome encodes regulatory information and will be built in part on algorithms we have developed for yeast. They will exploit the frequent occurance of multiple copies of the same binding motif in a module, and the enrichment of certain combinations of motifs in a module in comparison with the genome at large. The proposed research involves the close collaboration between a computational and an experimental group: In order to validate computational predictions in vivo, we will use reporter gene constructs to test putative regulatory modules, whole mount in situ hybridizations to determine whether a gene is expressed in a specific tissue, and DNA chips for genome-wide expression profiling; DNA chip data will also furnish raw input for further computational analysis. The results of the validation experiments will be used to refine and improve our algorithms. Finally, the analysis will be extended to other higher eukaryotic genomes, and the computational tools we develop will be made available to the scientific community at large.