Transcription is at the heart of the regulation of gene expression, yet the computational analysis of transcription regulation currently faces a number of challenges and opportunities: The large number of sequenced genomes allows to study and exploit the conservation of regulatory sequences, but algorithms that do so in a rigorous framework are still scarce. Detailed data of spatiotemporal gene expression has become available, enabling us to use this information to elucidate regulatory interactions in the development of complex organisms. The long-term goal is to build computational models to infer regulatory networks and their evolution in the development of model organisms and ultimately humans. The objective of this particular proposal is to develop algorithms to analyze the conservation of gene regulation on the sequence level, as well as an integrated approach to model conserved regulatory regions important for development. Its specific aims are: (1) To decipher the precise requirements to define a functional transcription start site, based on a comparative study of the conservation of core promoter elements in two fly genomes, and build a model for genome-wide comparative annotation. (2) To develop and implement an efficient progressive multiple alignment algorithm for non-coding regulatory sequences based on phylogenetic hidden Markov models, and to study the evolution of core promoters in a wider set of species. (3) To extend the framework set by this algorithm to more complex regulatory modules (such as developmental enhancers and E2F target genes), and to incorporate prior information on putative upstream factors to predict regulatory interactions. Computational predictions will be validated by a small number of experiments. The proposed research is expected to advance the understanding on the evolution of regulatory regions, and how to build computational models that accurately utilize sequence information from several species. Relevance to public health: Understanding how gene regulation is encoded in the genome is undoubtedly one of the most interesting challenges in molecular biology today, and it is intuitive that errors occurring in this machinery lead to mis-expression of genes, and may often be important in genetically based diseases. Our research will help to find the exact regulatory regions in DNA, both computationally and experimentally, and to learn the mechanisms that control the expression of genes in model organisms and humans.