Gene expression is largely controlled by regulating transcription of specific segments of the genome. Cataloging and understanding the protein-DNA interactions that control gene expression is essential to understanding the normal network of interactions and how they are perturbed in disease states. New technologies have greatly increased the DNA sequence that is available and have generated many new types of data related to the protein-DNA interactions. This gives us the opportunity to develop computational models that are much more comprehensive than previously available, but increasing the accuracy of the models is essential to maximizing the biological information obtained from the data. The objectives of this proposal are to develop improved algorithms for modeling protein-DNA specificity which have greatly reduced false positive and false negative rates compared to current methods. These methods will take various types of data as input, including qualitative and quantitative binding site data, and develop models of appropriate complexity for each important factor. We will also develop improved software for the discovery of regulatory sites using data that is currently being generated in high-throughput experiments. Finally we will develop improved methods for determining which transcription factors encoded by the genome interact with which motifs that are identified by motif discovery methods. Project Narrative: Many diseases are associated with mis-regulation of gene expression. Our studies will help to understand the normal mechanisms of gene regulation and also to pinpoint specific points of error in those disease states.