This project will develop computer programs to exploit the Human Microbiome Project (HMP) DNA sequences to better understand DNA-protein interactions. The interactions between transcription factors and the DNA sites that they bind to are critical to controlling the expression of the genes within each species, and therefore also the characteristics of each species and its interactions with the human host. The transcription factors themselves can be readily identified from DNA sequences and we will take advantage of the fact that most bacterial transcription factors regulate themselves and/or adjacent genes within their chromosomes. Transcription factors can be clustered into groups that are expected to recognize the same patterns of DNA, based on known structures for similar proteins from well studied bacteria. Together the clusters of proteins with very similar specificity and the probable regulatory regions of nearby promoters will give us a very large number of potential DNA-protein interacting sites on which to apply pattern discovery algorithms. This should not only help us to learn about the regulatory networks within the HMP species, but also lead to more general understanding about the relationships between transcription factor proteins and the DNA patterns that they recognize. This will have broader implications across several areas of biological research and may lead to the design of new proteins with novel specificities that could be useful as research tools and for therapeutics. PUBLIC HEALTH RELEVANCE: The Human Microbiome Project will obtain DNA sequences from many different species inhabiting many different microenvironments of the human body. This project will develop computer programs to analyze those DNA sequences to help discover how the expression of the genes in those species is regulated. The regulation of gene expression is a key element in understanding the interactions between the microbial communities and the human host.