Genome microevolution in natural populations of pathogenic bacteria is accompanied by acquisition of virulence determinants and antibiotic resistance mechanisms, both of which constitute an ongoing threat to the public health. This diversity among genomes is also a powerful source of information for typing, tracking, and inferring the evolutionary history of bacterial pathogens. In particular, genomic comparisons of pathogenic and non-pathogenic strains of the same species can reveal the molecular basis of virulence. DNA microarray-based comparative genome hybridization (CGH) is a powerful, high-resolution tool for discerning molecular differences between related strains of sequenced bacterial species. Although the method has become increasingly popular, appropriate analytical methods for the interpretation of these data are lacking. Thus, the long-term goal of this proposal is to develop a set of computational tools to automate the analysis of CGH data and facilitate its interpretation in the context of the genome annotation. The first proposed program will perform two key tasks. Statistical classification of conserved and divergent sequences based on microarray hybridization intensity ratios will be accomplished using a mixture model. Inference of phylogenetic relationships between strains based on CGH data will proceed by maximum likelihood and generalized parsimony methods. The second program will be a visualization tool that integrates graphical representations of microarray data and genome annotation, and provides rapid search and retrieval of annotation data. Development of this tool will utilize the Java programming language and employ a wealth of available open source code. The third tool will be a database for storage, search, and retrieval of genome annotation and CGH data from multiple species. This database will be based upon the Genomics Unified Schema (GUS), an open source set of definitions and tools for combined storage of genome annotation and gene expression microarray data. Modifications to GUS will permit storage and linkage of CGH data. These three tools will be developed in a phased approach, capitalizing on the combined R21/R33 application mechanism. All tools will be freely distributed to the scientific community as they become available. In addition, the classification and phylogeny program will be offered as an on-line tool, and a Bordetella genome and CGH database will be built, maintained, and served to the community on-line.