We propose to determine on a genome-wide and population-wide scale genetic variations across all major diarrheal and extra-intestinal E. coli pathotypes, with the analysis of positive selection footprints in genes shared by multiple strains. The main focus of the analysis will be to characterize in detail small nucleotide variations (point substitutions and indels) that occur under positive selection in the core genes of E. coli and are adaptive to pathogenic strains. Our plan is to assemble pathotype-representative collection of E. coli that will encompass approximately 2,500 isolates selected from over 20,000 E. coli strains from around the globe. Clonal diversity of these strains will be determined and clonally diverse isolates will be subjected to high-throughput sequencing using 454 technology. Genomes of 250-300 E. coli strains (obtained de novo or available in public databases) will be analyzed for footprints of positive selection by detecting hot-spot mutations - repeated (phylogenetically-unlinked) mutations affecting the same amino acid position. Hot-spot mutations are indicative of convergent evolution - one of the strongest indications of the adaptive significance of such changes in specific environments. We expect that there are several hundred core genes affected by positively-selected hot-spot mutations. Genetic association of specific hot-spot mutations with different pathotypes will then be validated on a population-wide level. Also, we will test the clonal resolution of novel genotyping markers of E. coli for potential application in clinical and environmental diagnostics. Finally, functional effects of positively-selected variations will be investigated for genes involved in the regulation of E. coli virulence factors. PUBLIC HEALTH RELEVANCE: The E.coli Variome Project will compile all genetic variations across major diarrheal and extra-intestinal E.coli pathogens, with the analysis of positive selection footprints in genes shared by multiple strains. Genetics association of specific hot-spot mutations in core genes and novel genotyping markers will be validated on a population-wide level. Finally, function effects of pathogenicity-adaptive variations will be investigated for genes involved in the regulation of E.coli virulence factors.