DESCRIPTION Recent outbreaks of food-borne infections caused by Escherichia coli 0157:H7 have underscored the importance of pathogenic E. coli to public health. The search for the underlying genes that cause the virulent phenotype of this pathogen would be greatly accelerated by availability of extensive or a full genomic sequence of 0157:H7. We propose a cost effective means of achieving a 95 to 99% complete sequence by comparative sequence sampling versus the genome of E cold K-12, a non-pathogenic laboratory strain whose complete genome sequence is being determined in this laboratory (scheduled completion June 1997). The cost advantages of sampling are threefold. First, coverage is lower. Second, sampling avoids expensive genetic engineering steps needed to subdivide a genome for sequencing. Third, a large expenditure need not be invested in finishing sequence that has already been determined. These resources can be concentrated on directed sequencing of regions that are new. We anticipate this will provide a most efficient way to discover new potential pathogenicity determinants.