Gastroenteritis caused by the bacterium, Clostridium difficile, is a global healthcare problem and a common complication of antibiotic therapy. A subset of toxin-producing C. difficile strains has recently emerged to cause large nosocomial outbreaks and an ongoing C. difficile pandemic. One hypothesis to explain the emergence of pandemic strains is the overproduction of toxins and an increased ability to form environmentally-resistant spores. However, recent studies that include data from a larger set of representative isolates challenge this hypothesis and illustrate the need for novel genetic characterizations that improve the surveillance and molecular epidemiology of these pathogens. The work proposed here will provide genome sequences of a broad diversity of C. difficile isolates and test the hypothesis that certain genomic traits are significantly associated with the severity of human C. difficile infection (CDI). Isolates will be collected from symptomatic hospita inpatients as part of an ongoing C. difficile clinical study at the University of Michigan. A totalof 1,500 isolates will be genotyped using high-throughput fluorescent-PCR ribotyping and subtyped using multi-locus variable number tandem repeat analysis (MLVA). Based on these characterizations, 150 representative isolates spanning the genotypic diversity of isolates and clinical spectrum of CDI severity will be selected for genome sequencing. Isolates from an additional ~200 CDI cases in our collection are presently being sequenced as part of a parallel project. Sequence data for all isolates (~350) will be bioinformatically manipulated and analyzed with respect to clinical parameters from the patient record database. Results from this analysis will epidemiologically link C. difficile genetic diversity in the form of single nucleotide polymorphisms (SNPs) with CDI severity and response to therapy. SNP data will be used to develop a novel phylogenetically-based typing system for the rapid genomic characterization of C. difficile strains. SNP typing will be used to characterize the genomic diversity of a large collection of isolates (n=5,000) and assess the bacterial genome-based correlates of CDI severity. The completion of this research requires that the candidate complete a didactic training program designed to specifically master concepts and tools from the fields of bioinformatics, computational biology, and genomics. A combination of coursework, mentorship from leaders in the field, and experiential knowledge to be gained from in depth data analysis will ensure the success of the research and the future career of the candidate.