Project Summary Transmission trees that define how pathogens have spread through a host network are immensely valuable to epidemiology, yet using existing methods comparing pathogen genomes such trees are difficult or impossible to obtain for many diseases. This is because the phylogenetic tree of the infectious agents is not necessarily equivalent to the transmission tree. For many pathogens the infecting population can harbor substantial nucleotide diversity, that is not adequately characterized by the genomes of one or a few isolates, and which is predicted to mislead attempts to reconstruct transmission chains. An alternative source of data to infer transmission is `shared rare variants': polymorphic sites at which more than one nucleotide is present within the infection, and which are shared among a small number of cases. The reasoning is that these reflect a transmission bottleneck that allows through more than one genotype, and so the same variant site is vanishingly unlikely to be found by chance in unrelated cases. Preliminary simulations modeling evolution of pathogens on a transmission network indicate that this approach is greatly superior to existing methods. This is further supported by recent work on viral pathogens including Influenza and HIV that correlates shared rare variants with host networks, but these methods have not been tested by experiment, or applied to bacteria. The proposed research uses deep sequencing to assay shared rare variants in populations of three bacterial pathogens: experimental transmission of Citrobacter rodentium in mice, a longitudinal cohort study of MRSA transmission in a high burden setting, and tuberculosis outbreaks. Preliminary data from the transmission experiments indicate multiple polymorphisms have arisen over the relatively short transmission chains (20 animals). The MRSA study will use samples from 4 body sites collected from ~600 recruits to the US Army undergoing basic training, and will test whether shared rare variants will be more likely to be found among close contacts reflecting the host network. This can be used to determine whether some body sites are more likely to transmit, and variants found in carriage samples can be compared with those from cases of skin and soft tissue infection to determine which body site is the likely source. The new 10X Genomics platform, which by tagging single molecules can increase resolution beyond the basic strategy, will be trialed to test whether it further discriminates between potential sources. Finally, deep sequence data from two retrospectively analyzed and identified outbreaks of TB will be assayed to develop means to infer the presence of unsampled links, which can then be applied to samples prospectively collected and sequenced by collaborators. Taken together this program of research will provide an unparalleled insight into the processes of infection within the host, which will inform contact tracing and help identify missed links in the transmission chain, allow new approaches to the study of risk factors, and allow better estimates of parameters for disease modeling.