Toxoplasma gondii is an important protist pathogen of humans. A genome sequence is available but the research community is greatly hampered by critical assembly errors that inhibit the ability of researchers to discover new, and to study known, gene duplications associated with virulence. This proposal focuses on the ascertainment and resolution of genome compressions and assembly errors in the reference genome sequence for Toxoplasma gondii ME49. Copy number variation is linked to differences in phenotype and virulence in many pathogens. The goal of this project is to identify and disambiguate local genome segment duplications that were collapsed/merged as an artifact of current genome assembly algorithms. Bioinformatics analysis of the assembled reference genome sequence for T. gondii strain ME49 reveals no segmental genome duplications, a highly anomalous result indicative of the extent to which replicate regions have been collapsed. In contrast, an analysis of sequence data from 62 T. gondii strains released from the community T. gondii genome project has revealed an excess of SNPs and an excess of sequence reads in many genomic locations when compared to the T. gondii ME49 reference. This finding indicates the existence of multiple repetitive regions in the genome assemblies as well as strain differences in the number of repetitive regions and SNPs each contains. The critical step needed to correct the reference genome sequence and disambiguate replicated regions is to obtain long-read single-molecule sequences (5-10 kb) that span the genome sufficiently well to cover duplicated regions. Two aims are proposed that focus on resolving duplicated genome regions via long-read sequences for several key T. gondii strains with a focus on the reference, ME49. Resolved sequences will be compared with each other to catalog the extent and types of replicated regions. Finally, comparisons of duplicated regions themselves will be used to catalog affected genes. All data will be released to the community via ToxoDB.org and archived appropriately. This study will provide a reference genome sequence with greatly reduced errors and much needed insight in the scope and potential significance of genome duplications in the evolution of Toxoplasma gondii strains.