Trichomonas vaginalis is a protozoan parasite that causes trichomoniasis, the most common non-viral sexually transmitted disease worldwide. The parasite is responsible for an estimated 5 million cases annually in North America alone, with over 170 million cases reported worldwide. T. vaginalis infections have been associated with preterm delivery, low birth weight and increased infant mortality, as well as predisposition to HIV/AIDS and cervical cancer. Its abundance as a pathogen, the increased incidence of HIV transmission in T. vaginalis-infected individuals and the increase in drug resistant strains underscore the societal value of obtaining the complete genome sequence of this parasite. In addition, T. vaginalis is one of the deepest-branching eukaryotes known. Given the tremendous evolutionary distance between the human host and this pathogen, its genome sequence is likely to reveal a number of candidate genes encoding potential chemotherapeutic and vaccine targets specific for the parasite. Furthermore, from a purely academic viewpoint, the complete genomic sequence of T. vaginalis will offer significant insights into the evolution of deep-branching eukaryotic organisms and will help to answer many new evolutionary questions. We propose to sequence, assemble and annotate the approximately 16 Mb genome of T. vaginalis strain G3, using a whole genome shotgun (WGS) strategy. A variety of computer programs and algorithms will be used to provide a comprehensive and current annotation of the T. vaginalis genome. This will include identifying genes through similarity searches of current databases, as well as analysis of sequences for signal peptide motifs and other motifs. Use will be made of the sequence data from the Entamoeba histolytica and Giardia lamblia genome projects, in a comparative approach to gene identification. In addition, we will generate 30,000 Expressed Sequence Tags (ESTs) to aid in gene identification. A publically accessible, user-friendly web site will be created for access to the genome and EST data during the project, and for access to the final finished genome sequence and annotation at the conclusion of the project.