This is a proposal to examine the genome structure of the amitochondriate, parasitic protist, Giardia lamblia. The selection of G. lamblia as a model organism for genome analysis is based on its well-recognized impact on human health, its relatively modest sized genome containing 12 million base pairs distributed onto five chromosomes, its basal position in molecular phylogenies, and the lack of several of the prominent organelles, (e.g. mitochondria and peroxisomes) that characterize most eukaryotic cells. The investigators will determine 900-1000 base pair sequences from both ends of at least 22,500 randomly selected blue script clones containing 3-3.5 kb inserts. Collectively this will provide greater than a threefold "pass" or primary data for 97 percent of the genome. They will incorporate the sequence data gathered above with existing and future mapping data to assemble contigs for each of G. lamblia's five chromosomes. The initial sequence data will allow them to assemble the cosmid library into contigs covering 98 percent of Giardia genome. The investigators will use directed strategies to construct complete physical maps. They will use primer walking strategies on existing cosmid and plasmid clones to determine complete double strand sequences for coding regions and their flanking sequences which correspond to genetic elements that are conserved in more recently diverged eukaryotes or among the three primary domains (Eukarya, Bacteria and Archaea). Regions that correspond to variable surface gene clusters or developmentally expressed sequences (DESTs) will also be sequenced on both strands. Molecular phylogenetic techniques will be employed to assess patterns of molecular evolution for conserved genes. Analyses of genomic sequences from G. lamblia will be coupled with the functional genomics of this IRPG proposed by Francis Gillin. Together these applications may yield novel insights into the evolution of key pathways and organelles, variable surface proteins used by parasites to avoid host defense mechanisms, and novel genetic elements that lead to eukaryotic genome organization. The investigators estimate the total direct cost of the sequencing and mapping proposal (including informatics and phylogenetics) translates into less than $0.20/base.