Parasitic nematodes infect over half the world's population, resulting in significant morbidity and mortality. Characterization of nematode genomes provides fundamental molecular information about these parasites accelerating both basic research as well as the development of effective diagnostics, vaccines, and new drugs. After completing the C. elegans genome, Washington University's Genome Sequencing Center (GSC) has generated and immediately made public over 210,000 expressed sequence tags (ESTs) from 25 nematode species, including representatives of all the major groups of human parasites such as hookworms, filarial worms, whipworms, and Ascarids. This application seeks to extend the available parasitic nematode sequence data and enhance its value to the research community through three aims. First, we will generate 125,000 new sequences from parasitic nematodes for less than half the cost per read of our original 1999 proposal. Sequencing efforts will focus on normalized libraries from the prevalent human and animal geohelminths Ascaris and hookworm that together infect over 2 billion people with the goal of identifying over half of the genes in each species. An Ascaris microarray will be produced and used to examine differences in gene expression over embryogenesis and in adult organs. Second, we will use nematode.net and wormbase.org to provide the parasitology community with bioinformatics databases and tools that are user-friendly, integrated, and lasting. Features to be created or expanded include the NemaGene cluster consensus sequences assembled from all available nematode sequence data, the NemPep database of nematode peptide sequences, Gene Ontology and Interpro protein domain classifications, and codon usage tables. In addition to its use by parasitologists, the availability of such information will provide important evolutionary context to developmental and genetic studies in the model C. elegans. Third, we will investigate novel areas of nematode biology by identifying nematode-specific protein domains with currently unknown functions. Regions of NemaGene sequences lacking Interpro domain coverage will be clustered and aligned by amino acid sequence to create new candidate domains with the goal of generating 250 novel high quality domain models for submission to Pfam/Interpro. Working with collaborators, a limited number of novel domains will be characterized in more detail with an emphasis on postulating molecular or cellular function.