Robust genome sequencing technology has resulted in over 180 completed genomes, with sequencing projects for an additional 700+ organisms in progress. The difficult and important problem of experimentally determining the proteins encoded by these genomes lags far behind. We propose to complement existing messenger-RNA based approaches with high-throughput mass spectrometry of the entire protein complement of a complex animal, the nematode Caenorhabditis elegans. Our approach combines open- reading-frame (ORF) analysis of the fully sequenced C. elegans genome with high-throughput mass spectrometry, using multidimensional protein identification technology (MudPIT). Our long-term goal is development of these methods to the point that at least 80% of all proteins in a newly sequenced organism can be identified in a few months of concerted effort by a small group of investigators. This goal requires development of the following tools: 1) efficient evolutionary analysis of genomic ORFs to identify a computationally manageable set of candidate peptides for mass spectrum matching; 2) a robust method for biochemical fractionation of intact proteins from whole organisms or tissues; and 3) analytical approaches to assessing the significance of MudPIT matches to specific candidate peptides. Peptide cleavage, fractionation, and 2-dimensional (2D) mass spectrometry methods are established in our labs and are currently sufficient to achieve our goal with the addition of these tools. Our specific milestone for this 2-year grant period is the identification of at least 10,000 unique proteins (>50% of all predicted proteins in C. elegans) and validation by orthogonal methods of at least 50 of the proteins that are not yet supported by other data. The end result will be both an extensive map of the C. elegans proteome and a high-throughput pipeline that will allow similar analysis of any complex animal or plant proteome whose genome sequence is available. [unreadable] [unreadable] [unreadable]