We conduct research on the mass spectrometric characterization of proteins both collaboratively with groups in NICHD as our first priority, but we also conduct independent investigations in mass spectrometric protein characterization. A major aspect of this work is the identification of proteins isolated in biochemical investigations of other PIs. In terms of identification of unknown proteins, the MS data are used to query genomic databases to ask the general question, "Do any of the protein sequences present in the data base have expected proteolytic cleavage products with theoretical masses that match the empirically determined masses of the peptides generated from the unknown?" Three mass spectrometric approaches are available for this effort. Matrix Assisted Laser Desorption Ionization (MALDI) with Time-of-Flight (TOF) mass analysis, liquid chromatography (LC) followed by electrospray ionization with mass analysis in an instrument capable of using fragmentation reactions to generate peptide sequences, i.e. LC-MS/MS, and MALDI followed by tandem TOF analysis for the determination of peptide sequences from fragment ion spectra. With this combination of instrumentation, we are confident that, given enough material in a gel band to allow as much as 100 fmole to be available for analysis, a positive identification can be made for a protein that is described in a database. There are several areas of development that are being followed in order to improve protein characterization capabilities. First, we have developed a novel approach to providing sequence information for proteins that are not described in data bases, due to data base error, incompleteness splice variants or SNPs; this incompleteness is associated most frequently with organisms having unknown or partially characterized genomes, e.g. X. laevis. We are taking the approach we have termed ?De Novo Peptide Sequencing through Exhaustive Enumeration of Peptide Compositions", EEPC. This approach is novel in comparison to the other widely used methods, in which the so-called "sequence tag" for a peptide is found within peptide fragmentation spectra, typically employing mass accuracies on the order of 0.5-1Da. Our approach requires measurement of fragment ion masses to at least 0.05 Da or better uses ions arising from the decomposition of energetic ions alone without the use of collision induced dissociation as all other current methods do. We then extend a base sequence or match potential amino acid compositions with those found in a data base generated for this work. This data base represents a totally novel aspect of the our work and consists of an exhaustive listing of all amino acid compositions up to a maximum of 2 kDa. The data base, a Length-Indexed Peptide Composition lookUp Table, LIPCUT, is indexed by both peptide length and mass in order to facilitate its being accessed during execution of the extension or matching de novo algorithms; to date five algorithms have been devised of which two, a simple extension approach and a bit-mapped matching approach are slated for further development. Using spectra from the MALDI tandem TOF instrument we have demonstrated the ability to obtain correct sequences for 21 different peptides, ranging in length from 7-15 residues, from several different protein sources. Comparison with several commercially available de novo algorithms shows that our approach gives superior results. The use of a sample-spotting robot enables spatial separations of LC eluents onto a MALDI target plate thus reducing the number of near isobaric interferences and thus permitting more extensive coverage of the protein being investigated. In a area related to protein identification and sequencing, we have made major strides in characterizing the C-terminal post-translational modifications of tubulins. Specific progress has centered on improvements in the cyanogen bromide digestion protocol and on the development of software for the the assignment of glycation, glutamylation and de-tyrosination mass spectral peaks within the families of both alpha- and beta-tubulins in samples containing multiple isotypes. These improvements have been conclusively demonstrated with the assignment of more than 60 peaks in spectra of rat brain tublins, the most complex of all tissue tubulin types with the exception of that found in testes. In addition, we have made progress continues to be made in the detection of phosphorylation sites using our differential MALDI spectra approach that compares positive and negative ion spectra. The method employs the esterification of carboxylic acid sites using methanolic HCl; the esterified acidic residues do not ionize efficiently in negative ion MALDI, but the phosphorylated peptides are unaffected. Preliminary studies suggest the possibility of being able to quantify endogenous levels of phosphorylation. The project for characterizing the protein mass fingerprints of amniotic fluid from patients who have undergone premature labor has progressed to the stage of clearly identifying patients who present with evidence of a bacterial infection along with premature labor. The overall goal of the project however is be able to differentiate premature labor leading to pre-term delivery from such labor that does not result in pre-term delivery. The methodology being developed employs comparisons of MALDI mass spectra in the range of 2-10 kDa obtained from samples of diluted amniotic fluid samples that have been desalted and then applied directly to the MALDI sample stage. Our experimental design that characterizes the variance of spectra arising from a variety of methodological factors. We have developed a mathematical/statistical approach in MatLab to automate both ANOVA and Principal Component Analysis and reliably differentiate between classes of samples. The same mass spectrometric and mathematical approaches are being used to characterize cells isolated by an improved form of laser capture microdissection.