We conduct research on the mass spectrometric characterization of proteins both collaboratively with groups in NICHD as our first priority, but we also conduct independent investigations in mass spectrometric protein characterization. A major aspect of this work is the identification of proteins isolated in biochemical investigations of other PIs. In terms of identification of unknown proteins, the MS data are used to query genomic databases to ask the general question, "Do any of the protein sequences present in the data base have expected proteolytic cleavage products with theoretical masses that match the empirically determined masses of the peptides generated from the unknown?" Three mass spectrometric approaches are available for this effort. Matrix Assisted Laser Desorption Ionization (MALDI) with Time-of-Flight (TOF) mass analysis, liquid chromatography (LC) followed by electrospray ionization with mass analysis in an instrument capable of using fragmentation reactions to generate peptide sequences, i.e. LC-MS/MS, and MALDI followed by tandem TOF analysis for the determination of peptide sequences from fragment ion spectra. With this combination of instrumentation, we are confident that, given enough material in a gel band to allow as much as 100 fmole to be available for analysis, a positive identification can be made for a protein that is described in a database. There are several areas of development that are being followed in order to improve protein characterization capabilities. First, we have begun addressing the question of providing sequence information on proteins that are not described in data bases, due to data base error or incompleteness; this incompleteness is associated most frequently with organisms having unknown or partially characterized genomes. We are taking the approach we have termed "Complete de novo Sequencing of Peptides". This approach is novel in comparison to the other widely used methods, in which the so-called "sequence tag" for a peptide is found. The sequence tag consists of determining between 2 and 5 amino acid residues from a peptide fragmentation mass spectrum along with the parent mass to search a database. Our approach requires the determination of the amino acid sequence of the entire peptide and requires the use of a MALDI tandem TOF. When the MALDI tandem TOF spectra are analyzed with software we have written for this task, our capability for sequencing represents a unique capability at NIH, and for most laboratories doing mass spectrometric characterization of proteins. Our procedure relies on the decomposition of metastable peptide ions, without subsequent collisions, in the time frame of the established by the selection of candidate ions after the first mass analyzer, but prior to the so-called "collision cell". The method yields very reliable sequences for as many as seven peptides in the range of 7 to 18 residues taken from the tryptic proteolysis of proteins analyzed as unknowns; in the case of a 30kD protein, this would amount to an absolute determination of about 30% of the entire sequence of the protein. In the past, the limitation of this approach has been due to near-isobaric interferences,i.e., precursor ion masses that cannot be resolved by the mass selection electronics, that lead to fragmentation spectra arising from multiple precursor ions, and re thus unintepretable. The use of a recently installed sample-spotting robot to enable spatial separations of LC gradients onto a MALDI target plate should reduce the number of these interferences and thus allow more extensive coverage of absolute sequences. In a area very closely related to the Complete de Novo Sequencing work, we have developed and implemented software that increases our ability to pinpoint certain types of peptides: those containing post-translational modifications and those arising from less commonly used proteases, particularly pepsin. In addition, we have made substantial progress in the detection of phosphorylation sites using a differential MALDI spectra approach that compares positive and negative ions spectra. The method employs the esterification of carboxylic acid sites using methanolic HCl; the esterified acidic residues do not ionize efficiently in negative ion MALDI, but the phosphorylated peptides are unaffected. Within the past year we have also begun a project for characterizing the protein mass fingerprints of amniotic fluid from patients who have undergone premature labor. The hypothesis of the investigation is that mass spectra can be used to differentiate premature labor leading to pre-term delivery from such labor that does not result in pre-term delivery. The methodology under development employs comparisons of MALDI mass spectra in the range of 2-20kDa obtained from samples of diluted amniotic fluid samples that have been desalted and then applied directly to the MALDI sample stage. We have developed an experimental design that allows us to characterize the variance of these spectra arising from a variety of parameters in the experiment. We have developed a mathematical/statistical approach in MatLab to automate both ANOVA and Principal Component Analysis and reliably differentiate between classes of samples. Preliminary results show that we are able to differentiate between the sources of amniotic fluid in groups of patients.