Mass spectrometry has become a method of choice for identifying and characterizing small quantities of proteins in complex mixtures. However, the ability to perform the identification in a high- throughput fashion has depended on the availabilit of protein sequence databases. This means that proteins from organisms with unsequenced genomes (e.g. peptide toxins) and proteins that modify their primary sequence rapidly in response to the environment (e.g. antibodies) have been excluded from high-throughput analysis. We propose to develop algorithms and software along with improving laboratory methods that make sequencing of antibodies and peptide toxins a fast and low-cost effort. This will allow us to access the circulating antibody repertoire of individuals for clinical application including vaccine development, and to access the vast number of bioactive venom components for basic research and ion-channel drug development. For the laboratory improvements, antibody peptides and toxins will be chemically labeled to improve spectral quality and we will use different types of mass spectrometric fragmentation. Data acquisition will be optimized to facilitate identification of diagnostically relevant peptides and a gas- phase digestion strategy will be used to increase the sequence coverage for larger peptides. We propose to develop improved algorithms for sequencing of antibodies and peptide toxins. These will integrate de novo and database sequencing and will include candidate generation algorithms incorporating multiple channels of information: spectra from different charge states and fragmentation methods, homology constraints, composition constraints, and in silico mutation of databases. Improved scoring algorithms will also be developed using subtle spectrum clues, currently used only in manual de novo sequencing. We will produce prototype software, and benchmark it against manually annotated mass spectra. The software will then be applied to automatically sequence a large set of antibody data from long- term non-progressors of HIV, and spider and cone snail toxin data. PUBLIC HEALTH RELEVANCE: We propose to develop algorithms and software that make sequencing of antibodies and peptide toxins a fast and low-cost effort. This will enable the effortless generation of large amounts of antibody and peptide toxin sequences; a critical step for vaccine and ion-channel drug development, respectively.