Mass spectrometry has become a method of choice for identifying and characterizing small quantities of proteins in complex mixtures. However, the ability to perform the identification in a high-throughput fashion has depended on the availability of high-quality protein sequence databases. This means that proteins from organisms with unsequenced or poorly sequenced genomes (e.g. peptide toxins) and proteins that modify their primary sequences rapidly in response to the environment (e.g. antibodies) have been excluded from high-throughput analysis. Widespread availability of Next Generation Sequencing (NGS) has not alleviated this problem, but rather NGS has led to a proliferation of lower quality, uncurated protein sequence databases, including personalized databases, cancer databases, and databases with uncertain assembly. The traditional division between database-search proteomics and de novo peptide sequencing no longer holds; many of the most interesting biological questions are now best addressed by data analysis that combines the best of both techniques. In this Phase II STTR project, we propose to develop two commercial software products for sequencing biologically interesting peptides and proteins, regardless of the quality of sequence databases. One product will be aimed at the peptide level, with applications to variable regions of circulating antibodies, peptide toxins, and human leukocyte antigens. The other product will be aimed at the protein level, with applications to end-to-end sequencing of purified proteins, especially therapeutic monoclonal antibodies. This system will include the peptide-level sequencer as a component, as well as tools for assembly of the peptides into the full sequence and for visualization and manual validation. The proposed project has the potential for great impact on human health in areas such as vaccine development, therapeutic antibody development, and cancer immunotherapies.