The recent explosion of nucleic acid sequencing capacity has given rise to routine high-throughput analysis of transcriptional and genomic variation. The goal of the proposed research is to create a toolkit of molecular reagents to enable similarly multiplexed single molecule sequencing of proteins and peptides. Recent advances in single molecule detection make this a feasible goal. However, in contrast to nucleic acid sequencing, nature has not provided us with suitable enzymes and amino acid-identifying proteins to perform this analysis. Protein design and engineering must be utilized to generate the necessary molecular reagents. The long-range strategy we envision for protein sequencing is to perform Edman degradation on single molecules. Protein engineering will be used to adapt naturally occurring proteins with intrinsic affinity and specificity for free amino acids to serve s sequence-specific binders of N-terminal residues in peptide. Visualization will be performed with single molecule fluorescence microscopy. A cysteine protease will be engineered to remove terminal amino acids to regenerate a new peptide N-terminus for subsequent rounds of sequencing. The ready availability of proteins that recognize post-translationally modified as well as the twenty canonical amino acids suggests that this method can be applied to study the post-translational state, as well as the content, of the proteome. The specific aims are 1) to engineer tRNA synthetases to serve as N-terminal sequencing reagents, 2) to modify a cysteine protease to remove N-terminal amino acids that have been modified with the Edman reagent, 3) to engineer a set of three proteins to enable the sequencing of phosphorylated amino acids. Preliminary results demonstrate that these aims are feasible. Completion of this research will move next-generation protein sequencing much closer to being a reality. PUBLIC HEALTH RELEVANCE: The complete and quantitative analysis of proteomic inventory is crucial for identifying and measuring biomarkers, those proteins whose levels differ between diseased and unaffected tissues. The ability to identify medical problems as early as possible improves outcomes. The ability to determine phosphorylation state in this analysis can make even finer distinctions between the biologically relevant states of different samples. The molecules we propose to engineer for protein sequencing could enable proteomic analysis that has excellent dynamic range, is inherently quantitative, and is sensitive to amino acid phosphorylation state.