PROJECT SUMMARY Specific protein-protein interactions are responsible for organizing the cell, for processing biological signals and information, and for the chemistry of life. Thus, understanding biological mechanism relies on understanding the interactions that occur between proteins. An important long-term goal is to develop methods for reliably predicting and rationally modifying protein-protein interactions. Such capabilities would provide insight into the molecular details of pathology and highlight opportunities for disease treatment. This proposal describes an integrated experimental/computational technology platform that will provide predictive models of protein interaction specificity. The experimental component involves constructing randomized libraries of proteins or peptides that will be sorted according to their affinities for binding a particular receptor. The identities and binding affinities for very large numbers of library members will be decoded using high-throughput sequencing methods. The data, consisting of up to 107 {sequence, affinity} pairs per sequencing run, will be used as input to computational machine learning methods. Models will be generated that capture the relationship between sequence and interactions, and the predictive power of these models will be tested experimentally. The work described in this proposal emphasizes technology development and application of the new platform to study two general types of protein complexes. First are interactions of short helical ligands with mid-sized globular proteins, here studied using anti-apoptotic Bcl-2 and Ca2+- binding EF-hand proteins. Second are interactions of short linear peptides with modular interaction domains, here PDZ and SH3 domains. These four protein families mediate an enormous number of important molecular recognition events in human cells, and the resulting models will provide valuable support to study of their biological functions. This work will also provide a stringent test of the capabilities of the proposed technology, which can then be applied to a much wider variety of molecular complexes, e.g., protein-protein, protein-small molecule and protein-nucleic acid assemblies. Given the paucity of high- throughput methods for accurately measuring protein-protein interactions, and the primitive capabilities of most computational models for predicting protein binding, the proposed technology platform has the potential to dramatically transform the study of protein interaction specificity.