Large-scale biological datasets (e.g. genetic and protein-protein interactions) are becoming easier to systematically produce in a variety of organisms, but it can be difficult to extract testable hypotheses on how individual proteins function. The overall objective of this research is to develop an experimental and computational platform that helps to address this gap between high-throughput information at the genomic scale and detailed mechanistic analysis of biological processes at the protein, protein domain and amino acid residue scale. To achieve this, the proposal integrates the complementary expertise of two investigators at the University of California-San Francisco in structural biophysics and computational protein modeling and design (Tanja Kortemme) and in large-scale, quantitative genetic and protein-protein interaction mapping strategies (Nevan Krogan). This work will specifically focus on specificity and promiscuity of protein recognition domains that mediate a considerable fraction of interactions in all biological processes. The central hypothesis this project will test is that there exist biologically important differences between the functional and biochemical overlap of members of a domain family. To test for such differences, we will simultaneously characterize the functional processes all members of a major domain family are involved in, and how these functions relate to the intrinsic protein recognition preferences of the family members. As a proof of principle, we aim to interrogate the family of 23 SH3 domain containing proteins in the model organism S. cerevisiae. SH3 domains have considerable biological importance: they are involved in a several critical processes in signal transduction, reorganization of the actin cytoskeleton, stress response and endocytosis. More practically, SH3 domains were selected as a manageable model system due to the amount of structural and biochemical data accumulated for this domain family. Aim 1 uses an unbiased large-scale genetic interaction mapping strategy to genetically interrogate SH3 domain deletions in all SH3-containing proteins in budding yeast so that their in vivo relevance can be studied. Aim 2 proposes to use this information, along with previously published physical interaction data, to aid in structure-based predictions of the recognition specificity of individual SH3 domains. Computational strategies using RosettaDesign will be used to reengineer domains to tune interaction specificity and promiscuity. These predictions will be tested in Aim 3 using biochemical, functional and genetic approaches and the resulting data will be used to refine the models generated in Aim 2. In the future, we intend to extend our findings and the experimental platform this project seeks to establish into other species, initially into fission yeast, but ultimately to higher organisms. We expect our developed framework to be broadly informative for applications in molecular reengineering as well as for development of therapeutics acting on interconnected protein networks. PUBLIC HEALTH RELEVANCE: While our knowledge of specific disease-causing pathways has been increasing rapidly, our ability to harness this knowledge for therapeutic strategies has been hampered by the complex interconnected nature of the different pathways. We aim to develop a widely applicable experimental and computational platform that will enable a more comprehensive understanding of how disease-relevant pathways are interconnected and therefore how they influence each other. We expect that widespread use of our platform will help towards the development of more effective therapeutics with fewer unwanted side effects.