Project Summary/Abstract Allostery is action at a distance in proteins. It is a property by which perturbation at one site of a protein causes an effect at a distant site of the protein. As nature's biological switches allosteric proteins regulate virtually every major process including catalysis, transcription, transport and signaling. Kinases, G-protein coupled receptors and nuclear receptors are therapeutically important allosteric proteins that play a major role in human health. Disruption of allosteric communication caused by mutations in these proteins is strongly associated with many types of disease abnormalities including cancer. Since the discovery of allostery in the 1960s, the molecular mechanism by which perturbation at one end of a protein is allosterically transmitted to the other end remains a mystery. Understanding molecular ?rules? governing allostery is a fundamental problem in protein biochemistry and biophysics. Such rules may provide a deep insight into how proteins work through interactions between residues. Current approaches to studying allostery, which are biophysical, structural or computational, have two limitations. First, they provide an incomplete picture. Allostery involves the interplay of protein dynamics, structural changes and their effects on function. Probing each independently, as most studies do, does not give a complete understanding. Structure alone cannot explain dynamics, dynamical movement does not imply functional role, and functional studies may not provide insight into the underlying molecular causes. Second, detailed mechanistic studies of individual allosteric proteins while invaluable, are tedious, and cannot be scaled up to investigate many types of allosteric proteins. As a result, unifying heuristic rules would be difficult to infer from such individual case studies. The goal of my research is to develop a generalized, scalable method integrating structure, function and dynamics to understand, quantify and predict molecular drivers of protein allostery. With allosteric transcription factors as a model system, we will determine residues important for allostery (`hotspots') and their connectivity (`pathway') by deep mutational scanning, a method for large-scale functional characterization. We establish a critical link between structure and function by computationally modeling each mutation with Rosetta. We use machine learning on this rich sequence-structure-function dataset to recognize common molecular features (van der Waals, electrostatics, hydrogen bonds etc.) of allosteric hotspot residues, to build a predictive molecular model of allostery. Thus, we integrate high-throughput functional studies, structure-based modeling and machine learning to understand molecular rules governing allostery. Any allosteric protein whose activity can be coupled to a high-throughput screen, of which there are many, is amenable to our approach. Over time as the number and diversity of allosteric protein datasets increases, we expect that the accuracy and generalizability of predictions will continuously improve. Our long-term objective is to be able to (a) predict functional impact of thousands of mutations in disease-associated allosteric proteins revealed by genome sequencing (b) discover novel allosteric sites that improve selectivity and efficacy of drugs.