C. Project Description C.1 Intellectual Merit The proposal team has had more than ten years of successful collaboration, supported by the NSF, NIH, and NIGMS, focused on the problem of computational protein docking, one of the most challenging problems in computational structural biology. This collaboration has led to the development of novel mathematical models and algorithms that, beyond proving effective in protein docking, are applicable more broadly to several other application domains. On the other hand, it can be argued that computational protein docking, as practiced by the global community of researchers, has achieved some degree of maturity without having fully attained its goals. The goal of the current proposal is twofold: (i) take advantage of the particular opportunity accorded by the DMS/NIGMS Initiative to develop a set of new mathematical methods and algorithms that can be the basis of the next generation of tools applied to computational docking problem, and (ii) address a set of specific problems of the computational protein docking that are identified in the proposal. The problem of protein docking is defined as predicting the three dimensional structure of the docked complex based on the structure of individual components. Experimental techniques for this purpose are often expensive, time-consuming, and in some cases not feasible; hence the need for computational docking methods. The problem of finding the docked complex/native conformation, is generally formulated as a minimization problem of an energy-based scoring function. The scoring function is often composed of multiple energy terms that act in different space scales and demonstrate multi-frequency behavior leading to an enormous number oflocal minima. Furthermore, the process of docking/binding involves conformational changes to the component molecules leading to a highly complex search space for the optimization problem. These features render the optimization problem extremely difficult. Most state-of-the art docking protocols, including ours, employ a multi-stage and multi-scale approach. They begin with a global search of the conformational space using a simplified scoring function in order to identify promising areas of the space. This stage is followed by local optimization using a more detailed and complete scoring function in order to remove clashes. In the final, so called refinement, stage, promising areas found in the first two stages are explored further using a medium space-scale search in order to provide a set of final solutions. It has recently become evident that due to the inaccuracy of the scoring function/energy potentials, the optimization stage outlined above can only lead to solutions that are in some neighborhood of the real solu- tion/native conformation and they invariably generate a number of false positives at the final phase, namely, conformations that have low score but are far from the native conformation. This motivates us to introduce in this proposal learning methods that combine energy with additional features in order to rank clusters of conformations at the refinement stage and improve final solutions. These methods will also be used in the pro- posed project to distinguish between binders and non-binders, a problem of great importance that goes beyond structure prediction. The methodology development part of the proposal has two distinct thrusts: optimization and learning. On the optimization front, the methods that the group has already developed have been based on formulating the optimization problem as an optimization on manifolds, more specifically on Lie groups. This formulation has been the basis for a dimensionality reduction approach that has led to more efficient algorithms and, equally important, biologically informative explanation of the docking process. In one of the tasks of this proposal we present a new representation of the Lie groups involved and a new Riemannian metric and we argue that they are better suited for the optimization problems at hand and they will address some of the shortcomings of the existing algorithms. On the learning front, using novel robust optimization techniques we introduce a new and more rigorous approach to robust regression, classification, and outlier detection, in order to (i) obtain improved ranking of clusters in the refinement stage, and (ii) address the important problem of distinguishing between 1