The broad goal is to develop and apply computational methods for building data-derived models of the structure and dynamics of proteins and their assemblies. These models can give insights into how the assemblies work, how they evolved, how they can be controlled, and how similar functionality can be designed. One successful approach, integrative structure modeling, casts the building of such models as a computational optimization problem where all knowledge about the assembly is encoded into the scoring function used to evaluate candidate models. It is proposed here to extend and enhance the open source Integrative Modeling Platform (IMP; http://integrativemodeling.org/) that provides programmatic support for developing and distributing integrative structure modeling protocols. IMP allows representation of molecules at a variety of resolutions, use of scoring functions based on many types of data, and searches for solutions by a variety of sampling algorithms. In addition, IMP is easily extensible to add support for new data sources and algorithms, and is distributed under an open source license, with more than 300 unique downloads since March 2010. So far, it has been applied mostly to data from electron microscopy, small angle X-ray scattering, and various proteomics methods. The package will be extended to allow addressing a greater range of biological problems and to make it more generally useful to the scientific community. Specifically, the traditional scoring functions used by IMP will be supplemented with inference-based scoring functions that extract the maximum possible information from the data. The formulation of these functions will follow a Bayesian approach with minimal assumptions and approximations, to account for errors and incompleteness in the data as well as a heterogeneous sample. Sampling of the scoring function landscape will be improved by a method that efficiently divides the complete set of degrees of freedom into potentially overlapping subsets, finds optimal and suboptimal solutions for the subsets independently by traditional optimizers or enumeration, and then combines compatible solutions to obtain guaranteed best-scoring solutions for the whole system. IMP will also be extended to make best use of the wealth of information provided by mass spectrometry. To maximize the impact of IMP and its utility to the community, it will be interfaced with other packages, including structure viewers such as Chimera, structure prediction and design programs such as Rosetta, and web portals such as the Protein Model Portal. Finally, the software will be well-tested and documented, and the growing IMP community will be supported with mailing lists, examples, demonstrations at workshops, and hosting of select users at UCSF. PUBLIC HEALTH RELEVANCE: Project Narrative We propose to extend IMP, a computer program that can describe the three-dimensional shapes of large macromolecular machines that are not amenable to solution with a single experimental technique. These structures will allow us to better understand the workings of the cell, both under normal and disease conditions.