Given the key role that crystallography plays in structural biology, protein crystallization remains a significant bottleneck affecting a broad range of research programs. We developing a data-mining framework, PROSPERO, that can predict crystallization success and crystal diffraction quality based on characterization of specific protein properties available prior to large-scale crystallization trials. This will increase the efficiency and overall success rates of diffraction studies by individual research programs as well as by genome-scale projects. PROSPERO will perform a meta-analysis of many individual predictors based on statistical and machine-learning methods. A key feature of this framework is that it can dynamically re-estimate success/failure rates based on the current contents of the underlying database, and on the set of physical characterization data provided by individual users. The design will be modular, in that we will define a standard set of application interfaces (APIs) for supplying new categories of data to the core data storage, meta-analysis and prediction components. This will allow use of PROSPERO to be tailored to individual research programs, to target-specific physical properties, and to incorporate new physical characterization techniques. Our long-term goal is to grow a user community that will benefit from the continually improving predictions made by a central PROSPERO web server, that will contribute new input modules based on data produced by standard laboratory protocols and apparatus, and will also contribute to the population of the underlying database of results used for prediction. PUBLIC HEALTH RELEVANCE: X-ray crystallography is a core technique in fundamental research programs that seek to understand disease mechanisms based on the three-dimensional structure of individual proteins, of large multi-protein complexes, and of larger assemblies of proteins and nucleic acids into key components of the cell. It is also a core technique in highly targeted research programs such as the design of new drugs. This work will increase the efficiency and overall success rate of these research programs by ameliorating a key bottleneck, the difficulty of obtaining high-quality crystals of the biological entity being studied.