The majority of our knowledge of biological structure comes from crystals. With the structure of a macromolecule one can visualize how it works, and how it interacts with other macromolecules - one can see life on an atomic scale. Among methods employed to reveal the details of molecular structure, none rivals single crystal X-ray diffraction for its generality of application (85% of the contents of the protein data bank), clarity of view, and lack of ambiguity in the interpretation. Crystallization is a significant 'bottleneck' in structural biology. The Hauptman-Woodward Medical Research Institute (HWI) operates a mature High- Throughput crystallization-Screening (HTS) laboratory for the general biomedical community and Structural Genomics groups. Macromolecular samples are screened against 1,536 different chemical cocktails that encompass both an incomplete factorial sampling of chemical space and examples of commercially available screens. Images of all the crystallization experiments are recorded for six weeks at weekly intervals. These images are archived. To date we have built up a library of over 90 million time-resolved images of almost 16 million crystallization experiments comprising over 10,000 biological macromolecules, each combined with the different chemical cocktails. We hypothesize that by analyzing the outcomes in terms of chemical space and dynamics, a temporal phase diagram of solubility can be constructed. From this phase diagram, optimized crystallization conditions and factors that drive that optimization can be predicted. In a multidisciplinary approach, we will use this data to develop an expert crystallization knowledge system. Our aims are to accomplish this by (1) making the data archive readily and rapidly accessible; (2) to continuously acquire and update data as it becomes available; (3) to use the data to establish trends and guide crystallization; and (4) to develop new crystallization knowledge from the data. The cocktails used for crystallization screening chemically decrease the solubility of the macromolecules, driving the system to a state of supersaturation that can lead to crystallization. We will focus on structural genomics samples (~40% of our data) where complete information about the sample is available. We will incorporate an X-ray feedback mechanism to supplement the visual data for characterization of both crystal and precipitate. Initial studies show that analyzing the outcomes in terms of chemical space and dynamics does produce an empirical phase diagram of solubility over time. From these preliminary studies with a limited amount of this data, we have defined trajectories to traverse this space effectively, rationally guiding successful crystallization. Using screening data and historical trends, we will generate specific chemical advice, based upon statistical and probabilistic analysis of the whole dataset, describing how to crystallize and optimize individual samples. We will also identify trends in crystallization behavior as a function of the biochemistry. This approach will greatly improve the transfer of information from the crystallization- screening laboratory to immediately benefit the almost 900 different laboratories that are currently making use of the service. By incorporating commercially available screens, we can relate in-house data to screening results from other laboratories, expanding our analysis to develop crystallization and optimization strategies for samples beyond those we set up in the High-throughput crystallization-screening laboratory. Our data analysis will improve the success rate of crystallization in general, and enable structural studies of a wider range of biologically and medically relevant macromolecules. PUBLIC HEALTH RELEVANCE: The majority of our knowledge of biological structure comes from crystals (85% of the structures deposited in the protein databank). Crystal growth has repeatedly been identified as the rate-limiting step in macromolecular structure determination. By the analysis and use of a unique data set of over 16 million crystallization experiments and 90 million images, we can improve the crystallization process by providing specific advice on optimization, and establish general predictive information for the biomedical structural biology community in general.