In one group of studies we have been studying the very most conserved parts of proteins, which comprise a small fraction of the total structure, typically 6 20 residues. At the same time we are also investigating the correlations of various molecular properties, such as packing, with the sequence conserved regions. Understanding the physical basis in structure for sequence conservation would enable us to make some critical predictions of protein cores from sequences alone. We studied several proteins, most recently lysozyme and a-lactalbumin, where we find a nucleus of residues that connect among all of the secondary structure elements and could act as a critical nucleus. In addition we are developing several new approaches to threading to consider cores and sequence conservation in a comprehensive way. We also have been studying protein folding intermediates by fluorescence to determine intramolecular distances in order to compare these with the native structure distances. In cases to date, the distances in the molten globule intermediates are similar to the native state distances. Another goal of computational biology is to understand molecular mechanisms. Protein structures treated with conventional molecular dynamics have not been so informative about large scale motions. We are investigating protein dynamics with a new coarse-grained model having only one point per residue. This new approach represents a simple way to infer functional behavior from structures. It considers fluctuations about known protein structures based on a Gaussian network model. This procedure has been shown to sample satisfactorily the distribution of residue fluctuations around the native conformation in proteins, and to yield remarkably good agreement with crystallographic temperature factors and hydrogen exchange data, for a broad variety of proteins and nucleic acid structures. Since this method is simple, results are intuitive and compelling. The approach yields a series of modes of motion, typically hinge bending motions, including even the slowest, most global motions. In a recent method development we have extended the calculations from scalar to vector, so that we are now able to follow translational deformations. This opens new and exciting prospects for comprehending the total functional dynamics of extremely large, even supra-molecular structures. In another computational improvement, the time required for calculations has been significantly reduced by one multiple of the size of the molecule. Our studies with this approach have included: 1) subunit communications within tryptophan synthase; 2) reverse transcriptase in which we showed how the anti-correlations between the motions of the fingers/thumb binding site and the ribonuclease H site could lead to a step-wise processing mechanism for the progression of the nucleic acid chain through the enzyme in a release-pull-turn series of motions; 3) t-RNA free and bound to its cognate synthetase (both show similar motions, independently and together); 4) topoisomerase II to infer connections between individual modes of motion and the enzymes functional steps; 5) the GroEl-GroES protein chaperone system which is an extremely large system (8800 residues), to show how the cavity is compressed in different ways and how the available binding surface changes through these motions; and 6) tubulin where the dimerization is critical for enhancing the cooperativity of motions and the dimers motions include a wobble between the subunits, elongation and compression along the long axis of the dimer, and twisting of the two monomers in directions opposite to one another. Future targeted applications will include further studies of tubulin in its fibrillar form, as well as several other nucleic acid binding proteins. Anticipated applications include studies of binding and conformational transitions for a broad variety of proteins. An application of the same mathematical formalism (singular value decomposition) utilized for calculating the motions of proteins has also been made to analyze the cell-line screening data. It was possible to cluster the 122 agents into 25 distinct groups, as well as to classify the cell lines themselves, in a highly systematic way. The 60 cell lines cluster into 21 groups, with the strongest groupings found for renal, leukemia and ovarian cancer. Z01 BC 08370-17