DESCRIPTION: The long-term objective of this proposal is the development of algorithms capable of predicting low resolution globular protein tertiary structure based on a threading folding approach, termed the topology fingerprint method. To facilitate development, five distinct testing criteria will be formulated. These will allow for the rapid and objective assessment of the efficacy of a given protein representation and energy parameterization. To identify the crucial variables responsible for fold recognition, a "reverse engineering" approach will be employed. Here, one first assumes that the property of interest is accurately known. If inclusion of this property in the threading algorithm greatly enhances sequence-structure recognition, then an attempt is made to predict it at the requisite level of accuracy. If the property proves to be irrelevant, then it is not included. To date, reverse engineering strongly suggests that a major error of our current approach is the failure to include the correct identity of the interacting residue pairs when threading with gaps in the sequence is done. Thus, better treatment of pair contributions to the potential will be developed. Furthermore, a principal limitation of contemporary threading algorithms, that an example of the global fold already be known, will be addressed. To accomplish this and generate all possible topologies consistent with known knowledge-based rules for the arrangement of supersecodary structure, the protein is viewed as comprised of "U" turns, where the chain reverses global direction and secondary structural elements or blocks between such "U" turns. These quantities can be predicted with rather high accuracy using our recently developed algorithms. Having predicted the number of topological elements, then using graph theory, all topologies consistent with this prediction are enumerated. The predicted structures will be constructed from fragments excised from proteins and recombined. Full atom models will be built and validated using their threading energy to select the predicted native fold. Finally, a divide and conquer strategy is proposed for the rapid screening of massive sequence libraries. A cascade of sequence-based, mixed sequence-threading algorithms and full threading algorithms will be assembled. Those sequences whose topology is identified with high reliability by a given protocol are successively filtered out to leave the most difficult cases. Some of these may be assigned with high reliability to a given topology, while for others, a set of possible folds will be proposed. Thus, a robust protocol capable of handling the plethora of sequences provided by the human genome project will be developed.