Generalizations to the inverse protein folding approach, which assesses the compatibility of a sequence with a given structure, are proposed. The method uses a "topology fingerprint" defined by the buried/unburied assignment of positions along the sequence and the pattern of interactions between side chains. This technique has successfully identified topological homologues of known proteins having essentially random sequence homology. In particular, the following developments are proposed: (l). The method, as applied to the identification of sequence- structure compatibility, will be extended. The empirical energy functions will be improved by incorporating new structural data and by including interactions with prosthetic groups and ions. Side chain excluded volume, currently neglected in all inverse folding approaches, will be included; this is crucial for the design of optimized sequences for a given topology. In a new approach, the preferences of groups of amino acids to adopt specific side chain packing patterns will be accounted for. In addition, a number of approximations previously made to allow for the introduction of gaps in the sequence-structure alignment will be relaxed. The focus will be on modifying the original fingerprint environment to reflect changes accompanying the introduction of a new sequence into the fold. Each parameter set and topology fingerprint realization will be tested by examining its ability to predict the stability of proteins subject to point mutations, to match the correct sequence to its structure when the fingerprint is screened against a large sequence database and to match topologically similar but sequentially unrelated proteins. (2). The approach will be applied to identify protein subdomains, i.e., fragments of tertiary structure, which exhibit sequence-structure specificity. The algorithm will be used to recognize other sequences which adopt similar structural fragments. If successful, this would allow prediction of supersecondary elements such as alpha/beta/alpha fragments. (3). The major limitation of all existing inverse folding approaches which requires that an example of a experimentally determined structure be present in the fingerprint library will be addressed. By studying interaction patterns within and between supersecondary elements, it might be possible to build a protein by superimposing predicted supersecondary elements. Indeed, hitherto unknown topologies could be constructed. Furthermore, knowledge of side chain packing patterns between various protein fragments will allow fingerprints to be built from conjectured and/or idealized topology diagrams. Thus, the library of possible folding motifs will be greatly expanded. (4). Applying the techniques of (l)-(3), the hitherto unknown tertiary structures of several proteins which are being studied experimentally will be predicted (e.g., rusticyanin). Finally, sequences optimized for a given topology will be designed, and the differences between topological and sequence similarity will be addressed. Both local mutations starting from native sequences and the design of optimized sequences starting from random sequences will be undertaken. As a test, a minimal plastocyanin sequence will be designed. Overall, the objective is the development of tools for the partial solution of the protein folding problem.