We are examining theoretical aspects of the structure, function, and evolution of proteins with emphasis upon protein sequences and upon those problems for which a computer is essential. We detect distant relationships and infer evolutionary trees of proteins and phylogenetic trees of species in which they occur, using sequence data. We organize all known sequences into the "Superfamily List," a hierarchical tabulation with five levels of distinction based on sequence similarity. We plan to develop an improved computer model of the evolutionary process by incorporating additional data on point mutations, parameters for deletion-insertion events, and parameters to allow variable mutability at different positions in the chain. Groups of simulated sequences of known evolutionary distances will be constructed and used to test and improve the performance of our programs for detecting relationships and constructing trees. This grant also partially supports the Atlas of Protein Sequence and Structure Reference Data Center, which contains a complete, currently correct, continuing collection of protein sequence data and files of background information including evolutionary history, distant relationships, alignments, genetic relationships, and three-dimensional structures. The manuscripts for a third supplement to Volume 5 of the Atlas and for a comprehensive sixth volume will be prepared during the 5-year grant period. A computer tape of the protein sequence data and a protein segment dictionary derived there from will be prepared with each published volume. Data searches and other computer services using the up-to-date sequence data collection will be performed at cost for other research workers upon request.