In this supplement we propose to add the detailed background information from molecular genetics to the protein sequence reference data collection. Further, we propose to update and maintain our current reference data collection of nucleic acid sequences in computer-readable form to be distributed to others on magnetic tape. Finally, we propose to test and evaluate the need of the scientific community to have interactive access by telephone to the collection. In the main grant we are examining theoretical aspects of the structure, function, and evolution of proteins with emphasis upon protein sequences and upon those problems for which a computer is essential. We detect distant relationships and infer evolutionary trees of proteins and phylogenetic trees of species in which they occur, using sequence data. We organize all known sequences into the "Superfamily List," a hierarchical tabulation with five levels of distinction based on sequence similarity. We plan to develop an improved computer model of the evolutionary process by incorporating additional data on point mutations, parameters for deletion-insertion events, and parameters to allow variable mutability at different positions in the chain. Groups of simulated sequences of known evolutionary distances will be constructed and used to test and improve the performance of our programs for detecting relationships and constructing trees. This grant also partially supports the Atlas of Protein Sequence and Structure Reference Data Center, which contains a complete, currently correct, continuing collection of protein sequence data and files of background information including evolutionary history, distant relationships, alignments, genetic relationships, and three-dimensional structures. The manuscripts for a third supplement to Volume 5 of the Atlas and for a comprehensive sixth volume will be prepared. A computer tape of the protein segment dictionary derived therefrom will be prepared with each published volume.