This proposal aims to produce a major database of protein sequence motifs that are predictive of protein function. Great emphasis is placed on the development of computer software that will help in the automatic generation of motifs and the maintenance of the database. The database will contain links to other molecular biological databases. The information in the database will be useful for the analysis of new sequences to find possible matches since this may suggest functions for the proteins encoded by the new sequences. This can be expected to guide the experimentalist trying to establish the significance of a new sequence. It is proposed that these short motifs present in protein sequences represent the basic building blocks from which proteins have evolved. During the course of this project we expect to develop evidence that will test this hypothesis by exploring the nature of the motifs within the database and their interrelationships. This work will impact all areas of molecular biology but will have especial importance to those studying the relationship between sequence structure and function in proteins. As a pragmatic tool the database will be especially important in the interpretation of the large amounts of sequenced data that will accumulate as the human genome initiative proceeds.