This proposal aims to develop software and databases that will be useful for the analysis of newly derived sequences (and some existing sequences). One area of software development concerns the detection of errors in sequences. The basic idea rests upon a comparison between known sequences and the new sequence. Whenever anomalies are found in the new sequence that distinguish it from similar or related known sequences, an error may be present. Routines to detect these anomalies will make extensive use of the properties of coding regions since these appear to be subject to more constraints than noncoding regions. Codon use, homology and structure will all be used. Additional routines will be developed that may be of use in detecting errors in noncoding regions. following a period of testing, software will be written that incorporates all useful routines. A second endeavor will involve the production of a major database of sequence and structure motifs that are found in proteins, and that can be correlated with function. This database will be used to analyze new sequences to find possible matches. When matches occur, this may suggest functions for the proteins encoded by the new sequences. This can be expected to guide the experimentalist trying to establish the significance of a new sequence. Thoughout this work there will be great emphasis on the creation and maintenance of molecular biological databases. Interfaces will be established between existing protein and DNA databases, as well as other databases of information about biological molecules. This work could impact all areas of molecular biology, but will be especially important as the sequence of the human genome becomes available.