This project has focussed on the analysis of amino acid and nucleic acid sequence data as it pertains to melecular biology and molecular evolution. Continuing areas of interest include: i) We have provided further theoretical evidence for contextual constraints on eukaryotic coding sequences as well as broadening our understanding of these constraints by quantifying their intensity. ii) The development of computational tools for molecular biologists, in particular rapid methods of similarity analysis for protein or nucleic acid data bank searches. We have recently modified the protein search algorithm with significant improvements in speed and sensitivity. These modifications allow sensitive protein data base searches to be conducted on widely available microcomputers. iii) We have developed new, more realistic methods for the estimation of the statistical significance of nucleic acid similarities. iv) We have initiated a new and different analysis of the relationship of exons to protein function. This research should prove useful in understanding the evolution of exons and introns. v) Cellular automatons have been used to model a variety of natural phenomena; we have been studying them, in particular, developing methods for predicting their behavior when subjected to noise. In addition, efforts are continuing on informing molecular biologists of the computational tools available for sequence analysis. A heavily attended short course on computational methods for sequence analysis was provided this year for the molecular biologists on the NIH campus.