This application requests support for our research activities concerned with problems of identifying and classifying patterns within and between nucleic acid and protein sequences. The emphasis is on developing mathematical, statistical and computing concepts and methods to help assess and interpret molecular sequence features. Our research program will concentrate in six main areas. 1) Development of new statistics, means for assessing statistical significance, and methods of data representations for nucleotide and amino acid sequences that can aid in identifying and interpreting molecular sequence relationships; 2) development of efficient and wide-ranging computer algorithms by which to identify significant word relationships among multiple letter sequences; 3) investigation of the nature of codon and amino acid preferences and patterns with respect to different classifications of genes; 4) extensive analysis of the presence or absence of charge concentrations over different categories of genes in many species and possible interpretations for function and structure; 5) pursuit of new approaches for phylogenetic constructions using partial ordering criteria and concepts of consensus trees; and 6) specific comparative sequence analyses to include detailed studies on (a) multigene families (globins, immunoglobulins), (b) the Herpes virus family especially cross studies of the complete Epstein Barr virus and the Varicello Zoster virus genomes, and (c) investigation of the conjunction of a set of retroviruses, hepadnoviruses, and transposon elements. The interplay between theoretical analysis, data analysis, computer algorithms, and interaction with biologists and medical faculty at Stanford has been a key factor in our program. The unique collaboration between our group and members of the biology and medical departments provides an ideal framework for achieving the research objectives defined in this grant.