The accumulation of molecular sequence data is proceeding at an unprecedented pace. Dozens of complete genomes, tens of thousands of proteins, and several hundred distinct nucleic acid and protein structures are now available. The next phase of molecular biology will be increasingly dominated by efforts to characterize, categorize, and analyze these data with the goal of understanding on a molecular basis the content of information and its transfer in biological systems. Our proposal is aimed at achieving a deeper understanding of genome structure, function, and evolution using empirical, descriptive and interactive statistical and computational methods. We focus on four interrelated primary areas. I. Genome signature and evolutionary relationships. We will continue the evaluation of genome-wide differences and similarities within and among species using the dinucleotide relative abundances as a genome signature. Applications of dinucleotide relative abundance profiles to genome comparisons do not require alignment. II. Genomic codon usage patterns. Detailed knowledge of codon and residue choices can help in gene prediction, in characterizing properties of a given gene, and in classifying gene families. In conjunction with the previous area, I, we propose new ways of probing constraints on codon usage that have implications for evolution, DNA structure, and vector design. III. Pairwise and multiple alignments of protein sequences. Multiple alignments achieved by our new methods are interpreted with respect to functional/structural properties and evolution. These alignments will be applied broadly. IV. Statistical methods for genome analysis. We will seek characterizations of genomic heterogeneity within and among species and will seek extensions that accommodate inhomogeneities for r-scan statistics that assess anomalies in the distribution of specific relevant markers along biomolecular sequences. We will further investigate rare and frequent words, motifs, or compositional biases. Finally, we will continue the development of versatile code that implements all our computational and statistical methods for sequence analysis.