This Phase I SBIR program in "functional bioinformatics" will test the feasibility of extracting functional information from genome sequence data starting from a rectified database, one that arrange protein sequences according to their "natural" organization (the evolutionary history by which they arose), and describes families of sequence modules by multiple sequence alignments, evolutionary trees, and reconstructed ancestral sequences, using three software tools: (a) Tools that detect distant homology between protein families by aligning probabilistic ancestral sequences near the root of the evolutionary tree. (b) Tools that detect distant homology between protein families by comparing predicted secondary structure models of individual families. (c) Tools that identify episodes in the evolutionary history of a protein family where function might have changed, using methods that reconstruct ancestral sequences in a protein family tree and detect episodes of rapid sequence evolution between them. This feasibility test will apply these tools comprehensively against all protein sequences in the extant genomic database. If they can be shown to be generally useful for analyzing functional divergence within genomic sequence databases, they will be incorporated in Phase II into a commercial package that helps the biological scientist draw inferences about biological function from the genomic database. PROPOSED COMMERCIAL APPLICATIONS: Pharmaceutical companies, biotechnology companies, genomics companies, Federal agencies, and academic institutions all need software to allow them to interpret and use the enormous volumes of protein sequence data that are coming from genome sequencing projects. This research will develop these software tools.