This project encompasses two related software development efforts. Both are intended to support both curators and users of the Conserved Domain Database, CDD, as described separately in project LM000161-03. The first effort involves is development and computational testing of algorithms to produce accurate alignments for diverse protein sequences. The second involves development of interactive tools for aligning and identifying subfamily structure within a protein family. Development of structure-based alignment algorithms builds upon our group's earlier work on protein threading. These methods "thread" a protein sequence through a structural template, scoring alternative alignments by energy calculations, using contact potentials, and a sequence profile derived from the protein family of the template. The success of these methods were demonstrated at the 1998 CASP3 workshop, where the NCBI team was awarded "first place" in structure prediction by fold recognition, among over 90 international groups entering the competition. To adapt these methods to the high throughput alignment as needed by CDD curators we have developed more efficient versions of the block-alignment algorithm used in threading. Work this year has shown that this method produces alignments accurate enough for identification of conserved functional sites, and that information loss relative to the original threading method is minimal. In current work we are evaluating the performance of an automated multiple alignment algorithm that applies structure-based alignment iteratively, across the apparent subfamily tree CDD groups. Development of interactive tools for structure-based alignment has involved integration of these algorithms into the Cn3D alignment editor. Several new functions have been added, including the latest version of block alignment. This software is in daily use by the CDD curator team and a new version with these functions has been widely distributed. Further work has focused on development of the CDTree alignment hierarchy editing system, which is also in daily use by the CDD curator team. This software implements a suite of tools for molecular evolutionary analysis of protein families in an interactive package. It supports generation of phylogenetic sequence trees using several algorithms from the literature, linked to displays of organism taxonomy trees and summaries of overall protein domain architecture. The software also supports an "update" procedure that automatically searches the daily-updated sequence and structure databases for new members of each family, selecting non-redundant representatives of new similarity and/or taxonomy groups. The CDTree subfamily hierarchy editor communicates seamlessly with the Cn3D alignment editor, allowing curator-users to easily detect and correct "outliers" caused by alignment or sequence errors. While designed for the needs of CDD curators, CDTree may also be used as a simple viewer, showing in intuitive graphical displays the sequence, taxonomic and functional diversity within a CDD family hierarchy. Public release as a viewer is planned for the coming year.