The long-term objective of our project is to develop advanced computational tools for the analysis of the immense amounts of data being generated by large-scale sequencing projects. The specific aims are (1) to develop a Web- based, user-friendly, customizable, integrated platform for search, display, and manipulation of both protein sequences and annotation, (2) to develop a sequence search module for sequence analysis and family classification, and (3) to develop an annotation search module for functional annotation. Accurate genomic annotation will facilitate the extraction of knowledge basic to the elucidation of the causes and cures of diseases of humans and economically important organisms. It will require multiple complementary approaches, which presents a need not adequately addressed even with the proliferation of new bioinformatics tools. To be built upon existing tools, the proposed system, ProAnnotator, will have several key design elements. The sequence search component will achieve speed and sensitivity by employing multi-level filter programs for search, classification, and alignment at the superfamily, domain, and motif levels. The annotation search will utilize information available from protein databases, such as keywords and taxonomy. The integrated platform will provide interfaces that allow scientists to interact with and customize the system for knowledge discovery. PROPOSED COMMERCIAL APPLICATIONS: The pharmaceutical and biotechnology industries are making major investments in genome sequencing and bioinformatics, as they are racing to analyze and annotate the flood of data coming out from the Human Genome Project and other large-scale sequencing projects. An integrated system for accurate functional annotation of genes will be invaluable for them to capitalize on the data for gene and drug discovery, and thus, will have a high commercial potential.