As the Human Genome Project accelerates, and as other eukaryotic sequencing projects near or reach completion, the need for automatic annotation of genes and other genomic sequence features is becoming increasingly urgent. Considerable research has been devoted to developing programs to find genes in genomic sequence data from both eukaryotes and prokaryotes, and related efforts have been dedicated to finding signals such as splice sites, ribosome binding sites, terminators, promoters and other transcription factors. Programs that find genes by alignment have become a standard tool of gene finding, taking advantage of both the increasing quantity of genomic DNA sequence and large databases of expressed sequence tags (ESTs). The rapid outpouring of complete microbial genomes--16 thus far--has added tens of thousands of genes to the public databases, making gene finding by sequence comparison more effective than ever. However, experience with the most recently completed genomes shows that 30-40 percent of the genes in a new genome do not have similarity to any previously catalogued gene. Therefore de novo gene prediction continues to be a crucially important tool, and it is important to continue research on improving the computational techniques underlying gene finding methods. This project will develop state-of-the-art gene finding systems that achieve several objectives: (1) integration of the best computational techniques for gene finding, including hidden Markov models (HMMs), interpolated Markov models (IMMs), and decision trees into a single gene finding system; (2) development of new techniques for recognizing starts of translation, splice junctions, polyadenylation sites, and other sequence patterns and integration of these in the gene finder; (3) integration of sequence homology information in the gene finding algorithm; (4) development of a platform-independent display tool that shows all annotation, including alternative interpretations of a sequence; and (5) release of this software for free use by the genomics research community. The approach to developing these bioinformatics tools will be interdisciplinary, building on the PI's experience in computer science and his NIH-supported training in genomics.