The technological achievements of the past 20 years have made sequencing a genome a relatively simple task. Decoding this information, however, has proved to be much more difficult, and is one of the great challenges for this century. Advancing our understanding of how genes are structured and regulated will eventually lead to novel therapeutics for combating cancer and other diseases, to cheaper and more nutritious food, to less wasteful materials and energy sources, and to a greater understanding of ourselves. One of the enduring, and most important products resulting from the genome era will be the catalogs of genes for each organism. Producing these catalogs is a difficult task even under the best of circumstances. The pace of genome sequencing continues to increase, and these new genomes represent a wealth of information if we can understand them. This proposal seeks to improve our knowledge of genomes by advancing the state of the art in computational gene finding. Our algorithms leverage untapped and new sources of information, and are expected to improve our ability to find both novel genes and genes with known homologs. Our specific plans include (a) automated training of gene prediction programs for any genome, (b) developing the first algorithm that merges a generalized hidden Markov model for gene structure with a profile hidden Markov model for protein family structure, (c) creating the first gene finder that incorporates information about DMA duplex stability under superhelical stresses, (d) building new algorithms that take advantage of high-throughput transcript profiling technologies such as whole genome expression arrays and massively parallel sequencing methods, and (e) providing web-based applications and support via the Internet.