A large-scale effort is underway to complete the sequencing of the entire human genome. Identification of functional sequence elements by computational tools has become increasingly important. The investigators' long term goal is to use mathematical and statistical methods to identify human protein coding genes in the human genome. This entails locating their approximate positions bound by promoters and polyadenylation signals; delineating their organization in terms of exons, introns and coding sequences; and inferring their structure/function by examining their control/regulatory elements and their encoded proteins. The investigators propose the following three specific aims for this renewal application: (1) To develop database and computational methods for identification of first-exons in human and mouse protein-coding genes. (2) To develop an evolutionary based algorithm for detecting proximal-promoters by studying a few well-characterized sets of tissue/developmental specific vertebrate genes. (3) To develop database and computational methods for identification of alternatively spliced exons and transcripts in human and mouse protein coding genes. The investigators predict that methods and tools resulting from this project will help molecular biologists to find human genes and to interpret the structures/functions more efficiently and more accurately.