A grand effort is underway to sequence the entire human genome. Identification of functional sequence elements by computational tools has become increasingly important. Our long term goal is to use mathematical and statistical methods to identify human protein coding genes in the human genome, which means to locate their approximate positions bound by the promoters and the polyadenylationsignals, to delineate their organization in terms of their exons, introns and coding sequences, and to infer their structure/function by examining their control/regulatory elements and their encoded proteins. We have the folloiwng four specific aims for this initial grant: 1) To build and to maintain a quality database of human exons of different types, and a separate database of 5' and 3' exons with their flanking sequences from human, mouse, budding yeast and fission yeast genes. 2) To extend our internal coding exon finder MZER to include complete coding region prediction. 3) To develop computational methods for promoter andfirst exon recognition. 4) To develop low-resolution methods for approximate localization of protein coding genes in human genome on large (chromosomal) scale. The methods and tools resulting from this project will help molecular biologists to find human genes and to interpret the structure/functions more quickly and accurately.