A grand effort is underway to complete the sequencing of the entire human genome. Identification of functional sequence elements by computational tools has become increasingly important. Our long term goal is to use mathematical and statistical methods to identify human protein coding genes in the human genome, which means to locate their approximate positions bound by the promoters and the polyadenylation signals, to delineate their organization in terms of their exons, introns and coding sequences, and to infer their structure/function by examining their control/regulatory elements and their encoded proteins. We have the following three specific aims for this renewal application: (1) To develop database and computational methods for identification of first exons in human and mouse protein coding genes. (2) To develop an evolutionary based algorithm for detecting proximal-promoters by studying a few well characterized sets of tissue/developmental specific vertebrate genes. (3) To develop database and computational methods for identification of alternatively spliced exons and transcripts in human and mouse protein coding genes. The methods and tools resulting from this project will help molecular biologists to find human genes and to interpret the structures/functions more efficiently and more accurately.