After the completion of genomic sequences of Drosophila and other eukaryotes, the next major challenge in genomics is the identification of complete gene and protein sets for each organism. A number of biological realities, such as differentially spliced or terminated genes, complicate the interpretation of the genomic sequences, and these difficulties are unlikely to be overcome by computational solutions alone. At Berkeley Drosophila Genome Project (BDGP), we have recently finished the sequencing of a large set of expressed sequence tags (ESTs) which has allowed for annotation of functionally expressed portions of the Drosophila genome. The ESTs have given rise to the Drosophila Gene Collection (DGC). We propose to essentially complete this unigene DGC. Further, we aim to identify alternatively spliced genes. Our cDNA and EST collections have already accelerated progress in generating a comprehensive transcript map of all Drosophila genes by providing information on the intron-exon structure, alternative splicing, and transcriptional start and stop sites. Sequences of our EST and Gene Collection will assist in authentication of predicted genes, discovery of unannotated genes, and refinement of existing gene models. We anticipate that these studies will provide information on the relative merits of approaches for completing cDNA gene collections, such as the mammalian gene collection (MGC). Having a representative cDNA for every predicted gene would allow characterization of biologically significant genes expressed at low levels or in only a few cells. Our goals are to obtain a more detailed understanding of the complete set of proteins that are encoded by the Drosophila genome and to provide cDNAs and functional genomics resources to the research community. These studies will provide information and tools that will further our understanding of higher eukaryotes and lay the groundwork for more complete analyses of genomic organization and protein function in Drosophila and other eukaryotes, including humans.