It is likely that the high-throughput production of ESTs has now provided identifying tags for the majority of human genes. Rapid progress towards sequencing of the human genome will soon provide the complete sequence of these genes. However, the most important information that is contained within these sequence data (the primary structure of all gene products) cannot be deciphered readily without the knowledge of full-length cDNA sequences. In addition, the most important utility of these sequence data (defining the function of all gene products) cannot be exploited readily without the clones of full-length cDNAs. Consequently, there is an urgent requirement to generate full- length cDNA sequences and clones on a much larger scale than at present. This developmental R21 proposal describes a novel system for construction and sequencing of cDNA libraries that confers major improvements over current methodologies. The system is based on an optimized reverse-transcription reaction that maximizes the length and yield of first-strand cDNA synthesis. The optimized conditions have a dramatic effect on the proportion of full- length products that can be derived from cellular mRNA. This methodology will be used to construct human cDNA libraries that are enriched for full-length species. Both representative and normalized libraries will be constructed. The cDNAs will be cloned using a novel procedure that overcomes the biased representation of short and partial clones that is inherent to most cDNA libraries. Populations of cDNAs will be concatenated, and 50-150 kb fragments cloned in a BAC vector. The relative representation of cDNAs within the BAC library is unaffected by length, and reflects only their molar ratio in the initial cDNA mixture. The cDNA content of individual BAC clones will be determined by a standard shotgun methodology that has been used routinely at TIGR for sequencing BAC clones of genomic DNA. This sequencing strategy is economical, highly automated and yields sequence of high quality. The overall cloning strategy is predicted to yield libraries with a high content of full-length species that can be sequenced readily, and can be retrieved easily for future functional analysis.