This proposal is aimed at the generation of verified non-redundant arrayed sets of full-length cDNAs to be made widely available for full-length sequencing programs. The achievement of this goal relies on the implementation of a two-pronged strategy that takes advantage of complementary resources and technologies. The wet laboratory and informatics research components of our proposal are totally integrated and they are both equally important for the successful achievement of our goals. Briefly, we propose to use 5' end-enriched random primed libraries in conjunction with oligo-[dT] primed full-length- enriched libraries to generate hybrid-selected libraries that will be greatly enriched for full-length cDNAs. Informatics analysis of 5' EST data generated from these hybrid-selected libraries will enable classification of cDNAs into three groups: full-length, full-coding and incomplete. Based on these analyses, full-length cDNAs will be identified and re-arrayed. Three such re-arrayed sets of cDNAs will be produced. The first will comprise a collection of 15,000 full-length and full-coding hybrid selected cDNAs from the oligo-[dT] primed full-length enriched libraries. The second and third sets, which will be non-redundant respective to the first set, will each contain 7,500 cDNAs, derived from a pair of oligo-[dT] primed full-length enriched and 5' end-enriched random primed libraries. Together, the latter two sets will allow for full-length representation of an additional 15,000 mRNAs. Central to our strategy is the utilization of the same size fractionated cytoplasmic mRNA for construction of both full-length-enriched and 5' end-enriched libraries. This pair of libraries will be generated for each of four human mRNA size fractions, spanning from 2.37 to 9.4 kb. Two different approaches will be used for construction of 5' end-enriched random primed libraries: the oligo-capping technology, and a novel method - "end rescuing" - whose development constitutes part of the work proposed in this application. Informatics analysis of 5' EST data obtained from libraries generated by both methods will be instrumental for identification of the best 5' end-enriched libraries to be selected for use in our proposed two-pronged strategy for generation of arrayed sets of full-length cDNAs.