An ever increasing fraction of the estimated total complement of human genes is being identified. Projects focused on either the biology of a particular functional or structural element or those carrying out a comprehensive cataloguing of genes in particular chromosomal regions or tissue types add daily to the lists of genes. Currently, in excess of 1500 well characterized human coding sequences have been identified with another 3,000 unique sequences identified via analysis of fragments from cDNA libraries. It seems likely that the next decade will see substantial additions to these totals as positional cloning efforts, exon trapping technology, massive sequencing efforts and homology screening programs expand. Project 3 will exploit this ongoing work in three ways to provide genetically useful expressed sequence reagents to the scientific community. First, we will incorporate into the linkage maps being developed by Project 5 polymorphic cDNAs identified from the literature. We will use published sequence based DNA variants for this, as well as new variants identified in Project 2. We will also incorporate into the evolving linkage map polymorphic short tandem repeats (STRPs) identified in Project 1 that will be derived from subcloned and STRP enriched cDNA libraries. Next we will identify the parent cDNA clone for a subset of the STRPs identified from coding sequences in Project 1. This will provide us with a benchmark for how frequently we can expect to identify particular classes of STRPs in true expressed coding sequences. Our third goal will be to provide the linkage maps and polymorphic cDNAs as a resource to outside investigators who wish to characterize their own families or populations using linkage or association studies. As part of this goal, we will pilot test the feasibility of obtaining and characterizing population based samples from a range of normal and disease based groups with multiple geographical origins. This will be done in close collaboration with the ELSI core to explore the ethical, legal and social aspects of issues including cooperation, consent and the scope and effects of erroneous typing. Project 3 will undertake extensive data validation in concert with Project 4 and error checking with Projects 4 and 5. The ultimate product will be a collection of 2,000 polymorphic cDNAs that have linkage assignments and can serve as candidate genes. Over 4,000 cDNA STSs will be defined which can be integrated into a physical map. A better understanding of how to carry out registry based surveys focused on abnormalities of candidate genes will be developed.