DESCRIPTION: (Applicant's abstract) This RFA calls for 30,000 to 300,000 genetic markers, based on single-nucleotide-polymorphisms (SNPs), to be created over the next few years. In response, we propose a pilot program that we believe can be scaled up to a whole-genome level, and which will provide a particularly important category of SNPs: those occurring in cDNA sequences (cSNPs). Many cSNPs can be extracted from the EST databases and we will exploit these as much as possible. However, we will also fill in the gaps in the databases, which come in two forms. For many of the less abundantly expressed genes, not many individuals have been sampled; and many genes are incompletely covered by ESTs. So, we will scan for cSNPs along the entire length of selected cDNA sequences, across a sample of 25 ethnically diverse individuals, deriving the full-length consensus cDNA sequence when one is not already available. Our cSNP discovery process will be sequence-driven. Although this is probably the most expensive approach, it is also the most comprehensive, as it ensures that nearly all common cSNPs will be found. Such a thorough approach is justifiable as most common cSNP are likely to be useful for the population-based association studies that are being planned in the growing efforts to understand genetically-complex diseases. Over the course of this 3 year grant, we will re-sequence 500 genes and create markers for all the common cSNPs that are found in this combination of new sequence data and existing EST data. We will create cDNA resources to better sample the full length of the cDNA sequence. We will evaluate a scoring technology (TDI) that has the potential to be arrayed. And we will develop software to facilitate the process of cSNP discovery, marker creation, and TDI scoring.