This research project is focused on developing a comprehensive map of INDEL variation in the human genome and detecting these INDELs using microarray technology. We estimate over 1.6 million insertion and deletion mutations are prevalent in the genome, of which only a fraction have been presently identified. These INDELs are likely to be directly responsible for phenotypic differences in humans, including differences in physical traits, susceptibility to diseases, and physiological responses to the environment. This variation will be identified by improving our existing method for identifying INDELs from trace sequence data through increases of speed and accuracy. All available human traces will be obtained from the trace archive at NCBI and INDELs will be identified using this faster pipeline through comparison with the human genome reference sequence. A method will be developed to use full length genomic sequences as input into this pipeline, and this will then be used to analyze the Celera genome sequence for further INDEL discovery. The identified INDELs also will be compared to the chimp genome for identification of an ancestral allele, where possible. The distribution of these INDELs also will be examined, and rules for INDEL classification will be defined, in particular for troublesome classes such as repeat expansions. A set of microarray probes will be designed to validate a portion of the INDELs we have identified. Microarrays have been used previously with SNP detection and a similar theory of probe construction should be applicable to INDEL variation as well. These probes will then be analyzed on commercially-available custom microarray platforms. Hybridization temperatures and other conditions will be tested to identify optimal parameters. The INDELs chosen for this aim will consist of a wide variety of classes and lengths in order to test these methods with a broad spectrum of INDELs. [unreadable] [unreadable] [unreadable]