Project 2 focuses on the patterns of linkage disequilibrium across17q21, a 12 Mb long region of the genome with several interesting characteristics. First, it is a relatively gene rich region with 253 genes assigned in the latest assembly (Nov, '02), though there are a few regions that have no identified genes across several hundred kb. Second, there is considerable evidence that it contains at least one region that would qualify as a block with strong LD extending across many kb: the BRCA1 gene appears to be a "block"with LD across at least 81 kb; LD was detected across an even longer distance of 8.6 Mb between D17S1787 ad D17S943 in the CEPH families. Third, the recombination frequencies across the region, based on two large independent mapping studies (Marshfield and Decode) show a great sex difference: the recombination rate between D17S1818 and D17S1877, which just flank 17q21, is estimated at 4.29 and 4.94 cM, respectively, in males and at 24.01 and 17.33 cM, respectively, in females. Finally, although 17q has many low copy segmental duplications (inter- and intra-chromosomal) on either side of 17q21, this region is relatively free of such repeats and the sequence assembly is relatively unambiguous. We will examine this region with over 1000 markers spaced at approximately 10 kb intervals with closer spacing in especially gene-rich regions and somewhat sparser coverage in the gene poor regions. As the boundaries (if they exist as such) of putative blocks become evident, we will attempt to have denser coverage to better define the boundaries and the variation in those boundaries in different populations. All markers will be typed in about 2500 individuals representing at least 40 populations. Empirically, we will be able to determine how generalizable the patterns of LD are among the various populations. This will provide one test of the generalizability of the "haplotype map" concept. The patterns of LD in the different populations, the levels of heterozygosity, and the population relationships will all be used in analyses designed to estimate the multiple factors that are the causes of LD in modern humans.