Genetic Analysis of Complex Inflammatory Disorders Background The Genomics Section of the Genetics and Genomics Branch was established to to identify genes underlying complex rheumatic and inflammatory diseases. These conditions, such as rheumatoid arthritis and systemic lupus erythematosus, are thought to be caused by the interaction of multiple genetic susceptibility loci with other non-genetic factors. The current approach involves a two-step process. The first step is to localize the chromosomal regions harboring these loci by performing a genome-wide screen for linkage using affected sibling pairs. The second step is then to identify specific genes by performing association studies on large panels of cases and controls using an extremely dense map of single-nucleotide polymorphisms (SNPs) taken from the putative susceptibility regions. Since 1996, our group has been a member of the North American Rheumatoid Arthritis Consortium (NARAC). This is a large collaborative group whose initial goal has been to collect 1000 sibling pairs concordant for rheumatoid arthritis and then to perform genome-wide linkage studies. NARAC has nearly completed the ascertainment and collection of samples from appropriate sib-pairs, and other laboratories in the consortium have performed genome-wide linkage studies on 667 families. Analysis of genotyping data from these families confirmed a major genetic effect in the HLA region, and also showed evidence at the level of p less than 0.005 for several chromosomal regions: 1p13, 1q43, 6q21, 10q21, 12q12, 17p13, and 18q21. The second phase of the effort will involve genotyping a panel of approximately 400 cases and 400 controls for a large number of markers derived from one or more of these regions. The regions themselves are relatively large (on the order of 10 million basepairs), and the desired marker density is approximately 1 every 5 kb. Thus, analysis of a given chromosomal region may require genotyping 800 individuals X 2000 markers (10,000,000/5,000) = 1.6 million genotypes. In some genomic regions, the number of markers may be somewhat reduced if segments (haplotype blocks) are inherited as a unit. We recently established a Sequenom MassARRAY genotyping unit, which relies upon multiple-base extension reactions that are assayed by mass spectrometry. At the end of the present reporting period, this system had a possible throughput of 20,000 genotypes per week (1 million genotypes per year) at a cost of $ 0.20 per genotype. Results of the Last Year Pilot project in the familial Mediterranean fever (FMF) susceptibility region: We have focused on the genomic region encompassing MEFV, the FMF gene, to pilot our genotyping system as well as our ability to discern haplotype block structure. We first constructed a dense SNP map and attempted to develop assays for 80 biallelic variant nucleotides (74 SNPs and 6 mutations) in a 160 kb region of chromosome 16p including the 15 kb MEFV gene. Using SNP assay development software (SpectroDesigner), assays were developed for 62 SNPs and all 6 mutations, but could not be designed for 12 SNPs. Efficient genotype data were obtained for 54 of the 68 assays (48 SNPs, 6 mutations). Thirteen of the SNP assays failed to produce genotypes, produced too few genotypes for use in subsequent analyses, or generated extension products and genotype calls in the absence of genomic DNA. One assay generated an error of Mendelian inheritance. Forty-six of the 48 good SNP assays (2 were omitted) and all 6 FMF mutation assays were run on 39 unrelated individuals who do not carry FMF mutations, 7 mutation carriers, and 99 FMF patients. Of the 46 SNPs, 7 were not polymorphic enough (minor allele less than 5%) for haplotype block structure analysis. Two of the 39 polymorphic SNP markers exhibited significant deviation from Hardy-Weinberg equilibrium and were also excluded from the analysis. Genotypes of the remaining 37 SNP markers were used to predict 2 haplotypes for each of the 39 non-carriers using a Bayesian algorithm (PHASE). These haplotypes were then used to determine the absolute value of D', a measure of linkage disequilibriium (LD), for all pairs of markers (Arlequin). This 160 kb genomic region of chromosome 16p exhibited 4 blocks of strong LD (average within block D' = 0.96, average out-of-block D' = 0.34). The MEFV gene spans the end of the first LD block and the beginning of the second LD block with the first 2 exons of MEFV located within the first LD block and exons 3 to 10 located within the second. LD analysis of the RA-suceptibility region on chromosome 18q: This region was chosen because of positive combined RA genome scan data from NARAC, the UK, and France. Our strategy has been to develop dense SNP maps around the genes. Genotypes from 384 probands and 384 matched controls were used to determine the LD structure of the region and to identify common haplotypes of each LD block. Haplotype frequencies in probands were compared with the controls. We have completed the analysis of 4 candidate gene regions including the receptor activator of NF-kappa B, RANK (TNFRSF11A); a mucosa-associated lymphoid tissue lymphoma translocation gene, MALT1; a basic helix-turn-helix transcription factor gene expressed predominantly in pre-B-cells, TCF4; and a gene encoding a protein induced in T cells with phorbol-12-myristate-13-acetate, PMAIP1. Our results demonstrated that these 4 genes, covering 544 kb of genomic DNA, were organized in 11 LD blocks. Within each block there were 3-7 common haplotypes. One haplotype of the second LD block of the RANK gene was increased in RA probands relative to controls (see below). None of the common haplotypes of the other 3 genes were associated with RA susceptibility. Possible association of a 14 kb haplotype from the RANK gene on chromosome 18q with susceptibility to RA: RANK is involved in osteoclast differentiation and regulation of apoptosis in immune cells. To evaluate whether variants in this gene might be associated with RA susceptibility, we undertook a case-control haplotype association study using 414 NARAC RA probands and 373 ethnically matched controls. We genotyped 15 SNPs in this gene. Two haplotypes were inferred for each individual using the PHASE Bayesian algorithm. LD was then measured between each pair of SNPs. The results demonstrated that the RANK gene was organized into 4 LD blocks. Within each block there were 3-5 common haplotypes. One haplotype of the second LD block of the RANK gene was found at a frequency of 42% in the RA probands and only 36% in the controls (uncorrected p value = 0.01). The outermost limits of this haplotype block included 14 kb of the RANK gene, encompassing a large portion of the first intron and the second exon. Suggestive evidence for an association of the genotypes of a pair of SNPs located within the block was also found. Conclusions and Significance The data obtained during the last year establish the feasibility of high-throughput SNP genotyping in our facility. Our pilot study on the MEFV gene may form the basis for estimating the age of various FMF mutations and for determining whether selection has influenced the frequency of common FMF mutations. Our data on the possible association of a RANK haplotype with RA susceptibility is also encouraging, but will ultimately be evaluated in the context of a much larger dataset. During the next year, our objectives will be: 1) further increases in the throughput (to 3 million genotypes per year or 60,000 per week) and decreases in cost of SNP genotyping (to less than 10 cents a genotype); 2) completion of the analysis of the chromosome 18q candidate region; 3) initiation of SNP analysis in another chromosomal interval.