Background The Complex Disease Genetics Unit (CDGU) of the Genetics and Genomics Branch was established three years ago to identify genes conferring susceptibility to genetically complex rheumatic and inflammatory diseases. The strategy often entails nonparametric analysis of large numbers of sibling pairs concordant for the disease in question, followed by association studies on collections of independent patients and controls using a dense map of markers derived from chromosomal regions highlighted in the first phase of the analysis. This latter linkage-disequilibrium (LD)-based approach can also be used to screen candidate genes, and, with improving technology, may be applied across the genome. The common requirement in all cases is the capacity to genotype large numbers of samples at many genetic loci efficiently and in a cost-effective manner. The CDGU has focused on the study of rheumatoid arthritis (RA), which affects as much as 1% of the population worldwide, and is associated with considerable morbidity and mortality. This work builds on our long-standing participation in the North American Rheumatoid Arthritis Consortium (NARAC), a large collaborative group that has collected samples from over 1000 sibling pairs concordant for RA, as well as from cohorts of singleton RA cases and ethnically-matched controls. Genome-wide linkage data from previous reporting periods have confirmed a major genetic effect in the HLA region, and also showed evidence of linkage (p less than 0.005) for chromosomes 1p13, 1q43, 6q21, 10q21, 12q12, 17p13, and 18q21. Because the 18p21 region has been replicated in independent French and Canadian cohorts, we have concentrated association studies for linkage disequilibrium (LD) on this segment of the genome. Our Unit uses the Sequenom MassARRAY platform for high-throughput genotyping of single-nucleotide polymorphisms (SNPs), with a capacity of 2 million genotypes per year at a cost of approximately $ 0.05 per genotype. Results of the Last Year Analysis of a novel gene in the chromosome 18q21 candidate region: During the previous reporting period, NARAC conducted a dense SNP analysis of the chromosome 18q21 candidate region using a bead-based optical technology (Illumina), examining 460 cases and 460 controls. The average marker density for the SNPs successfully typed was 4.3 kb over the 10 Mb region. Four LD clusters were identified, the strongest of which had a signature SNP with a p value of 3.61 times 10 to the minus 6, with an odds ratio of 1.46. Genotype association results were even more significant. The haplotype block containing this SNP covers approximately 60 kb of genomic DNA, and contains a novel gene, AK127787, of unknown function. AK127787 encodes a transcript with a predicted 133 amino acid protein product. Within the AK127787 coding sequence there is a SNP that would result in the substution of glutamine for arginine at residue 95 (R95Q). The minor allele frequency (Q) was 0.13 in 790 RA cases and 0.18 in controls (p equals 0.00015), suggesting that it may be protective against the development of RA. With the exception of Hela cell lines, we could detect transcripts of the novel gene only in tissues and cells derived from the lung. We failed to detect transcripts in RA or OA synovium, in activated or resting fractionated blood cells, and in a number of immune and non-immune normal tissues. We are attempting to replicate the association in a large unrelated North American Caucasian RA cohort before further characterizing the associated gene. Analysis of unclustered 18q singleton SNPs that demonstrated either allele-frequency or genotype-frequency-based association with RA: We genotyped additional samples from a total of 656 independent RA cases and 750 healthy controls for 25 18q SNPs. Three SNPs exhibited nominally significant association with disease susceptibility (p less than 0.005): (1) A SNP located in the first intron of the TCF4 gene exhibited genotype association with RA. Minor allele homzygotes were found at 2% frequency in cases and 5 percent frequency in controls (p equals 0.004). This SNP was in long-range LD with the AK127787 coding SNP described above; (2) A SNP located in the first intron of the CCBE1 gene, which has predicted collagen and calcium-binding domains, exhibited allele frequency-based association with RA susceptibility. Minor allele homozygotes were found at 7.5 percent frequency in cases and 4.9 percent frequency in controls (p equals 0.0044); (3) A SNP located in the NEDD4L (neural cell precursor expressed, developmentally down-regulated) gene exhibited genotype frequency-based association with RA susceptibility. Minor allele homozygotes were found at 34 percent frequency in cases and 24 percent frequency in controls (p equals 0.00014). Allele frequency was also associated, with a minor allele frequency of 56 percent in cases and 49 percent in controls (p equals 0.0002). Analysis of candidate genes outside the 18q and HLA regions: Earlier studies from another group indicated that a specific variant of the the peptidyl arginine deiminase, type IV (PADI4) gene are associated with RA among the Japanese. During the previous reporting period we found a different PADI4 haplotype associated with RA among Caucasians. In the NARAC cohort (571 independent cases and 750 controls) we found nominal evidence for association of the most strongly associated SNP reported in the Japanese population (minor allele frequency was 0.45 in NARAC cases and 0.40 in controls, p equals 0.03). Because we found a nearby marker in LD with this SNP to be more strongly associated with RA, we assumed that the weak association was the result of LD with more strongly-associated markers. By genotyping haplotype-tagging SNPs through the region, we found 3 SNPs, all located in the first intron of PADI4, with minor allele frequencies of 5-6 percent in controls and 10-11 percent in cases (p equals 0.00006 to 0.00008). Current studies are focused on determining whether these non-coding SNPs are associated with a difference in transcriptional efficiency. Several other candidate genes have also been evaluated. We studied 2 SNPs in FCRL3 (Fc receptor-like 3) that were strongly associated with RA in the Japanese, but we found no evidence for association in the NARAC cohort. We evaluated 1 SNP in MHC2TA (MHC class II transactivator) found to be associated with RA in the Swedish population, as well as 12 additional SNPs located throughout the gene, and did not find evidence for association in the NARAC cohort. We studied 3 SNPs in PDCD1 (programmed cell death 1), including one associated with systemic lupus erythematosus, and failed to find an association with RA. We evaluated 7 SNPs in TNFAIP3 (tumor necrosis factor alpha-induced protein 3), including one non-synonymous coding variant, and failed to find an association with RA. Finally, we evaluated the M55V variant of SUMO4 (a regulator of NF kappa B previously associated with Type I diabetes), and failed to find an association with RA. Conclusions and Significance The data of the last year support the notion that multiple non-HLA genes confer modest levels of risk for RA, and continue to support the likelihood of an RA susceptibility locus on chromosome 18q. During the next year, we plan to continue studies of this region, particularly on gene AK127787. We will also continue studies of genes not on chromosome 18, including PADI4. As a part of the NARAC collaboration, we will also continue to provide in-depth analysis of regions identified by more global screens.