Summary: We have characterized the kinetics of repair of XRCC1 and Pol&#61538; knockout cell lines following exposure to gamma irradiation and MMS. In a collaboration with Drs. Samuel Wilson and Julie Horton in the Laboratory of Structural Biology we used the comet assay to show that XRCC1-/- mouse fibroblasts have significant delay in DNA repair following 5 Gy of gamma-radiation compared to isogenic wildtype cells (Horton et al 2008). Repair in the XRCC1-/- cells is still surprisingly rapid (>65% repair by 30 minutes) suggesting the existence of robust alternative repair mechanisms for IR damage. In contrast, Pol&#61538;-/- cells are no different from isogenic wildtype cells in their ability to repair IR-induced damage, suggesting that Pol&#61538;is not critical for this repair. When we exposed these same cells to MMS and followed the kinetics of repair over time we saw evidence that XRCC1-/- and, to a lesser extent Pol&#61538;-/-, had delayed repair compared to their isogenic wildtype controls, suggesting that both are deficient in cellular BER of MMS-induced damage. These results parallel the finding generated by the Wilson lab showing higher cell growth sensitivity to MMS in XRCC1-/- compared to Pol&#61538;-/- cell lines. We also looked at the effect of inhibiting PARP-1, the first protein to interact with nicked DNA during BER, by treating cells with the potent PARP-1 inhibitor, 4-amino-1,8,napthalimide (4-AN). Using 10-fold lower exposure of MMS than the previous experiments we found no evidence of damage in any of the 4 cell lines tested in the absence of 4-AN, but dramatic increased damage in all 4 cell lines with 4-AN. Both XRCC1-/- and particularly Pol&#61538;-/- had higher levels of damage than their respective isogenic controls. This finding raises the question whether the human codon 399 SNP in the PARP-interacting BRCT I domain of XRCC1 might reveal a more pronounced phenotypic difference with the addition of 4-AN. The combination of dense genetic data and complex phenotypic data has provided opportunities to develop new statistical approaches. The comet assay generates data on hundreds of individual cells for each exposure and time-point. The distribution of DNA damage within this population of measured cells is not normally distributed and changes depending on degree of damage and amount of repair. In some cases, statistical models based on changes in the tails or shape of the distributions could provide a better metric than the mean or median of the distribution. This feature of the data has provided the basis for collaboration with Dr. David Dunson in the NIEHS Biostatistics Branch (now at Duke University) who has used the comet data to develop Bayesian approaches both for inferring quantiles of unknown sample distributions (Dunson and Taylor 2005), as well as weighted finite mixture models (Rodriquez, Dunson, and Taylor 2009). The collaboration continues now that Dr. Dunson has moved to Duke University and is the central feature of his RO1 proposal to use Bayesian methods to assess this special case of gene-environment interaction. The dense genetic data available for these cell lines lends itself to approaches other than our primary haplotype-based approach of examining genotype-phenotype associations. Using gamma irradiation and &#947;H2A.X foci formation we have characterized double strand break repair phenotypes for 5 non-synonymous SNPs predicted to be damaging by in silico methods. &#947;H2A.X foci formation proved to be a sensitive indicator of DSBs, doubling above baseline values after only 0.5 Gy of gamma irradiation. The number of foci returned to baseline within 24 hours of a 1.5 Gy dose, but the size and intensity of foci at 24 hours remained significantly elevated suggesting that these features are more persistent markers of exposure. None of the nsSNPs in ATM, BRCA1, LIG4, PNKP, and WRN showed evidence of any effects on damage and repair phenotypes compared to controls. This was surprising to us because all these amino acid changes had been predicted to be damaging by one or more of the in silico functional prediction methods that we used: SIFT, PolyPhen, or SNPs3D (Markunas et al, 2008). We developed a web application that incorporates GWAS data, functional predictions, and LD information from multiple ethnic groups to select the most promising SNP candidates for association studies. We designed and implemented a set of SNP selection pipelines that allow an investigator to specify genes or linkage regions and select SNPs based on GWAS results, linkage disequilibrium (LD) and predicted functional characteristics of both coding and non-coding SNPs. We incorporated a variety of functional predictions including effects of protein structure, gene regulation, splicing, and miRNA binding. In doing so, we not only considered whether a SNP was in a putative functional region, but also considered whether alternative alleles of a SNP were likely to have differential effects on function. We also allowed user-assigned weights for different functional categories of SNPs so that an investigator may tailor SNP selection (e.g. toward SNPs that effect miRNA binding), depending on their area of interest. A central feature of the algorithm is its use of GWAS SNP P-value data in prioritizing SNPs. We explicitly considered all SNPs in high LD with GWAS SNPs, and thus automatically considered a much larger set of SNPs than the GWAS itself. The algorithm excludes SNPs in high LD with large P-value GWAS SNPs and chooses from SNPs in high LD with small P-value GWAS SNPs. It also chooses from novel SNPs that were not in high LD with any SNP in the GWAS and thus previously had been excluded from consideration in the GWAS. In addition, the algorithm accounts for LD structure of different populations, including all 11 HapMap populations, and chooses appropriate tag SNPs for one or more ethnic groups. We evaluated the utility of this application using prostate cancer data, starting with a set of a priori candidate genes, prostate cancer GWAS data, and a set of linkage regions. We used our algorithms to select a small panel of 1,400 SNPs with 700 being novel SNPs and 700 part of the GWAS panel. We evaluated the utility of the application against the results of a GWAS validation study that screened the much larger panel of 27,000 SNPs whose P-values were <0.07 in the prostate GWAS study and found that our small panel includes 5 of the 7 SNPs subsequently proven to be associated with prostate cancer in a large validation study which used all 27,000 small P-value SNPs. This result provides an important proof-of-principle for our selection algorithm. In addition to the selection algorithms that use GWAS data we provide three additional tools: 1) TagSNP: a tool to find and list SNPs, choose LD tag SNPs for multiple ethnic groups, and produce high quality LD figures for individual genes or chromosome regions. 2) FuncPred: allows a user to find and query SNP functional prediction results by SNP, gene, or chromosome region. 3) SNPseq: allows a user to visualize all SNPs in DNA sequence context for an individual SNP, gene, or region of a chromosome. The web application is publically available at http://niehs.nih.gov/snpinfo and a manuscript describing it has been published.