We had previously developed methods for qualitative traits using multiple-SNP genotypes for affected individuals and their parents in a method called TRIMM (triad multi-marker). The testing approach is robust against bias due to population stratification. We further extended the approach to allow testing for haplotype-by-environment interaction, via a method we call GEI-TRIMM. The paper describing this approach and characterizing its performance through simulations was published this year. In another project, we are estimating the asymmetry that would exist in family history data secondary to the existence of a maternally-mediated genetic effect. We applied this strategy to family history data from the Sister Study, and found evidence that maternal grandmothers of young-onset (under 50) cases of breast cancer were more likely to have had breast cancer than were paternal grandmothers. This suggests there may be maternally-mediated genetic risk factors for breast cancer, that there may be imprinted genes related to risk or that mitochondrial variants play a role. Epigenetics could also be important for breast cancer. A particularly important design we are now considering involves a tetrad structure, with one affected and one unaffected offspring, in addition to the two parents. This design has been implemented in the Two Sister Study, which is assessing the joint role of genetic and environmental risk factors in young-onset (under age 50) breast cancer. The discordant sib pair allows estimation of effects of exposures, while the embedded case-parent triad allows detection of haplotypes that confer either protection or risk. The tetrad analyzed together should provide a powerful design for assessing gene-by-environment interaction. We have been working on developing and evaluating methods for use with the tetrad design. The Two Sister Study is continuing to enroll nuclear families where one daughter developed breast cancer before age 50 and the other daughter is unaffected. We currently have enrolled almost 1500 such families. This is described under a separate project. Inherited genotypes, together with tumor characteristics, will need to be explored to investigate factors that predict the clinical course following treatment, and improved statistical methods will also need to be developed in that context. We are undertaking a genome-wide association study based on these data through a contract with the Center for Inherited Disease Research at Johns Hopkins and will be able to explore gene-by-environment effects on risk of young-onset breast cancer and also look at maternally-mediated effects and possible parent-of-origin effects on risk. The Illumina platform that will be used is the human OmniExpress plus Exome array, and the use of the exome typing will impose the need to develop further methods appropriate for rare alleles. We have developed a method for studying gene-by-environment interaction using the tetrad structure and we carried out extensive simulations to document its performance under a range of scenarios, some with and some without exposure-involved population structure. We learned to our surprise that all of the existing gene-by-environment interaction methods are subject to bias if the population has exposured-involved population structure. This happens when there are subpopulations that differ both in their frequency of the marker allele under study and in their exposure prevalence. The resulting bias can best be understood as reflecting the fact that with that kind of structure the exposure can serve as a surrogate for the degree of linkage disequilibrium (which also varies across subpopulations) between the marker under study and a causative SNP/haplotype. This bias can be extreme. We have now developed some remedies for avoiding it, while preserving good statistical power, in work that was recently published. A robust procedure uses a case-only approach but augments it with exposure data from a randomly sampled unaffected sibling. Together with a graduate student from UNC Biostatistics, Alison Wise, we are working on a machine-learning approach to finding complex epistatic and gene-by-environement interactions based on case-parent triads. We downloaded case-parent triad data from dbGaP on oral clefts, sanitized it for real effects and are using those genomes to generate simulated case-parent triad data with known GxGxGxG interations. We are working to develop an algorithm that can search through the enormous search space of 3-way choices of SNPs from the GWAS data and identify the right multi-SNP model, even when the attributable risk is very small. This work is in progress.