The conduct of genome-wide association studies involving hundreds of thousands of Single Nucleotide Polymorphisms (SNPs) requires both innovative study design and statistical analysis. The objective of this application is to develop statistical methods and computationally efficient algorithms which best utilize the diverse data resources and strengths of different study designs. For many association studies interest will not just be limited to the characterization of individual SNPs or haplotypes that are associated with a disease outcome, but importantly will include the identification of interactions either between SNPs within a gene as in haplotype effect, between genes (epistasis), or between gene and environment such as drugs, smoking, and alcohol consumption. The first aim of this application involves the investigation of situations, including designs, where it is possible to identify different types of interactions as well construct predictive models based on several single SNPs or haplotypes. The proposed statistical methods will use stage-wise or regularization strategies to carefully control for statistical over-fitting in the context of high-dimensional SNP data. It is also important to recognize that study designs play a critical role in this setting. Two common study designs for association studies are population-based case-control and family-based designs. The population-based case-control design is popular because it is cost-effective, but it can be sensitive to population stratification. Family-based studies using family members as controls are more robust and allow for the evaluation of maternal or parent-of-origin effects on the disease. However they could potentially be inefficient due to over-matching in genotypes. Sampling ascertainment biases could also substantially complicate the analysis. For these reasons, conducting hybrid association studies using both designs can strengthen the power for detecting disease associated SNPs. The second aim of this application is to develop unified statistical estimation and inference procedures for combining resources, taking into account different ascertainment schemes and potential bias due to population stratification. Particularly we focus on the methods that can be easily adapted for high-dimensional SNP data by exploiting the computational techniques developed in the first aim. [unreadable] [unreadable] [unreadable] [unreadable]