Project Summary/Abstract Genome-wide association (GWA) studies have begun to reveal the genetic architecture of many common, complex diseases. Variants identified in GWA studies are proving useful for predicting disease risk, refining diagnoses, and optimizing treatment regimens. Due to the proportion of heritability as yet unaccounted for in most complex diseases, there is great potential for improved statistical methodology to mine new and existing GWA studies for many as yet undetected signals. As the clinical use of whole-genome sequencing and electronic medical records become standard, the high-dimensional inference procedures developed for GWA studies will find renewed use in whole-genome and phenome-wide association studies. Motivated by challenges arising in the study of obstructive sleep apnea, I propose to develop novel statistical methods that address outstanding gaps in methodology for GWA testing of non-normal and missing phenotypes. My first aim addresses the challenge of conducting GWA analysis of quantitative traits with non-normal residuals. This work is motivated by analysis of the apnea-hypopnea index (AHI), a skewed phenotype used in the diagnosis of OSA. A common recourse when analyzing phenotypes with skewed or heavy tailed distributions is to apply the rank based inverse normal transformation (INT). However, it is unclear how best to apply the INT for optimizing power, or even whether INT-based association testing is always valid. Preliminary results indicate that two variations on INT-based testing are indeed valid, but that neither is uniformly most powerful. In aim 1, I combine these approaches into a robust, well-powered, and general purpose omnibus test. The omnibus test will enable researchers to easily perform GWA analysis of quantitative traits, without the necessity of checking residual assumptions. I will apply the omnibus test to GWA analysis of AHI in several cohorts, including the Hispanic Community Health Study (SOL), the Multi-Ethnic Study of Atherosclerosis (MESA), and the Sleep Health Heart Study (SHHS). My second aim addresses the challenge of conducting GWA analysis using a surrogate of the target phenotype. This work is motivated by the setting where the target phenotype (AHI) is only available for a subset of subjects, but surrogates of the target phenotype are available for all subjects. Performing the analysis using only subjects with complete data leads to loss of power, and has the potential to introduce bias. An existing approach to retaining all subjects imputes the target phenotype using the surrogate phenotype. However, this approach neither makes full use of the information contained in the data, nor appropriately propagates uncertainty due to imputation of the target phenotype. In aim 2, I develop a surrogate phenotype association test using the expectation maximization (EM)-algorithm. I will apply the resulting test to leverage the SHHS, where AHI is measured, for performing GWA analysis in the vast UK Biobank, where only surrogate measurements, sleep duration and excessive daytime sleepiness, are available.