Liability threshold modeling of genes and environment in case-control studies Abstract Gene-environment interaction, which we define as the joint effect of genes and environment that cannot be explained by their independent marginal effects, is broadly recognized as one of the potential sources of missing heritability in genome-wide association studies. Indeed, even if genetic effects are not biologically mediated by environmental factors, one can expect to see higher genetic risks (higher SNP odds ratios) in disease cases carrying lower risks from environmental factors as compared to disease cases carrying higher environmental risks. Our work has provided compelling evidence of this type of interaction in our applied work on type 2 diabetes. The current proposal focuses on this type of interaction, motivated by the established idea that the main goal of studying gene-environment interaction is not to identify interactions per se, but rather to identify genes that would not be identified by standard marginal tests. Surprisingly, despite the huge potential for improvement in power, methods that optimally account for this type of gene-environment interaction have yet to be applied to case-control studies. In particular, as we show below, standard approaches such as using environmental risk factors as covariates, as well as previously developed statistical tests for gene-environment interaction, all fail to capture the available increase in statistical power. In this proposal, we will develop methods based on liability threshold modeling that attain superior power in the presence of interaction effects of this type. We will apply these methods to large type 2 diabetes and rheumatoid arthritis data sets involving tens of thousands of samples. PUBLIC HEALTH RELEVANCE: Susceptibility to type 2 diabetes, rheumatoid arthritis, and a wide range of other diseases is known to be due to a combination of genetic and environmental factors, but association studies have had only partial success in identifying the underlying genetic risk variants-thus the search continues. Because disease may be due to either genetic or environmental factors, diseased individuals with low environmental risks are likely to harbor increased genetic risk relative to diseased individuals with high environmental risks. In this proposal, we develop and apply new methodology to exploit this statistical gene-environment interaction in order to identify genetic risk factors with increased statistical power.