A major project of this section is the development of new statistical genetics methodology as prompted by the needs of our applied studies and the testing and comparison of novel and existing statistical methods. The project to develop propensity scores in linkage analyses as a method for inclusion of covariate effects has been continued in conjunction with Dr. Betty Doan. This method appears promising in that it is generally more powerful than including the covariates directly into the model, and does not have strongly inflated Type I error rates. We have created programs for calculating permutation p-values for the linkage results obtained when using propensity scores in LODPAL in the S.A.G.E. program package and are currently applying these methods to Dr. Bailey-Wilson's lung cancer data. Results will be presented at an upcoming international meeting. We continue to explore the utility of various machine learning methods in genome-wide association studies and in analyses of whole-exome sequence data, particularly with respect to power and detection of gene-gene and gene-environment interactions. We previously published a study using GWAS genotype data from the Framingham Heart Study data repository with computer simulated trait data, thus allowing us to show that these methods may be able to detect interaction effects in suitably-powered studies. We are continuing to pursue the use of machine learning methods in genomics studies, and have evaluated the power of several of these methods in whole-exome sequence data from the 1000 Genomes Project using computer simulated phenotypes as part of Genetic Analysis Workshop 17 (GAW17). A paper presenting these results is in press (1). In addition, Dr. Bailey-Wilson has co-authored a review paper on machine learning methods that is in press (2). Dr. Bailey-Wilson is also first and corresponding author on a paper summarizing the findings of the GAW17 group on Regression and Data Mining Methods for Analyses of Multiple Rare Variants which is also in press (3). We also have used the GAW17 simulated whole-exome sequence (WES) data to develop novel tools for analysis and interpretation of WES data, including strategies for combining linkage and sequence results, various schemes of collapsing rare variants in genes and gene networks to improve the power of sequence analysis, and methods for integrating sequence analyses with existing genomics databases. Two additional papers presenting these results are in press (4-5). Development of these analysis methods and tools are ongoing, driven by our own WES and targeted sequence data from multiple studies of complex traits. Given the limitations of the GAW16 and GAW17 simulated datasets, we have started developing our own simulation programs to simulate genome-wide association data with realistice haplotype block structures that will be representative of (at least) European Caucasian and African-American populations. These simulations will allow us to test and compare methods across a wide array of biological models including complex trait models that include geneXgene and geneXenvironment interactions.