For seventeen years this grant has developed statistical and computational tools vital to gene mapping. During that period, technology and genomic data have changed dramatically. Expression and genotyping chips have become standard scientific tools, and the full genomes from a host of organisms, including the human species, have been sequenced. Computers have continued to grow exponentially in speed and memory. These parallel advances have powered hundreds of successful human gene mapping studies for both Mendelian and complex traits. Unfortunately, these successes have shed light on only a small fraction of the genetic heritability of complex traits. This is hardly surprising as current technology stresses common SNPs, and selection tends to drive common deleterious mutations to extinction. There are several candidates for the missing dark matter of genetic epidemiology. Among these are (a) copy number variants, (b) polygenes of small effect, (c) missed interactions among genes and between genes and environment, (d) epigenetic effects, (e) variation across populations, (f) rare variants, and (g) non-coding RNA. As sequencing costs continue to decline rapidly, the search for rare variants via large-scale sequencing is perhaps the most promising new route to disease gene discovery. In the next cycle of this grant, we plan to build on our successes, with particular stress on mining the growing avalanche of sequence data. The statistical analysis of sequence data is surely one of the most complex undertakings in all of modern biology. Currently, the data being generated are in danger of being squandered due to a lack of good analysis tools. Beyond raw sequence data, interesting connections are being forged by the bioinformatic and functional genomic communities. We desperately need to bring this accumulated knowledge in mutation severity prediction and gene interactions to bear on gene mapping. In our opinion, the extraordinarily fast, coordinate descent forms of penalized regression are the best candidate tools for successful analysis of high-dimensional sequencing data. Genetic analysis via penalized regression easily handles non-genetic predictors, uncertainty in genotype and sequence calls, corrections for ethnic admixture, quantitative traits and disease dichotomies, gene-gene and gene-environment interactions, and both rare and common variants. Our first aim is to extend our penalized regression algorithms to incorporate prior biological knowledge at the variant level; distinguish modes of inheritance at the gene level; capture multivariate phenotypes; and exploit network information in interaction testing. Additional aims of this proposal include new methods to use sequence data to rule out variants involvement in Mendelian traits; extensions to our tests for intergenerational effects; and more efficient algorithms for genome-wide association tests based on pedigree data. Finally, we will implement all these innovations in our mature, freely distributed, statistical genetics package MENDEL. PUBLIC HEALTH RELEVANCE: The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, our ability to collect genetic information has currently far outstripped our ability to make use of this information in understanding the basis of disease and human diversity. Our aim is to develop, implement, and freely distribute new, efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers' ability to map and characterize genes that lead to human diseases and to trait variation.