Technological innovations arising from the HapMap Project have dramatically increased the speed and accuracy of genotyping while greatly reducing cost. Public and private efforts are beginning to release an unprecedented volume of human genotype and DNA sequence data into the public domain. In order to allow the best inferences about human variation and past human evolution from these data, we propose a series of investigations that center around four aims. First, we will develop novel statistical methods for population genetic inference from high-throughput DNA sequencing platforms. Pyrosequencing technology will generate assembled alignments that represent a sampling of sequence reads across individuals (multinomial) and across homologous chromosomes within an individual (binomial), producing a complex mixture. Inference of population genetic parameters from such data will demand novel statistical approaches, and we outline a set of plans to develop statistically rigorous methods. Second, we will develop methods for reverse-engineer the ascertainment biases of SNPs on widely used genotyping panels so as to enable population genetic inference. SNPs on the high-throughput genotyping platforms of Affymetrix and Illumina were ascertained in diverse and often irretrievable ways. Statistically sound population genetic inference from these data requires an understanding of the nature of the ascertainment bias of these platforms. We will reverse engineer the ascertainment by use of ENCODE and other dense resequence data, and use these inferences to perform ascertainment bias correction to high- density SNP platform data. Third, we will develop novel methods for inference of natural selection from patterns of haplotype diversity within and among human populations and apply these approaches to publicly available data sets. Methods of inference of natural selection from SNP frequency and haplotype diversity continue to gain in power and specificity. Optimization of these methods demands correction for effects of ascertainment, demographic effects, local variation in recombination, and for imputation of missing data and of haplotype phase. We will make use of Markov-Hidden Markov models for jointly estimating the magnitude, location, and age of selection sweeps. Finally, we will develop novel approaches for predicting the functional consequences of nucleotide substitutions in putatively functional regions of the human genome. Whole-genome association tests will gain power and specificity from the use of prior inference of the likelihood that a SNP has a damaging effect on a gene's function. In addition, after genome-wide association tests, there will follow extensive resequencing of candidate regions, and inference of the likelihood of deleterious effects of the many rare variants will also have utility. We propose methods that have advantages over existing approaches, making use of comparative genomic data, protein structure, cis-regulatory information, and patterns of segregating variation. Project Narrative: This project will develop methods of statistical inference from human DNA resequencing and SNP genotype data that will allow accurate estimation of critical parameters that describe the structure of variation in human populations. These inferences can provide vital clues to identifying genes that are associated with risk of complex genetic disorders.