The shift in attention toward rare alleles and the concomitant need for DNA sequence data from large samples drawn from human populations has driven the need to accurately describe the patterns of human DNA sequence variation and to understand the forces that impact it. At the same time, SNP genotyping platforms are expanding in SNP density at the same time unprecedented sample sizes are accumulating from GWAS studies. In order to foster rigorous inferences about human variation and past human evolution from these data, we propose a series of investigations that center around four aims. First, we will develop novel statistical methods for population genetic inference from next-generation DNA sequence data. Starting from alignments of sequence reads from multiple individuals, we will develop methods of parameter estimation and hypothesis testing that integrate over likelihoods of genotypes conditional on the data. Optimal balance of sample size vs. sequencing coverage will be analyzed for several distinct experimental problems. The methods will be thoroughly tested and applied to several resequencing data sets to which we have access. Second, we will develop and extend methods for ancestry inference from SNP and genome sequence data of admixed individuals and employ them to infer past demographic history, including migration. Our method of ancestry inference based on Principal Components Analysis will be extended to accommodate data uncertainty and ascertainment bias. We will model a range of admixture scenarios from single-pulse to continuous influx in order to determine whether genetic data allows more refined inference of the past history of mixing of two ancestral populations. Third, we will develop methods for estimation of joint IBD relationships across multiple individuals. Existing methods take discrete genotype calls as a starting point, and do not accommodate platform-specific error. There is significant need to develop methods for inference of shared IBD regions genome-wide across multiple individuals in large population samples. Through a combination of heuristic approaches and graph- theory based computational algorithms, we will develop and test such methods. Finally, we will use IBD sharing inferred across individuals in a sample to estimate population genetic parameters in models of demography and selection. Just as demographic changes impact the site frequency spectrum of SNPs, so too will they impact the pattern of IBD sharing in a sample. Turning this problem around, we will develop approaches for inference of population genetic parameters, such as demography, rates of inbreeding, levels of purifying and positive selection, admixture and migration based only on the patterns of IBD sharing. These will be contrasted to approaches that use phased haplotype information for demography inference. PUBLIC HEALTH RELEVANCE: This project aims to understand the population-level forces at play on the human genome by analysis of genome-wide SNP data and next-generation sequences using newly developed statistical methods. Estimation of model parameters from alignments of next-generation sequence reads will be done so as to accommodate base-calling uncertainty, and segment-wise inference of ancestry in admixed genomes will be applied to understand past admixture history. Identity-by-descent methods will be pursued to allow the most reliable inferences about demography, natural selection and other population forces acting on human genetic variation.