Statistical Methods for Next-Generation Sequencing in Disease Association Studies Through this project we propose to develop statistical approaches and software for genotype calling and association testing in next-generation sequence data. The field is driven by molecular advances that allow for affordable, massively parallel sequencing. The rapid development of statistical methods for next-generation sequence data in disease studies is necessary to keep pace with the advancing molecular technology. Next- generation sequencing is based on random, short-read technology;thus the coverage of any nucleotide is highly variable and subject to error. Distinguishing random error from truly variable sites is required for "SNP- calling". One step beyond this is identifying the individual's actual genotype at the site. This is a highly statistical problem and we have yet to see this problem addressed in a statistically rigorous manner. The solution that we propose, and what makes our approach novel, assumes that we have a sample of individuals, each with next-generation sequence data. We anticipate that sequencing may ultimately replace GWAS SNP arrays for disease-association studies. While this may be several years away for whole-genome sequencing, sequencing enough people individually for a small association study is already becoming practical with target capture arrays. We can leverage the information from a sample of individuals with next-generation sequence data to more accurately estimate an individual's genotype and the position-specific error rate. Our approach is to express the genotype probabilities and error rate in a likelihood framework. We can then use standard statistical theory to help us call genotypes. This approach should perform better than calling genotypes for a single individual at a time based on an arbitrary filter as is currently done. A distinct advantage of this statistical framework is that the uncertainty in the genotype calls can be incorporated directly into our disease-association tests (e.g., case-control and rare variant analysis). In this way we will increase power of our association tests and reduce bias due to error or systematic missingness. Incorporation of next-generation sequence data into the association tests provides a complete analysis pipeline from sequence to association.