Genome-wide association studies have recently been applied to detect genetic variants that contribute to the predisposition of human disease, such as diabetes, heart disease, and cancer. By design, these associations may be obtained indirectly, as the disease-influencing genotypes may not be observed. Rather, the hope is that genetic risk factors may be sufficiently correlated with observed genotype data, with which an association may be established. Standard marker sets, which capture common variation, are powerful to detect associations where the risk variants are also common, but rare genetic variants that influence disease phenotypes may remain undetected. However, specific combinations of genotypes can effectively "tag" the genotypes at a risk- predisposing genetic locus, facilitating the detection of association. While the correlations among genotypes at nearby loci are essential for detection of an association, they are burdensome when trying to identify the ultimate sources of association, i.e. causative genetic loci. In this application, we outline a haplotype-based statistical approach to detecting and dissecting association between a binary phenotype and rare genetic variants. In addition, we utilize our model to identify individual carriers of the risk alleles. This serves both to aid in the characterization of the genetic influence on a complex phenotype, as well as to provide a tool for formulating preliminary risk models. We will apply these methods to data from a genome-wide association study of lung cancer. Our methods, which are computationally tractable for the application to large existing and forthcoming data sets, will be incorporated into the widely used and freely available fastPHASE software package. PUBLIC HEALTH RELEVANCE: Identification of the genetic causes of complex human disease requires efficient use of available data from genome-wide surveys of variation. Haplotype variation provides information to detect influence on disease from rare genetic factors, and can be particularly helpful in predicting which individuals are at highest risk for developing disease.