When using case-parent triads to study the association of single nucleotide polymorphisms with disease, the phenotype of the child is used but the phenotypes of the parents are ignored, even though all three family members are genotyped. When parental phenotypes for the disease under study are available, including them in the analysis will bring more information to bear on disease-gene associations. We have developed and evaluated new approaches for using parental phenotypes together with the usual data from case-parents studies to increase power for detecting associations. Our approach uses parental phenotypes to assess association independently of the usual test based on offspring genotypes. Moreover, our procedure for using parental phenotypes is robust to bias from hidden genetic population structure because our statistical model employs strata defined by the pair of parental genotypes. Our simulations support this claim of robustness and show that that incorporating information about parental phenotypes can enhance power compared to using offspring phenotypes alone. We are in the process of developing methods for using exposures measured in pooled specimens from several individuals, together with genotypes measured separately on each individual, to study gene-environment interactions. Suppose one has case-control study and genotyped each individual at a panel of SNPs (single nucleotide polymorphisms). Suppose that one also has biological specimens (e.g., serum or urine) from the same individuals but lacks the budget to assay each individual specimen for an exposure of interest. Pooling specimens and assaying the resulting pooled specimens will not only save assay costs but preserve specimen volume for future uses. In the past, we have developed methods for analyzing case-control studies with exposures measured in pooled specimens. Those methods assume, reasonably, that the measured value on the pooled specimen is the average of the values for the individual specimens. With those methods, testing gene-environment interactions at a single SNP required creating specimen pools within strata of individuals who all had the same genotype for that SNP. To study gene-environment interactions for a panel of SNPs, our previous methods would require creating new pooled specimens for each SNP studied and the potential savings in assay costs would disappear. The approach that we are developing regards the individual measurements as missing data and uses the pooled specimens in a principled way to impute those missing data. With a give set of imputed data in hand, we can use standard statistical methods for case-control data to estimate gene-environment interactions. In practice, we use a multiple-imputation approach: creating multiple sets of imputed data, doing a case-control analysis for each set, and combining the results from the multiple analyses. This approach has shown some promise but some problems remain to be resolved. Work on this problem is ongoing. Identification of causative SNPs in a genome-wide study can be challenging when individual SNPs have small marginal effects because testing thresholds must reflect the large number of SNPs under study. For complex diseases, particular combinations of SNPs may dramatically increase risk a kind of epistasis or gene-gene interaction. We are currently investigating the use of a machine learning technique for the discovery of sets of SNPs that together cause disease (causative SNPs) in case-parents data. First, we devised a way to use actual case-parent triad genotypes to create simulated genome-wide data sets that reflect realistic linkage disequilibrium structure and are seeded with known sets of causative SNPs. Second, we implemented an existing stochastic search algorithm (called GA-KNN) that is based on an evolutionary algorithm to find multiple sets of k SNPs that are predictive of disease (here k is a small number, say 2 or 4). By cataloguing those SNPs which appear most frequently among the sets that are predictive of disease, we hope to uncover the sets of causative SNPS. In preliminary trials on simulated data seeded with two interacting sets of four SNPs each, our approach shows promise. In ongoing work, we are attempting to speed up the algorithm and to see whether the promising performance is maintained in more complex situations. (see also Z01 ES040007; PI Clare Weinberg; Min Shi is also a within-lab collaborator on this project; her time is allocated in Weinbergs project but not in this one.)