Methods for Genetic Epidemiology We published methods of analysis for case-control family data in which probands (cases or controls) are genotyped and time to disease onset information is available for first-degree relatives. Methods to estimate relative risks, cumulative risks and residual familial aggregation are given. The work indicates that case-control family design is robust to misspecification of copula models used to accommodate residual familial correlation, but that samples with case probands only are not robust to such model misspecification. We published methods for cohort and nested case-control studies to estimate relative risks from haplotypes. Based on a hazard function derived from the observed genotype data, we developed a semiparametric method for joint estimation of relative-risk parameters and the cumulative baseline hazard function. The method performs well in simulations. We published a class of TDT-type methods that can jointly analyze haplotypes from multiple linked or unlinked candidate genes. Our approach first uses a linear signed rank statistic to compare at individual gene level the structural similarity among transmitted haplotypes against that among non-transmitted haplotypes. The results of the ranked comparisons from all considered genes are subsequently combined into global statistics, which can simultaneously test the association of the set of genes with disease. Using simulation studies, we found that the proposed tests yielded correct type I error rates in stratified populations. Compared with the gene-by-gene test, the new global tests were more powerful in situations where all candidate genes are associated with the disease. To take advantage of high-density SNP maps across the genome, various candidate gene association tests have been developed to compare multilocus genotypes or estimated haplotypes between cases and controls. We viewed the two-sample testing problem from the perspective of supervised machine learning and proposed a new association test. The approach adopts the flexible and easy-to-understand classification tree model as the learning machine and uses the estimated prediction error of the resulting prediction rule as the test statistic. The procedure not only provides an association test but also generates a prediction rule that can be useful in understanding the mechanisms underlying complex disease. In related work we published on the performance of various types of prediction error estimators for the stochastic gradient boosting model. The boundary of common haplotype blocks in Hapmap constructions can be ambiguous, as are the associated tagSNPs. We addressed this issue by defining a marker ambiguity score (MAS), and evaluated it in simulations based on a real data. We showed that the MAS method can assess boundary ambiguity caused by ethnic variation, limited sample sizes for Hapmap construction, and disease aggregation. We found striking differences in overall patterns of blocks between blacks and whites. We published a re-sampling approach to control the family-wise error level of multiple testing procedures to detect genetic associations in case-control data. An omnibus test combines single nucleotide polymorphism (SNP)-based and haplotype-based procedures and has good power whether the genetic disease tendency is conferred by SNPs or haplotypes. A related two-stage procedure is also developed that controls the false discovery rate. Methods and procedures were developed to facilitate collaborations among members of consortia working to identify low-penetrance alleles associated with breast and prostate cancer.