Investigations have been conducted for using data from current genome-wide association studies to assess genetic architecture of cancer and likely yield of future genome-wide association studies. One project explored distribution of allele frequencies and effect-size and their interrelationships for common susceptibility SNPs using discoveries from existing genome-wide association. It used novel methods to correct for bias as variants with larger effect-sizes are currently over-represented due to their larger statistical power for discovery. The analysis identified several intriguing patterns that can have implications for design and analysis of future genetic association studies. A second project explored potential utility of future discoveries from larger genome-wide association studies for building risk-prediction models that can be potentially utilized for targeting high-risk groups for cancer screening. It was found that although many discoveries are expected from future genome-wide association studies, risk-prediction models based only on discovered SNPs are unlike to identify a small portion of the population that would give rise to the large majority of the future cases. Several projects involved development of statistical methods for exploring gene-gene and gene-environment interactions using data from genome-wide association studies. A new method was developed for modeling interaction of an environmental exposure with multiple SNPs within a genomic region using a Bayesian latent variable modeling approach. Another method exploited an assumption of gene-environment independence in the underlying population to improve the power for the test for gene-environment interaction on the absolute risk of a disease from case-control studies. Another report investigated power for various alternative methods for conducting genome-wide interaction scans using simulation studies. General statistical methods Several studies have been conducted to evaluate efficient design and analysis strategies for epidemiologic studies that use complex sampling designs. One study focuses on the efficient usage of specimen repositories for the evaluation of new diagnostic tests and for comparing new tests with existing tests. Typically, all pre-existing diagnostic tests will already have been conducted on all specimens. It was proposed that retesting only a judicious subsample of the specimens by the new diagnostic test could minimizes study costs and specimen consumption, yet estimates of agreement or diagnostic accuracy potentially retain adequate statistical efficiency. Another project explore efficient analysis method for case-cohort designs that select a random sample of a cohort to be used as control with cases arising from the follow-up of the cohort. Analyses of case-cohort studies with time-varying exposures that use Cox partial likelihood methods can be computer intensive. A new computationally simple method has been developed using piecewise-exponential approach where Poisson regression model parameters are estimated from a pseudo-likelihood and the corresponding variances are derived by applying the corresponding variances are derived by applying Taylor linearization methods that are used in survey research. Several studies have involved development of regression models in a setting that involve potentially a large number of predictor variables. A Bayesian variable selection method has been developed in a setting where the number of independent variables or predictors in a particular dataset is much larger than the available sample size. While most of the existing methods allow some degree of correlations among predictors but do not consider these correlations for variable selection, the proposed method accounts for correlations among the predictors in variable selection. The method could be applied to continuous, binary, ordinal, and count outcome data. Another method is proposed to combine several predictors (markers) that are measured repeatedly over time into a composite marker score without assuming a model and only requiring a mild condition on the predictor distribution. Assuming that the first and second moments of the predictors can be decomposed into a time and a marker component via a Kronecker product structure that accommodates the longitudinal nature of the predictors, the method uses first-moment sufficient dimension reduction techniques to replace the original markers with linear transformations that contain sufficient information for the regression of the predictors on the outcome. These linear combinations can then be combined into a score that has better predictive performance than a score built under a general model that ignores the longitudinal structure of the data. Our methods can be applied to either continuous or categorical outcome measures. Several studies have developed methodologies related to models for predicting absolute risk of diseases and their applications. One study has developed two criteria to assess the usefulness of models that predict risk of disease incidence for screening and prevention, or the usefulness of prognostic models for management following disease diagnosis. The first criterion, the proportion of cases followed PCF(q), is the proportion of individuals who will develop disease who are included in the proportion q of individuals in the population at highest risk. The second criterion is the proportion needed to follow-up, PNF(p), namely the proportion of the general population at highest risk that one needs to follow in order that a proportion p of those destined to become cases will be followed. New methods of inference are developed to compare the PCFs and PNFs of two risk models that are built based on the same validation data. A second project developed a linear-expit regression model (LEXPIT) to incorporate linear and nonlinear risk effects to estimate absolute risk from studies of a binary outcome. The LEXPIT is a generalization of both the binomial linear and logistic regression models. The coefficients of the LEXPIT linear terms estimate adjusted risk differences, while the exponentiated nonlinear terms estimate residual odds ratios. The LEXPIT could be particularly useful for epidemiological studies of risk association, where adjustment for multiple confounding variables is common. The method was applied to estimate the absolute five-year risk of cervical precancer or cancer associated with different Pap and human papillomavirus test results in 167,171 women undergoing screening at Kaiser Permanente Northern Califronia. The LEXPIT model found an increased risk due to abnormal Pap test in HPV-negative that was not detected with logistic regression. An R package blm was developed to provide free and easy-to-use software for fitting the LEXPIT model.