This project supports research on the development of statistical methods for conducting cancer and other disease epidemiologic and surveillance analyses from national health surveys. We developed innovative statistical methods and statistical software for displaying scatter plots and for estimating kernel density smoothers to obtain conditional mean and percentile plots using weighted cluster samples. These methods have been useful in examining residual plots from multiple linear, logistic and Cox regressions to identify data points that might be disproportionately influencing the results of the analysis. We developed methods for constructing confidence intervals for rare binary outcomes observed from a survey. Because binomial theory breaks down when the data is weighted and correlated within sampled clusters, our approach modifies methods used to obtain exact binomial confidence limits. These methods involve determining an effective sample size due to the complex sample designs inflation of the variance and using the effective sample size in the exact confidence limit formulas. A problem arising from logistic regression analysis of risk factors for disease is determining how well the estimated logistic model fits the data. We have developed a method to test the goodness-of-fit of a logistic regression model with survey data. In this approach the distribution of a Wald test that compares the observed and expected counts from deciles of risk is simulated under the null hypothesis. This approach is particularly promising for logistic models with small numbers of outcomes where the asymptotic distribution of the Wald test is not accurate. We are extending this simulation approach to testing of regression coefficients from logistic regression when the number of outcomes in covariate cells are sparse. The simulation approach is being compared to score tests under these same sparse data conditons. When we use regression analysis such as multiple linear, logistic or Cox regression, it is useful to estimate the average predicted response from the regression for each level of the risk factor if everyone in the population had been exposed to that level of risk. This is called a predictive margin. We have developed variance estimates for predictive margins when the sample data is from a survey. This methodology has been used to analyze the relationship of cancer screening to type of health insurance. We wrote a graduate level text book for instructing students and researchers in public health and epidemiology on how to analyze national health survey data. Additional research is underway regarding utilization of survey methods for analyzing two-stage case-control studies. We have been examining application of jack knife replication methods, which are used in survey research for estimating the variances, to the problem of variance estimation of logistic regression coefficients from two stage case-control studies. Another area of research interest is in methods for making inferences about superpopulation parameters. We developed adjustments to classical finite population variance estimators that can provide accurate variances for superpopulation means. We are extending these variance estimators to ratio and regression parameters and applying these estimators to the National Health Interview Survey, National Hospital Discharge Survey and the Third National Health and Nutrition Examination Survey. Also, we are developing methods for using latent class theory to analyze dietary survey data. These methods are being used to estimate the proportion of individuals who meet NCI guidelines for consuming fruits and vegetables.