This project supports research on the development of statistical methods for conducting cancer and other disease epidemiologic and surveillance analyses from national health surveys. We developed innovative statistical methods and statistical software for conducting a wide range of statistical analyses of data from weighted cluster samples from complex designed surveys, including displaying scatter plots for weighted data, estimating kernel density smoothers to obtain conditional mean and percentile plots, estimating directly adjusted estimates (predicted margins) from linear and nonlinear models, and estimating population variance components. A problem arising from logistic regression analysis of risk factors for disease is determining how well the estimated logistic model fits the data. We have developed a method to test the goodness-of-fit of a logistic regression model with survey data. In this approach the distribution of a Wald test that compares the observed and expected counts from deciles of risk is simulated under the null hypothesis. This approach is particularly promising for logistic models with small numbers of outcomes where the asymptotic distribution of the Wald test is not accurate. We are extending this simulation approach to testing of regression coefficients from logistic regression when the number of outcomes in covariate cells are sparse. The simulation approach is being compared to score tests under these same sparse data conditions. When we use regression analysis such as multiple linear, logistic or Cox regression, it is useful to estimate the average predicted response from the regression for each level of the risk factor if everyone in the population had been exposed to that level of risk. This is called a predictive margin. We have developed variance estimates for predictive margins when the sample data is from a survey. We are developing methods for making inferences about superpopulation parameters. We have developed adjustments to classical finite population variance estimators that can provide accurate variances for superpopulation means. We have extended these variance estimators to ratio and regression parameters and applying these estimators to the National Health Interview Survey, National Hospital Discharge Survey and the Third National Health and Nutrition Examination Survey. We are researching methods for using latent class theory to analyze dietary survey data. We have developed jackknife methods for estimating standard errors for estimators of latent class parameters and have investigated Wald procedures for testing hyptheses about these parameters. These methods have been successfully applied to dietary intake data from the USDA Continuing Survey of Food Intakes by Individuals to estimate the proportion of individuals who meet NCI guidelines for consuming fruits and vegetables. We are developing design-based consistent estimators of population variance components that are an improvement over existing inconsistent estimators. Simulation studies are under way to investigate the small sample properties of these design-based estimators. For cluster samples, the distribution of robust Wald tests that use sandwich estimators are not well approximated by their asymptotic distributions when the numbe of clusters are small. We have developed practical modifications to the robust Wald tests that provide more accurate type I error values than the usual robust Wald tests.