Project Summary: Statistical methods for analysis of longitudinal, clustered, and time-to-event data and for making causal inference from observational data are central to health sciences research in cancer, HIV, cardiovascular disease (CVD), and a host of other areas. The objective of the projects in this application is to develop new procedures to address existing and emerging challenges in these contexts, motivated by issues arising the Principal Investigator's collaborations. Linear and generalized linear mixed effects models are popular among practitioners for analysis of longitudinal and other clustered data, but there is little work on variable selection in this context. In the first project, we propose a unified, practically accessible framework that simultaneously addresses parameter estimation and variable selection. However, there may be settings where parametric such mixed models are not adequate to represent outcome-time/covariate relationship. Semi- and nonparametric mixed effects models are popular, but, again there is little in the literature on variable selection. We will also develop unified practical procedures for these important classes of models. Mixed effects and measurement error models often invoke parametric assumptions on latent random quantities such as random effects and true, error-prone covariates;normality is a standard such assumption. In the second project, we propose new, accessible, practical methods for evaluating and handling departures from such assumptions. Inference on causal treatment effects from observational data is a fundamental goal of epidemiologic and outcomes research, but, despite the critical importance of variable selection for regressions models in this context, a paucity of work in the literature on formal strategies for such variable selection. In the third project, we will develop and study systematically a formal strategy based on methods particularly well-suited to this objective, culminating in concrete guidance for practitioners. Methods for analysis of censored survival data are traditionally non- or semiparametric. We have demonstrated in the previous project period that, under mild "smoothness" assumptions, computationally convenient methods handling arbitrary censoring patterns are possible. In the fourth project, we will adapt this approach to further challenges with an eye toward a unified, accessible framework. Relevance: The research proposed in this application will provide public health researchers new tools to learn about relationships among subject characteristics, such as physiologic, demographic, and genetic attributes, and disease outcomes and to determine those with the strongest associations. New methods to be developed will also help researchers learn about the effects of treatments from data that do not come from randomized clinical trials.