Longitudinal data methods have matured over the past 20 years. It has become increasingly common for epidemiologists to use longitudinal models such as the GEE regression model or the mixed effects longitudinal model to fit their data. The result has been a more efficient use of precious longitudinal data. In this proposal, we seek to extend and evaluate the use of cutting edge longitudinal methods on real datasets at our disposal in the areas of cardiovascular disease and cancer. One issue introduced by Jamie Robins is the difficulty in attributing causal effects to time-varying exposures in observational studies. For example, hypertension and diabetes are both intermediate variables and confounders in studying the relationship between obesity and cardiovascular disease (CYD) We propose to apply and extend the techniques of Rosenbaum (1995) and Robins (1989) to assess this relationship using data from the Nurses' Health Study. Another important issue that appears regularly in the statistical literature bat infrequently in epidemiological practice is the proper handling of missing data. Several commonly used methods such as the complete case method and the missing indicator method are easy to implement but have conceptual problems in some instances. Imputation methods are more prevalent in the statistical literature but are difficult to use and not often used in practice. We propose to apply these methods both to Nurses' Health Study data as well as the Trial of Hypertension Prevention II (TOHP 2), a clinical trial enrolling subjects with high normal DBP (80-89) where missing data is often informative and indicative of subjects entering the hypertensive range (DBP=90+). Another topic in longitudinal data is the fitting of growth carves (e.g. of body mass index by age). One issue is that the variance of BMI increases with age and it is difficult to use standard linear models to fit such data. Instead, since interest is often in extreme percentiles (e.g. 90th percentile by age), one wishes to fit a quantile regression of the 90th percentile of BMI by age. In this proposal, we consider several methods for fitting these regressions. An important dimension in fitting longitudinal models is that of goodness of fit for logistic regression, this is often measured by the percent of correct classification of diseased and non-diseased subjects wing an SAS based ROC curve derived from the logistic regression We propose to extend this methodology to longitudinal data where one uses GEE regression to handle the correlated data on the same person at different time points. Finally, for incidence data, Cox regression is the standard methodology. However, the assumption of proportional hazards is violated when applying this technique to breast cancer incidence data since some risk factors (e.g., age, parity) have different relative risks at different ages. We have developed the log incidence model to apply to breast cancer incidence data to handle the case of non-proportional hazards (Rosner, 1996). In this application, we propose to extend this methodology to model the incidence of benign breast disease, an important risk factor for breast cancer among participants in the Nurses' Health Study who have experienced about 20,000 incident events from 1976-1994.