Continuing technological advancements allow researchers and clinicians to measure an increasingly vast diversity of physiological, molecular and genetic markers, rapidly increasing our understanding of disease processes. The wide range of newly available markers holds great potential for the personalization of medical care through accurate prediction of clinical outcomes. Traditional statistical methods for using a patient's marker values to make personalized predictions are derived under the strong assumption that the true model relating markers to clinical responses can be identified, at least with large enough samples. In practice, however, it is difficult if not impossible even to identify a class of models containing the truth. To make meaningful individual patient's prediction, it is therefore important to modify the standard methods and develop new statistical procedures for model estimation and evaluation when the model may not be correctly specified. When possible we will modify standard methods, but we address some fundamental issues in statistic inference that require the development of new procedures. Specifically, we seek to develop procedures for predicting future observations and for evaluating and comparing prediction rules. The key contribution of the proposed procedures will be the production of valid inferences even when the fitted models are incorrect. We will focus on the following three aims. In Aim 1 we will develop robust procedures for evaluating and comparing the accuracy of prediction rules constructed under various working models for continuous, binary and censored event time outcomes. In Aim 2 we will develop procedures that generate optimal robust prediction intervals for future observations without assuming that the model is correct. Implementing such inference procedures often requires approximating the sampling distribution of estimated model parameters and accuracy measures. This can be rather challenging in certain settings when the model is not assumed to contain the truth. In Aim 3, we will develop numerically efficient resampling methods to facilitate inference under possibly incorrect working models. The proposed methodological research will be guided by a wide variety of real datasets from the Multi-Ethnic Study of Atherosclerosis and cancer clinical trials sponsored by the Eastern Cooperative Oncology Group, to which we have access. Our aims will require, in most cases, the development of large-sample distribution theory, simulation studies of small-sample behavior and applications to real data. The developed methods will use existing statistical software packages whenever possible and be fully implemented otherwise. PUBLIC HEALTH RELEVANCE: In medical research, it is often of interest to explore the effect of various factors, such as patient characteristics or environmental exposures, on clinical outcomes. For example, an important step in discovering new diagnostic biomarkers is to quantify the ability of the biomarkers to predict the disease risk. It is therefore important to develop statistical procedures for constructing and evaluating empirical prediction models for clinical outcomes. The project will be a systematic research on the consequence of the model mis-specification and how to make clinical decisions based on empirical statistical models. The proposed methodological research will be guided by a wide variety of real problems of clinical interest.