A reliable and precise prognosis is fundamental for successful disease management and treatment selection. More aggressive intervention can be given to patients who are at high risk of early disease onset, while patients who are unlikely to respond to one treatment should be considered for alternative options. With the rapid advancement of technology, a wide range of biological and genomic markers have emerged as potential tools for improving the prediction of disease and treatment outcomes, and may lead to personalized, tailored medicine. New technologies such as DNA sequencing and microarrays are generating detailed data with exponentially increasing dimensionality and complexity. These data presents unprecedented opportunities and great challenges for making accurate prediction of clinical outcomes. To take full advantage of such data, this proposal aims to develop statistical approaches to efficiently construct and evaluate prognostic tools for disease risk assessment and treatment selection. Specifically, in Aim 1, we will develop accurate risk prediction models by incorporating complex interactive effects via a kernel machine regression framework. We will also provide non-parametric procedures for assessing the predictive performance of the resulting models. In Aim 2, we propose inference procedures for absolute risks and prediction performance of new markers using two-phase studies. In Aim 3, we develop systematic procedures for identifying subgroups of patients who may or may not benefit from a new treatment using patient level baseline marker information. In Aim 4, we focus on high dimensional regression and develop regularized resampling methods to construct confidence intervals and hypothesis testing procedures for regression coefficients and the prediction performance of estimated models. To increase the practical impact of our research, in addition to creating software for public use, we will apply the proposed procedures to predict individual risk of developing (i) rheumatoid arthritis among women using the Nurse's Health Study (NHS);(ii) CVD among diabetic patients using the NHS and the Health Professional Follow-up Study;(iii) AIDS defining events among HIV infected patients using a large immunogenetic study;and (iv) CHD or stroke using the Women's Health Initiative (WHI) study. We also plan to develop algorithms to identify cases of various autoimmune diseases using electronic medical record (EMR) data from two large hospitals in Boston. The identified cases will be used for subsequent genetic case-control studies of the corresponding diseases. Such algorithms will enable the use of EMR clinical data directly for discovery research. In addition, we will develop treatment selection strategies for HIV infected patients using randomized ACTG clinical trials and for dietary intervention in preventing CVD using WHI clinical trials. Incorporating genetic profile, modifiable risk factors, along with biologic markers into risk models is likely to improve the prediction of clinical outcomes and ultimately lead to personalized medicine. PUBLIC HEALTH RELEVANCE: The research proposal addresses the pressing need for advanced statistical tools that meet challenges in current development of prediction models for disease risk and treatment benefit. By providing statistical tools that enable clinical investigators to effectively develop personalized disease management strategies, this proposal will join prior and ongoing research activities towards the goal of finding efficient and cost effective personalized medicine.