Biomarkers in cancer research are considered a central component of the expected improvements in prevention, detection, treatment and monitoring. There are potentially useful in many different types of studies and for many different purposes. Critical questions are whether they are valid to use, how can they be utilized in a valid and efficient way, and then if they are used how confident is one in the conclusions that are obtained. The use of biomarkers to advance understanding in cancer science has great potential, but also has some risks. Biomarkers are subject to uncertainty in their measurement, they may not be measuring exactly the quantity of interest, and since they are not explicitly measures of symptoms their use to aid in decision making or evaluation of therapies in a clinical setting is subject to uncertainty. Thus careful analysis of data from studies that involve biomarkers is crucial. There are many statistical challenges that arise in such studies. This application is concerned with developing, evaluating and applying statistical methods for data that involves biomarkers. The first aim is concerned with adding biomarkers to prediction models that may be used to stratify or classify patients. In this aim we develop approaches for integrating data from other sources to improve the prediction models. This research will have broad applicability. Innovative aspects involve the use of targeted ridge regression, multi-kernel machine modeling, and importance sampling to incorporate information from the literature. The second aim is concerned with clinical trials where the biomarker is to be used to evaluate a therapy as a surrogate endpoint. Because of the nature of the scientific question causal modeling is very natural in this context. We propose to develop both potential outcomes and structural causal models. We will investigate both single trial and multi trial settings with different endpoint types. The third aim is concerned with therapies that may be effective only for a subgroup of patients, and to be useful this subgroup is determined by a small number of predictive biomarkers. For data from randomized clinical trials we suggest a unified modeling approach, and will investigate the use of single index models with variable selection and multivariate partial least squares to aid in the subgroup identification. Inference following subgroup identification is challenging, we suggest an innovative scheme to simulate data under an appropriate null distribution. All 3 aims in this proposal address fundamental and significant problems in translational oncology research. Successful completion of the aims will have an impact both in understanding and utilizing biomarkers and also in developing statistical methodology that can be more broadly applicable to other fields.