The contradiction of early cancer detection is that while some benefit others receive a detrimental diagnosis. A definitive example is mammography and ductal carcinoma in situ (DCIS), a noninvasive breast cancer. DCIS, which most frequently presents as a non-palpable lesion, was rarely detected before the advent of modern mammography. Since 1983 there has been a 290% increase in DCIS incidence in women under 50 and 500% in those over 50. Given that only 5-10% of DCIS cases progress to invasive cancer with a 10-year mortality rate of 1-2%, DCIS experts suggest breast conservation for the majority of patients. However, these women continue to be overtreated with mastectomy and radiation, at rates comparable to those with invasive cancer. The inability to discern those at low vs. high risk is due in part to non-reproducible study results as well as inadequate statistical methods for risk prediction and validation. We have collected a population-based DCIS cohort with the goal of delineating those women least likely to recur with invasive cancer and, hence, appropriate candidates for less aggressive treatments. Recently we established risk indices and published the corresponding absolute risk estimates for type of recurrence. However, two features of the study design, namely the presence of competing risks and the use of a stratified case-cohort design, constrained us to using crude empirical methods for analysis and left us unable to validate the clinical utility of our models. The overarching goal of this proposal is to develop a unified, principled statistical framework for building, selecting, and evaluating clinically relevant risk indices, permitting refinement and validation of existing risk prediction models in our DCIS study as well as beyond. We face multiple challenges including how to objectively build risk indices with relevant variables; how to estimate the corresponding risks (competing or not) in various subsample study designs; and, how to validate the resulting risk prediction models. Recently, we developed partDSA, a tree-based method which affords tremendous flexibility in building predictive models and provides an ideal foundation for developing a clinician- friendly tool for accurate stratification and risk prediction. In its curret form, partDSA is unable to estimate absolute risk in the presence of competing risks accounting for subsample study designs. Here we extend partDSA for such clinically relevant scenarios (Aim 1). We also propose aggregate learning for risk prediction to increase prediction accuracy and subsequently to build more stable but easily interpretable risk models (Aim 2). Finally, we propose the necessary methods for validating the resulting models (Aim 3). Our proposal has two immediate public health benefits: first, these novel statistical methods will result in a clinician-friendly, publicly available tool for accurate risk prediction, stratification and validaion in numerous clinical settings; second, current DCIS risk models will be refined and validated with the expectation of better delineating those at low risk, hence strong candidates for conservative treatments including active surveillance.